Principal Scientist and Community Manager at Sage Bionetworks in Seattle, Brian Bot has extensive experience working with clinical and genomic data and a passion for exploring innovative ways to make science more open and transparent. Living at the intersection of technology, research, and policy, Brian’s current work aims to make the biomedical research system more effective by challenging the traditional roles of researchers, institutions, funders, and research participants. At its heart, this work is rooted in building trust between these parties as well as with the public at large.
I’m wondering if you could start by giving a broad overview of your work, and then perhaps honing in and highlighting some specific projects.
I’m a Principal Scientist at Sage Bionetworks which is a nonprofit in Seattle, Washington. My background is in statistics. I worked at the Mayo Clinic for seven years primarily working in cancer clinical trials and cancer genomics.
After seven years at Mayo, I moved to Seattle and joined Sage, where I’ve been for almost six years now. I was hired as a researcher and a statistician, but very quickly migrated into a role that we call Community Manager in which I was a bridge between the technology development team within Sage and biomedical researchers — both internal to the organization, such as the position I initially got hired into, as well as to the broader biomedical research community.
That role started as a hands-on intermediary between technology development and research and morphed into what I’ve spent most of the last two years doing — being a scientific advocate.
I do a fair amount of speaking at conferences, talking about open science and the changes in the way that researchers conduct their work, in biomedical research in particular. I focus more on the social aspects: How do you encourage people to work more openly — more collaboratively and transparently?
That’s the high-level overview of myself, where I came from, and how I arrived with what I’m doing now.
My organization, Sage, is a nonprofit founded about eight years ago to advocate for more open and transparent science. We’re a group of between 40 and 50 people. A mix of technology developers — software engineers, data scientists, computational biologists — and biomedical researchers.
We both develop technologies that help enable this type of work and, really importantly, we support communities. We play a convener role.
There are lots of groups that act as data management centers or data coordinating centers. Sage is different because we work on the science as well — we don’t just have project managers. We have scientists who are deeply embedded in the actual work and that work with partners and try to get researchers to work collaboratively — not just to share data.
Some of the more interesting work that we’ve done in the last two years that I’ve been pretty heavily involved in has been around mobile health. In March 2015, Apple launched a library called ResearchKit. Sage was a launch partner, which included the year preceding the launch.
ResearchKit enables research studies to be run through a mobile phone, providing a consistent interface and templates to carry out tasks and allowing certain functionalities to tap into the phone’s sensors.
Since the launch, we’ve been focused on a number of different areas in mobile health. One of them is around how do you put the participants at the center of research as opposed to on the periphery. One way to do that is to enable them to participate in biomedical research studies without having to go into a physical location.
If you are a Parkinson patient, for instance, or you are a loved one of a Parkinson patient who wants to serve as a control for a study, we have an app called mPower, which launched in March 2015 and now has close to 2,000 self-reported Parkinson patients and over 10,000 controls.
Studying Parkinson disease via mobile device is particularly interesting because Parkinson patients are generally only seen by their physicians about twice a year. So they’re only getting a one-hour snapshot of how somebody’s disease is manifesting over the course of a six-month time frame. Anybody who knows somebody with Parkinson knows that it’s highly variable day to day, month to month.
So those snapshots are not always an accurate representation of an individual’s disease. Many of the symptoms that Parkinson patients struggle can be measured remotely using sensors on a smartphone. We were able to measure these symptoms very accurately via sensors on the phone: things like hypophonia, a change in voice. To assess this, we leverage the phone’s microphone and ask people to say “Ahhhhh”.
Gait and balance is also affected at various stages of the disease, so we have a task where people are asked to walk for 20 seconds and then stand still for 20 seconds. We use the phone’s accelerometer and gyroscope to measure how they walk and function.
In another typical clinical test, patients at the doctor’s office are asked to tap on a table — tapping back and forth between their middle and index fingers. This tests for motor initiation and dexterity. You can imagine having two dots on a phone and asking people to tap back and forth can give quantifiable measures — not just a visual — of exactly how the person is able to tap, with timestamps and pixel locations on the phone’s screen of where the person is tapping.
Wow. What you’re describing is significant because it is not self-reported or a doctor’s subjective assessment. It’s physical. It’s measured. It’s consistent. So you’d get an exponential increase in data points, which seems game changing.
Absolutely. As an organization that’s really focused on changing the way that research is done, these studies that we’ve done with mobile devices have been the first cases where we’ve actually been the “data generators” — the investigators who are carrying out this work.
We do a lot of work bringing together groups of people who are generating data and collaborating on it. This was our first chance to put our money where our mouth is as far as how we conduct a study. We went directly to participants to enroll them.
As part of our consent process, we gave people the ability to choose if they wanted their data to only be shared with the study researchers at Sage, or if they wanted it to be made available to qualified researchers worldwide.
In March of 2016, a year after the app launched, we released a curated version of the first six months of data before we even published the analytics on it.
Over the course of the last 10 months, we’ve had over 80 different groups across a variety of stakeholders — large academic institutions, high school students, people doing their dissertation, large tech companies, pharmaceutical companies, disease foundations — applying for access to these data. Instead of a single organization having ownership over this very rich data set, it’s now being leveraged by 80-plus different groups. This really gets to Sage’s mission, which is to make the research enterprise more decentralized.
Instead of having institutional silos where individuals are recruited to come into a physical location — the same place where the data is generated and analyzed — we’re trying to flip the research process on its head a bit and say, “We can democratize the recruitment of these participants.” The data generation actually happens organically as part of the study, and doesn’t have to be at a single location.
Then, instead of the typical model where analysis is only done by the study investigators, we’re saying, “we’ve got some really good researchers here who can access these data and make interesting insights, but we’re not going to presume that we have the best answer.” We feel it’s also important to try to get the data in as many people’s hands as possible.
You’ve mentioned that 80 different groups have applied for access. For me, that’s definitely an example of success. Thinking about your work, can you hone in on another example of a time where you felt a sense of success? Or provide more details about this example?
mPower is certainly an example of an early success in this space. But it’s still too early — we don’t have a shining research result that’s come out of it. It will be interesting to see what insights there different groups actually come up with.
We’re looking into running a computational challenge using some of these data as well, since we have a lot of raw data from sensor feeds. The important step in really fully utilizing these data is going to be extracting interesting features.
We need to enlist the help of a lot of different disciplines at this point. Signal processing, for instance, is not a typical skill set that biomedical researchers have, but it’s extremely important for analyzing gait.
We’re going to be reaching out to a broad community of analysts letting them know that we have this interesting scientific question that we feel like we can fit predictive models on. But what we really need is interesting features to feed into that predictive modeling framework, to really get at the underlying mechanistic things that are happening.
How are you defining the word “feature” here?
A feature is a variable that is extracted as part of a processing step. For instance, for the tapping test, one feature is the number of taps. How many times did you tap? Another feature is how close were you to hitting the center of the button with each tap.
It’s like “metrics” in the social sciences.
Exactly. And for a tapping test you can imagine that if you have 600 different tapping iterations with timestamps and the location on a track pad, those raw data aren’t all that useful until you summarize them in some way to try and get at what’s going on.
It might be that a participant is able to tap very well with one finger but not the other. There’d be a bimodal distribution of tapping intervals. Some of them are very short. Some of them are very close together, and some of them are longer.
It’s that type of feature extraction — or summarization of the data — that will become really important.
What about a sense of personal success? A time when you think, “Oh, it was a really good day.” When you run home and tell your loved ones about it because it was awesome. Tell me about one of those times.
What I got most excited about was when we actually made that data available. We made it available through our organizational data-sharing platform, but we also published what is called a data descriptor paper.
There are a few journals out there doing this right now. Nature’s Scientific Data is where we published ours. It’s an open access journal. Data descriptor papers are giant method sections for how you generated a data set — there are no analyses that go along with it. If a new person were to step in they would have all the information needed to understand how the data was collected, what the variables mean, and why it might be useful to them. It’s a new way of thinking and of carrying out research — conducting your research for others to consume.
You aren’t publishing a finding. You’re making broadly available a research asset — or a set of research assets. I’m very passionate about that being a direction that research needs to head. There are people who are really good at designing studies and collecting data, and there are other people who are really good at analyzing it. To pretend that the right to people to analyze the data are sitting in the same building as the people who are generating it, is an outdated belief in my opinion. We need to work on changing the incentive structures in academia and in industry to promote that.
I spend a lot of my creative time trying to rethink how to encourage people to work differently. We think altruistically that this is the right direction, but you can’t blame individual researchers for not wanting to do that when it’s not in their best interest — when their tenure is dependent on publishing papers and getting grants.
How do you work with groups like the National Institute of Health, to encourage them to change the way they review grant proposals? How do you approach different types of funders, foundations for instance, to get them to realize how important and game-changing some of these modifications can be? How do you get them to see that making these changes can get them a better return on their investments?
Most of the challenges that we face in this space aren’t technical. They’re social. How do you put the right social constructs in place? How do you encourage people to work in way that is going to have maximal impact?
I like to say that working openly and transparently is necessary, but not sufficient. At the end of the day, working collaboratively doesn’t mean a whole lot unless you’re really doing good research and changing the way that lives are lived.
It’s not a trivial problem to solve, and most of these groups move at a glacial pace. It’s not going to change tomorrow, but there have been promising movements in that area.
I’m fascinated with the concept of the data descriptor paper because, as you mentioned, a huge challenge — in the natural and the social sciences — are the incentive structures. Especially what counts as an output and towards building one’s career. Expanding that to include things like citations on datasets and methods would make easier for scientists who might feel caught between working collaboratively and advancing their career.
That’s exactly right. We’ll see how it works out for my career! I dove in head first. I’m putting a lot of my eggs in that basket. It’s something that I believe in, though. You need people to be pushing on that and showing the value. Maybe ten years from now we’ll say, “Man, how naive were we when we thought that doing it this way would really have a big impact.”
At Sage, in our early days, we focused on simply making data available, “Let’s just open up as much data as we can.” What we quickly learned is that that’s not enough. Simply making the data available isn’t enough.
You need to work with the researchers around a very specific question in order to have the maximal impact. For example, how do you create a community around Alzheimer’s disease? It takes time, and it takes building trust. Again, much more of the social issues than the technical ability to open up data.
Could you give me example of a challenge, one that is persistent or top of mind? You’ve raised the issue around the social challenges in this space, the fact that it takes time, you need to build trust. Do you want to go into more detail with that — or provide another example?
Sure. I’ll reiterate that the social challenges are the most difficult, and you see them manifesting themselves in a number of different ways. I’ll give a couple other examples of where you see it.
They tend to materialize when people are pushed to operate differently. If you ask an investigator who is generating genomic data to make that data immediately available to collaborators or the public the kneejerk response is, “Well, what about my grad students?”
The social structures are in place such that the way graduate programs work is that principal investigators generate large and important data sets, and part of how they get grad students to come in and get this work done is to have them own this data set or this research question. They feel threatened — like this is being taken away for them if they have to, for instance, share this data very quickly.
Another example is in the clinical trial data-sharing space. I served on an advisory committee that was looking at the recommendations coming out of an Institute of Medicine report around clinical trial data sharing in 2015, and what we hear over and over again from the clinical trial world is investigators — the actual PIs, the doctors of these trials — structure their entire career around a single study.
It’s not just that first publication. They publish the first landmark paper about a trial, but then they spend the next number of years — often 8 to 10 years — publishing another paper every year based on that same data set with a slight tweak to it. For example, “What if we stratify patients by age?” or “What if we look at a genetic mutation that might categorize people or have a different prognostic effect?”
So to tell them, “after you publish your first paper you have to make your data available,” is in fact threatening their career because the incentive structure has been set up such that that’s how you make your career as a clinical physician running, in this case, cancer clinical trials.
These systematic cultural barriers are in place for good reason. They’ve developed over time. Scientists are not working this way to be mean or be a jerk. But technology and policies are changing so quickly that the other structures can’t adapt quite as fast. We’re seeing this not just in sciences but in many other areas as well.
This is like the science or academic version of what happens in factories with technology and automation. Production systems and business models change, which affects how people make their living.
Absolutely. And normally in those cases the really well-established, close-to-monopolistic players still turn out to be just fine — but it’s tougher on smaller organizations. The corollary would be that Harvard is going to be just fine, The Broad Institute is going to be just fine, MIT is going to be just fine — but a researcher at a smaller research university in the middle of the country may really struggle to adapt.
Researchers from Harvard can afford to take some of these risks. I’m glad that they do, but they have a name to fall back on whereas some people don’t.
Can you highlight for me some of the ways you’ve approached addressing this challenge — the strategies you used?
A lot of what I’ve tried to do is educational in the sense that once you spell it out in these terms, people recognize why things aren’t changing quite as quickly as they might. So spelling it out very clearly and succinctly — and not painting somebody as a bad actor or a bad player because they’re not operating in a certain way — helps to defuse the situation.
Sometimes open science advocates are a bit antagonistic, saying things like, “Why aren’t we doing this? Scientists need to share. They need to do this.” Instead, recognizing that there are reasons why people would not want to share their data helps people focus on, “How do we get there?”
Shifting now to the broadest issue in the Mozilla universe, what for you — and I really want to emphasize the “for you” part here — is a healthy Internet?
A healthy internet is one that treats all of its users equally, whether that user is in Seattle or Nairobi or anywhere else in the world. Providing access to information is at the core of why that equality is important. There is huge opportunity to really leverage access to information, not just in the sciences.
Thinking about the concept of “working open,” what is that mean for you?
For me, it’s probably a bit different than most in the working open world because the types of data that biomedical researchers deal with can have some very unique privacy concerns around them. Sometimes we focus a little bit too much on openness and not enough on appropriate access.
For instance, some people would consider the Parkinson data that we released last year as not being open because you have to go through a series of steps to gain access to it, including agreeing to certain terms.
We put those steps in place on purpose — adding some friction — such that people think about what accessing these data really means. We feel that these steps are not onerous. They’re minor. Most people have their applications accepted within a day. It’s not like they’re spending weeks working to get access. The purpose behind asking for an application is to honor the participants whose data are being accessed.
We require people to identify themselves. To tell us who they are and where they work. We also require them to submit a short one- or two-paragraph statement of intended use which we post that publicly so that there is a certain level of transparency as to who has access to the data and what they’re doing with it.
Our decision to grant access to the data is not based on scientific merits. We don’t look for the person to have certain credentials. It’s more about providing a level of transparency into who and why. Who has access to the data and what they’re planning to do with it?
Can you share an example of a time that using an open approach had an impact — either on your career, within your organization, or for someone else?
Yes, there are few of them. They’ll take a little bit of explaining. One example I will give is a collaboration that we called the Colorectal Cancer Subtyping Consortium. In 2014, there were series of four papers that came out in the scientific literature within about six months of one another — all claiming to have identified the molecular subtypes of colorectal cancer.
The researchers were looking for patterns of gene expression in colon cancer. Unfortunately, all these groups — all really phenomenal and well-respected researchers — were competitors. They were all trying to do the same thing around the same time, yet these four papers that came out in these different journals weren’t congruent.
One group said, “We think there are six molecular subtypes of colon cancer.” This other group said, “We think there are three,” and another said, “We think there are four.” Each one of them found a slightly different subtype prevalence and had slightly different biological interpretations.
The scientific community threw up its hands and said, “Guys, what do we do with this? We know you’re good at what you do. We’re not trying to be dismissive — but this is useless to us as clinicians. How do we translate this? How do we move forward?”
We reached out to each one of those groups, as well as the others that we knew that were working in this space. Colon cancer is an area where Sage has historically done a fair amount of work, and I had previously worked at the Mayo Clinic.
We brought together these six disparate competing groups and proposed that instead of each one of these groups running their individual algorithm on only their data set, developing these subtypes, that we would commonly curate each one of their molecular data sets, and also bring in some data sets that were already in the public domain. So each group could run their algorithms on each of the data sets.
Now they could start to tease apart any differences between the groups and, if there were differences, ask why they were there. Is it due to the different algorithms? Is it due to the differences in the colorectal cancer samples and patients in each study? Was it technical artifacts? Are there no molecular subtypes?
You can start to do more of a comprehensive analysis. What we found is that once we got these groups together on regular phone calls, once they built up trust with each of the other groups, it worked very much as a collaboration as opposed to competing interests.
Together, we found that there is in fact a consistent signal for molecular subtypes in colorectal cancer. The differences that were seen across these different groups were the differences in the populations they were studying.
Essentially, each one of these groups was seeing different parts of the same truth. Not that any one of them was right or wrong — but you didn’t have that larger-picture view until you had access to all of the data and algorithms that were being run on these different data sets.
There’s an oriental proverb that talks about how different blind men each claimed they knew what an elephant was from touching just one part: a leg, the trunk, the tail. They disagree as to what the elephant is, but they’re all right. None of them are wrong — but you need the entire picture to really understand what the elephant is.
Now getting more specific about Mozilla, how did you get involved with them?
I got involved through Kaitlin Thaney. She used to work with a colleague and a friend John Wilbanks at Creative Commons. I was introduced to Kaitlin before she was at Mozilla when she was still at Digital Science. She invited me to speak during one of the Mozilla Science community calls in 2013 — shortly after she joined the Science Lab.
Over time, we’d occasionally write each other to give little updates. In 2016, I was part of their first Working Open Workshop in Berlin and then went on to be a mentor for the Open Leadership Cohort and Working Open Workshop that led into MozFest this past year.
What has that been like for you? What have been some of the impacts of being involved with Mozilla — personally, professionally, or on an organization level?
Personally, it’s been a great network of really interesting people to tap into. That’s a big part of what Mozilla provides — the ability to convene interesting groups of people who are working on similar types of problems across disciplines. That’s probably been the most beneficial.
The Working Open Workshop forced me to pay attention to how adjacent disciplines deal with some of the same problems that we see in biomedical research. There’s a lot of learning that can happen across disciplines, in both directions, for me as an individual learning from all these groups but also as someone who can contribute back to others.
What feedback would you have for Mozilla? Where do you see space for improvement?
I’ve talked to Abby, who’s taken over the Science Lab work, about this. One of the really big struggles that Mozilla has in this space is that most of the time, people are participating on their own time. This isn’t something that is core or central to their job.
They’re not getting paid necessarily to do this work. That can make it difficult to be as engaged as you would like. So it’s important to continually recognize that and try to figure out better ways of engaging people. This will be crucial for longer-term success.
As I listen to you, I wonder if that privileges the people who have extra time or fewer external commitments.
Absolutely. But it’s always about priorities. The good thing is that when people prioritize this topic, you’re getting the people who are really motivated. Most of the people that I’ve run into working through the Mozilla network are people who are really busy; but they make time for it.
Some of them are single moms and some of them are graduate students trying to finish up their PhDs and some of them are career scientists. Again, I really think it’s the diversity of backgrounds and opinions that really resonated most with me.
What you’re saying is that there is diversity. It’s not just younger, single people. Even people with more responsibilities are making it work.
Yeah, I’ve seen cases of that.