Anna completed her undergraduate degree at Plymouth University in Marine Biology & Oceanography and her PhD at the University of Sheffield in Marine Macroecology. Self described as a freelance research data-scientist, her interest is in building capacity through training, knowledge exchange, collaboration, and the use of technologies — particularly open source. She’s also passionate about teaching data management and Open Science in R.
Start by telling me a bit about your work.
I’ve always loved working with data. As an undergraduate in Marine Biology and Oceanography, I worked with a large long-term survey data set looking into long term changes in fish egg abundance and timing. I continued work on it for my PhD in Macroecology, I where I also used some cool satellite derived environmental data to look at population dynamics at a regional scale and build models incorporating hydrographic structure. That’s when I learned to code as well.
Since then I’ve been freelancing, providing mainly research data management and software engineering services. I’m trying to find a way to stay in science because I love research but I’m also now convinced that I mainly want to code. I also have great interest in efforts towards more open, reproducible and collaborative workflows which I see as going hand-in-hand with modernizing science and making space for broader contribution. In that respect — I’ve been actively participating by teaching and going to MozFest.
What are you coding in?
I’m coding in R. It’s such an actively developing language. And the folks at RStudio are doing a great job packaging all the tools a modern scientist needs to maximise their workflows. I’ve been making shiny apps and websites and presentations through Rstudio. I’ve learned so much.
Can you tell me about a time where you really felt a sense of success in your work, hone in on one specific example?
Launching my first open project for the Global Sprint was pretty cool. I was able to get it open, get it ready, and gain some interest. I am definitely proud of that — but there’s still a lot to do.
I’m a stickler for provenance, drummed in from passed experience as a quality assurance auditor. The last two projects I worked on had a significant data management aspect to it. The first stage was basically compiling a load of disparate datasets into one large bird traits dataset. It’s imperative, both to be able to control for data quality and at publication that we can trace data points back to their original source, to the quality of the method, and any species synonyms used to match taxonomies between datasets.
So I built a lot of code to make sure we could trace all this metadata and make it easy to compile. Reusing it across two projects and having a student try to use the tools made me think: “Hey, this code could be useful to someone. It could also be a lot better if others used it or even helped me develop it” — it got me to participate in the first open workshop — got me to decide, OK yes, I’m going to do this as an open project with the intention to turn it into a package.
The functions in the package are designed to help researchers be able to compile trait / ecological datasets whilst at the same time enforcing a basic standard of good research data management. Outputs are therefore easier to share, to further combine and eventually, even to turn into more formal xml files that can added to online repositories. The project has already proven useful to a PI and one of his students. Yet there’s still so much more work to be done. I still need a mentor to get it to the final stages.
What are some of the things that surfaced when making your project open?
First, I had to learn to get help. Then I had to write documentation, which is time consuming, but definitely improves functionality. I invested a couple weeks of my time to get it done.
A few key people have looked at it and the feedback was positive, although I don’t have many contributors yet. I feel the project really needs a mentor for completion.
Another good thing about opening it up for the sprint is that I was able to talk to other people who are running similar projects. The data structure they’re converging on is similar to mine — so it works! In essence, market research came to me, which wouldn’t have happened without opening it up.
How about an example of a challenge that you faced?
Probably the biggest challenge I’ve faced is learning to handle diverse data sets and statistics so that I can have confidence in what I’m trying to say.
Getting my PhD and writing my thesis was also challenging — I didn’t want to miss something or get it totally wrong and I’m a very slow writer…
How did you approach learning to handle data and statistics?
I’ve learned through interactive reading and coding and a lot of Google searching. My learning experience has shown me why open source is so important. I was able to use these resources to solve problems — resources like Stack Overflow, blog posts, open-source code, packages, and tutorials. I could have never done it without the other’s contributions to the internet.
I’m able to contribute high quality public goods to the internet, beyond published papers. Informal information is just as important. I’ve benefitted from code tutorials, and through gists or conversations on GitHub. I’m self-taught and I did it all through the internet.
What is the open internet to you?
I’ve already mentioned why the open internet has been personally important to my development. But I’d also add a more intellectual curiosity about it and it’s functions. I tend to think of things first in terms of ecology, because it’s the field I’m more familiar with. But really, it’s complex system systems that intrigue me. The more I appreciate macroecology as the study of how large-scale adaptive ecological systems emerge from the characteristics, responses and behaviours of individual networked actors, the clearer the relevance of the processes I study seem to other fields.
This has lead me to think of other topics of personal interest, like social, economic and political evolution, history and indeed, the birth and evolution of the internet through this lens. The overlap of all these systems offer a lot of opportunity for understanding. So the more I learn about the internet, the more my intellectual curiosity about the system grows, as does my belief in its importance as the driver of enormous socioeconomic evolution, not least in science.
Most of all the web is such a fascinating system, so fast in its response times, powerful and unwieldy like a force of nature. It is a force of nature! And so it is humbling and demands respect. The greatest moral questions of our times are played out on the web and in the fight to keep it open. The tension between transparency and privacy, anonymity and accountability, access and exclusion. It’s all there. The gateway to the open society. But we’re not quite there yet.
Can you give me an example of how these open aspects of the internet have been important to you?
The open internet gives me creative power and enables me to participate. By learning simple tools like Markdown, a real easy, stripped down html language and platforms like GitHub, I can now write and contribute freely to the web. It’s a digital skill that has enabled me to promote and organize events, build websites, and share presentations and teaching materials. It just is incredible what some of these simple tools can enable individuals to do.
Getting more specific about Mozilla, how did you get involved with them and what has that been like for you?
I applied for the first Mozilla’s Science Fellowship round and despite only making it to the 1st interview, I and about 30 others were later invited to the inaugural ScienceLab Working Open Workshop (WOW), in Berlin. At the time I was only interested in coding and reproducibility. I didn’t know that much about open source. Mozilla introduced me to the open-source movement and community. It was so fun and so productive and I met the most amazing people there. Mozilla’s goal — to build a community of people working openly, taking the message to other people, and engaging with others — has really panned out.
Since then, I’ve been to two MozFests, participated in the Global Sprint, joined the Science Lab at their All Hands event in London and also received funding to help run a symposium on reproducibility at ISBE post-conference symposium.
A lot of us participated in the next WOW mentorship program as mentors. And now we’re preparing for the next one! I love how we’re a web community. We’re all over the world but they are very clever in bringing us together at specific junctions, and online, and then just keeping us in flow. What we’ve all managed to do in eight months as a group is incredible. There’s a lot of excellent folks in the network which is itself well embedded in broader open science networks.
How do you describe reproducibility to non-science people? What is it and why is it important?
There’s a few things people might be referring to when they talk about reproducibility, often conflating with replicability. The topic stems for the realisation across a wide range of fields that many published results are hard or impossible to replicate.
This results from a convergence of drivers. Variation in results of replications can be due to valid natural variation. Another driver can be publication bias towards novel, statistically significant results. This creates conditions in which potentially chance or even crafted positive results become visible whilst failed attempts to detect the same phenomenon are simply never published. This is a sort of statistical problem which ultimately drives overconfidence and is looking to be addressed through pre-registration of research, allowing the process to be peer reviewed earlier, and the process tracked. Finally, problems with reproducibility can simply result from simple human error. So solutions to these problems converge at the very least on making the code and data used to produce published results available and in a format in which they can be reproduced.
Apart from the added quality control, this means that the a greater diversity of scientific outputs are shared, and the data and software are given their rightful importance. I feel there is a lot of low hanging fruit here. Currently the computational side of research is at risk of sinking under technological progress, leaving large margins for error. Better education on software & data management and core digital skills should turn that around, instead empowering researchers through technology.
Reproducibility also feeds into open science and to broader culture change. It’s about inviting more feedback into the research process. As long as we’re not afraid to find mistakes, open science has the potential to make science more robust and give us a fuller more accurate appreciation of our findings.
Back to Mozilla, can you tell me what impacts Mozilla’s had on your life and your work?
It’s been incredible — it’s changed my life. Mozilla has given me the tools to be able to learn, participate, and to get stuff done. They’ve put me in touch with a community of amazing, like-minded people. They’ve also given me the confidence to share my own knowledge and expertise and participate in peer to peer capacity building. I just love being with such vibrant, creative community of makers. I found my kind. They have great ideas and let those ideas ripple through.
How do you about tapping that network when you are exploring something new or testing an idea?
I’ve used the network for feedback and expertise, as well as to find the right people for collaboration on projects. At MozFest, I recruited a couple of people to help with my session, and currently, I’m helping organize OpenCon Berlin with people I met through the network. This network has allowed me to accomplish so much.
In these interactions can you tell me about a time where Mozilla didn’t meet your expectations or do you have any specific pieces of feedback on ways that they might work better?
Not really — it’s been a really positive experience. Being part of Mozilla is so important, you just feel that whatever your feedback is, it’s valued. This interview is part of why it feels so great to be part of the network. But ultimately the whole experience is about what we the participants make of it.
How might the stories we’re collecting be useful to you, if at all?
I always like hearing other people’s stories. They help me to find common ground and interesting differences. Mozillians seem to have diverse backgrounds and stories but important core threads that link us together. We share an interest in the open web and technology. We want to improve our own abilities and domains and pay it back by teaching others. But we often come from very different places. Through these stories we share our differing pathways to each other.
How would you describe the values of the open source community?
Beyond obvious ethical questions about trying to claim a stake in knowledge generated through public funding, I find working openly so powerful because it changes the incentive system for sharing. Transparent and more visible workflows allow contributors to get rightful acknowledgement from their work and for a broader range of outputs and build on reputation systems based on evidence and community engagement. Resources are therefore liberated to be reused and remixed, supercharging innovation and collaboration. And currently in science it’s important to be able to iterate and evolve from the code and the tools people are building — not just published papers. Open practices allow this.
It also builds greater community quality control enabling feedback throughout development and over more of the process. I always thought I needed a perfectly finished product before sharing it with someone else. Open source practices have taught me to share openly at an earlier stage.
It also makes life so much easier. I keep all code for my work on GitHub. I can’t tell you how satisfying it is to send people links when they ask me for resources! It also brings down the cost of on boarding. For example we’ve invested in full documentation of the repo for the project I’m currently working on. We’re looking to capitalise on this by crowdsourcing the next stage of the project.
In summary, it’s about growing the pie for all through facilitation.