078 – Open Research, Science and Data – IDEMS International Community Interest Company (CIC)

The IDEMS Podcast

078 – Open Research, Science and Data

00:00 / 26:48

Description

IDEMS is guided by its principle ‘Open by Default’, a well-developed concept in technology and software fields, but less developed in research, science and data. David and Lucie explore the differences between open science and open data, and question whether or how open science might create inequalities.

Transcript

[00:00:00] Lucie: Hi and welcome to the IDEMS podcast. My name is Lucie Hazelgrove Planel, I’m a Social Impact Scientist and anthropologist at IDEMS and I’m here today with one of the co founders, David Stern. Hi David.

[00:00:18] David: Hi Lucie. Looking forward to another discussion. What’s on the table today?

[00:00:23] Lucie: I’m thinking a bit again about data and about research. One of IDEMS’s principles is to be Open by Default.

[00:00:29] David: Yeah.

[00:00:30] Lucie: And I’m interested in how that relates to research data.

[00:00:34] David: So let me just check, are you interested in how that relates to research data? Are you interested in how that relates to data? Are you interested in how that relates to research?

[00:00:44] Lucie: Oh dear. What’s the difference between those three?

[00:00:47] David: I guess the key point is that of course there’s big efforts, UNESCO and others are into open science. And so if you’re interested in how this relates to research as a whole, that’s a really interesting and big topic, and there’s some wonderful elements of that. And that does relate to the fact that open data on its own is a thing and actually there’s wonderful efforts to make more data open. There’s actually a lot of journals now which encourage the publication of your data alongside any articles you publish.

So you actually have the open access to the article, but you also have open access to the data. And that can have really powerful implications in a number of cases.

[00:01:33] Lucie: Absolutely.

[00:01:35] David: So that would be open data and open science. And of course they’re related to one another. And I would argue that open research data is a specific subset within the open data and the open science discussions.

Where do you want to start? Should we start with that subset?

[00:01:53] Lucie: Yes, because I’m not sure how it’s a subset. I’m not sure how open data is separate to open research data.

[00:01:58] David: Open research data is an instance of open data.

[00:02:02] Lucie: Okay, yeah.

[00:02:03] David: So if you have, let’s say, a group that’s collecting routine data for climatic records or whatever it might be, that would be open data, which is not research data.

[00:02:12] Lucie: Of course.

[00:02:13] David: The research data, almost by definition, there has been a design process behind it, related to specific research objectives.

[00:02:22] Lucie: But what about when researchers use open data, as in data which hasn’t been collected for that, then it becomes research data?

[00:02:32] David: Now I would argue that isn’t open research data, that’s just open data.

[00:02:36] Lucie: Okay, yes.

[00:02:36] David: It’s research on open data.

[00:02:39] Lucie: Yeah, okay.

[00:02:41] David: I’m sorry, I’m a bit pedantic with language. So research on open data, I would argue, is different to open research data.

[00:02:48] Lucie: Yeah.

[00:02:49] David: So by definition for me, when you’ve got research data, there is a research protocol behind it. And this is where, if that research is actually open science, then not only is the data open, but the protocols will be open, the whole process, there’s a whole approach to doing open science. So some open research data will come from open science and some will not.

And so I would argue the really good instances are when you are doing open science and you are then having open data at the end of the open science process. But that means your protocols are open, it’s really thinking about getting standards for science, which are more inclusive in certain ways because things are more open, and it’s not just the end result, the data which is open.

[00:03:41] Lucie: Yeah, I was going to ask, it sounds like Open Science has similarities with what I would call good participatory research. Although participatory research tends to be with one certain community, and so it’s like only bits which are open with them. And like open science seems to be… world community.

[00:04:00] David: I would argue that I’m afraid these two would be perpendicular in my mind. You could have good participatory research, which is following open science frameworks, and you could have good participatory research, which is not. And similarly, you can have open science, which integrates into good participatory research, and open science, which doesn’t. And so these are two different concepts for me.

[00:04:24] Lucie: Okay.

[00:04:24] David: Where I think your similarity is coming from is that in good participatory research, you are sharing with the community all the aspects of this and how this is conceived, not just what you’re doing. And so it’s that collaborative process to do that.

And I think the parallel with open science is the fact that in open science, you’re not just publishing the end results. You are actually making public and open all of the processes. You’re going through processes, which are really not just transparent, but fully open. And collaboration is probably the wrong word on this because open science isn’t necessarily about collaboration. It is about replicability, it’s about global standards, being able to ensure that the knowledge which is created, you’re using methods which are globally acceptable, and so on. It’s about actually really raising the standards in a way which is enforced by a community of scientists that are…

[00:05:38] Lucie: Sorry, does it encourage collaboration though, in the sense that it encourages other people to use the open data and to build on that?

[00:05:45] David: Not just to use the open data, to reuse protocols and improve protocols. I believe it does, but I think…

[00:05:52] Lucie: Perhaps not yet there.

[00:05:55] David: No, it’s not that it’s not yet there or not, it’s that I haven’t seen the research which demonstrates that open science practices improve collaboration. What’s interesting is, I have seen instances where open science frameworks and open science approaches are surprisingly inaccessible to people in low resource environments.

[00:06:18] Lucie: Yes, I was going to ask, does open necessarily mean accessible?

[00:06:22] David: And it doesn’t. And this is important, and I think there’s work to be done there by people interested in open science to make some of these things more accessible.

[00:06:31] Lucie: What sorts of barriers are there then at the moment?

[00:06:35] David: So maybe, let me just, before I go into the specific barriers that I’ve observed with open science, let me just compare it with open source software. Open source software has huge followings, it’s extremely powerful and extremely good. But we found trying to get developers in low resource environments to contribute is really difficult. One of the barriers is simply the expectation of the standard you have coming in. And so actually for people who are not brought up in those communities, it can be actually quite difficult to bridge into what you need to do to be able to contribute to open source software.

So although anyone could in theory contribute, in practice there is a skills level, an expertise level, which excludes people below a certain expertise, deliberately, because you want to keep the quality of the code. And that implicitly excludes people who are coming from marginalised communities where building those expertise is harder. And so that’s open source, and that’s pretty well documented and that’s old. This goes back many years and this is known in many communities as a sort of phenomena which has happened.

I’ve been involved in open science, not in big ways, but on the periphery for many years, and I’ve not heard much consciousness about this issue, that it’s something where it has the same wonderful characteristics of open source, where it is looking to enforce quality and therefore it is something which good experts are on a par and they’re able to discuss at that very high level.

But the element that it does exclude others is one that I’ve not heard enough, in my opinion, discussion about. And I would like to just frame this, in many of the contexts which I’ve spent a long time working, where people don’t have much research funding. They just need to do research as part of their work, either with students or for their own career progression, but they don’t necessarily have access to much funding.

In some ways, this actually, you know, means that some of the processes related to open frameworks are just very burdensome. And they’re burdensome with good reason, but maybe in some cases that burden could be reduced, which would make some of the processes, in practice, more accessible. Now I don’t have the full set of answers for this at all. But trying to see how can we make open science frameworks and open scientific approaches more accessible is something I’m really interested in.

And we’ve been pushing towards this for a few years now related to education, you know, really trying to get good open research to happen sort of as action research in education. And to enable lecturers to be publishing on innovations they’re doing within education in ways which can be very mutually beneficial both for them and for the community.

And so that’s sort of an approach where I’ve been thinking about this for a long time and we’ve been working a little bit towards this, small steps, but it’s something which I feel as people who are engaged in the open science frameworks and open science as an approach, I think a lot more thought and effort could be put into this, and the accessibility issue.

[00:10:11] Lucie: Yeah, coming back to that accessibility issue, you mentioned that protocols are open, it’s open to the community of researchers. You know, when I said good participatory research, I’m, again, always thinking of community level or grassroots level. If a research protocol is open, it doesn’t affect them either way, basically, because it’s not going to be accessible to them in a sense of understanding, them understanding the research protocol. It’s not going to be written in a way which is accessible to them.

[00:10:38] David: So if you look at this, UNESCO has its recommendations for open science, and it has core values and guiding principles.

[00:10:47] Lucie: Okay.

[00:10:47] David: So, remember guiding principles, these are the same as our principles. This is an approach to be able to get to this, to be able to do this. So there has been a lot of really good work on this. However, if you look at those, actually to be able to really live these open principles, this is something where in high resource environments where you have research funding, actually then lining up and understanding how you need to put resources to be able to achieve these is something which I would recommend to all researchers, actually getting on and doing that because we have the resource to do so. Whereas in low resource environments, there are elements of open science which I think may be beyond.

Some of these are very much like our Open by Default. If you think about even one of something as simple as scientific publications, in general, arguing that you should publish in open journals is of course very good advice as part of open science. However, it’s very interesting that if you look at the data going back, actually the move to openness reduced the proportion of publications from low resource environments, because actually that pushed the burden to the author to be able to pay for this. And if you don’t have the resources to do so, you know, you could not do that.

And then there were efforts to be able to increase the proportion of people from low resource environments, publishing in high impact journals and the rest of it. But that sort of came second. When you moved to open, the first step was it became less accessible for people to publish from low resource environments. And then there have been steps to be able to try and increase that accessibility again.

But I think the key point there is this fact that something as simple as publishing in open journals, this is something where actually that can be more of a barrier in low resource environments than in high resource environments. I could go on because, it’s not just about this, it’s about open research data.

So, if you are in a high resource environment, then you’re probably under quite a lot of pressure to get your research results out quickly. So publishing your research data, you’re still in the best possible position to get your research results out. If you’re in a low resource environment, things might be working slower. And so if you get your research data out as open data, somebody might scoop you to the findings you’re wanting to publish. And so that becomes more of a concern because you might be on a slower pace because you have other responsibilities.

Most of my colleagues in African universities are teaching three lecture courses in a given semester, at least, sometimes more. And so, their ability to quickly publish is sometimes less, because they don’t have that special dedicated time in the same way. And because they don’t have the funds to create special time, dedicated time.

I’m certainly not advocating that low resource environments shouldn’t be involved in open science. On the contrary, I think the open science principles, when you’re looking at publishing in open journals, you’re looking at open research data, you’re looking at reading into and creating open educational resources, and using open source software and where possible, open hardware. You know, all of these, which are the heart of open science, all of these are, good aspirations, which in theory should help low resource environments.

[00:14:18] Lucie: Yeah.

[00:14:19] David: But in practice, may not always. And some of these elements actually, where they create barriers, maybe in unexpected ways, these are things that we need to be very sensitive to. And I think making open science more accessible, to me, a lot of what this is, is recognizing that it’s not enough for things being open to be theoretically good for publishers.

[00:14:50] Lucie: To me, this is making me think of basically the internet, that just because there’s the internet in the world, it doesn’t mean that we all have equal access to it, or we all use the information that’s available in the same way.

[00:15:01] David: Yes, I think that’s a good example of something which is an open resource. And even that is not quite true in the case that in some countries, you don’t have the same internet as in others.

[00:15:14] Lucie: Yes, you don’t have the same access either.

[00:15:18] David: This is the thing, even in very trivial ways, if I release something on the Play Store as an Android app I’m choosing which countries to release in.

[00:15:27] Lucie: Yep, I have definitely had the thing where you can’t download something because you’re in the wrong country.

[00:15:33] David: Exactly, there’s an app which is really important and really needed to be used locally. You come in with a phone which is registered somewhere else. Therefore, your Play Store can’t access it. Therefore, you can’t use it. Therefore, you can’t use those services. This is sort of a really interesting different instance, but it’s the same thing that in theory these are open, but there are barriers that are put in for people who are marginalised, because they’re using a device which is different and so on.

[00:16:00] Lucie: Fascinating. Okay. I clearly need to do my research about open science. So thank you for pointing me towards the UNESCO site. That’ll be interesting to find out more about and to think through. Also within our work with the McKnight Foundation, for example, or any of our research sort of supporting work. I think that’s really interesting for me to think through.

[00:16:20] David: We have deviated a bit from the original topic. We were really digging into the data side of this and we’ve hardly discussed that.

[00:16:28] Lucie: Perhaps in another episode?

[00:16:30] David: Let’s dig in a little bit now, but you’re right, we have almost run out of time. Let’s just do a little bit because I think one of the really important things about open data is this distinction between routinely collected data in its many different forms.

[00:16:45] Lucie: Yeah.

[00:16:46] David: What I would consider, research data or designed data, where you have a process to design data collection related to a specific question or need. And that distinction between these two is really interesting and really important. Because I’m actually much less interested in open research data, because data which has been designed for a particular purpose is often not as useful if you want to reuse it for other purposes.

[00:17:13] Lucie: Or sometimes even, I have seen surveys which have not been properly designed, so they are similar to open data because it’s just a collection of data about a subject, but the actual research questions that want to be got at haven’t been thought through in relation to the questions that were asked.

[00:17:32] David: Let me frame that differently because I don’t think that’s necessarily related to being similar to open data.

[00:17:39] Lucie: Okay.

[00:17:39] David: But in terms of the analysis maybe it is, analysing data from a research tool or research process, which is not strongly aligned with the research questions, actually, you’d be just as well off analysing secondary data, is what you’re saying.

[00:17:58] Lucie: Yeah.

[00:17:58] David: But that’s poor design. And let’s assume that you have a well designed study. You have research data, which has come outta people who actually had a really clear research question and they’ve articulated it well, and they’ve got really good data related to that question. But you can almost certainly then use that data to answer the question and that could be very interesting and redoing that analysis would be valuable. But reusing that data for something else is unlikely to be valuable because for another question, the data is probably not aligned to what that question is. And this is where good, well thought through, simply open data which is routinely collected can be so inspirational and valuable because although it can’t necessarily answer a specific question, it can draw out hypotheses. And this is I think really critical, you can’t answer the question you want, necessarily, with some routinely collected data.

But you can gain insights to lead to hypotheses which might refine an actual study you might want to do. And this is actually quite common in many areas, my wife did a PhD in development economics and she did secondary data analysis, which then led to her doing a primary study based on some of the insights which she was then digging into in her context in more detail. And that combination of these different methods and how you use different types of data is extremely powerful. I would argue that the great explosion of data which is happening is this sort of routinely collected data.

[00:19:47] Lucie: Which in most parts aren’t open.

[00:19:49] David: No, surprisingly large amounts of data are open. Not all, of course you have very large amounts of routinely collected data by tech companies which are not open, which they guard preciously because of the insights they gain related to advertising and all sorts of other things, behaviour patterns and so on, and that is not open.

What might it look like if instead of that being owned by the tech companies, that was more public data? Oh, our society could be very different. What if it was anonymized first and then released as a sort of more public data. Now, of course, there are issues about security and all the rest which would come into this, but it would be very interesting to be able to see what learnings can you have.

There are elements of this data which are being made available to researchers and others. So I think, if you’re thinking about that big tech data, that’s a whole nother story. But I’m not, I’m just thinking about all sorts of forms of open data that exist and they’re growing. National stats offices all over the world release very important, big surveys and censuses with elements of open data where this is now publicly shared and you can draw incredible insights from this. Then, there’s all sorts of other forms of data. The Met Office is in different contexts, and a number of contexts are now making climate data available and open, where again, you can be using this in ways which are extremely powerful.

Having open data which is coming from routine data collection exercises, be these sorts of daily data collection or sub daily data collections like the meteorological data, or be they once every 10 years for a census, that’s still a routine data collection exercise.

Fascinating to go back in history and look at the evolution over time as you look at the data in the different censuses. And the fact that many of these are… even down to the, to elements of anonymized individual data. And I want to single out IPUMS here as doing incredible work on making such anonymized census data available for studies. It’s incredible. Wonderful group doing very important work in my opinion.

And again, are we using this enough? What more should we be learning? Open data is, in its own right, it’s so exciting, the possibilities this opens up. And that’s not in contradiction with open science, but they’re different concepts.

When you’re talking about the routinely collected data, it’s not really that you’re following our open science approaches because that’s not quite what it’s about. But it is so important because open data is so valuable for science. So I think there’s elements there as well, though I don’t feel we’re yet, I feel we’re scratching the surface of what hopefully will really emerge as very powerful both scientific approaches to the open science frameworks, but also learning opportunities through the various forms of open data that are becoming available.

I seem to have driven you far enough off track that you’re no longer sure what this session, what this episode was about. I apologise.

[00:23:03] Lucie: I think I’m coming always back to the sort of practical of how it relates to me and my work. So I guess I’m going back into thinking of, okay what do I need to do with this information?

[00:23:13] David: Okay, very practically, I think then, open science frameworks are things that, being aware of this and, the UNESCO documents on this are very high level. There are other more practical approaches to that. But the open science community is a fantastic community to engage with, their frameworks can be quite useful. So certainly an awareness of that, digging into what open science is, is valuable. And thinking about does this change anything in terms of what we think of doing? And how we should be, maybe, helping others to engage with this.

I’m conscious that I’ve not made big efforts as part of our research method support work to push people to open framework. Maybe I’m wrong. Maybe I should be pushing open science frameworks more in those communities. And that’s something that you might come out and say, no, you should have been pushing this harder.

I’ve been talking to people about it on occasion, but it’s a very soft approach, as I think you can gather because we’ve been working together for over two years now on this, and this hasn’t come out strongly. But it’s something I’ve been engaging with for a long time for myself, but I don’t tend to push it on others because of some of these accessibility issues.

[00:24:26] Lucie: But also just in the sense of we’ve seen collaborations between projects on the same actual bit of research, in a way, that they don’t always share data openly. Perhaps that open science is a level beyond, is there a first level of can we work as a community? And then can we work as a larger community? I don’t know.

[00:24:47] David: Maybe except that, and this is where maybe I am wrong. Maybe it’s easier to build that collaboration…

[00:24:54] Lucie: Work with strangers in a way.

[00:24:56] David: But it’s not just a stranger, into a framework which is internationally recognized, which has strong backing behind it as being this is a good way to do science. Maybe that’s an easier approach and an easier inroad than some of the sort of softer approaches we’ve tried to take. I don’t know. I don’t have the answers here. Open science is something where I’m delighted if you’re interested to look into this and to actually dig in and see how can we bring this more strongly into our work.

And I suppose then with the open data, this is something which I suppose is more, where you were coming from, because we do talk about it, we do try to discuss this and actually think for ourselves, what can we make open? What can’t we make open? How do we do that? So this is something which we are engaging with more concretely.

And of course, open data is part of open science in certain ways, but these are different concepts. So I think open data, we need to keep pushing on this in different ways, both to make better use of data, which is out there and sources of data, but also to be encouraging ourselves and our partners to be producing and releasing more open data.

[00:26:07] Lucie: Yeah. Good discussions to have.

[00:26:11] David: Yeah, thank you. And I think this has been a really good topic to bring up, to dig into. These episodes are useful for bringing some of these discussions out internally. So this is a process we hope the listeners have appreciated, but it’s also been a useful process for us as part of…

[00:26:26] Lucie: Exactly. I’m sorry, that’s always my focus.

[00:26:30] David: No, it’s useful and this is good.

[00:26:33] Lucie: Have a nice evening, David. Thank you.

[00:26:35] David: All the best.