211 – Open vs Open Source

The IDEMS Podcast
The IDEMS Podcast
211 – Open vs Open Source
Loading
/

Description

Lily and David discuss the often misunderstood concepts of “open” and “open source.” They discuss the origins of these terms within the programming community and how they have expanded into areas such as open data, open science, and educational resources. The conversation focuses on the various types of licenses, including Creative Commons, and their implications for use and reuse.

Transcript

[00:00:06] Lily: Hello and welcome to the IDEMS podcast. I’m Lily Clements, a Data Scientist, and I’m here with David Stern, a founding director of IDEMS. Hi David.

[00:00:14] David: Hi, Lily. What are we discussing today?

[00:00:17] Lily: I thought today we could discuss open and open source and where those differences are.

[00:00:25] David: This is something which I spend a lot of my time discussing, especially with respect to licenses. There’s a lot of confusion out there.

[00:00:34] Lily: Yeah, and I think that there’s confusion even for me, I should say. Even for me, I mean you say out there, I want to say in here.

[00:00:44] David: Yes. Okay. I guess one of the things which is, I guess, historically important, is that the open source community are the ones who really started thinking hard about this, where a lot of these licenses came from. And that came out of programming, but open as a concept has now permeated many different fields.

And you talk about open data, you talk about open science, you talk about open resources or educational resources. There’s the whole Creative Commons movement, which is a way of licensing resources in ways which are considered open, and some of them are open and some of them are not. And so by the definition of open source, some of them are open and some of them are not. So there’s a whole set of interesting discussions around this.

[00:01:44] Lily: Yes, I can ima… so let’s go back a step then. ’cause to me open means open to me. Open means you can see the code. It’s easy to, it’s readily available. You can take it. You can use it to your own. I am aware that is where I am slightly wrong. That open doesn’t that, that there are then licenses involved in openness and, but then why have something open if you are then gonna have licenses around it?

[00:02:12] David: So one of the things which is really important is that as with any sort of copyright, you know the, what the open licenses are doing is they are protecting how something is used. And so within the open community there are very few restrictions on the use, which are acceptable if for it still to be considered open.

And this is where if you take Creative Commons licensing, what they’ve done, which I think is brilliant, is they have this very simple, easy to understand clauses, which capture these big distinctions. And so I think it might be sensible for us to broadly go through those. There’s things which are broadly, the license just says this is not, this is in the public domain. And so public domain license.

[00:03:21] Lily: Okay, so that’s your as open as it can get.

[00:03:24] David: That’s basically, yes. This is basically saying that, anyone can use it for anything and they don’t have to say where it’s come from or anything else, but it’s just in the public domain. And once something’s in the public domain, then it can always be used in the public domain because even if somebody else uses it for something else, that license still exists with it being in the public domain.

So this is old movies or old literature, which is, I don’t know what the current state is, but these, at a certain point, they enter the public domain and there…

[00:04:00] Lily: the Happy Birthday song.

[00:04:02] David: Yeah. And so in particular, the main first step of restriction you could be putting on is what is called the “BY”. So a Creative Commons BY license, let’s say. So a Creative Commons BY this just says anyone can use it for anything they want, but they need to say and give credit, to the person who created it. Now they need to say, this was created by so and or this was, has come from wherever it’s come from.

So the, whoever issues the Creative Commons BY license is simply then asking to be credited as part of the use of their resource.

[00:04:54] Lily: Okay. Yep, that makes sense. So in a way, with this, you can trace it or track it. It’ll trace where things have come from a bit more here.

[00:05:03] David: Yeah,

[00:05:03] Lily: Okay, this has come from you, and you could start to trace, okay, I can.

[00:05:07] David: Exactly. So this is not, so now it’s a license, which is not in the public domain. It is a Creative Commons BY license, this would then be enforced by, if somebody uses it and doesn’t credit you, I’m afraid you can go to court and you can say, that’s not right. This has come from my work. I need to be credited.

So I take credit and you could sue them to say, you are using my work without giving me credit. Now, of course, these are really interesting sort issues because of course we do know that a lot of copyright is being broken now by big AI companies that are trawling the web and making copyright.

So that’s a whole separate thing. But what this would be saying is that you retain the right to sue a company that is using your work without crediting you. So that’s already quite important and this is something where the BY clause is there in quite a lot of open licenses. This is not a barrier to something being open.

The fact that somebody needs to credit you is a reasonable request for you releasing something that you are creating under an open license.

[00:06:18] Lily: And so do you need to check with them that you are… so could you still use it for anything you want?

[00:06:26] David: You are not needing to check. There’s no approval process, but when you use it…

[00:06:30] Lily: Yeah. I.

[00:06:31] David: The clause says, and it’s it’s written in different ways. This is where, these are big legal documents quite often. Exactly how, what is acceptable ways of crediting it? This can depend on the exact license you are choosing, but this clause is a very important clause and it’s a valid clause, which is about recognition for the work that people do.

And so that’s a really important clause and very sensible part of the open licensing. The next one, which is a bit more, not controversial, but debated, is the clause called ShareAlike. Then you need a different license. Now this is a very, within the open community, it has become a really disputed clause because it’s led to some very difficult situations where the intention of the license and the actual implementation of the license have caused problems.

So within open source software the GPL licenses, this is a family of open licenses, which had a ShareAlike clause. And this became very problematic because people released under, let’s say, GPO two and other people released under GPO three, but these weren’t compatible licenses, so you couldn’t use them together.

And this is one of the things where, in… within software ShareAlike as a clause became a very controversial clause for the open community because it led to all sorts of issues where what one would have expected to be compatible licenses weren’t compatible. Now two things have happened. There’s been work to make the licenses more compatible, and that’s good work, which has been happening, which has made sure that the license clauses and the way they’re written legally mean that these are more compatible with one another.

And in other cases, people have moved away from these licenses saying they’re too restrictive to licenses which are considered more open by not having a ShareAlike clause. So within coding, within software development, the very well known ones are things like the MIT license, which is a, which has a, which doesn’t have a share like course, clause. And again, this is where creative comms have done such a good job of actually making these simple. When you dig into the details, there’s so much complexity on the details of exactly what is the legal wording and what are the different legal wordings, and why are people going one legal wording over another.

And most people don’t know. They just do as they, they just say, what is the best license? And they put on whatever license they want to make it open if it aligns with their, either their business interests or their ethical interests or whatever the reasons they are for choosing a particular open license.

[00:09:46] Lily: Okay. But on the whole, there’s these three we’ve gone through three so far types of completely open and you don’t need to give any credit or say where you got it from. Then there’s open, but you should say, but you need to give credit where it’s due and say, okay, this has come from this person or this place.

[00:10:08] David: And that’s Creative Commons BY.

[00:10:10] Lily: That’s Creative Commons BY, and then there’s ShareAlike, which is where it can be used in places which are similar.

[00:10:18] David: It can be used when what you are using it is, has a compatible license,

[00:10:26] Lily: Okay. Yes.

[00:10:27] David: So you can’t then include it, it shouldn’t be included or used in a closed product under that license.

[00:10:35] Lily: Yeah.

[00:10:36] David: And this is the first one where we’re starting to see, actually, if you are going to have Creative Commons BY license, then I think it’s fairly safe to assume that, if people aren’t, are wanting to use it, but don’t want to give you credit, then by and large they’ll choose not to use it. Your ShareAlike is the first one where people might really want to use this, but for various reasons, they cannot ShareAlike as per the terms of that license. So if you are going to use this, you almost certainly need to have the contributors agreements in place so that the responsible body, whoever that might be, it might be an individual, it might be an organisation.

They can then issue the same thing under a different license, maybe a commercial license that somebody pays for. So if I don’t want to make my product open, but I want to use this code base, I almost certainly need a way to do so legally. And this is one of the things which is forgotten on this very often.

People assume that if somebody isn’t happy with the license, then they can’t use it. But that’s not how it should really be, because then the only way to enforce this is to sue. And so what you actually get is good actors who want to use it legally have no route to do so, bad actors just use it and therefore, they’re not constrained by the license.

And there’s a whole industry around the fact that is it going to be cheaper to just use it and be sued than, than whatever. And if so, that’s cost effective and that’s good for business and so we may as well use it. And therefore you have to be willing to sue to enforce your license if you are going to put these restrictions on.

[00:12:37] Lily: That’s very interesting.

[00:12:38] David: And this becomes much more evident with the next Creative Commons constraint, which is non-commercial, and this is the most misunderstood license in general. Many people I go to in academia and work with in different ways, they very proudly say, I’m using the non-commercial license to show that I am doing this and I’m not trying to make a profit. And I say to them, no, you are not. If you are using the non-commercial license, what you are saying is, that nobody else can make a profit from this without coming to you to request a commercial license. So the whole thing is, it’s the opposite. You are saying you are retaining the rights to make a profit from it.

You as the issuer of the non-commercial license, you can make a profit, but if anyone else wants to commercialise it, they need to come and get a different license from you. So you should only be using the non-commercial clause if a) you are willing to issue other licenses, let’s say a commercial license to good actors who want to make a commercial product from what you’re suggesting.

And B) you are willing to go and actually pursue bad actors who use it for commercial purposes. So unless you are willing to sue the bad actors and to offer a commercial license to good actors who want to actually use it and pay for it, you shouldn’t choose the non-commercial clause. And this is the first Creative Commons license that I, that is not considered open.

And the reason for this is the open community in open source went through huge amounts of litigation around this non-commercial clause, 30 years ago. And so it became, it got to the point that actually this non-commercial license under the Creative Commons is not considered an open license.

So by the former definition of what’s open and what’s not, that falls to the, almost open, but it’s not considered open. And this is one of the things where we as an organisation, are… one of our principles is Open by Default and. There are times when we would recommend people to fall into this almost open category rather than the open category, but only if they’re willing to put in place the structures to do it.

As in, and we’ve got a discussion ongoing on this with a particular Met office where they’re wanting to make their data more widely available. It seems like the sensible approach to that would be to use something like this non-commercial option, but then to still have commercial licenses available so that, and make it easy then for commercial actors to pay to say, yes, we want to use it.

So your good commercial actors can then pay for the climatic data, which is expensive to collect, and which would enable the Met Office to have income streams, which would support it. Those actors paying a small amount for the data, which is really useful for them, for their commercial ventures, is absolutely sensible.

So again, this is where having everything being open all the time is not necessarily in society’s best interest. This is, there’s good reasons why in certain instances having something which is almost open, or transparent, but not fully open. There’s so much flexibility on what can be done and I love the Creative Commons non-commercial license as a simple illustration of where actually sometimes it’s good to cross that line.

And to protect your copyright, to make it available to commercial actors for a fee. But you need to put in place those structures. And again, you would have to have the con… the contributors agreements in place so that the body that’s actually issuing these licenses has the right to issue both, let’s say a CC, non-commercial creative, common non-commercial license, as well as a commercial license. And so you’re having the rights for this double licensing of the same materials.

[00:17:47] Lily: Until now. The examples I’ve been using in my head to understand have been related to code. But when we come to this area here, I imagine this is a little bit more prominent when it comes to data perhaps.

[00:17:59] David: So yes, this is something where for data, this would be, and for educational resources, these would be things where including a non-commercial clause and therefore not being considered open would be seen positively. But I’ll give you some examples from code as well.

[00:18:18] Lily: Okay.

[00:18:19] David: There is a another type of license which is emerging where I guess arguably it would correspond to a clause where there’s a number of years before which it would be released as being a fully open license. So the actual licensing is such that there is a time for commercialisation,

[00:18:44] Lily: Okay, so like a patent on the code for 10 years like when a new drug comes out.

[00:18:49] David: This is the sort of thing, yes, where you are saying this will go under an open license in the future, but we as the developers, and we don’t have any like this, but we do have some very interesting colleagues who we know who have used such licenses where they’ve chosen to negotiate with the funders who were funding some of their work.

[00:19:12] Lily: Okay.

[00:19:12] David: Now the funders insisted on it being open and they said, but we need time to commercialise it first. And so it had a sort of four or five year period for them to commercialise before the software becomes open. So RapidPro piece of software we use quite a lot, their latest code is due to become open, I believe, early next year.

And so that version of their code, it is coming to the end of this protected period and then it’ll be available as open code. So these are really interesting innovations, which are the same basic principle. It’s this idea that balancing the different, the protections of your investment for actually producing something. And the desire to be part of an open community, these are really difficult things to balance.

And there are so many powerful ideas going around about how to do this, that we don’t have a single way of saying this is the right way to do it, because another one of our principles, Options by Context, it really depends on the nature of how something was funded, it depends on who’s going to use it, how they’re going to use it, and all sorts of other things. What is the case is that

in my experience, if you go outside of open source, and you look to the open community, I tend to find that not many people are actually thinking deeply about this. The open source community they’ve gone through a lot, which is where they’ve gone through these different cycles, open, removed the idea of any form of non-commercial clause, and now they’ve come back to these sorts of open source things where it becomes open in the future and so on.

So there’s been some really deep experiences which have shown, through the open source community, how to get sustainable business models to be able to get good code, to be able to actually not have these lawsuits then pending all over the place.

There are certain things which are important to do, whereas I would argue in the rest of the open community, I find those discussions are still in their infancy. They are often not mature yet because there haven’t necessarily been the big actors actually sorting out this is what, this is how you do this.

So many as I come back to this non-commercial clause from Creative Commons, I, it the number of people who I’ve met who understand that clause versus the number of people I’ve met who misunderstand that clause is, it’s five to one misunderstanding. It pretty much, a lot of people in, who are on the periphery, who like the idea of open and then they hear about this open non-commercial clause, but they totally misunderstand it and therefore misuse it in my opinion.

There is one more in the Creative Commons, which is NoDerivatives,

[00:22:41] Lily: Okay, so we got Creative Commons completely open, BY, ShareAlike, NonCommercial, and then NoDerivatives.

[00:22:48] David: And so those are, if you go on the Creative Common website, this is really important. Now, the NoDerivatives one is very interesting because that’s definitely not open.

[00:22:58] Lily: Yes. This sounds to me like just from hearing NoDerivatives, it sounds to me like you can’t expand on it,

[00:23:04] David: Yes, exactly. You can use it, you can distribute it freely, but you can’t make changes. There are use cases for this, but, it’s not very open. It isn’t really aligned with what I would consider open, which is that you can create variance, you can reuse that.

Reusability is really at the heart of what open is enabling, and NoDerivatives is a big shutdown of that. So anything which has a NoDerivatives clause, I basically dismiss as being not useful. Now of course you, I’m not saying that you couldn’t use something with NoDerivatives, of course you can. And I’m not saying that you couldn’t have something which has NoDerivatives clause actually being re-released under a commercial license in a way which you can make derivatives of, which would be useful. So I’m not saying there aren’t use cases for the NoDerivatives clause.

I’m saying anything which has a NoDerivatives clause, I dismiss as not being open. That this is just a commercial a, a way of actually distributing something with a sensible license, but it’s not in my mind, very aligned with the open principles.

[00:24:19] Lily: But what’s the, what are, what is to stop someone from taking it, something with NoDerivatives, from feeding it to the robots, telling it to change it in such a way that it no longer looks the same, and then publishing that as something new.

[00:24:40] David: Okay, there’s two separate things you’ve got here.

One is the idea and the question is, what is that in the first place? Is this code? Is it, is it a book? Is it whatever? The robots are not good on copyright. By and large, this is a bigger problem. So all these copyright licenses that we’re discussing, there are a whole bigger societal problems, which are emerging because of the way AI is used on these in different ways.

[00:25:09] Lily: Okay. I can imagine that. Yes, because the robots probably might not know, oh, this is a NoDerivatives license, and this is a…

[00:25:16] David: And even if that got built in, it’s not quite clear what this means in terms of how things are used, but this is also one of the reasons that people are going back to more protective licenses because, if you don’t have some form of protection on the license, then your data is now not your data anymore.

Your, whatever you are producing is not yours. It’s, now, everything is considered in the public domain as far as the learning of these models is concerned, and there’s lawsuits which are coming up around that, and which are happening all the time. This is a matter of litigation as to what’s going to happen here, but this is also the point.

What does this mean? And again, it comes down to this fundamentally being a matter of litigation. I can’t answer at what point is this acceptable or at what point this isn’t, but the only way you find that out is through the courts. And of course then which courts are you in? Whose copyright is getting broken and where is that tried? In which court? By who? This is going to be very different, but these are things which are happening as we speak.

I’m not generally somebody who thinks about the litigation aspect, but you can’t proceed with licenses properly unless you are thinking about litigation because that is the only form of defense if you are putting your stuff out there where it can be accessed under any license. Then the only protection is litigation when it is misused. And so that has to be part of the thinking.

So of course you can just keep things in very closed so that they’re not accessible. But as soon as you choose any of these sorts of licenses and you use tools which make them accessible, then putting constraints on your license is only of value if you also consider litigation. Then the big question is, do you even have infrastructure in place to be able to litigate?

Is this something which would be done by an individual or is this something where you are actually handing it over to a group who would therefore have the power to litigate on your behalf? As an author, you know, who has the license, which means they’d be able to do this. And this is something where as an individual, you know, many… I like these good actors, bad actors. Many bad actors were recognised that unless there’s gonna be a big mass lawsuit against them, the individual is un unlikely to go through the litigation process. So a bad actor would just use it. That and accept that the costs of litigation would be less than the benefits they get from using it.

Those are calculations which are happening in boardrooms and so on. So putting this legislate, putting this, these licenses on or out there only makes sense if you are willing to follow it up. Does that make sense?

[00:28:36] Lily: Yes. Yeah, no, absolutely.

[00:28:40] David: And I guess the key thing, which I find difficult is, we want to be a good actor in this.

We are a not-for-profit social enterprise, but we are run as a business. If we use something, is that for commercial use or not? We pay taxes. If you pay taxes, arguably you’re commercial. We’re on this borderline between, if we use it in certain ways, might it be considered commercial or might it be non-commercial? When it comes to those are fine lines. And as a good actor, I don’t ever want to be on the wrong side of that line. So our choice, if there isn’t a way of using something legally and being able to check and make sure and go through the processes.

Then we would probably choose not to use something which could be useful. And this is the thing, your licenses should be encouraging good actors. Done well, your licenses should make it easy for good actors to be able to use what you are doing and to create the protections that if a bad actor uses it in a way that you are really not happy with, then you can litigate.

That’s what open licensing is set up to do. It’s about community. It’s about people sharing. It’s about things being reused. It’s about us building things together because that’s more efficient. It’s better for society in all sorts of ways. There’s lots of benefits which have come out, the open movements, but at the heart of it, we lose a lot of that if we don’t understand open licenses.

[00:30:19] Lily: Yeah very interesting and it sounds like there’s a lot that still needs to be cleared up within the community.

[00:30:29] David: Within communities because this isn’t one community. This is the thing. There’s some communities which are where open is the goal. There are other communities that believe that open is a valuable tool and you know, a lot… And so these different communities are using these ideas in different ways.

What I’m really pleased with though is that, it was a long time ago, but open was originally seen as being something on the margins. Now it’s not, open is mainstream, it is everywhere. Everything we use relates to open licenses in different ways. This has become part of our society very deeply embedded.

But it’s complex and therefore it is misunderstood. In my experience, good actors wanting to use open and wanting to use it, are often misled by the licenses that they’re using. Because their misinterpretations of the license mean that they end up worrying about the wrong things. For example, and I’ll come back and finish on the non-commercial license. If you are going to do the non-commercial license, you have to put in place the structures so that you could also release it under a commercial license to a good actor who wants to use it.

If you don’t have those structures in place, then don’t put the non-commercial license on it because all you are doing is you’re stopping good actors and enabling bad actors. This is, so the non-commercial license is a wonderful and very powerful tool used, and I’m not saying commercial actors or bad actors. No. There were good commercial actors and there were bad commercial actors.

And that line in this context is very easy to determine. Your good commercial actors will want to follow the rules of the license, whether there’s litigation or not. The license has been decided like this. I will follow those rules. Your bad commercial actors would say, I, what can I get away with? Is it going to be cheaper?

Is it going to be cost-effective for me for me to break the rules of the license and settle any lawsuits? If so, I’ll just go ahead. I don’t care about the actual license. I will use it as I want and I’ll accept that there will be lawsuits, but on average it’ll be cost effective for me to do. That is what I would consider a bad actor in this space.

A good actor. The license is this, I would follow the rules of the license. If you don’t make it easy for good actors, you are encouraging bad actors. And that’s what so many people are doing. They’re putting on the clauses in such a way that they’re keeping out good actors who could become really good collaborators and partners.

Because of course, a good actor who’s a commercial actor could be a source of funds for the project, which you are doing, which now enables them because they get better services, they get better software. Your open thing, which is maybe only open up to. Maybe it’s only almost open because it has a non-commercial cause, but you actually have the systems in place that the whole thing works.

The problem that I feel we are getting is the systems that are being set up now around these licenses are enabling and enhancing bad actors rather than good actors. And that’s the question that we should be digging into more deeply. How do we make sure that we make it easy for people to use these licenses in ways which encourage good actors over bad actors.

[00:34:28] Lily: I think that’s a discussion for another day, I’m sure. But no, very interesting. No, thank you for outlining the different licenses and the different types of openness is very enlightening.

[00:34:39] David: I’m not a, an expert on this, but I’ve had to learn about it because we come up against it quite a lot.

[00:34:45] Lily: Thank you very much.

[00:34:46] David: Thanks.