244 – Scaling Open Textbook Variants with PreTeXt and AI – IDEMS International Community Interest Company (CIC)

The IDEMS Podcast

244 – Scaling Open Textbook Variants with PreTeXt and AI

00:00 / 23:47

March 13, 2026

Description

Lily and David continue their discussions on converting open textbooks into PreTeXt. They focus on the “Learning Statistics with …” ecosystem, where an original open book has spawned variants for R, JASP, Jamovi, CogStat, French, and potential new versions such as R-Instat. They explore how PreTeXt could better manage multiple independently maintained variants by identifying what differs, easing updates from a base text, and supporting responsible human ownership.

Transcript

[00:00:07] Lily: Hello and welcome to the IDEMS podcast. I’m Lily Clements, a data scientist, and I’m here with David Stern, a founding director of IDEMS.

Hi, David.

[00:00:14] David: Hi, Lily. What are we discussing today?

[00:00:17] Lily: I thought it could be a good time for an update on the PreTeXt books. They’re moving fast, so while I think our last update was relatively recent, it feels like an appropriate time for another one.

[00:00:29] David: Sounds good. So how many have you got to?

[00:00:32] Lily: 20. Well, actually, I’m now on my 21st. I was meant to stop here, but I came across an email of…

[00:00:39] David: Anthony Unwin. Oh, yes, I love his work. Yeah.

[00:00:42] Lily: Yes.

[00:00:42] David: He was my lecturer back in the day when I was in Augsburg, so, yeah, great that you are doing that.

[00:00:48] Lily: Yes. Yes. So I came across, well, an email that he sent to you of his books, getting more out of graphics, which looks really interesting. So actually I started playing around with that this morning, and just seeing, okay, how doable is this? We’re talking about a lot of graphics, let’s just give it a try. So we’re on 21, even though I said I would’ve stopped this month.

[00:01:08] David: But no, that’s great. So Anthony Unwin is the person who introduced me to the grammar of graphics years and years ago, 25 years ago, I think it was. No, maybe not quite, maybe it was more like 23 years ago, but still a while back. And this is part of what started me on this path to actually recognising the importance of these structures, of these grammars, which, I would argue in many ways, the PreTeXt work is related to.

The semanticness of it is about trying to get that structure in a way which has meaning. And so this is really exciting that we’re taking his book and putting it into PreTeXt, and potentially enabling that to bring added value in ways which could be unexpected and interesting. Oh, exciting!

[00:02:01] Lily: Yeah, no, absolutely, very exciting. And I wanna go into the, I’ve gotta say, books, because they are different books, but they come from the same book. And so the one I want to discuss today is this set of books which are kind of learning statistics with, and then I say a “blank” there, because it was originally learning statistics with “R” and then different people have come along and made their variants.

So now there’s learning statistics with JASP, with jamovi, with CogStat, there’s a French version, I believe. And there’s all these different variants of the same book where this original book has been taken and formed into its own way. And even we now are talking about doing learning statistics with R-Instat, which is another tool that you can use, another statistical tool that you could use, which we work on at IDEMS and that we’ve done many podcasts on before as well, so I won’t go too much into that one here. I’m sure we will have our own one on that one another time.

What I want to go into here is this kind of idea of these variants.

[00:03:02] David: Yeah. And this is really interesting because this is exactly what we believe, that PreTeXt can help us build a system to manage these multiple variants very differently and to conceive books as having multiple variants. We’ve discussed this a bit in the past, and to have an open book, which already has this concept emerging, and to then try it and discuss how can that be taken forward, how can we actually take what already exists and put it into a structure and actually try to understand, well, what are the differences between these variants, how much has actually changed, what has changed, what hasn’t changed?

And to put them into a coherent, into a form of coherence, not into a coherent whole, because each variant is its own entity, but actually being able to think of that more together. This is what we at IDEMS are really thinking deeply about and it’s a problem which comes up in all sorts of places, but this is a beautiful illustration where each of those variants is owned, and I’m using “owned” in a sense that the people who have responsibility for it are different. So, they’re owned by separate entities or separate people, or separate groups. But at the same time, there is this sort of commonality around the open textbook, and this is exactly what we want to imagine and encourage.

How do we build systems which really enable this? This is the hard problem that we’re trying to engage in.

[00:04:44] Lily: It’s a very hard problem, it’s one that scares me a lot and that I’m not even sure how.. I’m just taking it bit by bit, all I know for now is that this is something that there’s clearly a need for, that people are creating these variants, and therefore, we want to add some ease to it. If they change the base book, the original book, then what updates do we do from that?

And actually this is a book that has changed. There are bits that have changed in their versions, which I’ve noticed, “okay, you’ve got this part”. Well, it’s, you know, a small statement here or there. It’s like, “oh, you’ve got that one, but you’ve got that one”. And I wonder if that was by choice that you chose to have that one, that certain bullet point.

There’s one in particular where the author refers to themselves as “I” throughout the book. And there’s one point in particular where they refer to themselves as a male in one version, and in another version they say, actually, I don’t refer to myself that way anymore. And actually different versions of the book, different variants of the book, some of them refer to the original and some of them then refer to the updated, with some having this footnote saying, “actually, this has changed, but for the sake of me not having to rewrite the whole book…”

[00:06:08] David: This is one specific piece, but it is exactly that idea of: how do we, when we have multiple variants, make those choices easily accessible? How do we make that updating happen in other ways? This is a hard problem, and I’d love to actually be able to work with all the different partners on this trying to say this is what we want to get towards, that we don’t want to take over these things.

On the contrary, we are wanting to then gradually enable others to take their pieces of ownership and to find this added value that comes from working in a system, which might support some of that ongoing maintenance. And you’ve been using AI for the translation, but one of the other things is, of course, what about an AI agent to help with the process of updating and that decision making?

How do you do this in a way where it becomes conservative and it’s something which is made more accessible, but also at the same time where the human who is taking responsibility continues to be the person responsible? It’s really interesting.

[00:07:21] Lily: There’s a lot of scope. And, as you say, it’s not only relevant to these variants. You know, we could talk about it in different versions of STACK questions, in different versions of apps, and different versions of courses. Just GitHub alone, for anyone that uses GitHub.. I guess now we’re gonna go into version control and that’s definitely not a territory that I can have anything of use to say about other than that it seems very convoluted.

[00:07:47] David: I mean, it’s complicated and it is really important. So the version control processes are so important, but now being able to apply those to these textbook variants and versions is very effective.

[00:08:01] Lily: Yes, and here we have a really nice example where we can actually try, where we can actually play around. I’m more applied than you are, you are more theoretical, I’m more applied, I like to actually try it before I can understand it properly. Whereas you can just do it in your head. Something that I alluded to is taking this “learning statistics with R” book, and creating from it two versions, one version being learning statistics and that version being kind of our software agnostic version. I think we’ve spoken about it on quite a few other podcasts about software agnostic and how having a version that doesn’t rely on a specific software can then actually mean that the learning doesn’t become about R or Python or that specific type. The learning instead, the emphasis gets to be on the actual statistics side itself.

[00:08:55] David: This is such a beautiful case to be working on because I believe that the software agnostic variant can only really exist once you have many different software specific variants. And so here we have a case where there is an R version, there’s a Python version, there’s a jamovi, and so on. And so these variants all exist already, and therefore a software agnostic variant can refer to the others.

And I want to come back, and I’m not gonna dig into this as it is something that has been discussed in previous episodes, but in a class, let’s say a postgraduate class, where you have students who have different backgrounds, some of which might have an R background, others might have a Python background, others might have used JASP or jamovi, or R-Instat, to be able to have a textbook where they can use it and they can continue to use the software that they are comfortable in first, but be exposed to other software, and where the class as a whole can be using a diversity of things. This isn’t right in all contexts, but there are certainly contexts within which that is extremely valuable. Next year I’ll be teaching at AIMS, the African Institute of Mathematical Sciences.

[00:10:18] Lily: Next week, not next year.

[00:10:21] David: Oh, sorry. Yes, next week I’ll be teaching at the African Institute of Mathematical Sciences in this doctoral training school, and the profile of the different students just came in, and some of them are advanced R users, some are advanced Python users, there’s this whole diversity.

And of course, the course that I teach, which you’ve taught as well, is a course where it doesn’t matter what you’ve done. What matters is if you can explore and investigate data and whatever tool you use. This is problem solving in statistics and data science. This is a course which I always get nervous about teaching, but I do love it, because you never know what’s gonna happen because it depends who’s in the room, what they can do, what they can already do. What they then share, you know, builds those skills and that awareness.

But it is a perfect example of a software agnostic course. You know, I come in and they say “what should I use for this?” Whatever you want. And some people will use Excel and others will use R and others will use Python and others will use all sorts of other tools.

Now, I think the most I ever had was six different softwares used in the same course by different people. It was great and they all struggled with different things. I love that, I love the fact that the tool does matter for what you are doing, but it doesn’t matter which tool you’re using. There’s a contradiction in that, but it’s really, really powerful.

[00:11:39] Lily: Well, and also to add, you could use multiple tools. You know, Excel has its strengths for some things, pivot tables for example, and filtering, and things like that. But then actually, if you want to get a real kind of power behind your graphics, for me, my go-to would then be R or R-Instat, a kind of R based tool.

[00:11:57] David: Yeah, absolutely, this is exactly right. I like to think of this as a language. If your software is a type of language, for people who only speak one language, it’s really hard to learn another. But some people who already speak multiple languages find it that they can more easily pick up other languages because they’re used to sort of listening in a different way and listening and picking up. And this is the same with using different tools for data analysis.

If you’re used to using multiple tools, it is easier to then pick up and add another tool to your toolbox. That’s very powerful. So yeah, I’m really excited, I’m looking forward to teaching that course, I always enjoy it. It’s probably gonna be one of the last times I get to do this simply ’cause it’s hard to spare the week. I haven’t done this in a few years, and I’m doing it this time because John’s finishing his PhD and it’s a good time to go into Rwanda and try and tie off things there. But I don’t know when I’ll next get the chance to do this. I do enjoy it.

[00:12:56] Lily: Definitely. I mean, as we’ve spoken about it on the podcast before, James and I taught it a couple of years ago, and it was very enjoyable, and very interesting.

But anyway, let’s go back to us wanting to create these two variants, one being this software agnostic version, which, from having these current variants in R, in jamovi, in Python and so forth, we want to see if we can work out how to create, and then the second variant being one that uses R-Instat.

[00:13:30] David: And this would be a good test to see, is this something where there is work we need to do on R-Instat to be able to cover the material? Are there gaps? Are there things which are within the content which are not currently prioritised in R-Instat? My guess is for the modeling, there will be gaps.

[00:13:48] Lily: Yes. Yes. I think that that’s a pretty safe guess. But in the “Prepare” side and in the “Describe” side, with graphics and all, I’ll be surprised if there were gaps. It’ll be interesting. I’d love to know if there are gaps. I guess we could be creating that kind of simple book first and then even expanding it in our own way, if we wanted to.

I was talking to Roger, who works on R-Instat as well about this, and he was digging into the data sets and he was saying he’s a bit disappointed, because a lot of the data sets that they use in the book are quite small. But, well, we can use our own data sets, though. We don’t have to follow the book’s data sets, and we can put in it ones that we enjoy and ones that we feel are quite useful.

[00:14:28] David: Absolutely. And this is exactly where these open textbooks are so powerful at being able to sort of say, “well, we don’t need to change the textbook, but we could change, or insert, an example, which actually could then go through”. You could have a variant of the jamovi variant, which uses that data set, which would be really exciting.

This is something where these things can go in parallel in ways which are really interesting and exciting. That these variants can then live alongside each other and potentially combine in exciting and interesting ways, that’s what I’m hoping can emerge from this process. You’ve kickstarted this with the amazing work you’ve done getting these 20 books into this technology. But I think it’s just the beginning. I hope we will be able to get the community more engaged and involved in collaboratively contributing to these things more.

And what I love about the example of the learning statistics with R book is that the authors have been very positive about these variants, they refer to them in the book. They’re really proud of the fact that other people have taken their work and built on it. And that’s what we want. The reusability is something which is often neglected within the open community, where instead we really want to have that reusability, where we want to be building a community around this, a whole community that could emerge around the book, which builds it and builds on it in interesting and exciting ways.

[00:15:58] Lily: Definitely. The only thing that then came to mind as you were saying that was: it’s a big task to create a structure. Developing different variants is not as big of a deal. But on top of that, we don’t want it to become confusing or overwhelming for the receiver.

If we have these different versions, hypothetically, let’s say that there’s this version, the jamovi variant with these examples and the jamovi variant with those other examples, then which version are you using? There is then that fear that we could create some discourse or just be adding another layer of complexity for users, creating this confusion. Say taking your example of having the jamovi variant that there currently is, and then the jamovi variant that then uses these new examples. Then there’s two jamovi variants. And then does that get confusing for the student, for the person reading it? If you’re trying to compare books, if you are talking to each other and you both know that you’re using a jamovi variant, but there’s now two jamovi variants, so, which one are you using?

[00:17:07] David: Well, but this is where my hope is that confusion is desirable in the sense that if we get to the stage where we always want uniformity, then we’re sort of removing the diversity. I would love it if there are variants in the future where, for example, you have a variant which is more adapted to biology students or more adapted to psychology students. And then, within that, what about Kenyan psychology students versus American psychology students?

So this desire for multiple variants, you could take this much further. Actually, I believe a single authoring perspective where you have the book is the reality we’ve lived in for so long. What does it look like when you have many tailored variants, but where you actually have this element that all these tailored variants are equivalent in terms of, let’s say, the content which is covered and the structures? And they’re all sort of approved: whichever of these variants you are using, if you’ve gone through it, you should get the same statistical concepts or the same concepts in data science.

That’s what’s so powerful. Then it doesn’t matter which variant. And the fact that there are different variants might mean that different students might then engage with multiple variants because they might find that, actually, it helps them to see the same concept presented in these different ways.

All of this is something which is not possible to do unless you get a community behind it. But the bigger your community that’s engaging in this process, the more variants you could potentially have and the more people you could be serving in ways which tailor them. And hopefully also the more minority edge cases could be deeply served.

So, you know, what if there was a variant which was very specifically, I don’t know, related to – pick your favorite narrow field – marine biology, it is not that narrow, it’s quite a big field, but it isn’t something which is mainstream in other ways. Imagine you now have your marine biology variant, and because of the way the structures work, well, they can do that with jamovi or they can do that with R or they can do it with Python and so on.

So you still get that potential, these things can overlap and coexist in different ways. This is the dream, that the marine biology community can be served without it then being tied into, let’s say, the choice of a particular person who’s done that tailoring to a particular software choice. They might have chosen R because that’s what they’re comfortable with, but then their students might want to use jamovi, or R-Instat, or Python.

This ability that the person creating the marine biology variant is not the person who has to be the expert at understanding all of the software components, those can exist in parallel. That’s the dream. Now, at the moment, the structures to do that and to maintain that and to build that ecosystem and to enable people to engage in those ways don’t exist.

But there has been work within the Statistics education community on tailoring these books to specific audiences, on having the books tailored to specific software. And these are things which have been shown to add value, but this adds complexity. And that’s what I think we could actually dig into and really enable people to lean into: that diversity, rather than everybody wanting the same textbook. Everybody having variants of a common textbook is an interesting alternative.

[00:20:57] Lily: No, definitely. That’s a very nice idea, a very kind of different future.

[00:21:01] David: And what you have demonstrated is that this is a future which is possible to imagine as a community effort enhanced by AI. And the key here, and this is part of those who have listened to lots of other episodes, is this idea that the narrative, the dominant narrative around AI is about it taking jobs, about it taking over what people do, whereas this alternative narrative, specifically around the generative AI possibilities that now exist, is that they could and they are enabling us to bring communities together and to work more with communities of people in ways that would have been difficult otherwise, because the AI can be part of bridging some of these gaps. The gaps of, as you’ve already gone through, what was it written in, what was it authored in? And you’ve converted that into a common authoring system using AI so that these things can now be more interoperable with one another.

This is the sort of thing where thinking of AI as being something which is enhancing human collaboration, that’s what I hope we’re demonstrating with what you started: a small textbook project, but a very exciting one. And it’s not small when you’re in it, sorry.

[00:22:30] Lily: Oh, no, it’s a fun little project though, as you say. The only thing I want to add there about the AI is: yes, it’s very, very quick to get your R Markdown version into a PreTeXt version, convert this .Rmd file into a .ptx file. It’s very easy to start the robots to do that. They can enjoy changing the language and the tone. The bit that’s time consuming is checking it.

[00:22:54] David: Absolutely. Doing it responsibly, and being able to check through and so on, this is where that human effort is still there. And that need for the human expertise to be able to have that direction to take responsibility. This is critical.

[00:23:11] Lily: Well, thank you very much. This has been a very exciting conversation.

[00:23:14] David: It’s great. The progress you are making is absolutely incredible. I really look forward to this moving forward in interesting ways, and I’m sure we’ll keep talking about this in the coming months as the next innovations happen.

[00:23:28] Lily: Yes. Yeah.

[00:23:29] David: Thank you.

[00:23:30] Lily: Thank you.