163 – Rethinking Statistics Education

The IDEMS Podcast
The IDEMS Podcast
163 – Rethinking Statistics Education
Loading
/

Description

Lily Clements and David Stern explore the future of statistics education through the lens of George Cobb’s influential 2015 paper, “Mere Renovation is Too Little Too Late: We Need to Rethink Our Undergraduate Curriculum from the Ground Up”. They discuss key imperatives from Cobb’s work, such as flattening prerequisites, seeking depth, embracing computation, exploiting context, and teaching through research. 

[00:00:07] Lily: Hello and welcome to the IDEMS podcast. I’m Lily Clements, a Data Scientist, and I’m here with David Stern, a founding director of IDEMS. Hi David.

 

[00:00:14] David: Hi, what are we discussing today?

 

[00:00:17] Lily: I thought we would discuss statistics education, and more specifically this paper that you’ve brought my attention to.

 

[00:00:26] David: Cobb, I presume it’s a paper about undergraduate education.

 

[00:00:30] Lily: Yes. Cobb 2015, Mere Renovation is Too Little Too Late: We Need to Rethink our Undergraduate Curriculum from the Ground Up.

 

[00:00:37] David: Yeah. It’s a really interesting paper and it’s one which is now 10 years old, of course, which is interesting in its own right. I would argue that a lot of what he articulated so well 10 years ago, and about almost five years ago now, we wrote a paper based on this related to our experiences in African statistics education, which was looking beyond just undergraduate, but building a little bit from these ideas. We really took a lot from this paper.

 

It’s a paper, which I like a lot, and it articulates quite clearly what I think are some of the sort of foundational changes we need to see in terms of how data is central to education and should be central going forward.

 

[00:01:25] Lily: So me, you and James had a discussion very recently in a previous podcast where we touched on this aspect of starting from data when it comes to statistics education. Are these ideas from Cobb’s paper or are these ideas that kind of have proceeded Cobb’s paper, but he’s articulated well.

 

[00:01:46] David: Cobb I think, has articulated some of these ideas well. We’ve been part of a community, quite a large community of statistics educators, who have been going in this direction for quite a long time. In the US this was the GAISE reports, in New Zealand they’ve had this amazing set of statistics educators who have really reformed the education curriculum over the last, it’s now over almost 30 years, certainly over 20 years now. And so there’s really powerful learnings which are coming out from what the New Zealand team have done.

 

And more generally, this actually dates back quite a long way to my father’s era when he was a statistician growing up and the idea of applied statistics was just coming out and applied statistics was trying to get a name for itself independently from theoretical statistics. And I think that’s an interesting starting point that if you think of historically, statistics was considered a branch of mathematics and for a long time, applied statisticians, and then statisticians more broadly actually fought for their independence for mathematics.

 

Not because mathematics isn’t important for statistics, but because if you take a mathematical framework, then it overpowers the data. You worry about the algorithms, you worry about different things than if you are taking a sort of data lens on statistics. And that’s where Applied Statistics was wanting that more data lens. This is an old discussion.

 

[00:03:32] Lily: Interesting. So my experience of statisticians is statisticians versus data scientists, say. And it’s a very limited experience, but then mine is that I would find that statisticians would be more focused on the methods, and data more on the, well, on the data.

 

[00:03:51] David: No, this is what’s interesting.

 

[00:03:52] Lily: Okay.

 

[00:03:53] David: I would argue that data scientists emerged because the theoretical statisticians were very mathematically focused, and they weren’t interested in the things which were more computational. And what’s so interesting is data scientists seemed to have been captured by the computational side. They’re really coming out of computer science.

 

And a lot of what’s happened there is, I would argue that the coding and the computer science side, the computational side, took precedence. Now what’s really interesting is even in mathematics now, there’s areas where you are getting that interplay between the computational side of proof and mathematical proof.

 

But in data science, this became a very big distinction, and data science became very independent from statistics. But I would argue it suffered from the same problem of not putting data first. It put the computations, the machine learning methods first, and not the data. Now, the importance of data science and the machine learning methods and so on is essential, and this is mentioned in Cobb’s paper. The importance of computation is one of the things he draws out.

 

[00:05:11] Lily: Interesting. So you’re saying in both situations, whether it’s data science or statistics, that we’re still having this problem of the data not being put first.

 

[00:05:23] David: That’s what I feel and I feel that quite strongly in different contexts. People would claim you can’t do statistics unless you understand the mathematical foundations. And similarly, they’d argue you can’t do data science unless you can code up the sort of solutions in different ways.

 

And I think both of these are wrong. I love teaching people to code at a young age, and I love the idea of people being able to code things up. I love the foundational mathematics, I’m a mathematician at heart. But I do think that the data skills are somehow independent from these two, maybe not independent, but they’re interdependent and they are separate from these two. And I think there isn’t enough emphasis on this.

 

And one of the ways in which Cobb articulates this so beautifully is that he says you can flatten prerequisites if you take this more data focused approach because you don’t need to have all the mathematical buildup of the theory before you can do stuff. You can use a generalized linear model before you understand the mathematics behind it, as a very simple example.

 

[00:06:31] Lily: Nice. So it makes it a lot more, it makes it a lot more accessible to more people because you don’t need to worry about this kind of mathematical side. But surely it’s still important to understand what’s going on.

 

[00:06:45] David: Absolutely. And the point which he draws out articulately, which I think is so powerful, is that this approach can lead to a deeper understanding. Your understanding doesn’t necessarily come from the mathematics or from coding things up.

 

[00:07:01] Lily: That’s true.

 

[00:07:02] David: And so neither of those necessarily lead to a deeper understanding. And we discussed this a bit with James in the previous episode. Both of you had these instances of having gone through and understood concepts theoretically and suddenly finding you can’t answer simple estimation questions or simple questions, which require a different level of understanding of a concept, which is a simple concept.

 

And I think that the key in many ways is to recognise that there isn’t a straight line of dependency between these three. I believe all three are powerful and valuable skills to have. I love abstract mathematics and the value that brings and the understanding that brings. I learned to code when I was 10, and I really appreciate that particular skillset and what it’s given me in life.

 

Arguably I learned about data later in life, even though my father was a statistician because nobody had this data focused approach. But there were elements of which I was exposed quite early on in ways that were sort of implicit rather than explicit. And I believe it’s that third part that we need to really work on, to say how can we teach these data skills, these real understanding that comes from data so that it can become a full partner alongside the computational skills and alongside the mathematical skills. And you have these three core skills, which would then be complimenting one another.

 

[00:08:30] Lily: Nice. And I guess then from there the question to me is that, this is a paper from 10 years ago. You’ve written on it in the African context five years ago. What are we doing about it? How do we rethink this curriculum from the ground up?

 

[00:08:46] David: So a few people have tried to do elements of what he’s put in, and it is a paper which has been heard in certain contexts, but it’s actually not easy. I think it’s worth actually going through. He has in the paper a lot of interesting things, this element about the role of mathematics and so on, that’s all part of it.

 

But I think maybe it’s worth actually going through the five, trying to think how he articulated them, I think he articulated it as imperatives. Five imperatives is my memory that he’s got. I mentioned flattening prerequisites, and I’m sure that’s one of them. There’s this element of actually, as you put it, wanting to understand deeply was another one, that it’s not that you are just trying to be able to do something technical, you are actually trying to build that depth of understanding.

 

And, you know, this has a number of different elements to it. Then we’ve also mentioned the fact that the computation side is important, and we’ve distinguished that from, if you want, the data side and the mathematics side. So in some ways saying that computation is important was another one of his important things.

 

But the ones that I really love were the two that I think we haven’t mentioned. One was related to context, you can’t think of data outside of the context within which that data was collected or it exists or what it represents. And actually the importance of context for data is central. Whereas the whole point of mathematics is the abstraction from context. And once you are doing the computations, you know the context doesn’t matter. But data, it does matter that you are in a context.

 

And then he had a final one, which is maybe slightly less relevant here because it was specific to the undergraduate approach, but he was arguing that research is an important part of it, that you know right from when you start, statistics and data are there to really enable research. I think he felt this was the key one. And I think from an undergraduate perspective, he’s probably right. This is what ties a lot together, that you are actually doing things, once you are working with data, you are working in context, you are doing research.

 

And so I think there is an element there that is more general than just the undergraduate, that project-based learning, actually learning by doing, is something which is very powerful. And from a data perspective with statistics, the reason that we have data and we do statistics is to be able to learn and understand about the world. These were the five, he did call them imperatives, I’m sure.

 

[00:11:36] Lily: You’re right. The five principles of Cobb, which he formulates as imperatives are flatten prerequisites, seek depth, embrace computation, exploit context, and teach through research, as you say.

 

[00:11:46] David: Yeah, it’s a really powerful set of five I found, which is why we used this framework in our paper afterwards. He also had a couple of caveats, didn’t he? One of them was that you need to think about curriculums in context or locally or something like that?

 

[00:12:02] Lily: Yes. I’ve got here all curriculum is local. I’ve got your paper in front of me, which is where I’m getting this from, but all curriculum is local is one of the two caveats that you mentioned.

 

[00:12:14] David: Yes, and that’s a really interesting one, and it’s one which we’ve actually thought very hard about. This idea that when we are looking to implement these sorts of things, you need to be working with local institutions, you need to be tailoring it to the context you are working in. It’s a really interesting one, and if we think of the work that we mentioned with James, it’s this aspect that you don’t just want to think about the resources or the curriculum, you want to make them open educational resources so they can be adapted to different contexts. And that’s really central to this idea.

 

[00:12:49] Lily: Sure. You want to be able to scale, but with it still being contextual.

 

[00:12:55] David: Exactly, it’s that ability to adapt rather than adopt.

 

[00:12:59] Lily: Yeah.

 

[00:13:00] David: This is true in so many different domains in different areas, but it’s particularly relevant in this area of education where you want that education to be locally relevant. I think the other one was something about institutionalization, is that right?

 

[00:13:15] Lily: Yeah, for change to endure, you have to institutionalize it.

 

[00:13:18] David: Yes. Yes, that’s exactly right. This is really important, and this is what we’re doing, working with institutions always to do this where it’s always collaborative. It’s not about what you build, it’s about how this is built into institutions. And this is something which personally, I got a lot of insights about this even before I read Cobb’s paper, when I was embedded in a local Kenyan university and I understood through that process how once you institutionalize something, if you choose that moment to get something into the institution, then actually the next steps, even if they don’t happen as you would have liked, there is a longevity to it.

 

The degree programs that I launched in Kenya, 15 years ago now, they’ve gone through their whole journey. They’ve had iterations, they’ve had their ups and downs, they’ve had their problems. But, because they were institutional changes, they’ve had a longevity, which is totally different to anything I did in the classroom itself. And that’s really interesting and it’s certainly at the heart of one of the, of the collaborative approach we have now to work with institutions to enable them to develop new curriculum, as we’re doing with the Open University at the moment.

 

[00:14:38] Lily: How can you be sure that it’s being kind of taught as you envisaged it or taught in the way that you want it to be taught?

 

[00:14:46] David: Ah, very good question. You can’t.

 

[00:14:49] Lily: Okay.

 

[00:14:49] David: And this is something which has been very hard and difficult for me to accept in many ways. The changes I made to the curriculum in Maseno University, there were times at which I was despairing because I’d had put really careful thought into how these things should be taught, and they weren’t being taught as I wanted to. And what’s really interesting is that, actually, some of this is a generational thing. Some of the people who have gone through those processes were not taught as it should be.

 

So we looked at the curriculum and we expected it to be better, and I said it was designed to be that, who’s going to be the person to implement it? And some of them have taken up that challenge and now are in the position to be part of giving that education to the next generation. And that’s what happens when you institutionalize it. It might take 5, 10 years before it’s achieved, but there’s always the possibility for it to be picked up and used like that and taught as it was intended.

 

But a lot of the ideas that we’ve had come from way back then. Fifteen years ago when I was designing these curriculum, I was already thinking about making that curriculum software agnostic, which is something we’ve never really written up on, except that I believe it’s appearing in one of your recent publications for a conference.

 

But this is something we’ve discussed time and time again over the years. And it’s something which for me, I’ve been thinking about and pushing towards for 15 years. And it takes that sort of time for it to gradually enter into a wider context, a wider set of people, a wider realisation that if you make the software the key, then people learn the software. If you do it in a way which is software agnostic, then you can focus on the concepts. What is it you are teaching about the actual substance?

 

[00:16:45] Lily: Nice. Yeah, and so again, it’s about you’re not putting those kind of methods first in terms of the how you do it, but more the focus is on getting it done. I just got off a call with one of our INNODEMS team members, and they have been doing this code for some work that they’ve been working on in R. She’s not, like she can understand R, but she’s not trained in R. But she wants to write this code and so we were looking through it together to find the bug and it was really well written because she’s had tools that we have these days helping her to write it. So I think now more than ever we can be a lot more software agnostic, we don’t need to have these skills for coding to be able to achieve what we want to as much.

 

[00:17:33] David: All the different trends should point in this direction. This is where I feel data science is making the same mistake statistics made all those years ago. And when I say mistake, they are products of their time. When statistics was emerging as a really dominant discipline, looking at data, it was transformative and you didn’t have much data, data was scarce. It was a time of data scarcity, and you needed to make the best possible value of that data. And quite often the challenge to do that, it was a mathematical challenge of actually understanding. If you don’t understand the mathematics, you don’t know how to do the simple things that would enable you to get the best value out of a small amount of data.

 

Then suddenly computational power became available, data became available very widely. Data science emerged as a discipline. And when it emerged, the challenge was to actually implement these routines that could use this abundance of data. And so it became a computational challenge. Whereas I think what we are getting to now is very much this point where both of those are still valuable, they’re still useful in certain contexts.

 

But data itself is abundant enough, and it’s used widely enough that the skills around it, and about how it relates to context, these can be taught independently. And then they can be brought together with these other skills.

 

[00:19:12] Lily: So I wonder if I should open up the kind of whole can of worms about what we are trying to do about it.

 

[00:19:18] David: Okay, that’s a big can of worms. And it might be that’s another episode. But what I can say is that part of what I think we are trying to do, I like phrasing it like this, we’re trying to liberate the data skills from its mathematical and computational foundations.

 

[00:19:36] Lily: Okay.

 

[00:19:36] David: We are not trying to get rid of the mathematical foundations nor the computational foundations, but we are trying to liberate the data skills from those foundations. And the reason for that is that I believe very strongly that if we look forward to a very future looking education, I believe that these three pillars, if they are seen as strong, independent pillars, then together they should give us more than anyone can give independently.

 

I would love to see, for example, and this is something, it’s been over 10 years that I’ve been thinking about this. If I looked at the maths curriculum of the future, which should be taught through school, I don’t believe calculations should be the central pillar leading to the progression of mathematical concepts because calculation is no longer the limiting factor. We have tools, computers, that can do that.

 

And so the question is you still need progression. What might that central pillar become? And what’s so interesting is in New Zealand, they have got this independent data or statistics stream from the first year of primary all the way through schooling. And they’re struggling, but they are now working on the progression. One of the problems they had with the initial implementation was that it didn’t have a natural progression because it didn’t have calculations in natural progression.

 

So they’re working on actually saying what is the natural progression that emerges from this that means that we have a sense of a linear sequence, which is a progression from what you can do as a six year olds to what you can do as an 18-year-old. And what’s amazing to me is that when I think about that progression I can see this as informing the mathematics curriculum. I could see that we could be building the progression of mathematics as an abstraction of the tools needed to understand data more and more deeply. The seeking depth is having the depth in your understanding of your data.

 

When you first look at things, you can look at them very simplistically, and then you add depth and you add layers, and as you add that depth and concepts, actually you are exposed to tools which have a much deeper mathematical foundation. We mentioned, for example, generalized linear models. What’s so interesting, of course, mathematically is that in mathematics or in statistics, you wouldn’t get exposed to those concepts until maybe postgraduate education, if you’re lucky.

 

[00:22:33] Lily: Yeah.

 

[00:22:34] David: However, as an idea, if you introduce data first, you could maybe introduce them very early on. And then this is a motivation for actually saying mathematically, what do we need to do this? We need to be able to measure things in different ways. So you could be introducing Measure Theory in schools as a sort of concept because you are introducing it from a needs based perspective rather than from the perspective of mathematical formalism.

 

And so because you are starting with the needs and not the formalism, it can be introduced much earlier, and the formalism can then follow. So your mathematical curriculum can then provide you with the depth of understanding related to the formalisms you are needing as you go through your different needs.

 

So I’ve not talked deliberately about what we’re doing, but I have talked a little bit about this bigger picture of the fact that if we want a future looking curriculum, it’s almost certainly not the computation or the mathematical abstraction, it’s not the calculation part of the mathematical abstraction, which is the skill which needs to be widely in the population. The skill which everyone needs is the data skills to be able to interpret data, not to be misled, to be able to use evidence and understand evidence.

 

That’s a population level skill. We need everyone to be good at that. And then we want some people to be excellent at the computational side, and we want some people to be excellent at the mathematical side. And so if you think of that trio, then suddenly we can think of educational reform at a much bigger level in a way, which would be transformative.

 

Now, we are not working directly towards that yet, but those ideas are what is behind everything we are doing. These ideas that data skills does not need, it can be liberated from the mathematics and from the computational side, and that if you want to go deep, you then do need the computational skills, and the mathematical skills alongside it. That’s the foundation of everything we are doing.

 

So this is at the heart of where, well, in the episode where we dig into what are the different things we’re doing, this is the heart of it, the foundational idea, the foundational position of data science and data literacy. So maybe this is data literacy rather than data science, data literacy, as this central pillar to our understanding of the world, and to education. It can be accessible to all. This should be a skill for everyone.

 

[00:25:20] Lily: Excellent. Thank you very much, David. Is there anything else to add or summarise on?

 

[00:25:25] David: I look forward to digging into discussing the actual things we’re working on, and that can be another episode.

 

[00:25:31] Lily: Great. Looking forward to it. Thank you very much.

 

[00:25:34] David: Thanks.