128 – Experiences of Using STACK for Data Skills – IDEMS International Community Interest Company (CIC)

The IDEMS Podcast

128 – Experiences of Using STACK for Data Skills

00:00 / 28:31

Description

In this episode, Lily and Santiago discuss their work with STACK for developing data skills. They share their experiences using STACK to create interactive and personalized questions for teaching statistical concepts. They reflect on different experiences using STACK questions on data with students, highlight the platform’s ability to foster understanding through feedback and mastery learning, delve into the challenges of question design, and discuss student reception and performance improvements.

Transcript

[00:00:00] Lily: Hello, and welcome to the IDEMS podcast. I’m Lily Clements, a Data Scientist, and I’m here today with Santiago. He’s a collaborator with IDEMS, and he used to do a lot of work with IDEMS, particularly in education. Hi, Santiago.

[00:00:20] Santiago: Hi, Lily. Yes, indeed, I am currently a collaborator, on leave because I left to spend a bit of time working at a secondary school here in Argentina. But I am coming back to IDEMS later in the year as we discussed with David in several episodes now, so the regular audience should be familiar with my story by now, hopefully. But yes, I’m currently a collaborator and I’m keen to discuss some of the work we’ve done sort of together on data questions in STACK.

[00:01:00] Lily: Absolutely. Great that you will be coming back, and hopefully that will mean that your introduction will be a little bit slicker in the future when someone has to introduce your role. But also that we’ve both worked in that kind of education side, we’ve both worked on STACK. But we don’t really work together as much, which is interesting.

[00:01:18] Santiago: No, and we took very different roles in that STACK work. So, we’ve had several episodes about STACK before. So, anyone new to STACK can look up episodes that we had in the past. But essentially, it’s a platform that we use to create questions that allow for personalised feedback detecting mistakes and so on. It has a lot of other features that we may mention, but it’s essentially a platform to ask mathematical questions.

[00:02:00] Lily: Yes, absolutely. So then on how I use it as a data scientist is a lot more on those kind of statistical questions.

[00:02:07] Santiago: And we’ve done a lot of work on questions for statistics. And James Musyoka was the instigator for that work, defined some questions he wanted implemented motivated from a statistics online textbook called CAST that as well we’ve discussed in a few episodes, and he defined these questions and I had the difficult job of implementing them.

Normally, a standard question in STACK takes about two hours to author, a good question. But the type of questions that James was selecting were initially for descriptive statistics. And for descriptive statistics, you really need graphs, you need visualisations of data. Please correct me if I’m wrong, I’m not an expert. I get experts to tell me what to do and I then do it.

But just to give an example, there’s one question where there’s data displayed as a dot plot with a line for the mean and the median, and you can move around that line to place the mean and the median. The data has some outliers, you have to think carefully about what those outliers would do to the mean and the median and place them correctly with respect to each other.

[00:03:44] Lily: Yes, exactly. And what I really like about that mean and median question, so you’ve got this kind of plot, if you imagine it, of all these different dots on the plot, and you move this vertical line to show, okay, where is the mean of all of my points, and where’s the median of them?

And I think that it’s a really great way that you can actually, talk about skewness, and talk about, okay, well the mean’s going to be greater in this situation, and the median will be greater in this situation. But this question here, it’s about knowing the definition of when will the mean be greater? When will the median be greater?

There’s no values which are given to you. So you can’t sit and calculate what the mean and median are. It’s just up to you to kind of actually understand those concepts behind it to then be able to really grasp, actually, will the mean be greater here? Will the median be greater here? That’s something that I think, especially it being mastery, so you can do it lots of times, really adds to.

[00:04:44] Santiago: Yes, and again, I’m not an expert, but my understanding is that traditional data courses, they’re very much about calculations and carry out this calculation. With this question, it’s so much more about understanding and being able to interpret the graphical representation of the data. As you said, you can’t calculate because you don’t have the values. So it’s really very much about focusing on the shape of the distribution.

[00:05:18] Lily: Which then really helps you to build that kind of intuition and that understanding because I absolutely agree. I mean, my experience is, it’s been a lot more about the definitions behind it, the calculations behind it. That really reminds me of another question that we have on STACK, which might have been mentioned before, but when I was in Cameroon…

[00:05:37] Santiago: Sorry, let me interrupt because before moving on to this other question.

[00:05:41] Lily: Yeah.

[00:05:41] Santiago: I’d like to highlight that with that particular question, one of the things that is great about it is that we designed it so that the feedback focuses not so much on the correct value of the mean and the median, but on the relevant positioning of each other and tells you, okay, the mean should be more than the median or the mean should be less than the median. Look at these points, what effect do they have on each of these averages?

And that, if a student gets it wrong, getting that sort of feedback will allow them to try again, as you were saying with mastery, and try again, having had specific feedback about the mistake they made.

[00:06:30] Lily: Exactly, yeah, that’s a really nice point and not one that I’ve really thought about before in that way , because this was a question that was written before I kind of started on the STACK side and education side of data. But that’s really nice, and there’s so much more thought that goes into these questions than I realise. But you’re right, because it doesn’t actually, the point of the question isn’t, okay, well, you’ve got the mean spot on, well done. That’s not the point. The point is did you get that relation between the mean and the median correct here?

[00:07:00] Santiago: Yes, and a lot of credit has to go to the authors of CAST, because that was originally a CAST question that we then implemented in STACK. So most of the credit goes to them for creating such a good question.

[00:07:16] Lily: This is what I like about it, is it’s a simple question as well. It’s a good question, but it’s just, it’s simple. It’s just, here’s a plot, here’s a bunch of values, like on a plot, you just move these lines to say where the mean is in relation to the median, essentially.

[00:07:31] Santiago: Yes. And of course, being able to present things online gives that potential for having the question interactive. The students actually have to move the bar. It’s not a static question where you have to write your answer. Because it would be a very different question in a textbook, I believe.

[00:07:49] Lily: Really making the use of it being online and really making the use of the mastery aspect of STACK as well, of the, okay, try again and try again, and this interactivity.

[00:08:01] Santiago: Yes, and you were mentioning that and other questions that you’ve used before. And I’m really interested because we never had a chance to delve into the use, or at least I haven’t had a chance as the author, how did it work out with students? What impact did it have? And maybe not that question, but other of the interactive data questions. Is there anything you can tell me about actually being in front of students using this sort of questions.

[00:08:34] Lily: No, no, absolutely. And I guess that does kind of link into my story. So firstly, this kind of other question, and then that really links into what your question is. So we’re teaching these kind of students who are incredibly bright, incredibly good at AIMS Cameroon. And Francis, who actually, he went to AIMS himself, but he’s a director of GHAIDEMS, IDEMS’ Ghana equivalent, I guess, what word do I want?

[00:08:58] Santiago: It was thought of originally as a potential subsidiary, but it was best to create a sister company, I think we call it, for a whole bunch of reasons that go well beyond my pay grade.

[00:09:14] Lily: Yes, that’s all I need, a sister company. We were lecturing together in AIMS Cameroon. And one day he turns to all the students at the start and he puts up a STACK question on the board and he goes, here’s some data. Now estimate the standard deviation from it.

And again, it was a graph, I think it was a dot plot again, and the students looked very confused and were like, what do you mean estimate the standard deviation? We don’t have the values, there’s so many dots up here, this is going to take us so long and that kind of thought process.

And I was looking at Francis as well, very confused, and Francis, what are you doing? This is, this is rogue, this is off topic, this is difficult, I don’t know how to estimate the standard deviation, and I had a doctorate in statistics. And I was like, Francis you’re being so mean here.

And as it turned out, this is actually a really simple question and a really simple way to do it and that, to me, links exactly to what you were saying before about how actually statistics is often taught, it’s taught very kind of, here’s the formula.

Whereas actually doing this STACK question really just showed, oh no, you can work it out, it actually can be quite intuitive, you can build that intuition, you can really understand this way, what standard deviation means. And some of the students actually got it as well so I learned a lot, like I learned a lot that day, but it was a really interesting insight, and then using this STACK question that then the students were able to answer it, try it again, like, now I understand what I’m meant to do. Let’s try it again.

And I really enjoyed that question. Then linking to your point of kind of like how has this gone down when we’ve done STACK, when we were in AIMS Cameroon, we were giving them STACK questions to do, and then they’ll have a quiz. Each week they would have a quiz or more or less each week it was a three week course. I think that they had three or four quizzes over the course of the three weeks.

And they could practice these quizzes using STACK questions, and then they’ll do the quiz in class one of the days. And we could see that the students, when Francis and I dug into the data, we could see that the students who had tried it out using STACK many times before performed much better than the students who hadn’t tried those questions before.

And to me, this just shows that kind of power of those mastery questions, the power of STACK and that kind of having this very simple tool there.

[00:11:55] Santiago: Sorry, what do you mean by having used these questions in STACK before? Did some of the students have experience of some of these questions before that course ?

[00:12:04] Lily: No, sorry, I’ve not been clear. So, they could access the STACK questions outside of class, and do those STACK questions outside of class. And then four times over the three weeks that we were there, they’ll sit down and they’ll have a quiz. And these quizzes were essentially those STACK questions.

[00:12:24] Santiago: I see.

[00:12:25] Lily: And so the students that had in their own time practiced the STACK questions then performed way better on the quizzes. Now maybe that’s because the students, you know, stats here, maybe there’s no direct link. We can’t say that there’s a direct link there because we obviously don’t have a control. So it could just be that the students that are better at stats are going to be the students that want to do these STACK questions, and then they’re also going to be the students that are going to perform better in the quiz. I don’t know.

I know that James did do a little bit more on it. James Musyoka, who you mentioned earlier, he did a little bit more on it and found that, again, with his students, and he uses STACK to teach, or used STACK to teach thousands of students, and found there that the students using STACK were doing fantastic and kind of exceeding or we’re doing better on average than the students not using STACK.

[00:13:17] Santiago: And do you know if that was doing better at an exam or an assessment or doing better in terms of gaining understanding?

[00:13:26] Lily: I guess how it’s, unfortunately, how it’s assessed is through quizzes. But I’m sure that he would have gone through and checked their understanding. I know certainly that the understanding level when kind of things like STACK were introduced was better than before having these kind of tools. I don’t know how that was measured.

[00:13:51] Santiago: Yeah, of course. It’s very difficult to measure understanding. I wouldn’t know how to even start to do that. I’ve been teaching for many years. You get an insight from discussing things with individuals or small groups.

[00:14:07] Lily: I suppose you could measure it by more qualitative data. So kind of interviews and a discussion, a kind of more oral conversation there. You could give a set of students STACK questions, a set of student non STACK questions, and then see, okay, how do both sets perform on like a final exam, and pick some of those students and have an oral test with them to also see their understanding on that level.

[00:14:33] Santiago: We could do a whole episode on experimental data for understanding I think, and I think David would be much better placed than myself to question you on that. But it’s interesting that this type of questions have in different contexts and in different levels of formality suggested better results or better understanding.

I suppose my other area of interest is student perception, because these are very different questions to the usual questions, they were designed for descriptive statistics and you might need to correct me, but it’s more about being able to visualise data and reach conclusions from the visualisations rather than performing calculations.

These questions are different, they’re interactive. Mastery, of course, requires repetition. So doing a question multiple times until you get it right with the targeted feedback that helps you get to the right path, that can be a lot more work for students as well.

So having unfamiliar questions in an unfamiliar platform of an unfamiliar format, how do they receive that? Is that something they resist to? And of course, I’m not asking for a statistical analysis here just your intuition or your perception of things.

[00:16:07] Lily: Yeah, so my perception of it is that it’s been well received in general. I know certainly there are points where, I mean, as with anything, it’s good to pilot. You know, when you make a STACK question, maybe in your head it will work one way and then you go out there and you see them do it and you’re like, okay, we need to change this because this is not working out how it should.

But in terms of how the students have found it from my experience, I’ve never seen any kind of complaints. I think that they find it quite intuitive, especially if you explain it, explain to them the question and these different, like in STACK you can have these explanations.

I know at the moment one thing that I’m going through with the West Africa team is that they are trying to translate one of the courses into French. And so they’re trying to change the STACK questions into French, that’s creating an interesting opportunity to try and think. But also it’s very easy to duplicate a STACK question, and then, it’s one thing to translate the language, but also to translate the context.

So it might be that we just changed, you know, it might be in the kind of East Africa context we use one type of crop that might make more sense over there in a question. And then so when we go to the West Africa context, we can look at different crops.

[00:17:36] Santiago: Yes, we did that, I believe, with the millet question, where we had millet in one region and another type of crop in another region. But that was for an experimental design course.

[00:17:51] Lily: Yeah, yeah, yeah. I remember one time writing a question and talking to David and he said to me like, this is perfect, the only thing I’ll change is you should use millet instead of, I don’t know, corn or something.

[00:18:03] Santiago: Yeah.

[00:18:04] Lily: That’s easy. That’s easy enough to make that change. Okay.

[00:18:08] Santiago: Which is something that is very much ingrained in IDEMS, looking at ways of contextualising things smoothly. I can’t say we have found a process to contextualise STACK questions very smoothly yet.

[00:18:23] Lily: No.

[00:18:24] Santiago: And it has led to a mess in our question banks because we duplicate questions and change something and then if you find an improvement for the question then you implement it in one but you have three or four other copies of that question and you don’t know where they are and so on.

[00:18:41] Lily: Yeah.

[00:18:42] Santiago: There are tools now like GitSync that will help with that a lot more.

Maybe getting to the last sort of area of interest that I have, you design courses as well as deliver courses on data in a range of contexts. How have you found the experience of trying to integrate STACK and particularly thinking about new questions or concepts that you want to get STACK to help your students understand through mastery?

[00:19:21] Lily: Yeah, that’s a great question. Because it’s a learning curve for me integrating it because I guess like pedagogically speaking, I feel that there is a kind of correlation between writing a good stat question and writing a good question for someone in terms of, okay, I could ask a question and they can put yes, no, or put in a value, but that’s not really what we want.

We want to think outside the box. One example is with a question that we wrote on merging data. In this question, you give them two data sets, and there’s some inconsistencies between the two data sets. We’re told to merge the data and give us the mean or something of the rainfall.

And if they give us one value of that kind of mean of that column, then we know that they did something wrong when they merged. So it might be that, for example that the units are different. It might be that one of the data sets has the rainfall in centimetres and the other data set has it in millimetres. And if they don’t make a translation there, when they merge those data sets, then they will get a much smaller mean than they should, or a much larger mean than they should in the rainfall. Does that make sense?

[00:20:33] Santiago: And this is another feature that we implemented in STACK questions, that we give students the opportunity to download data sets, analyze them externally in a package of their choice. Then come back to STACK to answer the question after carrying out an analysis of the data, which is very different to the initial questions that we talked about.

[00:20:58] Lily: Which is great as well, because the data that you download is randomized each time. And so again, you still have that mastery in there, you can download these data sets, but it’s a different data set each time. But it’s built such that, you know, the mean will remain the same. It’s just those values around it will change.

[00:21:16] Santiago: Yes, or in some more ambitious cases, we get randomly generated data of different types for each data set and the results will vary depending on the type of data or the distribution of data that you generate. I’m not sure I’m explaining it well enough.

[00:21:37] Lily: No, it’s incredible what you can do with STACK and it’s incredible these options of like generating this data and adding in there these kind of parameters that can change when you generate the data and that the STACK question already knows the right answer so you can verify that way the correct answer.

[00:21:58] Santiago: And potential misconceptions.

[00:22:01] Lily: Yes, yeah, absolutely. And so, oftentimes, writing the STACK questions, or kind of thinking of what the STACK questions should be for a module, is really difficult. It’s getting easier the more that I am getting my mindset into this different way of thinking, but it’s difficult because it’s like, okay, I need a question, it has to be mastery. But in a way it has to be kind of multiple choice as well, because it doesn’t work as well in STACK to have someone being able to write in a big bit of text.

Particularly with stats when we’re interpreting something, it’s like, okay, interpret this graph, but I want it to be mastery. And I don’t want it to be that they write it all in. I know that there are really interesting things that you’ve looked at before, maybe even implemented before, of looking through what people have written in and catching certain phrases to be like, okay, they’ve said this, this is good.

[00:22:53] Santiago: I don’t think I’ve implemented that just yet.

[00:22:58] Lily: Okay.

[00:22:58] Santiago: I think that there are possibilities that are being explored and STACK is being developed at an incredibly fast pace, I don’t think text interpretation is quite there yet. But yes, I get your point from a pedagogical perspective, you want to almost transform an open ended question into sort of direct question where there is a specific answer that is correct, that can be either quantified or qualified in a somewhat mathematical way.

[00:23:36] Lily: Absolutely. Yeah, that’s much better way to put it. And that’s a different way of thinking, and it’s hard. Writing questions is hard, but fun.

[00:23:46] Santiago: Yeah, we’ve spent probably about 10, 15 hours once writing a question. I think it was for one of the Responsible AI courses, where we went back and forth with the conceptualization of the question, which is the process that goes before I implement something where we need to look at all the possible responses and what sort of feedback we want to give.

And there can be multi part questions so relating the first answer to the second answer and the third answer and so on. It’s a very complex system, but thinking about those questions and implementing them, I think we make a good team in that sense where you have the knowledge of the data and I have the knowledge of implementation. We iterate multiple times until we get the structure of the question right before we start implementing. And then the implementation leads to other problems, this can’t be done or this is way too much code that is needed to assess this, and so on.

But it’s been good fun. From my side, it’s been a fun experience, and I’m glad to hear that it’s been received well and it’s had positive impacts.

[00:25:14] Lily: Yeah, yeah.

It’s also, I guess just the final thing to say about it is it’s nice ’cause you’ve got the data there on how people have answered it and how many attempts people have had. And so you can see on that data, on how the students are finding it in their performance and perhaps there’s a question where people aren’t answering it well, and you’re like, okay, this clearly means something’s wrong in the question. And particularly doing e-courses as we do. That’s a really useful way to get quick feedback.

[00:25:46] Santiago: Well, I’d like to argue against that.

[00:25:49] Lily: Okay.

[00:25:50] Santiago: It’s not always that the question is not correct or not okay. Sometimes it highlights a lack or perhaps mixed understanding of the underlying concepts before attempting the question. So it’s not just about the question itself, but where do you place a question within the context of the course? Do we need more prerequisite questions? Do we need to reinforce some concepts before? Would you agree?

[00:26:25] Lily: Yeah, yeah, that’s a really good point. But even having that is useful feedback when writing the course to be able to go like, okay, well, people are getting caught up on this concept. We haven’t explained this concept well, or people aren’t watching this video on it, or, you know, or we don’t even discuss this concept.

[00:26:42] Santiago: So using questions multiple times can give you quite interesting data on how to redesign a course perhaps which might include redesigning questions because they’re not quite at the right level.

[00:27:00] Lily: Yeah. Yeah, I think so. I think that’s a good way to put it.

[00:27:05] Santiago: Okay. Any final thoughts from your side?

[00:27:08] Lily: Nothing in particular, I guess just to kind of summarize is that using STACK in data questions, I think has been incredibly valuable. It’s a fun challenge to try and think of these questions. I know that there’s some questions which I’ve written which I’m like, no, we’re gonna think of a better question for that module. Definitely still ticking over in the back of my head and has been for years but regardless. But it’s really amazing the kind of power that it can then give to the students in answering it.

And particularly what we were saying at the start on kind of having that interactivity, having that mastery is so different to when you just have a textbook in front of you. And you can actually now focus so much more on understanding the problem than just knowing the formula.

[00:28:01] Santiago: Yes. And that is fueled by the potential for randomization as well.

[00:28:08] Lily: Absolutely. Thank you very much.

[00:28:11] Santiago: Yeah, it’s been a pleasure. Thank you, Lily, for your feedback.