182 – Twenty Years of RMS for CRFS: Levels of Variability

The IDEMS Podcast
The IDEMS Podcast
182 – Twenty Years of RMS for CRFS: Levels of Variability
Loading
/

Description

Lucie Hazelgrove Planel and Roger Stern consider the importance of multilevel data in agricultural research. They discuss the concept of measurement units at different levels, such as individual, household, community, and village, and the challenges faced when dealing with multilevel data in practice. Using an example of a student’s experiment on tadpoles, they illustrate key principles in determining appropriate experimental design and data analysis.

[00:00:07] Lucie: Hi, and welcome to the IDEMS podcast. My name is Lucie Hazelgrove Planel, I’m a Social Impact Scientist and anthropologist, and I’m very pleased to be here again with Roger Stern, continuing our special series of conversations about research methods for agriculture.

 

Hi Roger. Thanks for joining me today.

 

[00:00:24] Roger: Hello, Lucie. It’s nice to be back.

 

[00:00:26] Lucie: Good.

 

So I’ve heard that in statistics people are taught about multilevel data, which in my understanding of it, it sort of answers the question, what is your unit of measurement, so is it a person, is it a village? So that’s one individual or a collection of individuals living together. Or, for example, are you working at the plant level compared to the field level, depending on your study.

 

And so many of the researchers we work with in the Global Collaboration for Resilient Food Systems, they work at several levels. It’s hard not to work at several levels in agroecology. And so I’m interested to know why this is important, I think. Or perhaps why is it important as a concept?

 

[00:01:13] Roger: To me, one of the bits of importance is if there’s one topic that is not taught at an elementary level and arises at an elementary level, as soon as people leave the classroom and go into practice, it is the fact that they have to cope with multi-level data or with the idea that you need to think of different levels.

 

And not having been taught that at all means that many beginners in their research career or development career are at a disadvantage, this is new to them. We brought that into practice in our teaching in Kenya many years ago when we decided to have a first statistics course, which looked at examples of data and showed they were always rather easy, but we decided to include multi-level data in the examples.

 

And many statisticians were shocked because they said that’s very advanced. And that’s the difference between the data being at multiple levels and your analysis needing to be at multiple levels. So let’s think of some examples.

 

Now a lot of people are worrying about on-farm studies, and they now have simple experiments in a number of farms. And then quite naturally you can think of the farm level and the plot level. So every plot within a farm is one level. And the farmer and the farm is another obvious level. And in surveys that is extremely common, that you often do a survey of households and then you ask some questions of the people within each household.

 

[00:03:17] Lucie: And this is partly due to the variability then within these things, that farmers fields are not going to be the same and farmers are not going to manage their fields in the same way, I think, as we’ve discussed previously.

 

[00:03:29] Roger: Yes, the minute you discuss that, you have the variability that farms are different and you also have the variability that plots are different within a farm. So for example, the people in a household are different because they’re different ages and they’re different genders. So you have differences at the person level, and you also have differences between households at the household level. And then you have households in villages, and therefore you have variation at the village level.

 

And certain things are measured at the village level. For example, if you were thinking of selling produce, then it could be how close you are to a major road, because that means you get to markets easier. So that closeness is a village level measurement often. Whereas things about the household are at the household level and things about people are at the person level.

 

And that’s quite natural that you have information at different levels.

 

[00:04:35] Lucie: So, to come back to this initial question of why is it important? It’s because in order to understand the situation then, you need to know what level you’re thinking of. So in your example about households or villages being close to markets, there it’s clear. Well, if we are imagining a village that is well-defined, let’s say fairly compact, then you don’t need to think of it at the individual level.

 

You can perhaps, if the market is not in the village, perhaps think of it at the village level. Whereas if you’re working with a community which is very spread out geographically, then perhaps you do want to think about access to markets as being at the individual level.

 

[00:05:12] Roger: So once you start discussing what you’re going to do it clearly depends on your objectives at what level. And I fell into this as a very junior member of staff. So I want to give you a very simple example, which was my first piece of advisory that I had to deal with as an extremely junior lecturer, having had a theoretical training in statistics, but being expected now as a statistician to give advice.

 

And my first advisee was an undergraduate student doing a small project. And she asked me a question. Tadpoles are, tadpoles grow into frogs, so tadpoles are young frogs. But she was worrying just about the time they were tadpoles, and she was interested in the rates of growth of tadpoles in different temperatures. So her little experiment was to have the tadpoles in jars, and she could then have the jars in places where she could control the temperature. And she was proposing to have some jars at each of three different temperatures.

 

And she was very embarrassed about the question because she thought it was very elementary. And her question was simply, I’ve got a lot of tadpoles available to me and I have a lot of jars. Should I use just a few jars with a lot of tadpoles in each jar, or should I use a lot of jars with just a few tadpoles in them?

 

[00:07:02] Lucie: Asked like that then it’s hard to answer.

 

[00:07:06] Roger: I found, not only was it hard to answer, but I did not know how to approach the question. So all I could do was to tell her that this wasn’t a silly question that she’d asked, which she thought it was, but it was to me a very good question, and when she came back on the next day, I would try and answer it.

 

And then I made the first elementary mistake as she left, and I tried to look, remember I couldn’t look on the web because this was many years ago, so I looked in books for tadpoles and I couldn’t find anything which talked about experiments on tadpoles.

 

I then vaguely realised that I must think about the principles, that maybe it wasn’t just tadpoles that had the problem. I went off to a colleague who knew a bit more and he put me on the right sort of track, that I must think about principles. And the main principle, one of the principles, is what is the unit to which I apply a treatment. And the answer is she was having the temperature different and the temperature was applied not to a tadpole, but to the jar.

 

[00:08:19] Lucie: Okay, yeah.

 

[00:08:20] Roger: Therefore, this was quite a simple question, which is when she was saying how many should she have, because the treatment was applied to jars, it’s how many jars. So the size of the experiment was the number of jars because that’s where she was applying different treatments, so it was simply a question of jars.

 

And then in jars, then you put a number of tadpoles in. But the question should be how many jars, and then inside a jar, you should have happy tadpoles. So I now realised that this was the same question, or similar question, as people doing a trial on different plots of land, and, let’s say, applying a treatment, data planting or density of planting or fertiliser to the plot of land.

 

[00:09:13] Lucie: Okay, yeah.

 

[00:09:14] Roger: And in the plot of land you had plants. And she was now saying, how big should my plots of land be and how many plots should I have is actually the same question.

 

[00:09:25] Lucie: Yeah, this is really interesting.

 

[00:09:27] Roger: It would be a similar question, if you were doing education and you were doing innovation of teaching methods, then you would probably do that treatment to a classroom and you would have pupils in the classroom. So how many children would be sensible to have in each class so that you could measure things well, and how many classrooms should you have?

 

In that sense, the next thing that you have is, remember every trial has layout, variables, and they have treatments, and they have measurements.

 

[00:10:03] Lucie: And you mentioned happy tadpoles there, so in the classroom example, I’m imagining that, I mean there’s physical limits as to how many children you can squeeze into a classroom. And perhaps physical limits also in terms of, well, what makes for children to be in a conducive learning environment?

 

[00:10:21] Roger: Yes.

 

[00:10:22] Lucie: So even if it’s not maximizing the amount of people in a space, there’s also, well actually, in what conditions will they thrive? But then there’s also in what conditions are realistic, perhaps.

 

[00:10:36] Roger: That’s right. So one of the principles that isn’t written down as a principle so much, but I think is vital is that biology or, if you like, common sense, overwhelms statistical principles. In that sense, in her example, you need to have happy tadpoles that are realistic.

 

[00:10:59] Lucie: Ah, yeah. Well, I’m gonna split hairs, Roger. Is biology here the same as being realistic? I mean, there’s ideal conditions for growth, but then there’s, realistic conditions that tadpoles live in, which may not be the same.

 

[00:11:15] Roger: You are absolutely right, and that’s the very big discussion that you have with on-farm experiments and on station experiments. On station you have plots and you ask questions about plot size. And then you can criticise that by saying, but actually we want the results to apply to farmers. And they don’t do plots, they have fields which are much bigger than the sort of plots, quite reasonably.

 

And so we would be making the assumption, which is the same as the tadpoles. And tadpoles don’t always live in jars, they sometimes live in streams and in rivers and places. We also have to make sure that putting one tadpole in a jar would obviously be fine as long as the tadpole wasn’t unhappy because they were used to being with quite a few of their colleagues. Having lots of tadpoles in a jar might mean that they’re a bit squashed.

 

So that’s where I’m saying that having made the assumption that we can do a useful experimental study, we still have her question of how many tadpoles should you have, which is the same as the plot size. Now, one difference from the plot size in agriculture is that in the tadpole experiment, she’s probably going to measure individual tadpoles.

 

So if you have a lot of tadpoles in a jar, you have got to either measure a sample of those tadpoles, and then you wonder why you need it anymore, or you don’t have so many tadpoles in your jar. So as part of the answer, you need to think what you’re going to measure and how often are you going to measure it.

 

Are you going to measure just once at the beginning and end of the experiment, or maybe just at the end because you assume they’re the same size at the beginning, or are you going to measure repeatedly? If you’re going to measure repeatedly let’s also stick with our generalisation, supposing we’re going to measure every week after the end of the week, and I’m going to suppose it’s practical to take our tadpoles out and measure them and then put them back and they don’t suffer, but what I can’t do very easily is mark a tadpole. I don’t know whether I’m measuring the same tadpoles.

 

Whereas if we had sheep or chicken, then we could probably mark them and then every week we could go back and measure the size of the same chicken.

 

[00:13:47] Lucie: So this is really interesting that here you’re saying that, according to the biology, it also dictates a bit what level you might want to be taking your measurements at.

 

[00:13:57] Roger: And in the tadpole thing, we are taking measurements at an individual level. But in comparing the treatments we would probably take a summary to go to the treatment level, which is the jar level before doing any of the analysis, because there’s very little we can do at the tadpole level, we can do the tadpole level on one occasion, but on future occasions we can’t follow a tadpole, unless we’ve got one tadpole per jar, or they are distinctive in any other way.

 

[00:14:30] Lucie: But I know some researchers, they measure different plants within a plot. But here, as you mentioned, with the plants, you know that you’re going to measure exactly those five plants or something in the plot. Whereas here, each time it would be a sample.

 

[00:14:43] Roger: When you measure those five plants, you now have a choice because plants are quite obedient, they stay in the same place. With plants, you have a choice. You’re going to measure, let’s say the height of five plants on six occasions as they grow in the season. You can go back to the same plant and measure them, or you just measure the five plants. And it’s rather different.

 

In one case, you follow a plant through, which we are saying is impractical for the tadpole. In other cases, although you’ve got the same numbers, the rows, if you like, the plants don’t mean anything because you don’t know whether your first measurement was on the same plant or not.

 

If you are following the plant through, then although your treatment is applied at the jar level or the plot level, your data is available at the plant level. So you could see how a plant grows individually and that the small plants stay small. And you do that with a medical experiment when you measure people’s heights, you would tend to follow that person through and do a summary of the people.

 

But then remember still the treatment is applied at the jar level. So you are then trying to move to the jar level to do your analysis where you are comparing the treatments.

 

[00:16:06] Lucie: Okay, fantastic. Another aspect of this that I’ve heard you mention before is what happens if, for example, if tadpoles die?

 

[00:16:17] Roger: Yes. That’s an interesting one, isn’t it? As you say, that’s the same question as plants dying. Supposing you are measuring the length of the tadpoles and you’re measuring five tadpoles, and let’s assume for simplicity you’ve decided to measure five tadpoles and you have only five tadpoles in the jar, so you are measuring the size of each tadpole, although you can’t follow them.

 

And then you find that occasionally either plants die, or in this case our tadpoles die. I wonder what we should do about the measurement. And there the measurement doesn’t change, the measurements of the live tadpoles changes, but the measurements of the dead tadpoles stays the same. Probably like the height of plants that a few plants die. What should you do?

 

[00:17:05] Lucie: I was just trying to work out how the dead tadpole’s length could stay the same. But you’re quite right. I mean, normally in the experiment, nothing would happen to the dead tadpole, if it was in a natural environment then something would probably happen to it.

 

[00:17:17] Roger: We may want to be compassionate and give it a burial, or we may want to leave it at the bottom of the jar, but either way, it stays roughly the same. And the important question now is how does that link to your measurements? And to my mind, it adds an objective, and one of the key question is, does the death of tadpoles relate to the treatments? The sensible thing is to add another measurement. How many Tadpoles died?

 

[00:17:50] Lucie: I mean, here if you’re measuring the temperatures, I think the student was, then perhaps a temperature that’s too cold or too hot, it might have the unexpected or sad consequence of killing some of the tadpoles.

 

[00:18:00] Roger: There are two different possibilities. One is that tadpoles may die anyway, just as plants die in plots, and that just happens. Then the important thing is that you record that information, but it’s unrelated to your objectives of your experiment. And you now would presumably have a new objective, which is how do tadpoles grow when they stay alive?

 

You now realise as you collect your data that your objective was not quite right. You now need to record that some tadpoles die because that’s what happens to tadpoles unfortunately. And plants, or some plants in plots die. If it’s nothing to do with the treatments, then you are worried about those that stay alive. And so you measure the heights now of let’s say your five plants that remain alive.

 

[00:18:56] Lucie: You gave me another really interesting example of this, about, for example, chicken farmers. Like how many chickens do farmers have and what happens if you’re doing a survey and you have some farmers who don’t have any chickens? Are you going to include those?

 

[00:19:11] Roger: Recognising this is the same problem is I think important. It’s a problem I call a problem of zeros, that the plant that dies is something different. The tadpole that dies or the plant that dies or the chicken that dies…

 

[00:19:26] Lucie: Or the chickens that don’t exist, the farmers that don’t have chickens, are you going to include them in your survey or are you going to think of them as a separate case? As you were saying in terms of the tadpoles, you know, specifying, or being aware what your research is. Is it to specify how tadpoles grow when they’re alive or is it to think in general how tadpoles, I guess I think less people are interested in that, how tadpoles whether they live or die, how they change in size.

 

[00:19:56] Roger: Remember that the key question though is, is there any relationship with your experiments or do you just have to accept there are zeros in it or observations that need to be treated differently? This isn’t so different from a medical experiment where you record the state of health of people, and some people may either die or they might leave the experiment. What do you do about them? And the main thing you must do is record that information and probably treat it separately. So that’s where I call it the zeros problem.

 

And a very simple example that I often give to students and they think it’s very simple, which it is. Is counting the number of chickens by six farmers. And the first farmer gives zero chickens because he doesn’t have any. The second has five, the third has 10, the fourth and fifth each have zero, and the sixth has 15. And my simple question is, what’s the mean? How many chickens per farmer?

 

[00:21:10] Lucie: This absolutely depends on your objectives, your research objectives.

 

[00:21:16] Roger: Everybody thinks that’s very silly and they either give me one answer, which is the number divided by six, it’s 30 divided by six, which is five chickens per farmer.

 

[00:21:28] Lucie: And that’s the answer to the question: how many chickens do people have, let’s say, or how many chickens do people have, whether or not they have chickens, whether or not they decide to have chickens?

 

[00:21:43] Roger: And that’s the answer that you’d get if you put your data into Excel and said please give me the mean of that column of data. So that’s the non-thinking answer. I’ve got six farmers, I’ve got a total of 30 chickens, how many chickens per farmer, it’s the mean, what could be simpler? It’s five.

 

[00:22:03] Lucie: Well, it’s for general population understanding.

 

[00:22:05] Roger: The slightly more complicated answer is to notice there are three zeros for a good reason that they don’t have chickens, and to notice that answer number one is that only three outta the six farmers have chickens, so that’s my first part of the analysis. And then I won’t bother about the zeros anymore because they’ve done their job, and I’ll take the average of five and 10 and 15, and I get 10, not five. Both of these are valid answers, but they’re answering different questions.

 

One is the number of chickens per farmer and the other is the number of chickens per chicken farmer.

 

[00:22:45] Lucie: Exactly, yeah. Fascinating.

 

[00:22:48] Roger: And both of these are valid. I prefer the second, which is to divide up the data into the zeros and the not zeros and then to check those two groups separately.

 

[00:23:01] Lucie: Well, because it gives you much more information if you go into that separation and if you go into that sort of understanding of well, some people have chickens, some people do not, and then trying to understand the difference between those two groups even. Then I think it gives you more depth of information.

 

[00:23:17] Roger: It’s a special case, of course, the general analysis problem, that there may be reasons why we have zeros. There may be two groups of data in this, that’s what we call the factor, and we want to analyse the data altogether, or we want to split it up. And splitting up by zeros, we can see just by looking at the numbers, we don’t need another column to tell us which group these farmers are in.

 

But we could have got to there by saying, are you a chicken farmer? And those that say yes, we then say, how many chickens do you have? So we could have had it as two separate questions, one with a yes no, and the other with a numerical answer, which only makes sense if you said yes.

 

[00:24:02] Lucie: But, you know, it would give people even more questions to answer.

 

[00:24:05] Roger: If you asked it as two questions, it becomes very obvious that you should have two answers, one to the first question and the other to the second. And the second question only is given to the three farmers. That’s another way to look at this. But I often find the zeros question, like the dead tadpoles or the dead plants, people don’t think of as two questions, they think of it as just one. And that’s, if you like the unthinking answer.

 

[00:24:34] Lucie: Great. Well, thank you, Roger. We’ve discussed, you know, these different levels of data, and, you know, coming to the end in terms of what to do with zeros or what to do with deaths sort of thing, and how analysing that at, I dunno whether to call that at different levels, but it isn’t quite.

 

[00:24:51] Roger: No, well, you can think of it as different levels, whether people are chicken farmers or not chicken farmers. It’s dividing the data up into groups, but I don’t think of that as different levels.

 

[00:25:01] Lucie: No, but it’s coming back to variability and that’s why we have the different levels of data. Well, that’s why we need to think about the different levels of data.

 

[00:25:07] Roger: And it is the general question that our analysis is trying to understand the variability. And in this case, the zeros are rather different to the other numbers for a very obvious reason. And so that’s the first tenet of analysis. Analysis is to understand the variability of your data, and understanding the zeros is a very simple example.

 

Let me finish with the point I made at the beginning. Like levels of variation, there are lots of elementary things which are so simple, but they’re not taught in a standard statistics course. So the first time that many researchers will meet the zeros problem is when they’ve got a practical experiment, not in the classroom. So they should meet all these problems in the classroom.

 

[00:25:54] Lucie: Great. Well, thank you so much, Roger. It’s been wonderful discussing that with you. And it’s really lively example with the tadpoles case.

 

[00:26:03] Roger: Thank you very much. Until the next time.