205 – An Interview with Rikin Gandhi from Digital Green

The IDEMS Podcast
The IDEMS Podcast
205 – An Interview with Rikin Gandhi from Digital Green
Loading
/

Description

David talks to Rikin Gandhi from Digital Green to discuss the organisation’s innovative approach to integrating AI with farmer support systems. They discuss Digital Green’s approach to working with AI, including the importance of human-in-the-loop systems, the benefits of using multimodal inputs like voice, text, and images, and the advantage of open-source data for tuning AI models to meet local agricultural needs. They also explore the potential and challenges of leveraging small language models to provide tailored support to farming communities and the critical role of local expertise in enhancing AI’s effectiveness.

[00:00:00] David: Hi, and welcome to the IDEMS Podcast. I’m David Stern, founding Director of IDEMS, and it’s my pleasure today to be here with Rikin Gandhi. I hope I’ve pronounced that okay, from Digital Green.

[00:00:14] Rikin: Great to be here with you, David. Yes. It’s wonderful to dig into this with you.

[00:00:18] David: Absolutely. I have to confess that while Digital Green was more on the video side of things, I didn’t know as much about it. Now you’ve got into this AI piece, I’m really in awe of how you’ve been going about this. I’m really pleased to be able to have this discussion, it’s a follow-up discussion from a previous episode where I demonstrated I’m a bit of a fan of your approach.

[00:00:45] Rikin: Well, I really appreciate that and yes, you’re right. We started with videos buy-in for farmers. And then COVID happened, and then the in-person facilitated video screenings that were taking place in places like India and Ethiopia had to be stopped ’cause of the physical distancing requirements. And at the same time, we saw many farmers coming more online on YouTube.

And with the advent of GPT 3, we saw this convergence opportunity for farmers to just use multimodal voice, text, and images to get answers to their questions.

[00:01:17] David: Absolutely. And the fact that you have this amazing bank of farmer to farmer videos to draw on is a big part of what I think you’ve got right. But the thing which I’m really loving is, you’ve got the business model aspect you’re thinking about carefully, you’ve got a not-for-profit model behind this, your openness, and then your attention to detail on the models themselves is great.

So I’m wondering which of these we should start on. Let’s start on the humans in the loop aspect, which is something I found very interesting you made central to how you go about the AI process. Do you want to speak a bit more about that?

[00:01:57] Rikin: Totally, yeah. The primary humans in the loop, in the old farmer chat AI system is of course the farmers themselves, because they’re the ones who are directly accessing information, submitting photographs or voice notes to try to triage issues and questions on their farms, or even in the evenings, to kind of try to plan out what they might wanna do in the next season.

And personally, that’s been super energizing because extension has been for so long, very top down, from research to extension to farmers. And even our videos that would feature local farmers still would sometimes fall flat because maybe the video and the commodities that it was featuring were irrelevant to like the farmers. Whereas in this case, you’re actually seeing more on demand. What do the farmers actually care most about? What are their needs and what are their interests?

And then the data and the insights that they’re sharing in the form of these voice notes, photos or text prompts, or even their usage patterns in the app, are really amazing and rich resources because that then helps to prioritize what are then the needs from a language point of view, an image point of view, a content point of view so that we’re not trying to scour the world in a blanket way, but are really seeing as per farmers’ actual demands. And where the AI systems are working well and where they’re failing, we can close the gaps in a much more targeted way. 

[00:03:34] David: The thing which I loved is this fact that, in some sense, you’re super powering extension agents because you are having the initial extension agent, which is basically the AI, but then you are getting other people to look at what’s happening. Is this actually the right advice? Could there be different advice? What other ways could we train the models to do better and what are the pieces of advice that need to get in? Are there bits of wrong advice being given out? All of that needs that human layer to be able to improve the systems.

[00:04:06] Rikin: And you know, these neural networks, which are the basis of these generative AI systems that the world now has, are in fact black boxes. So it is important that we first have a process of gatekeeping with extension agents being the first line of defense and tuning of the models, because you don’t wanna get misinformation out to these farming communities or biased information towards conventional or industrial agricultural systems that are irrelevant to these small scale farmers.

So we do, when we enter into a new geography, we first start working with extension agents to ask actual questions from farmers that they serve, and then help to provide feedback and tuning. And then, as you said, from that data of the extension agents, and then eventually when we have confidence to be able to take it directly to farmers, that data and that feedback is then reviewed by local agronomists who can then look at the question, answer pairs that have been generated from the farmers, and then the AI system generated, and then make corrections, provide scores, annotate images where the AI system is fails and is unable to classify the crop or the issue that it might pertain to.

And then have their input be what helps to tune the speech to text translation image models in a much more location specific way. And then what’s amazing about that is it becomes a benchmark data set that then as the AI world continues to evolve super dynamically and there’s basically a new model a minute, we can validate how well these new models, these new pipelines perform, not by some generic expert sort of dataset, but by the actual farmers’ queries and these local agronomists reviews.

[00:06:03] David: What’s great, you know, you get so many details right. These benchmark data sets, you are making open. So these can be then built into different sorts of models, these can actually be reused, there’s the language component, which I’m also so in awe of. Just a few years ago wouldn’t have been possible, but you are now doing all this in local languages.

[00:06:24] Rikin: That’s totally right and it’s amazing how the big tech folks who are investing literally orders of magnitude, billions of dollars of investment into this space, how they’re trying to cover a bunch of languages, which includes intersections with places that we work, like Ethiopia, Afaan Oromo, Amharic, Nigeria, Hausa, Pigeon. The issue is that these models are not often tuned to our agricultural domain. And keywords in the agricultural space that are mistranslated or are misinterpreted from a speech text or text to speech basis can throw off the rest of the pipeline completely.

The word, for example, bigha in the state of Bihar in India can mean different units of land depending on which part of Bihar you are. In some places it’s one 10th of an acre, in some other part of India it’s another dimension. And of course, farmers don’t also use kilos and litres and varietal names. They have their own vernacular. So trying to use off the shelf models, whether it’s GPT, Llama, or Claude, is not going to serve the purpose.

And we found that if you don’t do this tuning, you will get very unexpected results that the farmers ultimately will not be happy about. And so you have to kind of go through this tuning process, and we definitely see that this should be seen in the public domain. A), because our space is so resource constrained as it is, so there’s no point of multiple parties trying to create like different benchmarks, we should all try to come together and create some like crowdsourced benchmark that we can all validate against.

And then number two is that it’s in our collective interest to say that how well is our tuning going because the models do keep getting better. And I may not need to keep tuning the same things as I was even six months ago. And so, I need to be able to adapt based on what the gaps are, again, which are driven not by my own intellectual desire, but rather what the farmers’ priorities are and where the system is working and where it’s failing.

[00:08:43] David: Absolutely. This whole process, is such a lively space, is one of the ways to put it. You’ve mentioned the models are moving faster, the data which is coming in as you get more data, again, it changes what you can do and how you can do it, the uses of it, how fine tuned you can get, we discussed when you’re in Hausa in northern Nigeria, you could go across the border to Niger and there’s a Hausa, but it’ll be a bit different. How much this can be reused, how much more tuning you need, this is going to evolve so quickly as these ideas get out.

But the thing which I’m really excited to dig into is that if we look at this at a slightly bigger scale, then going across these communities is going to be one of the big things. I mean, we know in a lot of the places that we work, actually, the overlaps of communities, they’re not clean, there’s clear overlaps in different ways. So, how do we actually get the models as they come together to be working across languages, to be growing across languages, to be having sort of common elements, which therefore the learning comes together?

Because one of the things which I’m really interested in is, as you are getting new languages on board, the learning and the tuning presumably needs to broadly restart from a language perspective. But maybe not from the sort of agroecological content perspective, but it’s not clear what that transfer can be. There’s some of this, which of course you can do, but there’s some bits where it’s gonna be hard.

[00:10:17] Rikin: That’s true. I was just in Ethiopia two weeks back in Adama, just about a couple hours outside of Addis. And what was really exciting was how these communities who are using farmer chat, speak the local language of Afaan Oromoo, and they would first use that as their primary way to interface with Farmer Chat to ask a question or get information. But sometimes the Afaan Oromoo was not perfect in the system, and they would sometimes then go to Amharic language where they found the response wasn’t good. And even when Amharic sometimes wasn’t also very good, they would sometimes also use some English words to try to get the information that they were looking at.

But what was also most exciting was I met, while I was in Adama, this local entrepreneur, he runs this company called Addis AI, and he’s training some of these off the shelf open source models from Meta like the Llama series and MMS, and he’s taking curated YouTube shorts in Amharic and Afaan Oromoo language, and then he’s using that to improve the language capabilities in those languages of Llama and MMS.

And it’s actually performing really well. We took it out to the field and got some farmers’ feedback because again, what he’s doing is he’s not just assuming that the YouTube videos are gonna make these models perfect, but he’s using reinforcement learning with human feedback to make the issues that these tuned models are having better, and to resolve them. And we find that compared, again, to like off the shelf whisper models from OpenAI or Google’s automatic speech recognition models, they’re much more performant and way more natural sounding, including when you hear the text to speech output of these models.

And as you said, the power of these AI models is sometimes non-intuitive because they are these black box neural nets, it’s really the combo of placing the data and then the system generating these probabilistic weights that then creates things like the gradient descent, back propagation models, that is a struggle for less, you know, even computer science people, to fully understand.

But what can we look at? We can look at the inputs and we can look at the outputs. If we keep those at the level of what the farmers actually are wanting and what is the local agronomists who aren’t just like technical experts but are also familiar with the holistic agroecology, the sociocultural context of these farmers and the language that they’re speaking and all the colloquialisms alongside it, then that does actually give you a ton of ways to actually tune the neural nets to be able to serve this ultimate purpose at a potential of scale and not doing so in a cookie cutter way, but representing the diversity of the varying interests that these communities and the ecologies and landscapes that they reside in require.

[00:13:28] David: And one of the things we haven’t dug into yet is, of course, well, what about the images? Once you start going beyond, then the actual questions are important, but we know a big thing which happens is people take photos and then they want to have information which comes back. And that’s another thing I think you’ve been working on in really interesting ways.

And this is something we have colleagues in West Africa who have been really keen on this because a lot of the things which are not trained locally just don’t work for them. So you need to have these mechanisms to be able to tailor this to specific local contexts.

[00:14:05] Rikin: Totally. And multimodality where a farmer or an extension agent can use a combination of voice, text, and images, not just one, but perhaps both or all three in combo is quite powerful because it’s quite difficult sometimes for a farmer to articulate what is the pest or the disease or the stress that I’m seeing, whether in a crop and even in livestock systems. Do I explain it as like a white bug or like a spotty pattern? It’s way easier just to take a photograph.

And in fact, in Ethiopia, because some of the local language support is not perfect, actually farmers are sometimes just preferring to take photographs. The last month, 78% of the queries that we saw from Ethiopian farmers in Farmer Chat were images, because it is also this time of year where farmers are seeing a lot of disease and pest situations on their horticulture crops and such.

Now the challenge is these off the shelf vision models like GPT Vision, even like Plantix, have APIs, but they’re not super attuned to the particular crop in livestock systems that we all work in in the Global South. So we get a hit rate of about like 40% of the images are able to be classified for the crop or for the pest or disease there.

But that’s again where reinforcement learning with human feedback comes to fore because where the image models are not able to resolve what the issue is, then we pass it to these local agronomists, again to annotate what is the AI able to decipher. And similarly, even when the AI does generate a response for a particular image, we also pass a subset of those to the local agronomists to review and make sure, is it right or is it wrong? Because you, again, need to do this on an ongoing basis.

And this, many of us have had the experience of using applications like Chat GPT or Gemini. And the reason why they keep getting better and why we use it is that this is basically a cycle that companies like Google and OpenAI are investing literally billions of dollars in, where they’re reviewing a subset of our queries, they have local experts who make corrections and tune the models.

We sometimes also give feedback when Chat GPT says, do you like version one or version two of the response that I’m generating. And that goes into training the models to perform even better, and is basically the result of why systems like Chat GPT now have 800 million users using them every week. 

[00:16:48] David: The thing which I think is so important, which differentiates the approach you take from the sort of big tech approach to this, is that these can be local community driven adaptations. And that’s really so important in some of these lower resource environments. And more than that, because of the way you are handling the data and you’re doing the data, this data isn’t then being used in ways which are extractive. It’s used in ways which is genuinely supporting these communities.

One of the criticisms that I found is often laid, and actually the week before you were in Ethiopia, I was in Ethiopia. I heard you were arriving the day after I left. But I was there for this Pan-African convening on Bio Digital Technologies.

And the big question there was are we going to be exploited? And the simple answer is that there are certain products which are extractive, they are going to be exploitative, they’re going to try and sell you things based on the questions you ask and the data you do.

But that’s not the business model that you’ve got. And this isn’t what the communities are wanting to build for themselves. I think is so important that we actually look at how do we support these really powerful systems to be developed in such a way that there is this community ownership of the knowledge which is emerging and the way that these can be trained. And your models of the open data for this are exactly what’s needed.

[00:18:22] Rikin: Totally, and certainly that starts with your governance of how you’re even structured as, in our case, a not-for-profit. And also your value system and also what you actually create in the world so that you’re not creating a closed GPT system that is extracting folks’ data, but rather you’re creating an open system that represents these farmers’ priorities first, and that you’re publishing these data sets and models in the public domain for others to benchmark from and more than that, that others can also build their own services, their own applications on top of, so that we don’t have to duplicate these investments.

And if you’re in West Africa and we’ve done the tuning of Hausa for the agricultural domain, and you may not have a ton of tech expertise, you shouldn’t have to reinvest in the fine tuning. If you wanted to build on top of it, create your own app, create your own bot, you should just be able to do that.

And that’s why we publish two platforms like GitHub, our software to hugging face our datasets, our models. And then it’s also from a transparency point of view, often in this AI world, especially in the commercial world, people are so ultra competitive with each other. They’re often faking even the evaluation benchmarks. But we can’t afford to do that. We need to be grounded in the reality of these actual farmers’ interests and what these local agronomists would suggest is appropriate, and we need to kind of create this as a common good for all of us who are in this space to be validating and building on.

[00:20:10] David: Absolutely. The challenge of course, though is that the competition is investing billions in this. And, unless I’m mistaken, your funding is not at that level. So there is this element, and this comes back to something you’ve said, that if you are not going to throw money at the problem, well what are you doing?

And this is the other thing I love about your approaches, what you are really throwing at the problem is human resource and human expertise. This is where the real richness and wealth of what you are doing is generating. There are teams of people who are investing back into this. The example that I have is we’ve been discussing with groups from the West African region, Niger, Burkina Faso, Mali, who are interested in engaging in this, particularly for the identification of pests and diseases.

And they have structures in the universities which could be built out and could easily now interface with what you are doing to be that validation step, this sort of expert agronomist, this expert entomologists, they want to be part of this. And this can be part of the student training, you can actually build this in to university programs, because the open approach that you are taking means that actually publicly funded institutions like universities are natural partners.

You can access a human infrastructure, which can just be, well, it’s competitive, because that’s what you need. That’s what the money is being spent on anyway. It’s been spent on people spending time.

[00:21:44] Rikin: Totally. And you know, like the billions and billions that the big tech people are investing at this point are for the frontier models that they have in development for their quest for things like artificial general intelligence and such. But the reality is whether they get there or not, the models that already exist today are extremely powerful. And what really requires investment is, as you said, the human layer for the location specific and season specific contexts in which we all operate and which are not adequately represented in the knowledge corpus that these models by default have been trained in, which are much more conventional, much more Global North, much more skewed to like English language.

And so we have to make these investments so that they’re not skewed. And the exciting thing is, we can leverage not even these large language models. Now there’s these small language models which are way more resource efficient, don’t use so much energy, don’t use so much water, are in fact faster. And we found in our own evaluations using the same benchmark data sets from farmers and local agronomists, are just as good in terms of quality, and are even much more open source than these closed proprietary systems.

This is where we need to be investing more of our time, more of our efforts to tune those small models, those open source models that can serve public good interest for these small farming communities.

[00:23:23] David: I’m so glad you brought this out. We’ve actually got an episode which has just come out, well, when this is released, it will have come out, which is exactly on this and on the importance of these small language models. And the key point, which I think you are, well, you are one of the few groups I feel, which is really highlighting this well. You don’t want to develop on the small language models now, because that’s a research interface, this is something which is just developing.

But if you are developing on the large language models and you now have your databases to benchmark, well, it’s obvious, the small language models are going to be good enough. There’s work to be done to iterate them, to integrate them. There’s so much potential there to be able to innovate further.

And this is really where I feel everybody talks about the cutting edge being the big models, but it’s not, the real innovation edge is the small language models and how over the next 5 or 10 years I would anticipate in many contexts like the ones we’re discussing, they’re just gonna sweep the floor with the large language models because they are going to be able to be efficiently trained to serve specific purposes for specific communities.

And that’s what we are talking about. It is localising things at a community level to be able to have the AI system serving communities rather than serving a big global generalised purpose. That’s the power which is really emerging now. And I expect this to become widespread in the next five or 10 years. And I think you guys are at the forefront of it, it’s great.

[00:24:58] Rikin: A hundred percent. And I think that farmers are not using Farmer Chat in the way that we might use Chat GPT to ask relatively complex questions like, write me a proposal or something like that. The farmers are asking way more tactical questions, my crop is browning, I see a pest, what’s the weather, is it gonna rain, should I sow now? This is a much more finite set of queries. And of course, what the farmers are looking most for is stuff that is dynamic, location specific and contextual.

They’re not looking for the generic Chat GPT or Wikipedia type of answer. They’re looking stuff that’s most relevant for my farm, my sociodemographic conditions, that’s relevant for me. And as you said, the small models are perfect for that. And even I can cache a lot of the commonly asked questions that farmers are asking. Actually, what’s interesting is about 70% of the questions that farmers ask, they’re not articulating as a question in a chat box like how we might in Chat GPT.

70%, they’re tapping suggested questions because we surface anonymized questions from nearby peers so that if a neighbour farmer, for instance, is seeing a pest issue and you’re that person’s neighbour, it’s likely that you might be seeing that same pest. And rather than wait for you to have to proactively come up and formulate a question, we can just surface that and say, hey, your neighbour’s asked a question about some pest, would you like to learn more about it? Tap this button.

And 70% of the time farmers are clicking on those suggested prompts based on peer neighbour questions and also based on historical context around what questions and issues you’ve inquired about. And, you know, agriculture is so location and season specific, we don’t need to like, hypothesize what you’re gonna ask about, it’s in a relatively well-defined set of things.

[00:27:05] David: I’m gonna challenge you on one thing, which I feel is the next frontier for where you are. And as I say, I love what you’re doing, but the next frontier I believe we could build into these models in terms of the training, the uncertainty. Again, this is a cutting edge element of how you do this, and once you get down to small language models, it’s actually easier to build in uncertainty measures in different ways.

And I think if we built in those uncertainties, then as you say, 70% of the time, they’re clicking on the button. Maybe another 25% of the time they’re asking a question, which can be easily asked, and the model is pretty confident of the answer. And 5% of the time they’re gonna be asking something totally new that really the model is not trained for, it doesn’t have a clue about. And if we’ve got the uncertainty built into the modelling process, that 5%, you could actually surface a response to say, huh, this is a tough question, let me get back to you in a couple of days time.

Maybe it’s 5% now, maybe in the future it’s 0.5%. But that’s where you can put your human effort and you can actually have timely human effort, which means that those ones get surfaced to an urgent agronomist who’s part of the team, and who’s able to provide this feedback in a timely way, which is not instantaneous. And that’s the sort of thing where I could imagine these systems could be built out to be so reliable.

And my favourite example of this is the Post Office. People forget, AI’s been around a long time. When we finally had an AI which won at chess, the Post Office moved all its reading of letters to an AI system. And when they started it, it was worse than the human sorters, it was sort of less than 90%. Of course, now it’s 99.99%, and the errors are really small and you get interesting feedback loops when it gets wrong with the Post Office system.

And this is exactly how AI works. All we need to do is put in place the right human structures to improve the learning around it. And you’ve got a system which is set up perfectly to do this. You are doing it retroactively, I believe in certain cases, you could be doing this as a hold to say, the AI says, I don’t know yet. I’ll get back to you.

And that I think with the small language models, with uncertainty built in is gonna be the next frontier. Is this something you are thinking about already?

[00:29:28] Rikin: No, I think it’s a great idea. I mean, it is technically complex. But what we have added is because we do have situations where the AI is unable to answer, for example, for certain types of images, it’s not able to classify what is this crop, what is the pest or disease, or even for livestock.

And the AI returns, sorry, I don’t know the answer. So we certainly log that and then we end up having to do this separate step afterwards to be able to get back to the farmer later, often, much later, with an improvement.

What we’ve also though recently added into Farmer Chat has been like a push to call the local extension agent. So that where the farmers might get, you know, stumped or the, sorry, I don’t have an answer for you response comes, they can still get assistance from a live agent from, whether it’s the government or some local NGO that might be able to provide them support. But I agree that trying figure out that hybridization would be amazing.

[00:30:30] David: Absolutely, and it is not easy because as you say, the technology is not quite there yet. My hypothesis is you’ve gotta do the move to small language model first. Because once you’ve done the move to small language model, actually the mathematics behind this, you can start using different mathematics, which can keep the uncertainty quantification, or actually expose that much more accurately.

That’s my hypothesis. But that’s the frontier of maths on what’s happening on this. And this is why it’s so exciting at the moment, these advances are happening, they are possible, but if we have the design based on the service we want the farmers to be receiving, and then we actually work back from that, well, what’s the work we need to do on the models?

I’ve got some professors at Caltech working on the maths behind this that we can try and tap into and see if they can help us to sort of figure that bit out. It might take five years or more, but it’s something which we can give real meaning to the research they’re doing. 

[00:31:28] Rikin: No, that’s true and that’s why I think we all need to keep one foot in these like frontier research on AI, and keep abreast of what’s happening there, to even leverage and incorporate techniques that could address the kind of point that you’re having around how do I provide uncertainty and be able to escalate to a human expert where necessary and slow down the responses, as appropriate.

While simultaneously having one foot in the, what exists already in these small language models, these open source models, and that I can invest at the application layer and as you rightly said, engage the community of universities, local student networks, other NGOs, who today might feel that this AI technology is for the tech crowd, but actually requires their input the most because they have the grounded location specific info.

And it’s on us to create the system so that they, with less tech expertise, can bring in their local agronomy and other types of rich cultural expertise. And that’s why we’ve invested in creating simple web interfaces so that you don’t have to be a computer sciences or an AI engineer, you can just look at a photograph and help to classify it, or look at the question answer pairs that are being generated and just tag what’s right, what’s wrong, what other info or link or citation would you provide here? And that makes the AI models tuned in a really real way based on your domain and location expertise.

You know, even Digital Green, we don’t have that. That’s why we’re trying to make a call out to connect with people in this community of practice.

[00:33:21] David: Absolutely. You know, this is a great place to finish ’cause we’ve actually gone over time a bit, but this has been such a great discussion, because this is exactly where good artificial intelligence is actually based on good intelligence. I mean, that’s broadly what you’re saying.

It is the human intelligence behind this, if we can get local communities to be giving their local intelligence in a way which serves the local community, and that they can keep ownership of this, which is broadly what these open systems enable, this is what is going to enable us to actually have systems, to have proper artificial intelligence with real intelligence behind it.

It’s great to see. I’m so in awe of your work, and I’m really hoping we can get this in Niger at some point soon. I’m interested in other places of course, but my heart remains in Niger. 

[00:34:09] Rikin: We would be excited to do so, David, with you and the community of practice.

[00:34:13] David: That’s great. All the best.

[00:34:16] Rikin: Thank you. Thank you.

[00:34:16] David: Thank you.