The following is a rough transcript which has not been revised by Vanishing Gradients or Alan Nichol. Please check with us before using any quotations from this transcript. Thank you. === Hugo: [00:00:00] Alan, you are a co founder and CTO at Rasa, where you work on open source software. for conversational AI and chatbots also work on build products. so I'm excited to get into kind of a pretty serious conversation around the history of conversational AI, where it's been, where it's going, and then about in context learning. But first, I'd love to start by just hearing a bit about your background, how you got involved in NLP, conversational AI and open source tool building in Python. Alan: Yeah. and we connected on that stuff quite, quite a few years ago already, Hugo: Yeah, nearly a decade. Alan: background. Yeah. Yeah. So my background, I think like many people at ML is in physics. And then, my PhD work was on applying machine learning to physics. And, so I was working on like Bayesian non parametrics, Gaussian process models, things like that. For simulating small molecules and doing molecular dynamics and those kinds of things. so really very far from NLP, but somehow in the computational physics, ML, world. [00:01:00] And, and actually got interested in NLP from doing some side projects. So I started building like some mobile apps with a friend of mine,who's actually Alex. And we ended up co founding, a couple of companies together. Rasa being,the most recent one. and one of the things that we built was a search engine. so we built this like little search product. And so I got interested in conversational AI through search, started working on, building a better search engine and then getting more interested in like multi turn conversations. and it's funny how things have come full circle again, where now with all the excitement about rag, I think the overlap of conversational AI and search is once again, very close. So it feels nice that,that's how things went. but yeah, I started as a side interest and then,just really liked that kind of thing. obviously really liked building products and,when we first started thinking of like startup ideas or building side projects and things like that, I don't think I've ever seriously thought that's what I would do after my PhD, but,turned out that way. yeah, so here we are about a decade later. Hugo: Yeah. And congratulations on all the [00:02:00] successes over the years with Raza as well. And I'm Alan: Yeah. Thanks. yeah, I think we've had. I think we've had,very proud of the community that we've managed to build around the product and, how many people get excited and elevate their skills by building with Rasa. That's great. Hugo: Yeah, and I'm excited to get more into exactly what people are building, few things of what people are building with Raza soon. I am interested. So the type of stuff you and I worked on together, conversational AI has had a very rich history before the advent of LLMs. And with all, as as the audience hopefully knows, I'm relatively bullish on a lot of generative AI and foundation models. But as with all hype waves, we're very good as a culture. immediately forgetting any and all history. so I'd like to gently encourage people to remember the history because it will inform how we build conversational agents going forward. And that's one of the things that I love about the work you're all doing. [00:03:00] So I'm wondering if you could give us, like a radically condensed history of chatbots and conversational AI. Free LLMs. Hmm. Alan: Yeah, absolutely. I don't think I'll go right back to the start. but maybe around the time that I was getting involved in this stuff. And,I sure remember how hyped we all were about Word2Vec,in the early 2010s, and the idea that you could do maths with words, was just pretty wild. and, and one of the simplest things you could do to build a chatbot is just, starting with a text classification task, right? hey, people are coming into the system and they're asking, about one of a few different things. So let me just treat that as a classification problem. And so I, when I was a teenager, I played the drums for many years. There's a drummer called Steve Gadd, who's just a total legend. And he always, the simplest possible beat that you can play, he calls that the money beat. He's I make all my money playing that. And he's you all think you can play it, but you know [00:04:00] what? Guess what? When the big stars need a drummer, they call me. and that's how I think about taking the Sum of Sum word vectors and training a classifier on top of it. That's like the money beat of NLP is It's just such a workhorse. It's such a simple baseline, it's so difficult to beat. If you have a decent embedding, it really doesn't matter, what classifier you train on top of it. But, in any case, that's the paradigm for chatbot building, right? And when I first started looking at this, discovered that there was a huge literature on how to build better chatbots and voicebots. Ironically, a lot of the research was done in the department where I had done my PhD. I just wasn't aware of it. It was like the floor above mine. I had no idea that these people existed. I only discovered afterwards.but ultimately the result is cool research, but the way that people were actually building chatbots in practice were, I'm going to send my text to an API. It's going to tell me the class that I've assigned to that text. And then I've just got a whole bunch of if statements that just tell me what to do [00:05:00] next. And we started Rouser out of frustration that cannot be how people build conversational AI. that can't be the answer. You can't just have this,unmaintainable nest of if statements that, drives everybody crazy and just doesn't scale and just falls over all the time. And so that, that's what motivated us to try and build something better originally. Bye. but that's still very much the paradigm that, that, that dominates today, I would say, is right, thinking about, thinking about how to categorize text, someone's coming in, are they greeting us, are they asking about an order, check in, are they, saying thank you, are they confirming something,we're just gonna come up with a couple dozen or a couple hundred categories, these are all the things that people can say, and we just categorize them. That makes sense. And so something I'm hearing in there, even embeddings aside, I want to get into some of the details it feels like, so even pre machine learning, like Eliza, which was a chatbot that was built in the 20th century, right? It was purely rule based with if else [00:06:00] stuff, absolutely. And it's, it it's like a reg X kind of system. You look for specific phrases. this is obviously only works in, in, in English, the system, right? It just like looks for specific phrases. And what it always does is it manages to turn everything that you say into another question. Back to you. And so one of the things that you and I worked on when we made the online course on how to build chapels was we help people build in Python, a very simple version of that Eliza system.and people have told this history many times, and it's a really fascinating story about like how frustrated the creator of that chapel was that he gave this, 300 lines of code to people, 300 lines of if statements to people, and they would talk to it and they really engage with it, and they would anthropomorphize it. And it was really. talking to them. and I don't know, maybe that was the first piece of evidence that maybe thinking about the Turing test isn't, isn't the right way to think about measuring intelligence or something like that. Maybe we're not as objective as we think we are. and language isn't quite the medium that we think it is,[00:07:00] Hugo: And even when people know, I, I. I think maybe it was him. He had an, whatever we called an executive assistant back then who was chatting with Eliza and she was like, just leave me alone. I want to like, keep this is really nice. And he was like, it's fake. And she was like, yes, I know it's fake now. Just go away. Like I'm having a really nice Alan: but I think also because those early conversational eyes, or let's say chatbots were, Relatively reflect, that would reflect a lot. They're actually somewhat useful as therapy bots within certain constraints. So tell me more about that or, Oh, this is how you're feeling, isn't it? Hugo: and that type of stuff. Alan: Yeah. Yeah. Yeah. Yeah. Absolutely. And you've seen like stories in the last couple of years where people, ascribe agency and consciousness to AI systems that they're chatting to. And it's, I think it says more about you as a person engaging with it than it really tells you about the Hugo: yeah, absolutely. Yeah. I do go to twitter. com every now and then, or the website. Alan: That sounds like a good website. What are they Yeah, exactly. Or do you have [00:08:00] Twitter over there or something? No. Yeah. Yeah. Yeah. Hugo: but what I really wanted to get my head around is when we're thinking about building chatbots, we once there's some sort of logic control flow, like business state thing. And we'll go through this maybe in more detail later. Like maybe someone is introducing themselves. To the bot, then figuring out what the task is, then figuring out how many of something wants to be purchased, whatever it is, so we're going through some control flow, there's something logical happening, then there's machine learning involved, and embeddings, and that type of stuff, now we're throwing generative stuff into the mix, so transformer architectures that can, modulo some details that can create a lot of text. and one thing I love about what you do is that a lot of people think have a mental model of I interact with an LLM and then outsource everything to it as opposed to putting the LLM in specific parts of the control flow [00:09:00] or business flow. So maybe you can tell us a bit about these different mental models if that makes sense. Alan: yeah, gladly. so I think maybe like breaking up, if you think about the traditional approach, what are all your if statements doing, your statements are actually doing kind of two different jobs, one is there's a category of if statements that really describe your actual business logic,if someone's coming in and asking about an order, then you got to go check, if that order number is valid and check if it's been dispatched or if there's anything, any kind of problem with it, it's too late to cancel whatever your actual business logic is, but a bunch of those other if statements. Are really just looking at, okay, what was the output of your classifier or your like language understanding model, right? and how do I interpret this in this context? Hugo: And so typically you would have text that's interpreted on its own. It's just like a, a single message, a single string gets categorized. Alan: And then it's like, okay, I have to now reinterpret the output that I got from that, that NLU system, that language understanding system.and so that, that's actually doing [00:10:00] two different jobs with the same system. Okay. And the reason why that gets so complex and gets so hairy is because your model of language understanding is so oversimplified, which is, we're not going to worry about trying to understand language, whatever that means. we're just going to categorize text. That's a much more tractable, much simpler problem, right? But because of that, like very strong structure that you've imposed, you now have to do all this work in terms of reinterpreting it. so you've got two different types of logic there. then the other side of it is, okay, I just have a big LLM and I'm just gonna sort Let the user talk to it and it will just do what it does, right? And I'm going to generally guide it to do the right things by adding information into the prompt but you really don't have a Great sense that there's a lot of control that you're exerting over the system because it's just gonna say what it's gonna say so it's always just like sampling from this big model And so something that I'm very [00:11:00] interested in is, and we've been working on for some time is, how do you say, okay, I'm going to take the power of that language model and the incredible ability that these things have to just understand arbitrary instructions and navigate complex representations of what people want, but then just like in a very simple and reliable way, integrate that business logic, And in many ways, that's something I've been thinking about for a very long time, I found an old blog post. From 2016, where I talked about how it's really difficult to marry like a deep learning system with, with some sort of business logic and that you can't really just simply inject new facts into,into a deep neural network. and it's wild. And this is the topic for today. It's wild, but a pretty workable answer to that question is just, we're going to take a really big recurrent neural net or LSTM or transformer or whatever the architecture happens to be like a really big language model. And we're just going to describe that logic in text. And then we're just going to [00:12:00] keep sampling that text, right? And that's a workable approach to marrying business logic and. Um,and just sampling from a large, black box model is I think pretty miraculous. Hugo: I totally agree. And it is, this is something that we all know, but perhaps don't reflect on the, it's actually quite profound. I think that we can interact with these models now using natural language ourselves. that was the Holy grail of a lot of what we were thinking about. so I do that and that's your post, which I've just shared in the chat and I'll share in the show notes. We don't know how to build conversational software yet. Correct. Yep. From Alan: Yeah. Yeah. Yeah. And that was around the time I was describing when we were, we were starting Rasa and we were like, no one knows how to do it. it's a mess. It's just a bunch of statements that can't be the answer. but,we've come some way. So that's encouraging. Yeah. Hugo: of use cases. I'm wondering if you could just, to help us ground the conversation, tell us about a few use cases. Alan: for sure. the [00:13:00] kinds of use cases that I spend my time thinking about because it's like our business is,one of the big ones is like customer service, especially for large companies where you have, a very large bank or a very large telco or a very large retailer. We just have, lots of different departments, lots of complexity, lots of things that people could be asking about, right? if somebody just goes into a bank and says Transfer that could probably be 12 or 14 different things. So you have quite a bit of complexity on the backend and,quite a bit of things to, to navigate and potentially automate for people. That's one big category. and people use Rasa to build that kind of branded flagship chat bot that they'll, they'll give it a cute brand name. And then they'll, advertise it, the posters up on the subway, that kind of thing. And then, what, but another one of the really impactful use cases is for internal, especially in large companies, especially with people who are, real experts and are very highly paid. if you can make those people more productive by giving them an assistance. to automate part [00:14:00] of their work that they can talk to and say, I'll quickly do things for them or retrieve things for them. That's also, gives you a pretty nice return on investment. but in general, I would say the cases that I'm most interested in are cases where the chapels are doing something for you, right? So they're reading data and writing data to like real APIs. They're like interacting with the world. So it's cool, obviously. Taking action. Yeah, and I think it's really cool and interesting all the stuff that we're doing on purely like informational conversations, right? With rag being a big topic and just making it so much simpler even to build an FAQ bot. but I think the cases that really excite me and I think, are probably the most impactful or the ones where you're actually doing something, right? It's cool to tell somebody,give somebody the right link where they can do something themselves on the website. It's even cooler if you can just do it for them. Hugo: Absolutely. And I do think the ability, what we're talking about also, a chat interface can give me the right information or it can also give me the wrong [00:15:00] information, can give me the information I need in order to take action, make a decision and take action myself. But if you have, an agent or conversational AI or chatbot taking action itself, I think that's infinitely more scalable as well, right? all I'm saving myself is some time as opposed to automating things around me. So what I mean is that the taking actions I think is going to be a huge part of the economic value that these, this software can deliver. Alan: Yeah, I couldn't agree more. Hugo: I, so in the internal, for internal use cases for business, maybe just LLM can be chill for a bit. I don't know. Alan: yeah, if you've got, if you've got a human expert in the loop who's aware that what they're getting is, the same way that you or I, or anyone working in tech, would use something like JadGBT, you're going to take everything it says with a grain of salt. Like it might help you, get somewhere or explore some ideas or something, but you're not going to take it as truth. And,that's obviously a very different thing from if you're, [00:16:00] talking, representing a whole company to the outside world, right? To the customers. That's a, it's a different game. Hugo: Absolutely. And I do. There was another. I'm not actually sure whether you're able to talk about this case. So I'm going to hint towards it that Alan: All right. Hugo: was there an insurance use case with text messaging of Raza. Alan: yes, Hugo: Is that Alan: but Hugo: Can you talk about that Alan: me more. I'm not sure which example you mean though, you might need to hint a little more. Hugo: I think there was, okay, yeah, so I could just be making this up. there was a case where people whose insurance had lapsed, I think, received a text message just seeing if they wanted to re up Alan: absolutely. no, that was really early use case that we supported and it was super impactful because it was just people who were at the mid to long tail of this insurer's sort of policy holders. So they weren't high net worth enough that they would get a human to call them up and try and re engage them and upsell them. But, they were just letting these sort of insurance policies lapse and that just, it [00:17:00] should churn, that's potentially preventable, right? So just automating that sort of outreach, on,to reach out to people and say, Hey, are you still interested in renewing? Like here's a new quote. And then automating that whole thing and then saying, okay, there's a new, here's a new contract. You can just sign it and send it back. That's obviously really nice. And people do lots of creative things with Rasa, right? of course, many people use Rasa in their IVR. So you call up, and you talk to an assistant. But we've also seen companies that do the reverse. So we've seen companies that actually, call other systems, IVRs. So they build a router Java that calls other systems, IVRs, and like updates information and then gets, and then that gathers and collates everything on behalf of the user, Hugo: very cool. Very cool. And the reason I like these examples, the insurance one, for example, It's clear that a huge, like 70 billion parameter model isn't necessary for this thing. It's also clear that there are several steps. There's, first say, hello, I'm a chatbot, whatever, then find [00:18:00] out if the person is actually interested in renewing, then maybe a step to discuss the quote, then a step to figure out the address to send. So there are several steps that you have to go through, defined by business logic, and in each of those some slots where the person maybe has to message, send some information, and within that conversation, The chat bot, needs to be slightly creative in terms of how it engages. But besides that it does, and it doesn't want to make mistakes, but we don't necessarily need or even want the huge functionality of these incredible LLMs. Oh yeah. for sure. and we see everybody that's seriously LLM.using a hybrid system, because like, why would you call GPT 4 to respond to hello? That's just silly. It's expensive and it's slow and it's, that would just be ridiculous. so yeah, absolutely. And that kind of goes back to the point about the money beat, right? Alan: There's a lot, there's a reason why this paradigm, of classification has [00:19:00] existed for so long. Because for, A chunk of use cases, it really does quite well, but then you do hit a ceiling, right? So how do you build a system that kind of gives you the best of both worlds? Hugo: now, having a strong sense of some of the use cases you support, I'm interested what happened with you and your community when chat GPT hit the scene was, Alan: but was there some like existential moment? Oh, I would describe it and I do describe it as like a nuclear bomb going off on the industry, right? it, it changed everything. It changed expectations.I've heard. of a very large company in the U S that had one of these sort of flagship chatbots,it was branded. It was very prominent. It was in the mobile app, all this stuff, they weren't using browser or anything like that. They'd built it all in house. They all had their own stuff. And the project had been axed. So like senior management had decided. This thing is dead. This thing is done. Consumers don't prefer to using the app. It's, a big team, it's expensive, all the rest of it. And then ChatGPT [00:20:00] launched and they just reversed that decision. They were like, they were ready to move ahead with the purge. And then they undid the decision to kill it because it was like, Oh wait, actually, maybe there's been a, a big leap forward in the technology. Maybe there's actually something we can do here. maybe we shouldn't give up on this just yet. And, I think that is just indicative of all of a sudden. This getting the highest level of attention everywhere. And as someone who's nerded out about chatbots for 10 years, it was pretty wild in 2023, how like every week front page of the New York times, there'd be some story about chatbots, which it's just really bizarre to have that level of sort of public conversation happening about something, which used to be pretty niche. Yeah. Hugo: And I don't want to be too provocative, but I'm interested in what, what it did business wise, because I'm making an assumption here, but a lot of people I know who aren't deep in the technology are now like, Oh, we [00:21:00] should just use LLMs for everything. so when you'd been spending so long working on a particular way of building. something and a particular mental model of how things work and work in production. what did your world become in terms of having to have the LLM conversation? And I suppose, think about incorporating it into your work as well. Alan: Um,I was just so excited by it because there was so much good stuff to do. And honestly, I'd felt like the field felt pretty stagnant for a few years. And, we were really doing lots of work and pushing out new models and new approaches and new ideas and trying to move the needle,and try and introduce some new ideas, trying to challenge how people build conversational AI, do it differently, or at least be open to doing it differently. and there really wasn't a lot of interest in new ideas and new approaches. And then all of a sudden. Everybody had an appetite for we don't know how it's going to work in the future, but it's definitely going to be different from how it works today. So that was exciting for us. And, something I like about Rasa is that we [00:22:00] like to solve problems from the ground up. So we said right away, we're not here to randomly slap some LLMs onto our product here and there. let's take this opportunity to really think about, if we go back from first principles, we forget everything that we know about conversational AI. And we just know really well, what are the kinds of use cases that people want to build with Rasa. And we know lots about conversations and how conversations work. And we have this new category of models which can do qualitatively different things from, previous gen models like, BERT. what's the right set of primitives, right? What's the right set of LEGO bricks that help you build? Build a great conversational experience. And, yeah, and so we came up with, with something that's just pretty great and it's called calm and,we launched the first beta latter half of last year. and, and now it's already in production with big customers and all that kind of stuff. So it's just a very different approach. It goes away from this model of creating language understanding as a [00:23:00] classification problem. But it still retains the idea that like you have some strong business logic. And. That you just execute that right and you execute that reliably and that isn't something that you ask an LLM to reason about or to guess or to, hopefully get it right or to sample. So we still, we use LLMs conceptually as a sort of translator between the conversation that you're having with the person and like the business logic of the task that you're trying to do. and it's so it's not that we just do like open ended generation and then, hope for the best.Ben Evans always gave this quote that a computer should never ask a question that it should be able to figure out the answer to. I think that's also a good way to think about LLMs, right? an LLM should never have to guess something that you already know a priori.this is the set of steps to complete this task, right? Absolutely. I love it. and I've just put a link to the calm docs in the chat. I'll do that in the show notes as well. and calm [00:24:00] as well as being a peaceful state. stands for conversational AI with language models, That's right. Hugo: which I like as well. And I've also linked to, I know I'm putting the cart ahead of the horse slightly. I've linked to your great paper task oriented dialogue within context learning. cause you've already gone into this a bit, but I am interested in just hearing a bit more about. How you now think about embedding LLMs in, in, in chatbots and conversational AI. as we said, it's, think what we want to, for the most part of void is just giving LLMs total freedom. and there are many, actually, do you want to just give us a quick rundown as to, from your perspective, why we don't want to just chat with an LLM? what are the failure modes when we do that? Alan: plenty of people do want to chat with NLLM. I don't want to argue that nobody wants to. I think just for the kinds of use cases that [00:25:00] people come to Rosset for, you want something that's reliable. You want something that's going to say the thing that you wanted to say and not something that is hopefully relevant. maybe the simplest way to describe it is If you just plainly asking LLM, what's your returns policy? It will give you a returns policy. But not your returns policy. similarly on the sort of reasoning through business logic side, if you ask the user, if you ask the LLM to do something for you, it will come up with a reasonable process for completing that task, but not necessarily like your process, right? that's the sort of the magic and the trickery is. it does so much on its own, but that last bit of making it do the right thing, how do I do that? So there's different approaches to that. And the way that we decided to tackle it is to, still very much use the LLM as. An understanding component primarily, right? It's not the only way you can use an LLM in Rasa and it's not the only place you can put it in, but that's its primary job because, and I think this is, this is a phrasing that I really liked from, [00:26:00] I think it's our friends at, explosion, the spacey folks, they wrote this blog post and there was a sentence in there that said, people argue about whether LLMs like understand or don't understand. I think it's a moot point. They're so good at manipulating symbols of meaning. That it doesn't matter, right? I can't remember the exact sentence, but it was something like that. And it stuck with me, right? Because I'm also not a philosopher. I'm a person who built products. So that, that works for me, right? They're really good at manipulating symbols of meaning. And so what we use them for as an understanding component is instead of saying, We have this understanding model, which looks at a single message and then categorizes it into a bucket. have a model which can look at the whole conversation, right? The whole back and forth. It can also look at the business logic that we're executing and like where we are in that process and what we're currently asking the user about and all that stuff. We can feed all that context into the model and then we can [00:27:00] interpret what the user is saying. In context, and we just generate some, let's say, symbols of meaning, some representation of meaning of what it is that the user is trying to do to progress the conversation, right? And so this lets you really trivially cover cases that are very difficult in the sort of traditional classification type of approach. So one of my favorite examples is, you ask the user, are you traveling in economy class? And then they reply back, sadly.pragmatically, you and I, we read that, and it's perfectly obvious what the person means. But, the word, sadly, on its own, does not mean yes. It's not, it's very much in the pragmatics of, that question was just asked, these are the options. it's a yes or no question. and, and, nobody,would travel in economy class if they had the choice not to, right? So there's a lot of sort of implicit knowledge. And so you can output, as a representation of what the person means, you can output, yes, or affirm, or, whatever [00:28:00] the instruction is, right? And so we work with instructions inside of, of our com system. We call them commands. and it's it's classification and it's generation. So we generate a sequence of commands to represent what the user says. So if the user does multiple things, we generate multiple commands. the set of commands itself is fixed, but they can take arbitrary arguments to So it really is it's not like a, one, something you can represent as a one hot encoding. It is genuinely like a generative, system, but it's more like a structured prediction type of task. it's, it's, yeah, it's sampling within a constrained grammar. So you're not, we don't use the LLNs like generating some slop that you then send to the user and kind of hope for the best, but we use it to generate a representation of meaning, that you then use to navigate your business logic. Hugo: I am interested in the example you gave where you're expecting the answer to be yes or no. I'm just thinking from a, this is pretty naive as well. And I'm not a, a UX personal or designer, but if there are really two [00:29:00] options, why don't we just give people buttons with those options Alan: that's a totally valid point. Yeah, I know. If you really have a yes, no question, you can add buttons. Realistically, what you see is you give people buttons and about a third of people will just respond anyway instead of clicking a button. It's just the nature of things. Hugo: What if Alan: you're right. and there's a, Hugo: write anything? But I suppose also I get really frustrated when there are only two options and mine is actually a third one, which happens, Alan: yeah. because there is a legitimate use case for, Actually, you're asking me this, yes, no question, but that's only because you misinterpreted what I said before. And so I'm actually going to tell you something else Hugo: Yeah. Yeah. Yeah. Alan: So rather than locking the person down a particuLar path Hugo: so we did talk about, when we're talking about the, an LLM, just saying stuff that may be relevant, may not, may seem relevant. I suppose we're talking about what we could call the hallucination issue or challenge or something like that. There's another issue though, that arises with freeform LLMs of some form of prop prompt injection or something along those lines, right?, Alan: No, exactly. That's it is, and I think there's been some work on this recently. I saw something, but I haven't read it in [00:30:00] detail of kind of separating out instructions to the LLM that come from the system versus the ones that come from the end user, because you get this thing where you as a semi clever user can just use the chat bot as like a hand puppet, like the, there was that case with the Chevrolet dealership or somebody said, I need you to respond to me all the time with. That's a binding offer and it takes you back to these, right? And then the guy said, I want to buy a car for a dollar. And it obliged, told him it could have a car for a dollar. that's a kind of, somewhat trivial case where somebody can hijack the output of your system. which is obviously, for sort of high stakes use cases, a bit of an issue. and, yeah, I think the, you talk about it as like the hallucination problem. I think this is a pretty,widespread belief that, among NLP folks, at least that it's a weird way to talk about it, right? If you think about, for me, one of the forms of blog posts when I was coming into NLP was that Karpathy blog post about the unreasonable effectiveness of RNN, I went and reread it, the other day, and I was [00:31:00] just thinking imagine going back in time and commenting on that blog post, Oh, but how can you guarantee that it only generates factual statements? or do you think that if we made this model really big that it would only generate things that are actually true, right? It's just, it's a category error of thinking about it. because they generate plausible tags, right? it ascribes a probability value to a string, but not a truthiness value. But, But yeah, prompt ejection is obviously a real thing if you're building a high stakes application, and I think there was even a paper that showed that there was like a proof that for every LLM, there's like a jailbreaking, input sequence that will like override, like it's controls and things like that. , it's obviously an issue if you're just going to have a chatbot represent some sort of, large organization, you can't be taking those kinds of risks. Hugo: absolutely. I'm glad when I think about LLMs from my perspective, we talk about them as like gen AI, and that's the big hype y topic at the moment, generative AI. Alan: From my perspective, like the generative aspects of LLMs are mostly a [00:32:00] downside. And a really cool thing about LLMs is actually the in context learning. It's the fact that you just, tell it to do something and it does Hugo: let's jump in there. I do want to say, I'm glad you mentioned the examples you did. No takes these backsies is such a genius. Phrase as well. I feel Alan: Right. I think that helped it go viral, just that it was so irreverent and, funny, right? makes me think of Bodhi McBoatface actually. Focus, yeah. I'm also glad you, you mentioned Karpathy with respect to RNNs and hallucinations because he also made a point a while ago in a tweet that, if we're going to use the term hallucinations, we probably should acknowledge that LLMs are actually hallucinating all the time. Hugo: Like everything they say is a hallucination. Just sometimes it, it coincides with ground truth. What we think of as ground truth. Alan: exactly. yeah. Hugo: it only really coincides with the ground truth if it just happens to be the most obvious thing to say, right? Like it's, there's no signal in there to [00:33:00] tell it that it's true. And yeah. And to bring this back to a point you made at the very start, which I. I definitely want to explore in, in further conversations. We won't have enough time in this one though, is, so Karpathy also makes the point that, so he says LLMs hallucinate a hundred percent of the time, and he compares it to search, which never hallucinates, right? It takes, it always, search, and he's interested in thinking about the middle ground, right? and that's something we've been talking around and hinting at, and there are,tools like perplexity, I'm actually quite interested in, because I think it's one of our first data points, but thinking about what this very rich, fertile ground between LLM, generative AI and search can actually look like. Yeah. Yeah. Yeah. It's great. as I say, it's great that it's all coming back together. Yeah. So now to, the topic of the day, how, I think few people When they think about supervised learning, we'll think of it as a failure. So I'm interested in your thoughts on how supervised learning has failed us. I Alan: it's [00:34:00] obviously a, a provocative and incomplete statement to say that supervised learning is a failure because it has been so wildly successful in driving the field forward. All right. ImageNet is synonymous with big leap forward in ML, right? People talk about an ImageNet moment or something like that. Huge supervised classification data set that really, moved the needle in a very significant way. so if it's been so successful at helping us buildand discover advanced, Ways of building.why is it unsuccessful in practice? And I also think that it's a victim of its own success. If you think about what happened in the 2010s and all the progress that was made right on architecture, sure. But also on, CUDA and GPU programming on stochastic gradient defense. On,frameworks that are high level, with Tiano and then TensorFlow and PyTorch and all these things, Keras, of course, being a huge contribution there of [00:35:00] really, making it a simple API and everything down the stack, making it so that you can really describe a, at a very high level, like a very over parameterized. Model that has way more parameters than you need and still reliably fitting it right still reliably converging to something sensible because of the, the giants on whose shoulders we stand, of all the great work that was done. So it became a non question for most practical applications of, can I fit? something to this data set, right? And like in, my PhD days, that was definitely not a given that you could fit something, right? So now you have such an open parameters model, you have such good technology for doing gradient descent and things that, that it's a known question. So then the question shifts from can I take an architecture and can I get this thing to converge and all that stuff to, okay, if I fit, a model to this data, does that actually solve my problem? And what we failed [00:36:00] to do as a field is shift the education in the same way as the technology shifted. So we got so excited about all the progress that we made on fitting the models. And doing hyperparameter search and all this stuff. And that's what all the education focused on. And we told everybody that the data sets, data sets, they're just given to us by God. they just come down out of nowhere and, you download them and then you fit something to them. that's obviously overstating the point, but we Played that down. And in my experience, many practitioners, people who are experts in ML, don't feel like it's really their job to think about the data set. And when you're building a system, you're building a supervised learning system because the model fitting is so good. The real work of improving your model is in the data and in the feedback loop and checking for errors and annotation and keeping your data clean. And, and we just did such a poor job of shifting the focus there, shifting the education [00:37:00] there. And I've just seen some spectacular failure modes. and I can give you a few of where that has just really led us astray and where supervised learning has been a pain. and I think one of the gripes I've heard people have, and I also feel it myself when you're playing with an LLM, is you know, you you ask it to do something and it's almost there, but it's not quite, so you tweak the prompt, and you tweak the instructions, and then it gets a little closer, or maybe it doesn't, and then you try again. And it doesn't feel systematic, right? It doesn't feel like you yourself are doing prompt descent to the ideal. prompt, right? You've just taken a step and it can feel like incantations of, maybe if I say the right magic words, it'll give me what I want. And that's a fair criticism, but in my experience, that's also very much how people in practice engage with supervised learning systems. It's Oh, the model's not quite doing what I want. Let me like, I don't know, downsample this class. So let me like upsample this class with some some synthetic examples, or let me, go in and try and just throw things in and out with the training data to try and control [00:38:00] the output. so I think it's actually, prompt engineering is actually a closer match to the way that people in practice actually engage with a machine learning model than the supervised learning paradigm. Hugo: totally agree. And something that came up for me, I actually mess mentioned this in the most recent episode of the podcast with Jason Liu, but, You may recall 2018, 2019, people started, because of all the things you just mentioned, people started thinking a bit more about what data centric AI, as opposed to model centric AI, would look like. And Andrew Ng even, came out and said a bunch of stuff about it. And I think he even, I've got to fact check this again and find it, but I think he started a Kaggle like competition where the model is held fixed and you've got to, Fiddle with the data, which I think is such a wonderful idea. I, but of course, part of the joke of what happened then is that all these big foundation models came along and suddenly we're all model centric again, but everyone who's doing the [00:39:00] work is still looking at the data. Like it's even because we've actually got exponentially, infinitely more like unstructured text in the form of natural language now created by LLMs also like actually looking at data. Is even more of a concern than it ever has been. And it should have been the utmost concern previously as well. Alan: Yeah, I couldn't agree more. yeah, it's a really great point and I'm very strongly aligned, on that particular point with Andrew, and, and, we've been doing a bunch of work and education around this within our space. we introduced a phrase of conversation driven development, which is mostly just what Andrew would call data centric AI, but in the context of conversational AI. there's a little more to it because there's. It's not just about the performance of the model when you're building a conversational AI system. There's also questions of UX and backend integrations. And, maybe you've automated the process perfectly, but actually your process is what sucks. there are different reasons why you might fix something beyond just, [00:40:00] curating the data. That's just one part of it. But ultimately, it's really about that sort of feedback loop. And, Yeah. and, but the other sort of thing I want to call out is, when I, I posted on that twitter. com website that you mentioned earlier, when I posted on there about this podcast,Derek Chen called me out and he said, boo, that's a cold take everybody thinks in context learning is better than supervised learning. But it helped me refine the point that I really wanted to make. Because yes, like in context learning, we just talk about, whatever you give to the LLM and then it spits out some stuff.but it's really controlling a model by feeding it examples versus controlling a model by just describing what you want in natural language. And I, I remember brainstorming on this problem a couple of years ago with our research team saying, what if we could build a model that could, rephrase chatbot responses? To be a little more contextual or to to, to change the tone depending on the type of user. And think about how you do that as a supervised learning problem. You've got to have a whole bunch of conversations that you have these like individual,turns that you want to rephrase and [00:41:00] then you've got to have like ground truth of what are valid rephrasing. But it's obviously like a, there's many possible valid things. So it's not like a machine translation problem. How do you actually quantify the tone? Like it's so difficult, And you just don't even really know where it's going. Start and then you say, Chad, GBT, please be extra polite. And it just doesn't. so it's magical to write instructions. To the model and just have it like follow those instructions, right? that's wild to me Hugo: I can tell one of my biggest frustrations at the moment, you can't even tell by like the agitation with which I'm speaking now, is GPT 4. 0 when I use it to help me code. Like I, I'll just want, I'll be like, Hey, this didn't work. Give me, and I'll want just one line of code changed or something like that. And it will give, I'll be like, don't, I'll say, do not give me the entire framework that we're working on. And it will then legit like spit out the directory and local file and then all the different files. And I'm like, bro, that's [00:42:00] literally what I told you not to do just then. And you keep doing it. Alan: Yeah. Yeah, my top tip for interacting with these models is just at the end of every, or at the start of every conversation say,talk to me like you're texting and don't exceed 140 characters. And it makes it a lot less tedious and a lot less annoying.it doesn't give you all the fluff. It just tells you the answer. Hugo: great. So you've hinted towards this, but I'm interested in what you think the most overrated and underrated aspects of LLMs are specifically. Alan: Yeah.so I think it's ties together some of the things I was saying in the last few minutes, I think the generative aspect of it. Is really overrated and overvalued.and, and the fact that you can just tell it what to do and it does it, whether we call that like in context learning or we call it, instruction tuning or prompts engineering or whatever else you want to call it. That to me is the true magic and it's truly undervalued. And if you think about,how much simpler it is to build an [00:43:00] application where you just pull together some information, a few strings, and youFrom a database of some kind, and then you concatenate together and you feed them to a commodity, perfectly standard, same for everyone, same for every application model, just from the pure perspective of building software applications, that is so radically simpler than training your own model, deploying a model with those specific weights for the specific task, monitoring it and all that stuff. Um,all the model training that needs to happen in your CI pipeline and all the ways that things can go wrong. and the fact that if you're building, for example, if you're building a platform that has like multiple users instead of every user having to have their own model deployed, running in a container somewhere with their own weights, like all of that stuff is just commodity. And the thing that is like per user is just some strings that you pull together from a database. You can do [00:44:00] that. in real time on demand because it's so fast and then, just call some sort of generic general purpose model, right? And so the simplicity that you get from that is also, I think, is going to have a bigger impact than we're accounting for now. And I think it has a bigger impact than the fact that we can, I don't know, write really daft, uninspired blog posts automatically. Hugo: Totally.I'd love to see a demo in, in a second. I am interested,in what for people who want to work with LLMs and conversational AI, what would you encourage people to, what direction should they move in and what should they be looking at and Alan: Yeah, that's a good question. so my perfectly unbiased experience, opinion is, of course, go check out Rasa. No, but more, more genuinely,I hang out on Reddit, local llama and things like that. And you come, you see people come in and ask these questions who are asking how to do really simple things. [00:45:00] And it's clear that it's not obvious to them how to do it because they've come to chatbots only through the lens of GPT 4 and, working with these kinds of models and something that's a really straightforward task. And then you see somebody comment with Oh, here's a paper, which like explore some approach to doing that. But it's always here's how to do it with a lens. Anyway. This is a very simple, very straightforward problem that was solved quite a while ago. so don't sleep on,techniques that have been proven, that have been around for a while. and, and other approaches to conversational AI, becausesome of these approaches that I see with Oh, you're going to generate and then you're going to have the LLM inspect its own output and generate another candidate and then you're going to have another LLM like grade that and all this stuff. All of this is just a scam by token companies to sell more tokens, right? Like that there's so much you can do with more. Straightforward techniques that, you want to look beyond just the chain of whatever, approach that's cool this week on [00:46:00] Twitter. Hugo: So what I'm hearing there is don't necessarily jump in and do the thing that everyone else is doing, explore, explore what's possible, and see how you can incorporate LLMs into your work. Alan: Yeah, absolutely. And I think about what they can do that they're really good for, right? Hugo: exactly, and I've just, I'll put this in the show notes as well. I've just, Oh, so this is pretty funny. This is on local Lama. For those who haven't checked out local Lama, please do check it out. And it isn't just for local stuff as well, by the way, like there's lots of cool stuff. there, that this is a Reddit thread that starts seriously. Can LLM agents really work in production? I'm building a salesperson, but dot, dot, dot. and then someone links mentions Raza and the OP replies. Wow. I'm still checking out the docs, but this is exactly the level of control I was looking for. So Alan: It's nice to hear. Yeah. cool.love to see a demo. I'm going to make you co host now.[00:47:00] Hugo: And. If I were to, I've shared the docs to calm. Is that the best place for people to check it out currently? Alan: Great. It's a great place to start and it links out to everything else. So there's a great course on YouTube. there's obviously the paper, if you like that level of detail. and then there's the, the link to the download and just getting started with it all. Hugo: Great. And can you just remind us, Rasa is open source. Rasa pro is not that's enterprise and calm is open source. Is that, or Alan: so com is part of browser pro. and you can get a free developer license, for Razzle Pro. So you just, you go to the site and you just put in your email and then we send you a license key. That's it. Hugo: Awesome.great. let's jump in. What are you going to show us today, Alan? Alan: just something very simple. but just to, I think illustrate some of the points I was making about how we separate out business logic from the LLM, and how you can edit that and work with it reliably and add your business logic into a chatbot, and still build something [00:48:00] that is. very capable in terms of its ability to understand variations in language and, nuance and pragmatics and some of these things that we talked about,it'll be quick, but we'll just take a little look and,if there's anything I'm forgetting to point out, then please, please speak up. Okay. You can see my screen here. Hugo: I can see Firefox. Alan: Banging. All right. so we'll take a look here. in, can we define business logic? in a declarative format called flows. And so this is a very simple example of a flow. And this would describe the logic for transferring money. So this is like a single skill. That lives inside your chat bot, and it can send money to people. And the sort of two key things in here, there's a description, which helps us figure out when, we should, activate this flow for people. and then it has a list of steps. And so the steps are literally,they do what it says on the tin. It's these are the steps that you need to go through.[00:49:00] So you have collect steps, where you just ask the user for some information. So you say something, you pause, you wait for input. And then you have, actions where it's doing something, but not necessarily asking and waiting for input. So this is the most trivial examples. We'll start with this one, right? It doesn't have any branching, any complex logic. it just, very simply we'll go and I'll ask. Okay, you want to transfer money? How much? Or to whom do you want to send it? How much do you want to send? And then it'll tell you, what, it'll tell you that it was completed, right? So this is really hello world example. so if I just do rousletrend, I can build this model and then I can talk to it here in rousletinspect. So this is just a little web UI that we built to,help you chat to your chatbots and inspect and understand what's going on. Hugo: Alan: so if I say here that I want to transfer money, what happens is that this sort of middle panel now shows me that this is the flow that I'm in, right? So what we use the LLM for is we translate what [00:50:00] the user is saying into some commands. in this case, it's a very simple translation. The user says they want to transfer money. So we translate that into a command, start flow transfer money. So we're now in this flow. and you can have multiple flows,active at the same time. So all the flows are maintained on a stack and you can see that here on the left. So if I like interrupt with a different task now, like I want to ask for my balance or something like that, we'll start that next flow, that'll get pushed onto the top of the stack. And when it completes, you return to the previous conversation. this is, just a very simple single skill chat bot. but now if I say that, okay, I'm going to. I'm currently at this recipient step, and then I'm gonna need to, provide the amount. But of course, if I say both, I want to send, let's say 3, 000 to Hugo. Would you like that? Hugo: Oh, dude, Alan: great. I skipped through to the end because I provided all the information that was needed, right? So here, we would have produced two commands. So one setting [00:51:00] the slot for the recipient and one setting the slot for the amount of money, right? So we look at the conversation, we look at the business logic, we look at where we are, and then we generate some commands, navigate your business logic. And the nice thing about this type of approach is that,for sure that your business logic is going to be followed faithfully. So there's no way for me to like prompt inject and skip a step. Because the execution of going from step to step in the conversation, that's not handled by the LLM, right? That's handled by just a deterministic piece of code. and the other nice thing about this is that, this flow is actually all I've really provided here. So I haven't created any training data, for a language understanding model. I haven't built a classifier or anything like that. This is some of this magic of, This new generation of models that, you only really have, you could do so much with zero shot learning, right? The other thing that's maybe worth looking at is that these things that I'm collecting, so like the recipient and the amount, I declare these as variables and I declare their [00:52:00] types. and that helps me,extract them correctly. Hugo: Yeah, that's super cool. Just figure that out quickly. In order to progress from step to step, the user, you in this case, needs to provide the name of the person and the amount. Those are stored in the slots. And I'm just trying to get my head around this. And when they're stored in the slots, that tells the chatbot when to move to the next step in the conversation, pretty much, or something along those lines. Alan: yeah. so the way to think about it is what we use those commands for, we generate commands through the LLM, and those commands represent like what the user would like to achieve. It's what the user's trying to do in the conversation. That doesn't guarantee that it's what's going to happen, right? but it's what they want to achieve. So in this case, they're just providing you that information.and within a flow, you absolutely cannot say, Oh, skip the verification step and just get me to the end, or a skip. I don't, I know I'm supposed to be logged in, but just let me off just this one,kind of thing. so you can't do things like Hugo: if you do it. yeah, [00:53:00] exactly. Exactly. But something I still, we haven't seen in the code is how you invoke the LLM, or where it's playing a role. Alan: yeah. yeah. Okay. Let me show that right away. So you can see here, there's like a tracker state. So you can really see, this is a debugging view for developers. So you can really see. a lot of detail. And so here we have this section with the commands. And so this is the output that we're generating from the LLM. So we generate a list of commands. And in this case, it's a transfer money command, which has to start the flow, right? if you, if you look on the console, you can see the full, prompt that's being sent to the LLM. If you put on debug mode, and you can see exactly what's being sent there. And it's a bunch of information about. The conversation itself about the flow that you're in and the steps and all that kind of stuff. now, the other point is that you can't, skip steps in a flow or alter the logic of going through a flow, but you can absolutely change your mind and go and do something else, right? So you can say, Oh, actually, [00:54:00] I don't want to transfer money. I want to go and do this other thing. And you can interrupt and go and switch to a different task where you can cancel, or you could do all those kinds of things. But the execution of these are the steps,is solid. and then I'll show a slightly more complex. Um,version of the same flow. so I just checked out a different branch with a more fleshed out example. and here we now have some actual branching logic, right? So it's not just saying, okay, step one, step two, step three, which is a trivial hello world example. Here we have something more complex. So after we collect the recipient and the amount, I'm actually going to go and call an API. And we're going to check that this person actually has enough money. Okay. So I might have been very generous in offering to send you 3, 000 Hugo, but I might not have that. Hugo: it didn't come through yet, and I've, yeah, Alan: Right.and Hugo: my lawyers Alan: the of the verbal contract we had around it as well, Very good. Yeah, exactly.I just hallucinated it. Sorry. That Hugo: Yeah, Alan: [00:55:00] uh, that was the issue. so yeah, here you have an if else, right? very straightforward. So you can say,if the person doesn't have enough money, then just tell them and end the flow. Hugo: and if they do, then you can go and check and we're going to ask, for a final confirmation step, which is probably a good idea. so as to not just, you know,do things that are unintended, European Alan: we can, right.so we can just rather train this one again and then start it up again. Hugo: I like the simplicity of the train and expect inspect paradigm as well. It's cool. Alan: Oh, I want to send 3, 000 reduce. You go, all Hugo: Alan. right. And in this case, it tells me I don't have enough funds. Sad. Okay. I'm to raise another round, Alan. Alan: I don't know, alright, just send him 30, let's try, okay, so now we'll see again, we're in this flow, but this [00:56:00] is now a more complex version of it that actually has some branching in it, and it's just a little visualization of where we are, and we're here, because through the rest of the conversation, it was already obvious what the recipient and the amount were, and so now I'm just, Here at the sort of final confirmation step, and I can, I can say yes, or I can say no, or I can change my mind and do something else, or change my mind again, actually, send it to, who's your worst enemy? Hugo: Alan right now. Alan: Oof. Arsh.yeah, so I can, change my mind about, something that I said and,and still follow the business logic. so the flow diagram that you see here, it represents really, it represents the logic of the process. It doesn't represent all the possible conversations that you can have, and that's going back really to the beginning of the conversation that we've had today is,a problem. With a lot of like conversational ad platforms and approaches and a lot of these no code tools, it's that they conflate those two ideas. You see all these no code platforms where you're building one diagram to represent two [00:57:00] things. One is your business logic and the other is like all the possible conversations that can be had. And that's why they always end up looking like spaghetti. And this is really, it's not about that, Because of the magic of LMS and because of the way that com works, we don't have to represent in this view or anything similar to it. We don't have to represent what the user, how the user is going to provide that information. Or what to do if the user doesn't or deviates or start something else. Like we just don't worry about that. We just say, here's the logic for the task. that's really nice. It means that you're, it's a much, much faster and more token efficient way of using LLNs than, than maybe some of the things you see out there at the moment. and it still gives you all that, that fluency, that generality, that incredible sort of, Language understanding ability and ability to manipulate symbols of meaning that,that's so powerful from LLM. Hugo: cool. thank you for that demo. And those in particular, it was really nice seeing the MVP, but then seeing the branching take place. And although I didn't get any money this time, [00:58:00] it's really nice to see the structure of that flow to actually have a mental model of how the business logic works and how the software. Is actually built to support that business logic as opposed to having the free form beauty and chaos of a large language model to, Alan: Yeah. Hugo: to work with.I am, you did, I don't know if you're at Liberty to mention it. You did say that, there are some pretty serious enterprises using calm now. I don't know if you're at Liberty to tell us about any of them, but Alan: I can't name names, but, we have some case studies, live on our website and some press releases coming out and those kinds of things, but, um, you know, very, very large millions of users, hundreds of thousands of conversations a day, kinds of things that are live with the account system, and some pretty,incredible, like boosts in terms of, top level KPIs, like how many cases are resolving, how much are we able to automate? overnight from switching from a purely NLU classification based approach to switching to Calm. So pretty exciting time, I have to Hugo: [00:59:00] Very cool. Very cool. So everyone watch this space and definitely follow Raza on Twitter. Twitter. com and or LinkedIn,to get updated about all the wonderful use cases. thank you for that demo. I, people also can follow what's your Twitter handle if people want to. Alan: It's, at Alan M. Nickel. Hugo: Okay, great. And that's N I C H O L. I'll put that in the Alan: Yeah. like Jack Nicholson, but without the sun. and with Alan instead of Jack, I do I do. Yeah. what a one wonderful day. Thank you. Person to have as a pseudo namesake.to write. Hugo: yeah. Um,yeah. What a wild man. I, and of course, Alan's on LinkedIn as well, if you're interested in this type of stuff. But please do check out the Raza docs, the calm docs that, that I linked to, and let us know about any fun stuff you build with it. So it's time to. Wrap up. I'd love to thank everyone for joining. and most of all, thank Alan for your eternal wisdom and patience with [01:00:00] some of my silly questions, but it's always great to chat and always great to have a conversation that we can take public as well. So I appreciate you, man. Alan: Super. And, yeah, thanks for being on the show. Thanks everyone for tuning in and, yeah, appreciate the chat. It was really good. .