The following is a rough transcript which has not been revised by Vanishing Gradients or the guest. Please check with us before using any quotations from this transcript. Thank you.

===

[00:00:00] Well, I tell a lot of people, it's like, Hey, even if the AI models right now didn't get any better at all, right? All the r and d stopped and we just froze the models. There's trillions of dollars of business value to unlock there. And so one practical use case that I often suggest, you know, where, where execs, managers, et cetera, start is like, use AI to help you do that.

And I don't mean just like come to a chat GPT and say, chat GPT, give me a trillion dollar idea. You know, it's more like build. A personal knowledge base in that AI system. That was Dr. Randall Olson talking about the trillions of dollars in business value available right now with current AI tools and his practical advice for executives.

Randy works at the intersection of data science, AI engineering, and executive strategy. He's the co-founder of Weird Studios where they build technology with privacy first, AI applications, and human centered design. In our conversation, Randy lays out a practical [00:01:00] framework for moving from ideas to execution.

We discuss how non-technical leaders can rapidly prototype concepts like uploading photos of car accidents. To test the new insurance classification system with virtually no engineering cost. Randy makes the case for focusing on what might be considered boring but valuable problems. And I actually give the example of an EdTech company that discovered 20% of a support tickets could be automated with a simple retrieval bot.

Instead of chasing ambitious and ill-defined projects, we get into the specifics of improving AI performance through advanced prompting and why you should treat AI evaluation with the same rigor as software engineering. We also explore how to automate tasks incrementally along the agentic spectrum, starting with simple workflows that summarize meetings and email to-dos before attempting to build fully autonomous agents.

Finally, we cover why organizations must carve out dedicated time for AI experimentation, even if it means a short-term dip in productivity to unlock significantly more [00:02:00] long term gains. I'm Hugo Anderson and welcome. Two vanish ingredients.

Hey there Randy, and welcome to Vanish in Gradients. Hey buddy. Good to be on here. I think this is my first ever live stream, so this will be fun. So let's jump in, man. I am super interested in thinking about. How executives can use a ai. So I'm wondering if you could tell us a bit, maybe like what's the most practical first use case you've seen executives use to try to [00:03:00] get real value from AI at work?

Yeah. I mean, and that's, that's like a really challenging question that a lot of people get stuck on, you know, because like, AI is just extremely capable nowadays, and, and it's really cool being in this space just because like every month feels like Christmas with new models, new capabilities, et cetera.

What I, I tell a lot of people is like, Hey, like even if the AI models right now didn't get. Any better at all, right? Like all the r and d stopped and we just froze the models. There's trillions of dollars of business value to unlock there. And so like one practical use case that I often suggest, you know, where, where execs, managers, et cetera, start is like, use AI to help you do that.

You know, and, and I don't mean just like come to a chat GPT and say like, chat GPT give me a trillion dollar idea. You know, it's more like, uh, build. A personal knowledge base, um, in that AI system, you know, so that can be personally about you and maybe we can chat about [00:04:00] that later. But business wise, it can be uploading meeting transcripts, strategy docs, business timelines, et cetera.

Put in as much knowledge as you can. Into that, that knowledge repository, and then have your chat, GPT, your Claude, whatever reference that you know, and ground its conversations in that. And so there's been a lot of execs who have found tremendous value from just doing that. You know, they, they take their business plan, et cetera, and then they go to a chat, GPT, whatever.

Uh, I'm not gonna recommend a particular product here. You know, like they're all equally good and bad in their own ways. And, and you say, okay, grounded in. My particular situation, you know, with my particular context, what do you advise? And so that's extremely useful because you know, now it knows your business context.

It knows about you. Like if you give it deep like transcripts, it can even kind of learn how you talk and how you talk about things. Right? And that's really useful too. And so now you know, it [00:05:00] can help with that strategic brainstorming, so right, so you can have that conversation back and forth with it.

You know, like, Hey, I'm thinking about adopting ai, where should I? Start, et cetera. Um, and it will likely start bringing out strategic framework, right? That you can follow to, to help you think through that. There's other things you can do, you know, like synthesize information across departments. So another fantastic example I saw was taking meeting transcripts from all across the company.

Putting them into a common knowledge base. And now you know, you have conversations, daily updates, et cetera. Everything that's going on in your company. You don't have time as you know, an executive to stay on top of all that and listen to it yourself. But you can synthesize that information using ai.

Right. So there's like, those are kind of like. Massive information, syn synthesis use cases. But there's also like, Hey, based off of all the emails that I've written in the past, write it. Write this email in my voice. Right? And that's like a small win too. So that is, I think one of the, the easiest [00:06:00] early wins that people can have is take your data.

Put it into something like a custom chat, GPT or you know, Claude has where you can put it, put in a knowledge repository, put it in there, make sure you're not like potentially exposing major secrets, company secrets, whatever, and then, you know, start chatting with it there. I think that's one of the great, great early use cases.

It's hot, super high it, yeah. Love it so much. And I am wondering if people do need to do something, I, I know of course you think so much about privacy with products like. Chat, GBT, Claude Gemini, and even doing your cu, your custom ones, how should people think about doing it in a way that's a bit more privacy preserving?

Hmm. Yeah, that's the tricky one, is that you really gotta go in there and, you know, read the terms of service, et cetera, and see if, if you're okay with how, with how they use the data. Right. There are ways with, you know, with all these companies, especially in like enterprise plans, we can get into like a zero data retention policy, et [00:07:00] cetera, right.

Where they will not, um. They will not retain the data after you upload it. After you upload it, you know, they're not keeping it. Um, or they, or, you know, maybe they'll keep it only for seven days, 30 days, et cetera. You know, there's that way. Um, other things you can do is you can explicitly go into these comp, you know, to these services and say, Hey.

Don't train on my data, you know? 'cause that's ultimately what a lot of these chats are, is like, uh, a source of training data. So like, you know, because like humans are great data generators, and that is something that I do whenever I go into any of these tools is I immediately say like, like, no, I do not wanna be a part of your training data.

You know, don't, don't do it that way. Um, but if you have. If you have like data that, or, or you have an IT team that is just really, really strict, you might not be able to use a chat GPT or something like that, right? It, it's all kinda dependent on, on how you set things up, but I think it, it is a smart thing to be mindful about, um, but also.

You know, don't, don't let it [00:08:00] get in the way of really seeing value from these AI tools. Yeah, absolutely. We've got a great question from Pascal, uh, in, in, in the chat who, Pascal actually, I've, I've put a link to the course that Stefan and, uh, from Salesforce and I, I teach on building with LLMs Pascal, uh, was in one of our cohorts, um, and Pascal's actually a machine learning AI engineer, the US Air Force.

And he says, this is an a, a great topic. He asked, can local models be a solution for far more private knowledge, content, AI driven development? I will say absolutely. I'll say there's a huge cost in in, in doing that as, as well for pers like for me personally, running a local model, chill, but doing it in an organization can be costly just in terms of getting it up and running, and then maintenance as well.

And headcount is gonna be a, a big cost there. But I'm, I'm interested in what you've seen with respect to the use of local models, Randy. 100% immediately. My, you know, my first thought when you asked that question is like, you know, oh, local, like local models, you know, running your own model on your own server.

You have full control. You're not sending it off to the open AI [00:09:00] servers and hoping that Sam Altman isn't, isn't saying, I don't care about your data privacy agreement. We're training on this anyway. Right? Running everything locally is of course the more, most secure way, but you, you, you nailed it, man. Like that is extremely expensive to do, especially if you, uh, wanna run like one of these frontier.

Frontier models. You can't even get a lot of these frontier models, right? They're like, they're private, they're owned by the companies, they're not open source, et cetera. You're gonna find yourself at least if you wanna stay on top of the latest models. Constantly having to update it and to re-engineer the system and et cetera, et cetera.

You know, so it's just like, if you're willing to make that investment and that trade off and, and you, and you know, it's like an absolute must that you must not be sending data off to private servers, then yeah, you do have to make that investment and you either have to say, okay, you know, we're gonna be, we're gonna have a dedicated team.

To be constantly deploying and updating and managing these models or just accept like, okay, we're gonna have to run some older models. Right. And older models may [00:10:00] be from a few months ago, but that's like ancient in AI models. Hmm. Yeah. And if you are using MO models instead of products in terms of like.

Setting up inference endpoints or something like that. There are places like Base 10 where you can host your own as well. So definitely check that type. You'd need to check terms and conditions and all of that, but also think about where your data already is, right? Because if you already use AWS to store secure user data, perhaps, then user AWS's instance of clawed will be enough for you, and if it's similar terms and conditions.

Having said that, of course, when you use endpoints, you don't get all the fun. Like memory stuff that the products have, right? Like you need to build memory systems yourself to have conversational interfaces and these types of things. I am interested Randy, in, in one other thing that I, so for example, and we chatted about this last time, I've seen.

Car insurance executives want to prototype. I wonder if a foundation model, we can give it photos of accidents and have it classify them. So maybe we [00:11:00] could create an app to that users can upload it to and we can do some automatic classification. Then have a human in the loop or, or something before getting engineering team involved.

They can jump into chat GBT and upload photos and see. If that works. Um, Ima or if it, if it somewhat works, and that's a prototype may not work in, in production, but there are huge wins in terms of non-technical people just being able to prototype stuff and experiment themselves. Right. Which is so different to when we first got to know each other, when machine learning.

As it turns out was somewhat niche, although a lot of us were doing it. Yeah. A a hundred percent. And so I, I, I think that's like been, um, a really cool evolution in the space. You know, it's like one of the greatest things about AI nowadays is like, it's made it super easy to take any idea and turn it into reality.

Right. Um, I actually have a cool example we can talk about later. Where, where we do, where, where one team did that in a really cool way and so. Yeah, like, I, I think that is a really powerful tool that like [00:12:00] execs, managers, you know, essentially non, even engineers can, can use this. Where you can say, Hey, I have this idea.

Can AI do this? Right? Mm-hmm. And something that I really try to coach people about, uh, when thinking about AI is to take that mindset of saying, ah, probably not. It's like not trained on our data. You know, it's not trained on the specific use case, but rather to, to approach it with curiosity and say, well, maybe.

We don't need to build a whole production AI system. You know, we don't need to spin up a whole AWS instance, blah, blah, blah, blah, blah. Local models to do this. We can just upload some images to chat, GPT, cloud, whatever, chat with it and kind of see, oh, you know, can it actually extract useful information from those images?

So, you know, that particular use case where you're talking about can it take an an, a picture of a car, you know, and like analyze how it was damaged, dah dah, dah, dah, you know, maybe, right? Like that is something. Where if, you know, I was talking with, with a prospective client and they were saying, Hey, we have this use case where we wanna do this, and I'd [00:13:00] say, well, let's take some images and upload it to chat GPT and see what their models can do.

Absolutely. Um, and like that just opens so many doors because now you're not, now you're not blocked behind. Well, this is a cool idea and now I have to spend a hundred thousand dollars, whatever, you know, on, on developing this product. Now it's, it's a, a $20 subscription. Right. And a, and a little bit of your time.

Go into chat. GPT does this work extra? Yeah, so it, it saves both ways. Like it can open up possibilities, but also like if it, if it doesn't work, doesn't seem promising, what'd you lose? You lost 20 bucks, you know, if, if you didn't already have the subscription and a little bit of time where you up, you know, up where you kind of just chatted back and forth a little bit.

So yeah, that's, that's a, a really great way for folks to get tremendous value out of AI too. So many wonderful and, and key points in there. The first is the ability to experiment. Now, try not try a hundred ideas. Maybe you promote one to the next step, but realize you can start a lot, but spin things down very quickly.

Mm-hmm. [00:14:00] Um, another thing we've identified here is everyone's obsessed with the generative capabilities of foundation models. Essentially, we're, we're talking about, you know, in, in context learning or doing machine learning classification in these models and harnessing the multimodal capabilities there.

One pro tip, if it doesn't work quite as well as you'd like. Give it a few examples, a few examples of different cases you know, that have coverage over the types of things. You'll see and see that with Few Shot, that's called Few Shot Learning how it can actually improve significantly. One final, like maybe like next Level Pro tip is if, let's say you do classification on car damage images as to.

What the damage is. You could then include a spreadsheet of insurance coverage types and payouts and give it a bit more information, and then create a pipeline in chat GBT that says what the, what the payout may need to be, and write it to a CSV or a PDF or an Excel spreadsheet or something like that.

So don't start like that, but you can start building these, these [00:15:00] pipelines to to prototype relatively easily. Love that. Yeah. And, and another thing to add around the examples, I, I love that you brought up like few shot learning, right? Because that's such an advanced example, but you can do that in chat GPT too, right?

You just give the examples right there in the chat. If you also provide. The reasoning. So like you give an image, you know, and you say this was the decision around that. If you can give the reasoning too, that really helps AI learn because a, like a lot of large language models now nowadays are much better at deducing rules from examples.

Then they are from following rules, right? So it's like you can write a whole book of rules for it to follow. In this case, do that, da, da da, da. It's probably not gonna follow that. That rule set quite as well as if you give it some really good examples with reasoning and then it goes, okay, now I know how to do this.

Right? Absolutely. I, I think a combination of examples that have some sort of coverage over the cases you want. Reasoning heuristics, maybe a few rules, uh, as well, getting it to explain its reasoning also. And the reason [00:16:00] of course giving it reasoning and heuristics is it they can tend to overfit to the examples you give them as well.

Couple of other things and then, then let's move on. But. Giving it a role, like if this, this is wacky, but if you say you are an expert for at the top car insurance company at identifying what created certain accidents in cars and that, that type of stuff, just as if you're writing marketing copy or looking at a campaign saying you're chief marketing officer for a company that gets, you know, that has.

$1 billion annual recurring revenue or something like that. It will often be more performant than if you don't say that. The other pro tip, which is kind of whacking, that's why I call it prompt alchemy sometimes, and not prompt engineering. If something's really important, say it at the start and remind it at the end as well.

'cause language models can be a bit forgetful. We talk about hallucinations a lot. We don't talk about how forgetful that can be 'cause they get very excited about what they're doing. Yes, exactly. I guess they can suffer from shiny ob object syndrome, just like, uh, just like [00:17:00] all of us. Right? Um, and, and speaking of that, like, I think another really like important thing to bring up here that is useful about prototyping, rapidly iterating in, you know, your chat, GPT, whatever is, it also helps you think, is this worth doing, right?

So we've talked about can you do this also? Is this worth doing right? Because like yeah, AI can do. Thousands, millions of really interesting things, but like what's actually gonna drive business value. For your company. Yep. Right. And so here and now you can get, you can have a little prototype working. You can say, yes, this works.

And then you can say, okay, you know, this is automating X, Y, Z manual process. I think it's gonna save five hours per person a week. We have 20 people on this team that do these reviews, da da, da. You know, like it's gonna improve accuracy this much. And then you can start thinking about dollar signs right Now you can say, okay, how is this thing that I'm prototyping now?

Actually going to affect the company bottom [00:18:00] line and not just be a shiny toy that dies in prototype stage. So that's another really, really helpful thing to keep in mind too. And you can use AI to help you do that too, right? Mm. Like you can say, here's my company context. I'm building this thing. Is this actually gonna like, drive value for me?

Without a doubt. So, for example, and, and it is so easy to get attached to the shiny thing. So I was speaking with a, an ed tech company recently. I, I think I may have told you about this, who they were like, we wanna build and. All purpose AI tutor for all our students. And I said, why? And they said, oh, it would be, how cool would it be?

I was like, it would be cool. And they're like, and we also need an AI strategy. And I was like, yeah. So how are we going to evaluate it? What's, what's the, the increase in revenue due to this? And they're like, oh yeah, well maybe we need to think through these things. Maybe we just spin it out. 'cause it's easy to do a prototype.

And I've seen this happen so many times over the past few years where these types of things, um. Uh, then spun down 12 to 18 months later. So what I said was, let's jump into all of your support tickets and see what's up there. As it turned out, [00:19:00] 20% around of their support tickets. Um, were people asking which lesson is this particular topic in, right?

Mm-hmm. Yeah. And, and they've got transcriptions of all their lessons as well. So then I proposed you should actually think about building some sort of retrieval bot over transcripts, right? Mm-hmm. And if you solve this, and I think you'll be able to, you've just solved. Like 20% of your support tickets, and let's look at how much it costs you to solve those tickets in terms of human customer support time, and see what the, what the return will be there.

And then it's quite, and it isn't shiny. It isn't sexy in any ways, but it does, um, get return. Hundred percent. Yes. And, and so like, that's just really the, the wonderful thing about this approach is, is it like, it, it also lets you rapidly iterate. Yeah. You know, which I think is really key because the goal with really all, all product development should be learning, right?

Like, you wanna, you wanna rapidly iterate, you wanna rapidly learn, are you building the right thing? Are you solving the right problem [00:20:00] in the right way? So yeah, that's a, that's a huge win right there. Another thing that, that I want to call out in this space too. The, the other thing I, I, I preach is like, start in your chat.

Your chat, GPT, your Claude prototype. Well, what do you do if you have something that looks promising and it's like, okay, like this might have ROI, you don't have to jump straight into ENG in hiring the engineering team or bring the engineering team in. There are also like no code, low code tools, you know, that can help build a small scale prototype and now you can actually start seeing, okay.

How does this work beyond my little personal use case? How does it, you know, how does it start working when I deploy it and other people start using it? Right? And so you can use, there's, there's tons of tools out there for this, right? Like, is you, like lovable n Aden, et cetera. Like this may be a little advanced for a lot of users, but I mean, lovable and these other tools, you, it's literally just like build this thing and Out pops an app.

Um, that's not gonna be your production grade app, but that's gonna be your next level, right? Yeah. Like super empowering to [00:21:00] see, okay, how do we go from prototype to like somewhere in between there, right. Where we can start measuring things. Totally. One thing we chatted about when starting the conversation was uploading, uh, your document or giving, giving context to, to an LLM, right?

Or. They're not even l lm, just LLMs now, right? All of these applications have image models, clip whatever is embedded and, and connected. And they have agentic capabilities. They have tool calls able to search the internet. So these, these foundation plus plus models or agentic systems, you can upload PDFs, right?

Uh, you can upload Slack conversations, you can take screenshots, you can upload transcripts, you can upload email threads. Also, Manus, if people wanna check it out, have just, um, recently launched their like. E relaunch their email man agent where you can tag it in an email to do things for you. So when we're seeing a lot of movement here at at at the moment.

And one other side note in terms of tagging things in regular interfaces, you can now, like Devon [00:22:00] Cursor, all of these things have Slack integration so you can ping your. AI coding assistant in Slack or Discord or whatever to say, can you update this documentation? And those types of things. And we're gonna see more and more of that on the non-technical side as well.

But you can upload PDFs, you can upload all types of documents and that you, so we know how to do that in a single use fashion. Right? But in terms of like significant gains in efficiency, in the end, we really wanna be able to connect our LLM to a preexisting knowledge base. Right? Or have it remember. Um, what we've uploaded before.

So how do you think about this in your work? And also like for executives who want it to have constant access to its Google Docs or something like that. Well, so I mean, like a lot of, even your, your, you know, your chat gpt, your clouds are, are building these integrations into Google Drive, you know, so, so it's like, if, I'm just gonna assume, you know, that maybe you're working in a space isn't as locked down.

Like you're not working for the Air Force, right? Isn't a lockdown. So like [00:23:00] maybe you can, you can use Google Drive, maybe you can do an integration between chat GPT and Google Drive. And those integrations aren't just like, download all the files, right? They're doing rag. On that. So now, like imagine, and I've set up systems like this before.

Every single call you're on, every single call transcript goes to Google Drive, every single project you work on, um, all your project notes, all your research, you know about prospective clients, whatever goes into Google Drive. And now your chat, GPT can do an intelligent search through your Google Drive, right?

And when you ask a question. It can just pull the right information in, um, when, when answering that rather than answering generically. So I think that's one really easy way to do it, you know, is like that. That's an easy, that's one of the easiest ways to build a knowledge base is just to like, put things in Google Drive and then connect your chat GPT through a connector that way.

Um, because pretty much all the connectors are doing this in a relatively intelligent way nowadays. But there's, there's like lots of other products out there too. Like I, I would say. Like [00:24:00] knowledge bases have been kind of commoditized because they're one of the easiest wins in ai, right? You just like upload all of your documents, whatever format they're in, and then there's lots of companies out there who are building, you know, specialized, parsers, specialized ways of representing that data so that when you have a question, you know, the AI can go right in and and pull the relevant information out.

So, yeah, I, I think. I'm not gonna recommend a particular product there in the know knowledge base space, but you know, there's, there's lots of off the shelf tools for that. If you're on AWS or whatever cloud platform, they all have their own, you know, relatively easy, um, solutions for that. Yeah, I, I, I, I, I don't think I have any strong recommendations beyond that though.

The other thing worth mentioning here, of course, when you're using a knowledge base in that manner, it's often more performant than what I'm about to describe, but not, not always. Just watch out for giving chat. GPT for example. For example, any of these, uh, systems [00:25:00] and products, A PDF and saying, summarize it for me because a not insignificant amount of the time, it may not inspect it, and it may give you a summary based upon the title.

That looks reasonable and that's something I've encountered quite a bit. Yeah, I, so I mean this, this kind of touches on as up something else I know we wanted to chat about, which is like when you use a chat, GPT, a common issue we all run into is that like these models are all trained to be friendly and helpful, right?

And to always give an answer. And that's great in a lot of use cases. But when, but that can also leads to hallucinations. Right? You know, like that's not the model of making, making something up just because it wants to, it's making something up because like, it's basically mandated to give an answer so you can.

Prompt it. You can tell it yourself or all the time. I'm constantly asking if I'm using a chat GPT, are you sure? You know, like, did you actually read that? Right? My my favorite thing is I'm, I feel like I'm [00:26:00] constantly telling AI to disagree with me. You know, like, I don't want you to agree with me. I want you to challenge me.

Like, I consider it a win when the AI actually proves me wrong. You know, it's like, actually, I think you should go this way, or you should use this thing and here's why. And then it convinces me like, that's a huge win. Right? Agreed. You do have to be careful, um, with a lot of these systems, you know, they make accessing, you know, rag knowledge basis, et cetera, really easy, but sometimes, yeah, they can mix stuff up along the way.

Absolutely. I totally agree with that assessment. I'll, I'll add a slightly different flavor to it though. I don't think they're only trained to be helpful. They're, they're trained to appear helpful, which is slightly worth. Right. Yeah. Good call

now for a word from our sponsor, which is well. Me, I teach a course called Building L lm powered software for data scientists and software engineers with my friend and colleague Stephan Crouch, who works on agent force and AI agent infrastructure. [00:27:00] At Salesforce, it's cohort based. We run it four times a year, and it's designed for people who want to go.

Be on prototypes and actually ship AI powered systems. The links in the show notes, and I totally agree with, with prompting and, and telling it explicitly. I want you to ground everything you say in actual docs. Give me, if it's a transcript, give me timestamps or give me references, page numbers, or all of these types of things.

And on top of that, I do think treating it as like a superpowered. Intern with an incredible memory who has a DHD traits and tendencies, to be honest, right? Will do lot, lots of incredible things, but may do things that you haven't asked for, may forget certain things, and in that sense. If you had an intern like this, you'd probably double check some of the work.

That's pretty important, right? Yeah, yeah. Um, I mean, and this is, this is [00:28:00] why, and I, I, I know you've, you've recently talked with, with a lot of folks about AI evals, right? Mm. Like this is increasingly becoming a super, super important thing, you know? And I say like. AI evals should be coming in probably sometime after you're doing like the personal prompting thing, you know?

Mm-hmm. And you're like starting to build the automations. You should also be saying, Hey, you had these, you had these labeled examples, right? Of like, here's the document you're providing in. Here's the output that you want. And maybe even here's the reasoning. You should have a system that is basically checking your model responses at scale to say, okay, you know, like, yeah, it may have worked great for your few personal use cases that you did in chat, GPT.

How about we expand this out to 500, you know, use cases. Let's see how it works at actual scale, you know, where, where are the bugs, et cetera. Um, and so I, I've seen, you know, a lot of really, really smart people converging on this idea around like, Hey, like building an AI eval system is really important.

And just the [00:29:00] way, same way as like building unit tests, building integration tests. For software is really important. You know, like how do you lock down expectations of performance, uh, at scale? You have to automate it in some way, right? Yeah. And AI causes that. Yeah. So that's, that's, that's a very exciting development in the field for sure.

And, and that's actually, I, I know that when we first started talking about this, I was like, oh yeah, I'm doing consulting, having a great time, but like I, I got so incensed about AI evals that now I've actually jumped off and now I'm building an AI evals product myself. So incredible. Yeah. Yeah. And I do think, you know, and my friend Shreyer and Hamel who run the evals course, that if people are interested, they should check out on, on Maven.

They always say that it's kind of a shame that it's called evals. 'cause it sounds more niche than it should be. Yeah. And they don't say this, but I, I kind of think of it as it's about making sure your product does what you think it does and having a way to, to measure that. Right. And as it turns out. How do you do that [00:30:00] at, at scale?

Well, there, there are lots of ways, but one way is having like a gold test set which has, you know, coverage over all the use cases you think it will. And having subject matter matter experts in the loop, seeing what's working, what isn't, and looking at your data. And as it turns out, building these things in this skillset isn't new.

People doing data science and machine learning have been doing it since for a decade or two now. Exactly. Yeah. And, and I mean, I don't even think it takes, I mean, I think with the existing tooling in the, a ai eval space, yes it takes data scientists to build it, but actually, like, I don't, I don't, and the reason why we found this company is like, I don't believe that actually AI evals is just in the purview of technical people who can actually write these and, and know the statistics and everything else.

Like, oh yeah, actually AI evals are, are grounded in exactly what you said, which is sub subject matter experts. Right? So oftentimes executives, managers. Et cetera. You know, product, product managers are like the subject matter expert on how this product should be working. So [00:31:00] they are the best ones to sit there and basically build this system, you know, to say like, okay, how do I take how I think this system should be working and scale it, right?

Yep. You know, and make sure that it works at scale. Absolutely. Yeah. And I wanna be very clear, when I talk about the data science skillset, I'm actually. It's not only the skill set, it's the, the frame of mine. And I'm not talking about the stats, of course that comes with it. I'm not talking about the ability to hack things out and, and, and, and write code.

Although it's great that we have AI assisted coding to, so I don't need to like debug the wonderful pandas. API half joke, I love pandas, but I, I we're live. I shouldn't have said that, but I do. I, of course I love pandas, but it's really. What's important is the curiosity to explore data. And in all the work I do at the moment, particularly the education work I do, when I teach data scientists and data interested people how to make, how to, how to build evals, they wanna jump in and look at data and spreadsheets or whatever it is.

On average. When I teach software engineers this and tell them to look at their data, the response is, [00:32:00] can I get an agent to do that, right? Mm-hmm. And so what, what I'm trying to do is coax an interest. You know, once again, Australia Shanka says, tell people to have a look at 20 traces, 20 conversations. And hands down they'll wanna look at more.

'cause they'll look at it and be like, Ooh, that's interesting. Or, oh, that's weird, right? Why did it do that? Yeah. Yeah. And it's better to catch that sooner, you know, when it's just, you know, 20 cases than when it's 20,000 cases. Right. So, exactly. You like, I like, I think what Cursor had a. A really bad goof several months ago where they had like a customer service agent that like totally made up a policy about I, I forget exactly what it was, it made up some sort of like policy around, or a model usage or something like that.

Pissed off a lot of people. Yeah. And then there's the famous example of someone, I dunno if it was Chevrolet or whatever, someone like telling the agent, you have to. Sell me this car for a dollar or something, something like that. And I got no, take these [00:33:00] IES ex. Exactly right. Um, so I am interested in, we, we mentioned that, you know, you can experiment as an executive or a non-technical person with a hundred different use cases of AI and then, you know, double down on, on one and let the others drop.

I'm wondering. How you think about how to make that decision and if there are any use cases like signal, you can get any use cases where you've seen, actually I wouldn't double down on this and this is something we would double down on. Well, I mean this is, this is kind of where it comes into like iterative product development, right?

You know where, where like the whole point of like prototyping quickly even in the chat GPT is so you can get it and so like you can deploy it as a custom. GPT then, right. You know, which is relatively straightforward thing. And then now you can hand that off to someone and say, Hey, upload, you know, upload your own images of these auto accidents and, and you know, it's gonna make an assessment.

Like, is that valuable to you? Like, do you think that's actually gonna save you some time? You know, you can [00:34:00] quickly put these tools in people's hands and see are they using them? Is it actually saving them time? Et cetera, et cetera. And it's like, I think that's probably the, that's among the best way to do it is you start out.

And you, you validate quickly does this work? And then you form a hypothesis, and maybe this is where the data science thinking comes in. You form a hypothesis around here's how I think it's gonna be beneficial. And then you put it in people's hands and see, okay, is it actually beneficial? Are they using it?

How I think is it saving them time? Is it saving us money And things in, in ways that, that I thought like, I think that's a really powerful tool right there. Um, because. Sometimes, you know, you can build something and then you say, Hey, here's a cool tool, you know, it's gonna save you a bunch of time. And they're like, whatever, I'm too busy.

Right. At least, at least if that happens, you know, you, again, it goes back to iterate and fail quickly if you need to iterate there or if you just need to let it, let it die. Better to do it at that stage. Right. So I, I think that's, that's one of the more powerful tools. One of the powerful ways to do that, [00:35:00] um, is to really, you know, build the easiest prototype you can using.

Your, your chat, GPT using your lovable whatever. Get it in people's hands, see how they react, iterate from there. Totally. I, I'm also interested in how we think about getting. AI to do things. So moving a bit along the agentic spectrum, and I'm talking about like having a hundred things that could do or that type of stuff.

And funnily, I do have a, a blog post that garnered a bunch of attention earlier this, this year called Stop Building AI Agents. And I, the reason I wrote it is because I saw too many people get burnt using frameworks. Yeah. Essentially. And not being able to inspect their systems, not being able to see what prompts were being sent to the LMS and not being able to evaluate, but.

Of course I use AI agents all the time daily, right? So I was like, what? Like I was like, how can I tell people to stop building them where I use them E every day? And I realized it was because when I use them, [00:36:00] the AI has high agency, but there's strong supervision with me as well, right? So I don't use Claude Cursor in what they used to call YOLO mode.

They used to call it YOLO mode. They've changed, changed the name, but they really should. I remember that. It really was yellow mode, like it was deleting people's repositories. Dude, Steve Yeager, right? He got burned and like if, if he's getting burned, you know, we're all, we're all going down a wild path.

But I do honestly think that LLMs AI that can do things will provide 99.9%. Plus of the economic value and productivity value of the these systems. Just because firstly, I'm kind of sick of chat interfaces in, in some ways I think they're very valuable in, in, in, in, in a lot of cases. But they're, the ability to chat and read and derive value from that is upper bound by human time, right?

Which is a very precious resource. So the ability for, instead of an executive, for example, having to have a conversation with an [00:37:00] AI on a Monday morning, just imagine if. It was pro proactive, and I think we've got reactive agents now where we ask them questions, they respond, but proactive. Come to you having done, you could do it with a CR job man.

Like I'm not, and anyway, I don't wanna get into too many, too many technicalities. But for executives to have an, an AI agent come to them and say, Hey, this is what happened last week. This is what's on your calendar in in the coming week. He's a slide deck summarizing all of it. Then a slightly more futuristic vision of that would be.

Here are your emails, here are the priorities that you need to reply to and let's shift around these meetings so you can get deep work on Thursday afternoon. Yeah, I mean, I, so I mean it's, it's interesting. It's like we're almost 45 minutes anyway, just started talking about AI agents when we're talking about like how people can use ai, but like, honestly, like AI agents are, can be really hard to get to actually function autonomously, right?

Because there's a lot, because when you give them access to all these tools, et cetera. Maybe most of the time they can work properly, but you know, the, sometimes they [00:38:00] may go off the rails, right? So I think AI agents are just like super, super exciting, but also really hard to build. Um, which is why we kinda like, have this super hype around AI agents.

But the fully autonomous ones are kind of like a little, a little slow to come around stealth. But that does lead a nice segue to one. Pseudo AI agent that we made. Um, at my, at my last job, which I thought was amazing, which was basically after we, so you know, this, this was in like a sales role. We would have a discovery call with the client and we'd just talk about what they're doing really casual, da da da, da, you know, what, what are you building?

What's your company? Where is the value? Yada, yada, yada. And then like. That transcript would go into a known place and then that would kick off you, you can call it an AI agent job that would go, that would go and basically run that transcript and all the other relevant, relevant context into Claude, and then build a functional prototype right there.

Amazing. Of what we were talking about. Right? So then like literally right after the meeting, we could have a meeting an [00:39:00] hour later and say, okay, here's our initial. Thought on what you know, what you wanna build, what do you think? And like that, I think is a really valuable version of an AI agent. You know, it's sort of like a constrained AI agent because.

You know, it can react to, well, obviously Claude is, is going nuts and reacting all kinds of different ways when it's building an app, but it can kind of react to the context in a limited scope and then, but it still ultimately has the same deliverable over and over again. Right. I love it, but. Yeah, so I think, I think that's where we're gonna see tons of business value.

You know, just like the example, example you gave tons of business value and, and like limited AI agents where there it's like a simple automation. Where's a highly repetitive, laborious, manual task? How can we. Have an AI agent take even a piece of that and automate that, right? Like that's, that's a huge win there without a doubt.

And that's, um, what I was referring to earlier when I used the term, the agentic spectrum or agentic continuum, you can have an LLM and give it a single tole. So [00:40:00] all jargon aside, imagine. Randy and Hugo on a call with my circle back or whatever AI transcriber we have. So we already have a speech to text.

Then we have that stored the transcript in Google Drive afterwards. So there's some sort of little workflow happening there. Then we can have it, have a summarization tool call, don't even need to worry about the term cool tool call. We can have Claude, for example, summarize the conversation and email us our to-dos afterwards.

Right. So that's agentic. It's really, yeah. One or two to tools, depending on how you classify tools. There are no multi-term conversations. You don't have to worry about memory. There's perhaps, when you are getting out the to-dos, you're re using retrieval of some sort, although you don't really need to think about that depending on the, the length of the call because of con context, windows are pretty good with these types of meetings these days.

And then perhaps at some point you're like, oh, actually, when Randy and Hugo chat. They often try to set up the next call and they're in different time zones, and that's annoying, particularly at [00:41:00] this time of year when there's daylight savings challenges. Right? Yeah. So perhaps we can just add one more little agentic tool call to it, where it will set up the calendar invite that Hugo and Randy agree on in the call.

And these are systems which are not full blown wild, like multi-agent, sub-agent spawning all, all of these types of things. And the evaluation process will be relatively straightforward, but this is already. Achieving a lot more than, you know, a a, just a conversation with an LLM would right. A hundred percent.

Yeah. And, and this is kind of like, you know, maybe the like boring but valuable use cases that I think are really gonna proliferate as, as companies start getting their hands on AI is, is kind of like not, we don't need a full blown, you know, autonomous AI agent. We just need something that can, that can kind of work with the heterogeneous data sources that we're bringing in.

Whether those are call transcripts, images, whatever, generate reports, extract, synthesize information. And, you know, do like, do basic tests on our [00:42:00] behalf, you know, and a lot of those are gonna be like. Things that you don't even think about or most of us don't think about, you know, like lead qualification for sales, right?

You can take this initial call and and say like, oh yeah, I think in the real estate space, right? Oh, I think we should send this person to a real estate agent. Now they're about ready to buy. Uh oh, no, actually, you know, or not. You know, like that's a really important decision early on, really laborious thing.

So there's a lot of like these kind of. Back office jobs that, that can have AI automation built into them. That's not super cool. You know, not, not super sexy like a lot of the fully autonomous AI agents, but they're gonna drive a lot of value and, and are for the companies that are like, jumping ahead of the space and, and, and experimenting without a doubt.

And I love that you mentioned sales because. For time things to revenue, sales is the place to do it. I mean, you know, we've always joked that 50% of marketing works, we just dunno which 50%. So marketing is definitely a really tough place. [00:43:00] But I am very interested in, okay, I've worked in a bunch of SaaS startups as we've scaled from five people to 150, 200, whatever reason I mention that is some of the most.

Interested people about exploring data are scrappy salespeople, like I've seen, I've worked on, on teams where like. Account executives and BDRs, like the ones who really want to like get those commissions right, will jump in and learn a bit of sequel or start to use Looker or some, something like that.

Instead of waiting on the data science team to give them the Monday morning da, they'll dive in, get on the phone, whatever it is, see what the hottest leads are, and that's, that is. So wonderful. And the reason that's relevant for our, our conversation is giving these people a bit more space and time to experiment with how they can use AI to do this now, I think would be incredibly powerful.

And I, one of the big failure modes I, I see in organizations that I work with at the moment is [00:44:00] leaders and middle management, expecting people to add AI to their tools and see immediate efficiency gains. Yeah. As opposed to. Where I've seen it work a lot better organizations carving out time, even if it's like every Friday afternoon or whatever it is, and perhaps even seeing short-term efficiency losses for medium and long-term gains.

So I'm wondering how you think about the organizational need to actually carve out time to experiment with these tools as opposed to have it as an add-on for efficiency gains. Ah, interesting. I mean, gosh, how, how to, how to say it? I think adopting really adopting any new technology takes, you know, has a learning curve.

Right. Um, and, and so I think you have to have that strong case, uh, right? Like, I mean this, I think this just goes back to what we were talking, uh, talking about at the top of the hour, right? It's adopting AI just for [00:45:00] AI's sake, is. Probably is gonna lead you to like generic chatbots, right? Hmm. And, and like things that, that are, aren't actually that useful bu you know, rebuilding the wheel, yada, yada, yada.

So I think like you, you really need to build that strong case early on, right? With the tools that we talked about on, on, on this call here. You know, like rapid prototyping. Building your, your hypothesis on like, Hey, this may take a month of engineering hours to build, in addition to, you know, my own personal time, a a couple weeks to prototype whatever, but, you know, I think this is gonna affect the bottom line by $5 million within a few months, right?

Like, it, it, it, it comes back to that as long as you can make that case. And you can kind of speak, you know, the, the, the language of, of the folks who, who ultimately are caring about the bottom line. Like that is, that is the way to do it. That is the way to get buy-in on, on these projects, especially ones that may have a, a longer horizon.

Right. You know, especially one, one of the biggest, [00:46:00] um, things that slows adoption is, is like people unwilling to actually try the thing. Absolutely. It's not to like, it's like that's, isn't that the worst feeling? You know, you go through, you build it, you prove it works. It's like it's gonna generate value.

Oh, people don't like it, people won't use it. Yeah. That's my take on it is I think if you have that strong hypothesis, you know, like, oh, I think this is gonna save us a lot of money, even if it costs us in the short term, that's the way to do it. Love it. And I, I love how you framed it as any. Adoption of new technology requires this type of process.

I, I think there are at least three differences here though, that make it worse, um, and, and harder. The first is, so think about when word processing came out, right? You use it one day to type a document, you use it the next day, the same thing happens. You hit a key, like literally you type and it, it, it, it does something.

So it's deterministic in, in, in, in that sense. Right. Whereas if you are a marketer and you're using any of these systems to write copy and then iterate on it [00:47:00] yourself one day, you may be able to do it in half an hour, whereas it may have taken you a half a day yourself the next day it may take a few hours, right?

Yeah. And that's with a stable system. So that's one difference. Another is. These systems are constantly evolving. So one day you're using g PT four, then July or whenever it was, you're using GPT five and it's routing things, and it's like click, like thinking longer for a better A and you're like, whoa, okay.

So what, what's happening now? So this type of thing, um, takes, uh, a, a lot more time, uh, as well, right? So the lack of consistency in an individual tool level or LLM. Foundation model level, and then at a product level as it shifts and, and evolves and then adding agenda capabilities and, and that type of stuff.

The other thing that I'm really interested in your thoughts on these are so, such multipurpose horizontal tools, right? And a word processor was a product, a real product [00:48:00] in the sense that it. Had a specific use. And I like to joke that chat, GBT, like pushes our idea of what a product is because we started using it and clearly it did stuff.

You could chat with it, but it was, it was like, does this product even have a particular use? Um mm-hmm. Because it seems so general, so even knowing it's affordances and capabilities isn't obvious. Right. We're finding it out still. Yeah. And what I'm still one of, one of my favorite use cases with like a, a chat.

GPT is a thought partner, you know? Yes. Right. Like putting in all that context, like I, I actually have, um, a personal thought partner that I've been refining over the past year. Right. And I like, have documents of all kinds of personal stuff about me, and I definitely don't, don't. Make this public, make these documents public.

But you know, it has like a deep knowledge of me and I've been refining the prompt on it too. Like, you know, become a version of me and help me think better. Right. You know, like I'll come to it and I'll say, I'm having this really [00:49:00] challenging thing in my personal life today, and like multiple times I've had this wonderful experience where it's like it.

Gets me, you know, it does kind of the, the typical thing where it's like, oh, you're totally right, but it, it doesn't say, you're totally right, you know, but like, it, it validates me. But then it says. Let me challenge you, right? Mm-hmm. And that like, when that happens, it's like such a powerful thing. 'cause you're like, oh my God.

You know, like that is, you know, an unknown, unknown for me. And so, yeah, I think, I think yes, chat, like chat, GPT, et cetera, is, is a little bit directionless at first. But also that is the power of it too. 'cause it can become many things. Right. You know, like it can become a thought partner, it can become a classifier, it can become a social media post writer.

It can become almost anything like kinda your, you know, the sky, the sky's the limit. Yeah. I, I think that's been one of the, one of the coolest realizations that I had early on was like, oh, within [00:50:00] this model. Is many submodels, right? Like many sub classifiers. Like we never have to build a sentiment classifier again, right?

Like we have the ultimate sentiment classifier just built right into these models and we can just use it that absolutely. Um, and there's yeah, a a thousand more classifiers in that chat GPT model. So without a doubt. And, but of course if you wanna run things for cheap like you're doing, like you're serving something, which incorporates a sentiment classifier to.

Millions of users, perhaps you do wanna use something smaller that you get from hugging face or whatever it is. But prototyping and doing product iteration, uh, with these systems for that is, is so powerful. I, I love that you mentioned you iterate on prompts, so the system knows your, who you are better and what your wants are.

I, I build a lot of content pipelines and so for example, I actually don't use. Vendor APIs for this now, but I used to use vendor APIs and build content pipelines to get timestamps on transcripts of podcasts and that type of stuff. Yeah, [00:51:00] I don't do that now 'cause I actually use D Script, um, which is a product which will do that for me.

Right. And so totally chill. But when I was doing that, you write a prompt and it will give out transcripts and initially it gave out transcripts with a different timestamp every 30 seconds for an hour long conversation. Granularity was tough, right? So then I said, actually maximum every four minutes or something along that, then next time it spat out a result that had all these emojis and stuff like that, right?

And all then I iterate and I was like, okay. And then maybe 15, 20 prompts later I got something which worked. Pretty well. Um, now you need to be careful. Of course, something I said to it was, uh, the order make the timestamps, um, attractive to data scientists, machine learning engineers, AI engineers. Okay. So yeah.

Now of course, if then I did a carpentry podcast, which is outta sample, it would probably give horrible, horrible timestamps. So, yeah, the only reason I tell that story is once again, iterating a prompt by looking at your data, looking at the results, [00:52:00] iterating, and I wanted to mention that. I made it for a specific audience because you may over fit in that process as well and need to iterate a again.

So even having a human in the loop for that amount of time, and I've worked with people and taught people who've, you know, done prompt iteration 50 times, but also people who've done it like 1500 times for SaaS products that, you know, they'll iterate over weeks. Right. Yeah, I mean, well, so it, it all comes back to AI evals, right?

Yeah. Uh, which is, you know, like essentially what we're doing when we're manually iterating is we're kind of like acting as the eval system ourselves. Yeah. But like, we could just take, you know, we could just manually label data ourselves, build an eval system, and then 1500 evals. That's a lot to do manually.

But like, once you have, or sorry, 1500 iterations, that's a lot to do manually. But when you have an eval system set up 1500. Iterations is like five minutes. Exactly. Right. Exactly. You can quickly iterate using, using, uh, uh, an optimization methods. Yeah. Yeah. And a few other point, like get your, try to get your subject matter [00:53:00] experts to write prompts, or at least read them, but hopefully write some of them as well.

Also set up systems where people and teams can share prompts, even if it's something basic like a, a Slack channel where I've seen marketing teams where different individuals have their own prompts and they've barely shared them at, at all. Right? So get those net network effects going. I am interested, we're gonna have to wrap up in, in a few minutes, but I think something I'd just like your thoughts on is, you know, we can get crazy and futuristic and, and technical and all those.

Things. But, uh, I'm just interested in how we can encourage people to use AI for all different types of things. So one example is. I've started trying to build an email classifier, like hooking, you know, my, my system into using the Gmail API, um, and then, uh, classifying emails each day that have come in and just using like classic machine learning to prioritize them for me and, and that, that type of stuff as well.

And then ping me with ones that are [00:54:00] super important, um, and hooking it up. I could, I haven't done this yet, but I could hook it up to WhatsApp or whatever it is. If there's something super urgent and I'm not on email. Because I try to only check email twice a day, you know, then pinging me somewhere else and these type, and then perhaps even drafting emails for me and that type of stuff.

Knowing, knowing my style. So I'm just, just wondering. What type of things do you think about with respect to this and to building personal tools, but encouraging other people to experiment? I mean, so I, I have, I have thought about one of the really fun things that, that I at least used to do was I ran a blog that just solved all kinds of weird problems, right?

We, we were talking about where is Waldo, where, where is wall or earlier, um, optimizing road trips, things like that. Like I, I've, I've thought about revisiting that in the lens of LLMs. You know, it's like, how do we solve these different, you know, like personal problems in the, in the lens of, of LLMs. And I think that's.

Like that's the, the great, the greatest thing I, I think we can do is we can try to spark curiosity, um, and people, and like, this is essentially what I'm doing with execs, right? Like, I'm trying to say, don't [00:55:00] assume that like, you know, your chat, GPT can't do this particular thing, you know, because all LLMs are just next token predictors, blah, blah, blah, blah.

Like. Like, maybe they can look at an image of your family and make a really funny caption about it, or something like that, right? Like, like may, like maybe it can optimize that amazing road trip for you. So I think it's really, it is really fun to, to think about personal use cases where you can like, build fun things with AI too, right?

It doesn't have to be all, all completely serious. Uh, you know, sadly, now, nowadays, I don't quite have the time to play as much as I used to. But I, I think that could be a really fun thing. That's, I've given a lot of thought too, like revisiting, okay. Um, how do we optimize road trips in an effective way using a chat GP team?

I love it. And it's funny that you mentioned you don't have as much time to play as, as you'd like, because, you know, here I am preaching, carving out time for experimentation and all, all types of things. I definitely don't, aren't able to carve out the time. I would like to keep, [00:56:00] keep up to date with everything that's, that's happening and, and having a sense of play, uh, around these tools.

And I do think, you know, a lot of wonderful things coming with all, all of these technologies and a lot of challenging things as well that we need to figure out as, as a society. But I almost think. Thanks for this reminder, Randy. We've forgotten in, in some ways a sense of play and, and creativity and the ability to, to have fun with, with these systems and create.

So I think that's a lovely, lovely note to wrap up on. Indeed. Love it. So people can find you on LinkedIn, um, yeah. On the platform known, formerly known as Twitter. Do you have a, uh, do should they check out with studios or is there anything you'd like to, like, to encourage people to have a look at? Yeah, I, I mean, um, my personal website is randall send.com.

I'm, I'm like, not quite as active on, on social media anymore. You know, I kind of became like a social media recluse. But you can find me on, on LinkedIn. I, I pop on there every once in a while. Yeah, check out weird weird studios. WYRD. Um, you can maybe, [00:57:00] if you type like weird studios, ai, that lab will pop up.

And then we have another really, really exciting AI evals entrepreneurial endeavor coming up soon. So definitely if you're, if you're intrigued by that, either message me directly, I'm happy to chat about it. Um, or ping me on, you know, or just follow me on LinkedIn and, and there will certainly be an announcement soon.

Fantastic. And what we'll do is I'll include the link to weird studios in the show notes and people, there's a stay updated. You can subscribe to the newsletter and hear what, uh, Randy, and correct me if I'm wrong, but you, you founded it with studios with, with your wife. I did, yes. That's, it's something we wanted to do forever and it's like awesome.

It finally be the stars aligned and we were finally able to do it. So now we're able to build a lot of really fun products there. Amazing. Super cool. Randy, thank you so much for your time and expertise. Thank you all for joining as well and for the questions, and appreciate all your work and always love chatting with you, Randy.

Likewise. Thanks so much for having me. This was fun. Thanks [00:58:00] for tuning in everybody, and thanks for sticking around to the end of the episode. I would honestly love to hear from you about what resonates with you in the show, what doesn't, and anybody you'd like to hear me speak with along with topics you'd like to hear more about.

The best way to let me know currently is on Twitter. Vanishing data is the podcast handle, and I'm at Hugo Bound. See you in the next [00:59:00] episode.