The following is a rough transcript which has not been revised by Vanishing Gradients or the guest. Please check with us before using any quotations from this transcript. Thank you.
===

[00:00:00] Look, I think just to take a step back and look at the evolution of this five, 10 years ago, the problem, the same problem has existed back then, except the stakes were much, much lower, right? I think call it, 10 years ago, there were just a handful of people in an organization looking at data. We had a lot of time to make sure the data is accurate.

Fast forward to today, most everyone in an organization uses data and AI as well, increasingly, and so in that world it's really hard to let any data issue slip, and I'll talk about the implications of what bad data looks like for the data team, for the business. When you think about the implications for the business when it's externally external product, it's a way bigger deal.

Just a couple of examples that you know. Come to mind. I think this was like early twenties when Unity is a gaming company had one schema change that cost them a hundred million dollars. Stock dropped by 37% the day after. It was a big deal. One schema joint, that's, I think this is four or five years ago, more recent.

A few years ago, [00:01:00] Citibank got hit with $400 million for a fine for the data quality practices. So now you're talking about pretty significant revenue implications and regulatory fines and risks that organizations are really held to if you are building with ai, but your data feels duct taped together.

You're not alone. In this episode, I have the great pleasure of speaking with Bar Moses, CEO, and co-founder of Monte Carlo about the growing gap between the pressure to build with AI and the reality of unprepared data infrastructure. We cover why your moat isn't the model, it's your data. Why most organizations aren't protecting it, why so few companies are actually AI ready today and why that gap is more alarming than it looks.

We also discuss a new playbook for observability tracing failures across data code systems and model output. How multi-agent LLMs are already reshaping data debugging and where this might lead next, as well as why most teams are still managing data like it's 2015. While trying to ship generative AI [00:02:00] products, Barb brings battle tested stories, practical frameworks, and a clear-eyed vision of where the data and AI stack is headed next.

If you enjoy these conversations, please leave us a review. Give us five stars, subscribe to the newsletter and share it with your friends and colleagues. Links are in the show notes. But before we jump in, let's just check in with Duncan from Delphina who makes high signal possible. Duncan, I'd love to hear just a bit about what's up at Delphina and why you produce high signal.

At Delphina, we're building AI agents for data science, and by nature of our work we get to meet with lots of leaders in the field. And so with the podcast we're trying to share the high signal. So Cool. And speaking of leaders in the field, and speaking of data science and agents, in this episode with Bar, we had a wide ranging conversation, particularly about the importance of data and observability and monitoring in our practice, which you can forget about sometimes.

So I wonder what resonated with you, particularly from the clip we just showed. Barr famously coined the term data [00:03:00] downtime, and I think that if you have ever shipped a model or owned a dashboard or been ambushed by a, why is this number wrong Slack message? You know how real that is? When I was a data science director at Uber, I was paranoid to the point of nausea about bad data.

I'd walk into these 100 person tech reviews with our CEO Dara. Knowing that getting one metric wrong in that deck could incinerate trust and send the large organizations in the wrong direction. And frankly, as a data leader, it could even cost you your job. And Barr tells the story of a single schema change that costs a company a hundred million dollars.

And that's not even the wildest example. This episode is way more than war stories. It's a crash course in what it takes to earn and keep that trust in your data. I think that if you've ever felt a knot in your stomach about whether data is right, this episode really hits home without a doubt, and I really appreciate you bringing your ex expertise and experience from Uber into the conversation as well.

So [00:04:00] without further ado, let's jump into the conversation with Bar I. Hey there, bar and welcome to the show. Hey, great. Thanks for having me. Such a pleasure. So co-founder and CEO of Monte Carlo. That's right. Having fun, doing it. Having fun, doing data quality stuff. Observability stuff. I am, as, as you know, I have strong feelings about I.

The model centric world that we live in. Not enough public conversations, particularly on, you know, LinkedIn or as I like to call it, TikTok for adults about data quality and observability as like people on the ground know that data is primary, right? AI adoption's moving fast. Many data leaders feel their data isn't AI ready.

From all the people and companies you speak with and work with, what are the biggest challenges that companies face in making their data. Actually useful for AI applications. Yeah, great question. And look, I think maybe just to kind of take a step back and sort of like look at the evolution of this, you know, five, 10 years ago, the problem, the same problem has existed back then, except the stakes were much, much lower.

Right? So, you know, I think. Yeah, call it. 10 [00:05:00] years ago there were just a handful of people in an organization looking at data. You had a lot of time to make sure the data is accurate. Fast forward to today, most everyone in an organization uses data and, and AI as well, increasingly, and so in that world, it's really, really hard to let any data issue slip.

And so, you know, I'll sort of talk about the implications of what bad data looks like for the data team. For the business. And so starting with the data team, it's very common for data engineers, AI engineers, data analysts, to be sort of hit with complaints from folks downstream asking why the hell is the data wrong or what the hell is happening here?

Right? Um, and sometimes it can be internally focused, which is not a big deal. Maybe like for example, if we have an internal. User sort of, um, you know, asking why something is wrong or, you know, highlighting a, you know, an internal chat bot that gave a wrong, a wrong response. Maybe that's okay. Maybe you can get away with it.

When you think about the, you know, the implications for the business when it's externally external product, it's a way bigger deal. So. [00:06:00] Just a couple of examples that, you know, come to mind from, you know, recent. I think this was like early twenties, um, when Unity, each is a gaming company, had one schema change that cost them a hundred million dollars.

Stock dropped by 37% the day after it was a big deal. One schema joint, you know, that's like, I think This's like four or five years ago. More recent, a few years ago, Citibank got hit with $400 million for. Fine for the data quality practices. So now you're talking about pretty significant revenue implications and regulatory fines and risks that organizations are, you know, really held to, you know, taking that forward to today's world.

Now just sharing a couple of more recent examples. My favorite example is actually, uh, this kind of went viral on X. Someone Googled, what should I do if, uh, cheese is slipping off my pizza? And Google responded, well, you should just use organic super glue and you know, okay, ha ha, funny, right? Like, we all find it ridiculous.

[00:07:00] How many companies in the world can get away with that? Like, sure, maybe Google can get away with it and you'll still be Googling tomorrow. But most enterprises can't get away with that level of, of mistakes and. You know, just as an example, I think there's, um, you know, a car company, Chevy Tahoe, that a user basically convinced their, their agent to sell them a car for $1.

Yeah. Like I commend the user was able to do that. Right? Like awesome use of AI if you will. What, what are the implications of that on that organization's reputation and, and brand and revenue and so, you know, I think the types of issues I. That organizations have as a result of bad data, bad data in ai start from revenue, brand, reputation, and all the way to, you know, data engineers just like waking up in the middle of the night worried about receiving a, an alert or, or hearing from a downstream consumer about something that's wrong.

That's really sort of at a high level, if you will. Yeah, I appreciate that and I, I really love the, the examples that you grounded, grounded it in. I do. [00:08:00] I wanna ground it even, even further and, uh, I'll link to this in the show notes, but at Monte Carlo, you have a wonderful recent survey that showed a hundred percent of data leaders, 100% of data leaders feel pressure to build with ai.

Right? But only around 60% think their data is AI ready. So what, why does this gap exist and how can organizations go about bridging it? Yeah, it's such a, such a great question and I love this survey because, you know, when we did the survey initially I was like, okay, probably like. You know, maybe 80% of people would, would say that they're sort of, you know, doing something with ai.

But literally, like there wasn't a single person on the survey who was willing to admit that they're not doing something with ai. Right. Like, that's how big the pressure is. Another anecdote from the survey, I think also 90% of respondents mentioned they do have AI in production, so, so I think it's, you know.

There's a certain, obviously there's hype, but we're also moving to a place where people actually have AI products in production and they are looking to tie that to, to real revenue and real ROI. However, I think you know, the point [00:09:00] that you pointed out and where the friction is, is that only one out of three actually think their data is ready for ai, and that's really troubling.

Why is that troubling? This is something that you and I talked, you know, before the podcast. I think historically in the last couple of years, there's a lot of discussion about the latest and greatest model, right? You can discuss whether it's like open, open source or closed source and you know, open ai, andro, meta, a lot of different contestants out there.

Obviously deep Seek Monday was a big deal re right. For folks who don't know, you know, deep seek that have showed the ability to create a much better model at a much lower cost. Right. And so there's a lot of drama around that. You know, I think the hard truth or the hard reality is that. Today anyone has access to the latest and greatest model.

And so within a couple of seconds, maybe a couple of minutes, we can all get an API key and, and off we go. And so in, in that world, what is our moat, right? As an organizations, what is our, what is our competitive advantage? If everyone has access to the latest and greatest model? And what we are seeing and hearing time and again from enterprise is that [00:10:00] your moat is your data.

And why is that the case? Actually, the Chief Data officer of JP Morgan Chase was interviewed a couple weeks ago to the Wall Street Journal, and she had a really nice quote which says, you know, the model, and I'm, I'm paraphrasing her a little bit, but she said, the model is not our secret sauce. Our secret sauce is the information that we use.

What does she mean by that? Right? On the one hand, internally. We can use information that we have to create to basically automate the processes that we have and externally, you know, if I'm building a financial advisor or some concierge, if you will, my ability to make that product much better for say, Hugo would be based on the information that I already know about you or other data that I have about your preferences.

And so my ability to make this product way more powerful is based on that data that I would have about you. And so. You know, AI ready data is a really big deal for organizations today, and I think the gap there is that in my opinion, we are thinking about data and ai. [00:11:00] Uh, management in the wrong way. And what do I mean by that?

I think the data and AI estate has changed significantly in the last five to 10 years. Like you can't even compare the way in which we manage, transform process data and AI compared to where we were five, 10 years ago. However, absurdly, we still manage the data and AI estate in the same way. So I'll give you an example.

I was just speaking to an AI leader in a Fortune 500 company. You know, a company you've all heard of. I was basically talking to them about how are they evaluating the reliability and the quality of their AI products. And, you know, he was basically telling me, oh, we basically have like a, like a human being, a user kind of go through and sift through 200 responses and sort of evaluate them like 200 responses a day.

And that's a grueling process for a product that's supposed to be highly automated and highly reliable. Right. And it sort of reminds me of back in the day, like five, seven years ago, and you know, I would ask chief data officers, how are you making sure your BI [00:12:00] solutions are, are, uh, reliable? How are you making sure your reports are accurate?

And they would always tell me, I have eight pairs of eyes. Look at reports every morning to make sure the data is accurate. So in a sense, like we're still using the same methods. For, for management of quality of our data and ai. And, and we haven't adapted. And I think that gap that you called out is a, a direct, direct result of that.

We need to, need to evolve and we need to improve how we do our, our data and AI state management in order to catch up with, you know, with all this innovation that's happening. And what do you think, like the top couple of things people can do practically in their organizations to improve their data estate management?

Yeah, look, I wish this was magic wand and like easy to do, uh, and, and just, you know, completely done. I think there are a couple things that are obvious and maybe a couple things that are not obvious. I think the obvious thing is make it a priority, right? Like this should be something that's. A top level company priority and many, many [00:13:00] organizations are doing it and saying, what percentage of our data is AI ready?

Can we create a data trust score? How can we measure this? What's the percentage of data incidents that we have? What are the SLAs that we have? Just in the same way that you have like five nines of availability for your application. You should have some nines for your data, right? For your data and AI down, you know, downtime and quality.

And so I think starting by making this a priority, and we see this more and more a Fortune 500 bank that you, I'm sure you've heard of and I hear that there, CEO on a regular basis looks at reports and sends angry emails asking why the data is wrong. Mean after too many of those reports, he just decided to make it a company-wide initiative to make sure that the data that all of all of his, um, executives and staff are looking at should be accurate.

So, Hmm. I think it takes, again, this is obvious, but it takes a decision to say this matters. And if we wanna be best in class in ai, it starts with best in class data. Like your AI is only as good as your data. So that's maybe one thing that I'll call out as as obvious, but needs to be done. Now let me call out something that I [00:14:00] think is less obvious, but I feel strongly about is what I've learned from doing this thousands of times now with enterprises.

I think we have to, historically we've really thought about data quality as a point in time thing and really has to do with like detection of incidents. And so, you know, people would typically write. SQL queries or have some other manual ways, some checks, whatnot, to make sure that the data is accurate.

Right. And then, you know, the data team is inundated with alerts about things wrong and, and that's about it. But like, we have data quality, so we're good. Right. I think that doesn't cut it. I think detection, doing detection is very important. And you can actually do detection of data issues better with AI today.

Happy to talk about that separately, but I think you have to have world class ML and AI based detection. But I actually think the non-obvious thing to think about is how do you do really strong troubleshooting and triaging of issues? And what we've found is that data, data and AI issues as a whole can be.

Trace back to four core reasons for why data and AI [00:15:00] goes wrong. The first is you can have pure bad data. Like literally you could be ingesting bad data. Like let's say I'm, I'm a, you know, an airline and I'm ingesting data about where's your, um, where's your luggage, or what's your next flight, et cetera.

If that data that's passed on to me is like half null. Um, in terms of like just receiving null values that can actually cause a problem. And, you know, calculating the, the arrival time of passengers and making sure that, you know, transfers happen on time, et cetera. So, you know, the first, the first thing that can go wrong is your data can grow up.

The second thing that can happen is you can have a code change that actually influences the reliability of your data and AI application. Now what do I mean by code or code change? We write or data teams write code to transform the data. To move it, you know, maybe to like merge a couple data sets, like, you know, the list of passengers and where their luggage is, for example.

Or it might be code that goes into actually building an agent. Um, in any of those instances, a code change can [00:16:00] actually completely, um, you know, undermine the reliability of your data or AI app. That's the second. The third, um, uh, core issue or root cause could be a system that goes down. Now when I say a system, it could be.

A system issue be. When I, when I say system, I basically mean like all the infrastructure and sort of all the infrastructure that runs all these jobs. You know, a practical example could be an airflow job that fails an issue with a DBT job. It could even be a transformation in Snowflake that you're doing.

The thing is. Systems are prone to fail. Like it doesn't matter. A hundred percent of systems fail at some point. Those can can lead to issues in, in your data and AI application. And the fourth thing that can happen, I think that's, you know where we're going as an industry, is you can have. The perfect prompt and the perfect contest and still your model response will be not fit for purpose.

And I think the Google example that I gave earlier with, you know, use organic super glue to glue your cheese back, use [00:17:00] back to your pizza is exactly the right. You know exactly an example of that, right? Or, or any sort of hallucination would be an example of that. And so even if everything is perfect, you still need a way to make sure that the output and the responses of.

The output of your, your model are still fit, fit for purpose. And so I think, you know, the non-obvious thing here is if you wanna make sure that your, your data is, is AI ready and that your data and AI state is properly managed, you actually need visibility into your data, your systems, your code, and.

Model out. You cannot look at one. And I think the common mistake is to look at one versus the other. You can look at one at silo. It's actually the combination at all of those. It gives you really, really powerful visibility. And the reality is that most data issues is actually a combination of one or more of these at the same time.

And that's typically when you know the data, data and AI team is sort of like. There's a fire drill and you have to figure out what's happening and there's a war room. You're like, what the hell is happening? And then it's very hard to kind of unwind these things. But if you have proper visibility to, [00:18:00] to all four of these, you can start actually moving from, from being in a very reactive to place, to a more proactive place.

I love it so much. And one thing you've spoken to there, I think, and we'll get to this later in the conversation, is the extreme importance of observability. 'cause I think too many people think of data at the start of the pipeline and observability at at the end, and not. The holistic integration. Yeah.

And in the end, with data powered products and data powered software, we are essentially dealing with far more complex systems than traditional software, which traditional software kind of is the sum of its parts, but data powered systems aren't, in a lot of ways, we're introducing essentially, the entropy of the real world into these systems.

Right. And need to monitor them accordingly. A few other things you spoke to in there, which I. I find so interesting, and I hope we get there as well, is not only using data to power AI products, but how we can use machine learning and AI to inform us about data. Mm-hmm. Will be essential. And then you made me kind of remember it was a post, um, a hacker noon post by Monica ti maybe, I don't know, like eight years ago or something called the AI Hierarchy of needs.

Right. Which had, she's like, everyone wants [00:19:00] ai, but let's, and this is before generative ai. Mm-hmm. But let's look at what we need first. We need to log things, right? Mm-hmm. And then do basic ETL, and then we can, only then can we start counting. And after that, if we have an experimentation framework, perhaps we can do experimentation.

Mm-hmm. And then we can introduce machine learning. So it really is back, back to basics. And it's so important. And I think a point you made wonderfully there is it's not necessarily being AI first, it's being data first and then figuring out how. Your data powers a variety of things. Ai, ml, statistical inference, and otherwise, of course, we don't then get like the cosplay and LARPing of, you know, where a je I native company and all of that, but we get to deliver real value.

So this really brings me to the, the next question. I think I'd like data quality and observability are often treated as, as. Separate concerns. So how do they relate and where should teams focus their effort? Great question, and I'll, I'll talk a lot about the things that, that you said as well. I think you talked about various teams and how this compares to software [00:20:00] engineering.

Mm-hmm. And software applications. I think the interesting thing about data products is that they are. Very many different and disparate teams are involved in building them. And so it's not uncommon to have an engineering team, a data engineering, an ML engineering, a data analyst, all contributing to, um, a data product that you have a user consume at the end.

And so actually each of these. Teams have, um, are almost basically totally different personas. They're motivated by different things. They have very different skill sets ranging from coding a hundred percent of the time to point and click ui. They have very different, um, understanding of data. So some data teams, some teams upstream don't, you know, honestly can't speak to, they might.

Be responsible for storing the data, but they can't speak to how the data is used and why, and vice versa. You might have teams that are way closer downstream to the end user, but don't know what is the source of the data or where it's coming from. And, and, and so you're actually, you know, it [00:21:00] reminds me of that like meme with like the superman, like three different Supermans pointing at each other.

Like that's what that looks like when you build a data product. And so there's all these like people pointing at each other. Why am I sharing all of this is because. I think there's a lot that data organizations can learn from software engineering. And so I think when I think about the difference between data quality and data observability, in my mind, data observability is taking data quality, which has been around for a long time and taking that to the next level.

You know, I think I mentioned this before, like data quality is really, has sort of traditional tactics like testing and monitoring, which is really important. Like it's actually really critical to set up those, you know, like unit tests in the same way that you have unit tests. It's critical, and I don't think that's ever going away.

I think data observability helps you take that to the next level and start asking questions like, what else do I know about my estate that can help me solve this problem? So, okay, I understand that there's a problem here, but is you, you know, there's a very high percentage of values. You know, in particular field, I typically have only, you know, let's say it's a credit card field [00:22:00] and only 1% of the time you know the value there is null.

Suddenly the percentage of null values are way higher. It's like 40%. Let me start thinking about what's related to this. Was there a pull request at the same time that happened that I can ask you? You know, correlate to this. Was there a code chain somewhere? Can I actually use AI to compare? You know, what's something that, something that analysts do is like, they actually sift through SQL code, like literally hundreds of lines of SQL code to understand what the problem is.

Can I use the AI to start understanding like, where was there a. A code change that could have contributed to this. And, you know, sort of taking, taking that to the next level really draws on experience from software engineering that has been using software, you know, observability for, for the last couple of decades.

I do think the next evolution is data and AI observability and I think there it's a lot more visionary because people are not building so much with AI yet. When they are building, excuse me, with ai, the reliability and the output and the quality of of those products matter a lot. Does that answer your question?

It definitely does, and I totally agree that, you know, from software engineering [00:23:00] import several, several things. I do think there are slight shifts in terminology, which I find really fascinating. So I work with a bunch of people helping build, um, AI powered apps and, and, and teach it a lot. And even when I get software engineers that, that I'm working with, and I use the term test, if I have a test that doesn't pass a hundred percent of the time, software engineers freak.

And I'm like, no, no, no, no. Sorry if the test, if the LLM test passes a hundred percent of the time, something's clearly wrong. So maybe we actually need, you know, some, some different terminology there just to get kind of level set everyone. Right? Yeah, I love that example. That's right. You know, when people tell me our data system, that's perfect.

We don't have any data, any data issues. I'm like, Uhhuh. Yeah. Right, right. I think as, as people work with data, we know data is wrong. There's no instance where that isn't the case. And and similarly like yeah, LMS, hallucinate, there's so many instances. Like that's just the reality. And I think it's not about, I think the mistake here is to say, okay, we can't trust this.

We're not gonna do anything. Yeah. And I don't think that's the case. You could argue that, you know, no data is [00:24:00] better than bad data. Um, but I think when it comes to. Generative ai, we talked about this earlier. I think there's a lot of pressure to actually build and deliver AI products. And so, you know, I don't think we should stop from, from doing that, but I do think we should recognize that when we are building these things, we are also responsible for the reliability and the quality of them.

And so we must invest in making sure that we're doing what we can on our part to, to make sure of that. Yeah. And so you mentioned we're still in the nascent stages, which I totally agree with, of data and AI together and observability there. I'd like your visionary take on, on what you'd like to see. In the next one, two, or five years?

Yeah, great question. So first of all, you know, and, and I'm very bullish on, on AI long term, on Gen I, you know, I do think people will reach a GI, I'm excited for it. I do think that there's a lot of sort of significant benefits that I think I. Our generation can have, and, and I hope our kids too, right? Like I do believe this can make, make us significantly better.

And, and there's, you know, a variety of examples of where I think we are, we're starting to see [00:25:00] that. I do think the time horizon will be longer than we currently anticipate. And, you know, I think this reminds me of, of the internet, right? We talked about this, like it changed our lives, but it took a little bit longer than we thought.

You know, being, we had a big bust as well. We did. That's right. I was 18 at the time, but I, you know, and. Actually, a bunch of people I went to school with decided not to go into software because of it. One guy spoke with, um, some doctor friends and lawyer friends of his parents, and they said, no, definitely go into it.

He was one of the first employees of the first Google Maps office in downtown Sydney and had a ball for eight years here. Something like that. There you go. By the way, I think you raise an important point. As I think about the future, one of the questions that I'm asking myself is, you know, if I was, you know, going to college now, what should I study?

Like, what would you recommend studying right now? I, I'd hate to say. Don't go into ai because, because I do think, you know, to, to your, to your story, to your example. I think this is real and I think there will be material developments just, it might not happen tomorrow and maybe that's actually okay, but it definitely will, will disrupt and improve the way that [00:26:00] that we work.

You know, I think, but Joe, out of everyone we know who excels in machine learning, who studied it formally, but literally pretty much no one I know most of us. 12 years ago took a Coursera course by someone you may know the name of Andrew Ing. Right. And then of course, yeah. Got involved Daphne Kohler that way.

Right, exactly. Right. Yeah. That's how, you know, a lot of this began for a lot of us. Yeah, that's right. I mean, computer vision was a big deal back then, right? Yeah. Like there's a lot of that and yeah, so I, I think that is a question. I don't, I don't have the answer to that. Um, but I am excited about, oh, go for it.

Yeah. Yeah. I have one, an answer, provisional answer please with like error bars and just thinking aloud. Yeah. If you're trying to solve problems, try to solve a problem and build something, and what you learn along the way. Start using ai, talk with people about ai, that, that type of stuff. And I think being a subject matter expert in something and knowing how to leverage ai, then, I mean, that's the joke.

You won't, I, who knows whether this is true or not. I can't predict the future, although I've been working at ML and AI for way too long. You're not gonna be replaced by ai, you'll be replaced by someone who does what you do, who's using [00:27:00] ai. Right, exactly. I think, I think I totally agree with you. And look, I think there's some things that are fundamentally always true regardless.

The first that I is like, do something that you love. So like, figure out what, what about this thing excites you? And go for that. Like your chance of success are way higher. You actually enjoy what you're doing. And then I think the second thing is like, work with great people who will force you to be better and force you to, you know, learn whatever it is that you need.

So I think those things are like truisms, at least for me, they, they'll always say the same, you know, more practically. I'll just say one other thing that I'm really excited about is I think there are some early applications of LLMs and AI for data quality in particular, which I'm really excited about.

Maybe I'll give a couple examples here. I'd love that. Because I think those are like really cool applications. There's sort of just from the Monte Carlo kind of space, two things that we're working on, I'm really excited about. The first is basically observability agents. So the first is a monitoring agent, and the second is a troubleshooting agent.

So quite simply a monitoring agent. The problem that it [00:28:00] seeks to solve is helping analysts who want to set up monitors for their data but don't know which monitors to set up. Mm-hmm. And so, um. You know, just to describe the work of the person who would do this, they, you know, need to know a lot about their data in order to set up monitoring.

So they would need to know that the data needs to arrive every morning at 5:00 AM It needs, needs to, to sync, or they need to know that the values of a particular field need to be between X and Y. But if you could actually. You know, use semantic data and use, be able to profile the data. You can make suggestions for what that should be.

So I'll just give you an example from baseball. You can define what the values should be for a fastball, right? Like a fastball should be, I don't know, between. 80 and 120 miles per hour. I don't know, but like you can, you can infer what that might be and, and make a recommendation for that. So you can actually use AI to make that process a lot better and a lot easier.

That's one sort of, you know, application on, on the monitoring side. I think the second, the second part of it is on the [00:29:00] troubleshooting side. So, you know, I, I mentioned earlier. When a data analyst or a data person basically, you know, has become aware of an issue, either through an alert or someone, someone downstream told them, they basically, their work now is to uncover what went wrong.

And so what they do is they basically typically will come up with sort of a list of hypothesis for what might wrong. You know, maybe the source, the source, the data source upstream stop. Giving me data, maybe I mentioned before, like maybe there's a pull request that, that, you know, cause like here's like 10 things that I wanna test and then they'll start testing them one by one and go through it.

That process might take weeks or even years. This is where LLMs are so helpful. And so basically, you know, what we've done is, um, we created an agent that basically spawns agents for each of these hypothesis. And the job of each agent is to, to basically, um. Inquire based on this hypothesis and, and the magical thing here is that it basically does it recursively for every single table upstream until it [00:30:00] finds a root cause.

And so you basically have, you know, up to a hundred LLMs or agents running at the same time looking at all these different hypothesis recursively in under one minute. And that is magical in my mind because that's basically like condensing the work of like a hundred analysts over like many minutes of weeks to under one minute.

And that's the stuff that like I get really stoked about because if we can like, that is really, really grueling, pushing the rock uphill to like find a problem and you can, like, it's magical how it like, sort of very nicely, you know, we actually use this super LLM or like a smarter LLM sort of more of a reasoning model to actually summarize all of the findings and present.

But here's the TLDR, like, here's what we think the root cause is for, for this data issue. Here are the related code changes. Here are the data trends that happen at the same time. So basically like present a summary of the issue to you. And again, that can take such a long time. When I see this I, when I see that I'm really optimistic, you know, I'm like, hell yeah, I can get behind a fe, behind a future where, you know, a lot of this work that's very [00:31:00] manual can be done with like the click of a button.

Absolutely. I think there's a very bright future there in, in my experiments with, with su such things. Not only is hallucination a a, a serious issue, so is like the, the forgetting that happens with, with LLM. So one example is I can give an LLMA dataset and give me a one, um, the mean, the variance. The median.

Let's say I ask for five summary statistics. It'll come back and give three. Legit. It will like know one turn conversation, zero shot. It will give me three of them. Right? So one way to solve that, of course is to parallelize these things. Return and structured, output. There are ways to deal with this type of thing.

I do wonder how much we want to have human in the loops in, in, in, in this process as opposed to merely to returning a report at this point. Similarly, something I've found, um, and a lot of people I work with have found is LMS can. They can be horrible at what I just described, but they can be wonderful at uncovering insights.

So if you just say to an LLM, here's a dataset, tell me some interesting things about it, it will actually say things that you wouldn't have even thought to ask about, which I think is phenomenal as well. So you've got these weird, Simon [00:32:00] Willison talks about LLMs as being weird interns. So you have these like weird analysts that, you know, maybe won't follow all the instructions, but if you like, will come up with things that you wouldn't have even thought to ask about yourself.

So I'm wondering. In this like multiagent spawning multiagent approach, you've seen these types of things as well. Yeah, great question. I'm trying to think of a specific example. You know, I, I will say that there are definitely hallucinations. It is not perfect a hundred percent of the time. So we're still, you know, we're still iterating with our internal team and with our users.

I personally, you know, I think in the current iteration, I think, okay, maybe I'll describe this in three different levels. The first level, which I think a lot of people want, but I'm not sure we're there yet, which is where you actually have like agents running these for you. Fixing these for you. And I think that's like some holy grail that people are hoping for, and I don't think we, you want agents touching your data right now or actually making changes there.

I think the second level of like agents providing you information that's like human assisted or human loop, call it whatever you want, we're definitely there. And I think that's like very, [00:33:00] very powerful. Maybe the stage below that is like no LLM involvement, which like, you know, human, human alone. I think that's a miss.

If you are today, even though if you're not using some of that, does that help? It does, and there is, I suppose there's another I, I love those different levels. There is another other level I suppose, which is as anthropic coined the term augmented LLMs. Mm-hmm. Where it's not quite agentic, so it doesn't direct its own tool usage.

But one needs to specify when you write out your LLM workflows, how it's augmented at different points. I love that. And to your point, as as well, I wouldn't, I definitely wouldn't give an agent right. Access at the moment, because I wouldn't give a weird intern Right. Access as well. Right. But that's exactly right.

One, I'd give it read access and two, if I was in the loop, and this is, I mean, I've been doing some pretty weird vibe coding recently, but working with, you know. Cursor in agent mode with claw three points, sonnets 3.7 is, is is wild, right? But it will, it has Right? Access. I don't do it. They call it YOLO mode.

I don't do that. They literally, it's in the settings. YOLO mode. Right. But if it wants to make a [00:34:00] change, it will tell me what change it wants to make and I'll accept or reject it. So being in the loop. Like that by allowing it, right. Access can be extremely powerful, I think. Exactly. And I think, I think we're in early days of figuring out how to do that, but I think that's where we're headed.

And honestly like that gives me confidence that it's the right path. So I, I think we are seeing positive feedback with those types of interactions. Yeah, I couldn't, I couldn't agree more. And to your point, use it not using an LLLM, you're losing out there. That way solely using LLMs and not having human in the loops when we're not there.

When we're not there yet, and we're the in, in, in this middle, middle stage. So I suppose then to think about enterprises that are rolling out AI powered products, and by that I mean generative AI powered products, whether it's, you know, rag or infinite rag or agent workflows or whatever, you know, cosplay we're doing, doing today.

What, what are the biggest. Data reliability risks in your mind? Oh, great question. I mean, I would say, I'll start by saying again, sort of maybe the obvious. I, I think that the big risk for a lot of these projects is that it's, it's actually hard to [00:35:00] tie them back to ROI into revenue. Um, and I don't wanna understate how important that is because I think at the end of the day there's going to be a, um.

Come to Jesus moment when we are going to have to say like, what have you done for me lately? Um, and, and I think to a certain degree, data and AI teams today are, and by the way, I use sort of AI in generative AI interchangeably, but, um, I think those teams are, are far from the business today. Um, and I do think that, um, the ways in which organizations have gone about building AI these days is not having a centralized.

Um, strategy, but rather having a lot of small teams take things and run with them, which I think is the right approach for experimental innovation based. Oh. I think over time that will consolidate. But I think in those instances, the expectation for the ROI is lower. It's basically like, it's not uncommon for me to speak with, you know, fortune 500 pharma company that will be like, yeah, we have like about 270 projects in production.

We're not totally sure what's gonna like add value, [00:36:00] but it's like a portfolio based approach. Right? We're just gonna like. Throw a bunch of stuff at the wall and see what happens. I think that's a totally valid approach for, for where we are in the, in this innovation cycle. But I think that's a big risk, uh, if, if that makes sense.

Um, you know, I think the, you know, particular when it comes to reliability risk is sort of. I spoke about this earlier, but I think the data can break in a million different ways. You can trace it back to data systems code model, but at the end of the day, all of those risks will hurt your brand, your reputation, your revenue, and those are real things.

And maybe the one thing that we didn't talk about is the risk that it takes or the tool that it takes on your team. So it's not uncommon for me to speak with data and AI team and hear that they spend 30 to 80% of their time on. They are on reliability, fire drills. Um, and so when you have, again, 30 to 80% of the time spending on, you know, data and AI issues, like things going wrong, you're doing something wrong with how, or we are doing something wrong with how we're working, right?

Like the benchmark should be way, [00:37:00] way, way lower. And you know, I, I think flipping how we're thinking about that is really, really critical. Maybe I'll stop at that, that I feel like I. Touched on this a little bit earlier. Yeah. Would you wanna expand? Well, I I love that you mentioned, you know, having a portfolio of, of different Ai AI projects and seeing, seeing what sticks, um, and, and lands and that.

But there is a risk associated with that and I, I think we'd probably both agree, but I don't wanna put words in your mouth. This isn't just with ai. This has been with data science and ML as well for some time. And there, there has been a machine learning bubble in a, in a variety of ways as well where people have invested a huge amount.

Machine learning engineers aren't like, headcount isn't cheap, infrastructure isn't cheap. Companies have invested a whole bunch and perhaps not seen the returns that. They were promised by looking at Google and Meta and, you know, FANG companies, right? A hundred percent. I actually think the ML space is such an interesting case study, if you will.

Mm. I think there's, there's a lot of things that you can do better with ml, you know, like I'm thinking about CPG consumer product goods that we [00:38:00] work with that all have supply chain. Organizations. So think like delivery route optimization, pricing optimization. There's a lot of things. Ads obviously right for, for media ads is huge for, for media organizations.

And you know, in all of these instances, like ml. It can take you a pretty pretty far away. Yeah. I think there is emotion now. I hear a lot from organizations where they're basically retooling their ML projects with LLMs and they are able to see some improvements on that, which is pretty nice. But I think by and large, if you look at the ML space, I don't think it delivered on the hype that it set out.

You know, when it started, like five, seven years ago. Hmm. Um, esp, you know, definitely not lived up to expectations. Let me say something, let, let me hot take here. I think there's, by definition, there's no way that generative AI delivers on its expectations given where we're at right now. Like, you know, I think the bubble is so massive.

However, even if, even if it's just [00:39:00] 10%. Of what we think it is, that's still significant to change our lives, I think. Yeah. And that's maybe the, the difference that I would say compared to ml. I, I love that you say there's no way it can deliver. I, I can be a troll with friends occasionally, um, and or puckish.

Let, let's say, and I love The Beatles, but I always tell my friends who love The Beatles even more than me, that the Beatles are clearly overrated. 'cause they are like anyone. Who's that? Who's the Beatles? I love The Beatles is clearly overrated. I love The Beatles. But anyone who has that much mind share is clearly overrated along some dimension.

With all this AI going around, we'd think we'd have a lot more stuff automated and so many companies just still rely on manual processes to do important things like validate critical reports. I'm just wondering why automated data reliability in, in this space, for example, and data reliability, lily checking hasn't become the default yet.

Why is so much still manual? It's a great question, and I think this goes back to. I think it's just human nature to do things in a manual shitty way. Like it's kind of like our, [00:40:00] you know, default. I, I always think about this, like, it's sort of our default to be like slow and unfocused. Mm-hmm. Which, which by the way, like at at Monte Carlo, we have this sort of, um, uh, I dunno if called mantra, but like something that we try to live by is like speed and focus.

Like, you know, choose very few things and do them really well in a focused way. I find that that's helpful to remind me every day because I think by default we are not fast or focused as just human beings. And so I think in, in a certain sense, like it is, is our default to, to do things in a manual way.

And I think it does make sense, like, you know, we talked about. Human in the loop and why you wouldn't wanna give, you know, right access, but you do wanna read access. So I do believe that this is, you know, evolution and that as human beings we're process things in an incremental kind of way. It's a lot easier for us to make incremental steps.

Way a I'll just add to that, please. I haven't reflected on, on, on this quite, but when you said, you know, we wanna do, we do things slowly and, and, um, and not to hate on us, but Yeah. No, no not, but you know, we're reality check, right? I do think we also kind of, we need a reason to [00:41:00] change the way we're, do, we're we're, we're ex we have a lot of energy, but a lot of life is energetically consuming.

So we are beings of inertia in, in some way. So if something isn't the best way to do it, but we know how to do it. I'm not gonna necessarily pick up a new tool unless I, I have to. Right. Exactly. That's exactly right. And I think oftentimes it's very comfortable to do things the way that you've done them before.

Yeah. And again, I'll remind you a hot second ago, like just seven, 10 years ago, you actually could really, like, having the level of diligence on data was not acceptable at all. So like, you know, having an issue or a mistake in your data happened all the time. I remember, you know, at the time I was speaking with.

C-level folks and you know, their biggest issue was they were reporting the wrong numbers to Wall Street and finding out about that like 24 hours later. Right. And like that was sort of like the biggest thing that can happen. 'cause mostly people used to your point, like used data to sort of count customers or count whatever, count something, you know, product users or, you know, monthly active users.

And getting those numbers was hard. And so just a hot second ago, not like actually five, [00:42:00] seven years is not that much in the span of a lifetime. So just a hot second ago. We were using data in very limited ways. I think it's easy to forget how far, how far along we come. I sort of like to say this, like, you know, speaking of the Beatles, I, I feel like the data and AI space is a little bit like Tado Swift.

We kind of feel like we reinvent ourselves every couple of years. And actually the space moves really, really quickly when you think about this, like, you know, snowflake and Databricks, which are like huge, like behemoths today. They weren't that big five to seven years ago. And so I think most enterprises that I speak with, their biggest problem is still how do we get on the cloud?

And so in that world, thinking about like fully automating, you know, your, your data quality or data reliability is a lot harder. I do believe. We should be moving to the cloud ASAP like yesterday. So like all these enterprises that like need to do that, they have to catch up. But it's just the path there.

The reality is that the path there is longer, and to your point, it requires pretty significant change agents in the [00:43:00] organization to say, Hey, we're not gonna do things the same way we've done before. And I think the reality is enterprises can't afford to not do that today. And I think it was called digital transformation a few years ago.

Mm-hmm. You know, that was very hot for a couple years and now everyone needs to use ai but like you need someone to push that in order to make that happen. Totally. And look, there are lots of incredible cloud tools out there and I won't name any names, but it isn't as though like a lot of in you can do amazing things with them.

Isn't as though they're always easy to use as well. Right. No comment. I'm kidding. Yeah, yeah, yeah. No, you're, you're totally right. I mean, look, I think that's, it's a combination of like, you know, cost and skillset, and I think, I do think the rise of cloud solutions has allowed because of the standardization and things in the access.

That has made it a lot easier for companies like Monte Crawl to create standard solutions. Mm-hmm. That makes sense. Like it building a company that's largely on-prem is, is very, I think it, it sort of degrades the, um, experience for the customer and it's a lot. A lot harder to [00:44:00] maintain and to scale. And so, you know, companies that are sort of cloud native have that advantage.

And so I think we are a company that's making a bet on the cloud. Mm. And we've been able to do that thanks to Cloud advancements would've been hard to do, to do otherwise. And so, no, I think by and large, the advantages or the benefits outweigh the issues that it has because it does allow. You know, to create a much, much better customer experience.

And that at the end of the day is I think, really, really what matters. So, yeah, you're right. It's not, it's not the easiest, but I don't think it's like harder than on-prem or whatever we had before. Yeah. Yeah. I'm, I, I was referring to the three big providers and the fact that their products seem to, I think that different product teams, it's whatever, Godwin's Law, which, whichever one that is, you know, the product reflects the organi, the way the organization is, is structured.

Right. And you, I'll have a note. A notebook opened and literally not being able to connect it to another part of whatever that offering is. Like, it can take me 10 hours to figure that out. I'd love an agent to do that for me. That's right. Actually, that's a good idea. [00:45:00] You should suggest that. And also some of the cloud providers introspecting into, into the costs and, and charges is, is not always trivial and those can absolutely pretty quickly.

Right. Absolutely. I'm so excited to hear from you a bit more about what observability for LLMs and AI agents and augmented LLMs can, can look like. And I, I think my bias is, as I mentioned, I help people build a lot of LLM powered, powered software, and they're like, Hey, should I use this framework, or I wanna do multi-tone conversations, or should I use this multigenic architecture?

I'm like, what? What's firstly, what are you trying to do? And they're like, something cool. And I'm like, okay, have you looked, have you looked at your traces? And they're like, oh. What do you mean? I'm like, let's actually look at the data, perhaps even label some of the failure modes in a spreadsheet. Do a pivot table to see what the biggest failure modes are, right?

And they're like, oh, but I wanna use crew ai. And I'm like, let's just step, step back a bit. So, and the other thing I suppose is. To, to one of your points earlier with generative AI powered applications, now observability comes a lot tougher because of all the unstructured data [00:46:00] these systems are producing as well.

So how do we even think about logging these things and observing and all the data we're creating? So, I know that was a whirlwind of, of questions, but I hope something in there hit. No, that's great and, and I think that's something that's top of mind for lots of folks I think. You know, one, one sort of approach here is that AI observability and AI quality is, is a lot different than data observability.

I actually, you know, I dunno if this is a hot take, but I will say that I think the data and AI stack are emerging. I don't think there's gonna be two different stacks. I don't think there's gonna be some secret super player that's gonna come. I think the incumbents and data are gonna be the incumbents in ai, and I think in the same way that software engineers build in vary with various stacks.

Data engineers and software engineers will continue to build data products with various stacks. Like it will not be homogenous. I don't think there's gonna be one stack to rule them all or something like that. That's my hot take. I do think then when you think about observability for AI powered application, some principles hold true if you think about data and then expanding out through your data and AI estate.

So I really [00:47:00] think. The backbone of the AI estate is the data platform and the data foundation. And then on top of that, you have a variety of things. You have agents, you have, maybe you have a vector database, maybe you have prompts and context and structured and unstructured data. You have orchestration systems, you have agent systems, you have model responses.

Those I view as sort of an extension of the data state, if that makes sense. And I think, hmm. In order to do observability, you're gonna have to do it end-to-end. So in the same way that we think about end-to-end software observability, we also need to think about end-to-end data and AI observability. So you need to be able to, first of all, have visibility into that full estate.

So you have to have connections into each of, like to your full stack. You have to have a single view. Or a couple of views of like how those all, you know, connect and, and lineage becomes very important here. And then I think you have, you need to have view into where things are wrong and why. And I think you can use data and AI to do that faster.

I think one common mistake, one common issue is that people think that's because of a lot of legacy solutions. [00:48:00] They think that's a very manual process. It's no longer, um, a manual process. And so. You know, if you really think about data and ai, instead of thinking about as two separate technologies, but thinking of it as a single system with a single stack and a single view of observability, I think that's how you, you know, should, should really think about that.

And I do think that all of these things, whe when I take into consideration all types of data systems code or model outputs, those remain to be the four core things to observe. And you have to observe them across every single. Um, a component in that stack, if you will, but those are the four ingredients that make that entire stack.

There is not a single solution that does all of that today in the future. I think that's, you know, my vision for what I think very strong observability should look like. And I think that's where, where the world is going and I think we're not gonna be able to operate. Our AI systems without having that, that kind of visibility.

Um, if that makes sense. That makes an absolute sense. And I, I appreciate this insight so much. So we're gonna have to wrap up in, in a minute, sadly, but, um, [00:49:00] I am wondering what the biggest blind spot in how organizers organizations think about AI and AI observability and data quality to today and what, what, what's coming that they're not prepared for and, and should be?

Okay. I don't know if this is a super hot take, but I do think we talked about this earlier. There is an obsession with the latest and greatest model. And, um, you know, GPUs and performance and how do we get there? Um, I actually think that's a red herring and I think the thing that folks should be focused about is, um.

Whether their data is AI ready and whether their organization is AI ready. Maybe this is a hot take, but I think 90% of organizations, maybe 99% of organizations are not there. And you know, I, I said this before, I'll say it again. AI is only as good as your data. And right now we have a lot of work to do to get there.

Yeah. And I, I love that you mentioned before, I mean, it, it is obvious as you stated, but it seems like. We do need to be telling people the obvious things at, at the moment is that this needs to be [00:50:00] baked into the DNA of every company and culture, and it has to be top down, but also in hiring bottom up.

Right. Totally. And to your point, I don't think that there are people who have like formally studied this stuff. Hmm. And so when you're thinking about how to infuse that DNA, I think, you know, there's that last mile of like, how do I actually make this applicable to, to my enterprise or to my organization?

It's very hard to teach. It's very hard to study that, but I think that last mile is the most important and the hardest part, like how do we make this technology actually applicable and useful for us? Yeah. That's a skillset that's very hard to find. Yeah, and to a point you made earlier as well. As I said, when I help people build the iPad applications, a lot of the time they're focused on our micro evaluations.

Like how. Like this LLM call, how often does it hallucinate how? And I'm like, chill. Like that's, we need to figure that out. But how do we tie it to the macro level business metrics? So having a tight loop between micro level technical metrics and macro level business metrics is not only important, will become increasingly important.

I. As we all need to deliver [00:51:00] actual value. Yeah, and don't get me wrong, like evals are very important. It's an important part of this, but we oftentimes have a tendency to not see the forest for the trees, right? So we need to ask ourselves like, you know, do we have the foundations in place? Do we understand where we're going?

What's our north star? Who's the customer that we're building for? What's the use case? You know, what level of hallucination can we tolerate? There are a lot of questions to answer. Absolutely. Well, thank you for such a great conversation. I could have chatted with you for hours and hope to in the future, but really appreciate you coming on the show and all, all your wonderful work and insights from Montecarlo Bar.

This was such a wonderful conversation. Thank you for the great questions. Thanks so much for listening to High Signal, brought to you by Delphina. I. If you enjoyed this episode, don't forget to sign up for our newsletter. Follow us on YouTube and share the podcast with your friends and colleagues like and subscribe on YouTube and give us five stars and a review on iTunes and Spotify.

This will help us bring you more of the conversations you love. All the links are in the show notes. We'll catch you next time.