The following is a rough transcript which has not been revised by High Signal or the guest. Please check with us before using any quotations from this transcript. Thank you.
===

Chris: [00:00:00] The demands on these data engineering pipelines are going up at a tremendous rate, and it's basically, it's becoming increasingly difficult to keep up with both the volume of data that is coming in as well as the needs that you're starting to see from downstream both business users, but really AI agents that, that are driving a lot of this increase.

That was Chris Child, 

Hugo: VP of product for data engineering at Snowflake on one of the big shifts happening in his field. I asked him what implications this has. 

Chris: And so data engineers, we can't, we can't just keep hiring more. We can't just keep scaling the number of people within a company. We have to get more effective, more efficient, and operate at a higher level.

And so that's driving, I think, a big push among data engineers to not just be engineers, really, to not just be writing code and worrying about infrastructure. They have to be thinking about business outcomes, the insights that people are trying to generate. They really have to be thinking about what the business needs and how the data [00:01:00] is going to support that, rather than just focusing on, Hey, as long as I get the data from here to here, we'll be fine.

Hugo: As a leader at Snowflake where he oversees data engineering, open, Lakehouse, open source, and developer products, Chris has a front row seat to how the age of AI is transforming data engineering from back office plumbing into a core strategic function. In this episode, we discussed the paradox of ai, how it boosts productivity while creating a net increase in complex work.

Chris explains why AI assistants still lack critical business context and how the best data engineers are evolving into product managers who ask why to solve real business problems. We also talk about the future of data engineering with AI agents and explore the long-term vision where. Data engineers manage a fleet of AI agents.

Chris also shares a great two-part framework for technical executives on how organizations can become AI ready. If you enjoy these conversations, please leave us a review. Give us five [00:02:00] stars, subscribe to the newsletter and share it with your friends. Links in the show notes. I'm Hugo Bound Anderson, welcome to High Signal.

Let's jump in. Hi there Chris, and welcome to the show. 

Chris: Hi, Hugo. Thank you for having me. I'm excited to be here. 

Hugo: Such a pleasure. First of all, congratulations on your very recent MIT report, redefining data engineering for the future of ai. 

Chris: Thank you. Yeah, we're really excited to get to work with MIT Technology review and to be able to put this together.

I think it had a lot of really interesting insights and really captured a lot of the change that's going on in the data engineering world right now. 

Hugo: Absolutely, and I think having a conversation around data engineering has always been. So important for people who build data ML and now what we call AI-powered products.

But as you point out, like in some conversations, data engineering has almost been relegated to the world of quote unquote plumbing and ETL and these types of things. And now we're seeing a huge amount of, of movement on actually what the role is [00:03:00] becoming and may become. So I'm wondering if we could start off by you telling us what you've discovered about how data engineers are becoming really central.

To whether AI initiatives and software in general succeeds or fails? 

Chris: Yeah, I'd be happy to. And I think this has been a big trend we've been observing for a while, and really I'd say the. Responses that we saw in the MIT report, were just reinforcing what we've been seeing happening anyway. So for a long time, data engineering really started as trying to start taking the types of transformations and work that people wanted to do and ETL processes.

Start treating them more like actual software work. Let's actually put the code into diversion control. Let's run some testing on it. Let's have a deployment process where we can actually see what's going on and happening. And that's actually been a huge improvement I think, in the lives of data engineers and the resilience of the systems that that we've all been building over the last 10 plus years.

But what's happening now is the. The demands on these data engineering pipelines are going up at a [00:04:00] tremendous rate and it's basically, it's becoming increasingly difficult to keep up with both the volume of data that is coming in as well as the needs that you're starting to see from downstream, both business users, but really AI agents that, that are driving a lot of this increase.

And so data engineers, we can't. We can't just keep hiring more, we can't just keep scaling the number of people within a company. We have to get more effective, more efficient, and operate at a higher level. And so that's driving, I think, a big push among data engineers to not just be engineers, really, to not just be writing code and worrying about infrastructure.

They have to be thinking about business outcomes, the insights that people are trying to generate. They really have to be thinking about what the business needs and how the data is going to support that, rather than just focusing on, Hey, as long as I get the data from here to here, we'll be fine. 

Hugo: And there are actually, there are many parts, but two parts that it's easy to convolve in some ways.

One is data engineers now being required [00:05:00] to build pipelines and architect solutions for. Incredibly sophisticated multimodal AI software rights, and this involves a lot more unstructured data than we're necessarily accustomed to. But then there's the other side of data engineers using AI to help them build software AI assisted.

So I mean, this is a choose your own adventure. So if we tease apart those, maybe you can pick one and we can riff on it. 

Chris: Yeah, so I think they both, there's a bit of a push and pull on both, actually. I'd say where on the one hand, yes, the increase. Ai. AI is both helping make data engineers more efficient, but it's also creating more work for them to do.

And I would actually argue, and I think it came through a little bit in the survey, the net of that is actually still netting out to more work for data engineers to do. So the. Efficiency gains that they're getting from using ai. From ai, helping you write code from ai, helping you manage the pipelines that you're building, that is creating efficiency gains.

Absolutely. But also AI is unlocking, as you mentioned so much more. [00:06:00] Data that you can actually use. We, I was talking with a customer recently where they had a question they wanted to answer about what percent of their customers, or what percent of their revenue came from customers with a certain term in their contract.

And in the past, that would've been a really difficult question to answer because a lawyer would've had to go read all the contracts for their thousands of customers. And so they just never actually would do that. With ai, they were actually able to set that up. They had the data available. They could go run a large language model that could do a rag based search across all of these documents and then stitch that together with, with the revenue numbers that they had in their in, in Snowflake in this case.

And that is creating a tremendous amount of work for data engineers to be able to now catalog and process and set up governance on all of that type of unstructured data. You're also seeing as companies are starting to deploy more. AI agents and AI use cases that want to consume that data. That's also leading to a massive increase in just the number.

[00:07:00] Of pipelines that need to be built and the amount of data that needs to flow. So I think that's the first half of your question is that it's, those are both growing, I think exponentially. On the other side, there are a huge amount of productivity gains that people are getting. Although I'd argue that to date, a lot of the AI coding assistance and other things, we're still in the early days of those being great at data engineering.

They're better trained and better built on. Software. And I think part of that is because there's a tremendous amount of open source software that these things can be trained on. There's actually far fewer open source data pipelines and data engineering code. A lot of that tends to live with inside companies.

And so I think we're still in the early innings of AI actually making people more effective. And I'll tie this all back to the to, to the actual study, but we found that I think the number 77% of people that we surveyed say that they're seeing the workloads of their data engineers grow. Even though they're getting productivity gains from ai.

Fantastic. And something 

Hugo: that I think is implicit in, in, in part of this [00:08:00] conversation is data engineers are now increasingly able to be responsible for architecting solutions, as you said, explicitly being closer to business problems as opposed to solving 

Chris: ETL pipelines. Yeah. And that's that I think that shift I find actually very interesting.

One of the, I I. In, in, in a previous startup, I ran our data engineering team and one of the, one of the things that the team spent a lot of time doing was worrying about the infrastructure that we were running on, and I think making sure that the nodes that were running, the pipelines were up and that the jobs ran over the course of the night before.

And, and one of my most important roles there was that in the morning at 8:00 AM our marketing team would get up and would look at their looker dashboards to see how the campaigns the night before had run and if the data was stale. They would call me on my cell phone and complain that the data hadn't run, and then I would go figure out what happened and you know, who I needed to get on call and go through and it was miserable.

And, and so I [00:09:00] think one of the big shifts that's happened is with tools like Snowflake and others, people are having to think a lot less about the actual infrastructure that, that they're running and managing these pipelines on. And so I think that's step one in what you described is don't worry as much about.

Where the code is running. Don't worry as much about did the pipeline run. There's now software that can do a lot of that for you. So then the next is, okay, now I need to think about what are the actual pipelines I need to build? What's the code that I'm writing and all of this. And I think the next phase that we'll see will be a lot more of that will be automatable.

Things like semantic layers and semantic models help simplify some of that work. LLMs will help generate that code. I think we will start to see. Less of the data engineer time spent on actually writing the specific code in the pipelines and munging of the data itself, and that'll move up to the architecting, as you described it, Hugo, of thinking about.

What is the actual architecture that I need for my overall data engineering estate, my overall data [00:10:00] estate, but what's also the architecture of the data itself, which which will continue to be incredibly important. How do I wanna structure the data? And that requires understanding what are the types of questions that people are asking right now, but also what are the types of questions that they're going to be asking in the future, and extending that beyond.

People to, obviously the AI agents, what questions are the AI agents, what data are they gonna want to have access to, and in what format? And that's, as I was saying before, that I think is what requires understanding the business needs deeply. Because you're traveling to think about, you know, what types of questions is someone going to ask about our customers?

What other data might they want to be able to join that with? How do you anticipate, like the example I gave before of someone wanting to join together contract data with revenue data. How do you make sure that all that's available together? That I think is where data engineers are going to add a lot of value, but they're gonna have to elevate and get out of, my job is to write code into my job is to help the business generate insights and make better decisions.

Duncan: You briefly mentioned, Chris, that data engineers are adopting AI tools, but also [00:11:00] that the tools aren't quite there yet and maybe we can double click a little bit on that. We've all vibe coded and cursor. We've all played with a lot of coding tools. But the reality is the coding tools aren't actively analyzing live data.

Typically. They're usually used in a more of an offline or sandbox environment. Cur curious to dig a little deeper with you, but how are data engineers actually using AI tools today? What are the big gaps? 

Chris: Yeah, so I think that's where it starts is everyone starts with, I'm gonna start with cursor or copilot or something else to try to build these pipelines and a couple things that we found.

I'll actually give an example. We started using a lot of AI models fairly early on and at Snowflake, our core language is sql. We, we have a lot of Python interfaces and Spark like interfaces, but at the core it's sql. And so we said, let's look at how good are these LLMs at generating SQL and actually helping do data engineering or data analytics use cases and.

On the surface, they actually look like they're very good. You go open up chat GPT and ask it to write you a SQL query to do something. It doesn't, it spits out syntactically correct sql, [00:12:00] but we found in general it, it didn't have, none of these things have an understanding of what's the business context that you're trying to use.

And that actually made it very hard to then try to take the sequel that they wrote and it rarely actually did what you wanted because there was enough context that it didn't have, we tried to work on this, and so generally, if you look at the text to SQL scores from a lot of these different engines, they're, they're, they're not very good.

We tried to solve this initially by taking. A huge corpus of SQL queries that we've run over the last 12, 15 years. And we tried to fine tune models. We actually tried to train our own LLMs based on it, and we continued to get pretty terrible results out of it. And we realized ultimately that what was missing was a lot of the context that you need in the course of building that pipeline or.

Or running that analytics job. And so we found it was actually the semantic model that turned out to be the thing it was missing. And so once we started giving these models access to semantic model [00:13:00] information, and this could be your DBT models, this could be your look ML models. There's a bunch of different ways it was stored.

It turned out not really to matter the specific format you had it in. It needed that understanding of. Yeah, what does your company consider a customer to be? And it turns out in a lot of cases, that's not an easy question to answer. At a lot of cus at a lot of companies, when you say revenue, what do you actually mean?

And there's no way for one of these generic models to understand that nuance of your company. You have to actually provide it with that information. And so I think that's been one of the challenges of trying to use. Use this to do your data engineering work is data engineers have to have a massive amount of context of what things actually mean.

I'm, I wouldn't say I'm great, like I'm a reasonably good, like I'm a reasonable data analyst and data engineer at this point. But even when I go, when I want to go ask a question about, hey, exactly what are these customers doing and how is that shifting over time? I go ask one of our data engineers to double check everything that I do because they understand the nuance between what [00:14:00] do I mean by customer and how is the data actually stored in the tables that, that we're using?

And so I think that's where AI has struggled with that to date in, in, in these data engineering use cases in particular. The other case I think that we see is there are certain types of things. There's actually not as much business Logic sequel sitting out there to be trained on, as you would think.

Most of it companies consider to be pretty proprietary and and trade secrety, and so they tend to lock it down, and so there's less of it for models to train on. I think in cases like Spark Jobs, there's actually quite a bit more out there floating around, but even there it, it tends to be what people are doing in open source pipelines as opposed to what they're doing inside an enterprise.

Lo looks fairly different. And so I, I think a lot of it is just the underlying training. Data's not fully there and that's where I think getting the semantic models and other pieces tend tends to help with that. And it is still such e early 

Hugo: days. I presume you and our listeners listen to at least some of the [00:15:00] recent conversation between Andre Kapai and, and DDU ish podcast and Kapai made it clear at the very start that.

This isn't the year of agents. This is the decade of agents, and it's incredibly early, right? And so even to some of your points, the type of questions people may ask an AI agent aren't like, what is in this database or whatever. It's, I need projections for 2027, and the agent needs to go and do several complex like multifaceted.

Database queries and then build an ML model on top of that. And that's where we will end up. But we need to do or get this data engineering absolutely down to be able to architect those solutions 

Chris: Well, and that's, I think it'll even go beyond what you just described. 'cause you'll start with, someone will come in and ask, I need projections for 2027.

Based on some set of factors or something else, or you know, like an example, I've been asking our data team to help me build a model so I can play with the inputs to predict the outputs around a certain product area. But what that may require is one, yeah, writing the right queries and [00:16:00] figuring it out.

But then the next step is you might not have the data in the format that it can actually use for this. And so what it needs to do is actually go. Build and deploy some pipelines, and so you'll have the agent that's asking the question now needs to go to another set of agents and say, Hey, I need a new pipeline to do this and this, and I need to be alerted when this thing changes.

And then that agent may go find that it doesn't actually have access to some of the data. It needs to really do a good job answering that. So it may need to go build a new ETL pipeline, or it may need to go collect data from a different source that wasn't set up before. And then I can see this go even further where maybe you end up and you're saying, we're not actually even tracking this appropriately.

I need to submit a pull request for my company's mobile app to add some new tracking. That can then feed the pipeline, that can actually end up in a table where I can answer. And if you think about it, you have people in your company who know how to do each of those things and they talk to each other.

And I think you can see how agents can start to take on some of that. But I agree with you, this isn't gonna happen in a year. It's gonna take much longer. And so a lot of what we're trying to do at Snowflake right now is figure out how do we, how do we [00:17:00] make sure that we're building and architecting the platform for our customers so that they can take advantage of those agents and be able to use them as they develop and as they go out.

I think trying to say, Hey, we're gonna re, we're gonna replace your data engineering team 12 months from now is crazy. It's much more about how do we help the folks in the data engineering team, how do we help them elevate to, to be thinking about business problems like we talked about a minute ago, and how do we help them be more productive and more effective?

And really, I think of it as how do we turn the data engineers and the company into heroes who are able to get an unbelievable amount of things done. And they're gonna do that ultimately by deploying lots of agents. But it's gonna be a, an incremental process, I think. 

Hugo: Amazing. And we've had a few conversations on the podcast recently that I'll link to in the show notes about the present of working with ai, system coding and AI agents, and the not too distance future.

And there are a few with Tim O'Reilly, Annu Bud Raj, who's the president of Atlassian and Thomas Thomas ton of theory ventures, and something we've triangulated [00:18:00] around is that perhaps the future of this type of work. We'll be solving business problems, but managing a set of agents yourself. And I'm wondering where you see that the data engineering role going.

Like even how many agents would one data engineer perhaps manage? 

Chris: So I think I, I'm in the same camp that it is going to turn into much more of a management role. As you think about these agents over time, you're gonna need to. At the end of the day, if you just allow the agents to go nuts and say, any question we get asked, I'm gonna go build a pipeline for, you're gonna end up with, you're gonna end up with millions of pipelines and that's not the right thing and you're gonna end up with a bunch of pipelines that are doing very similar things.

So you do end up with the role of a data engineering architect today is also to think about how do I do the data architecture and other things we were talking about before to minimize the amount of repetitive work that I need to do and do this within a budget at the end of the day is really, how do I.

Have the data ready to answer questions, and you're always thinking about the trade off between how fast can the queries be versus how much money am I spending pre-processing you. [00:19:00] You could ultimately, basically try to de normalize all of your data into a state where you have every possible query ready to go.

That's not cost effective or a good way to do it. And so I think you're gonna one need to still provide that kind of architecture guidance and thought. The other is thinking about. Ultimately every pipeline needs to have some sort of ROI calculation. How much does it cost me to run this pipeline, to bring this data in, to perform these transformations?

And then how much value am I getting from it downstream? And, and I would actually argue that's a set of things that many data engineering teams are not great at right now. It's actually very hard. It's easy to measure. The cost is very hard to measure the ROI. And so I think that's where. There will be more tools to help people make those decisions, but I think we're a long way away from AI being able to do that.

And so instead you're gonna say, okay, what are the budgets? Who do I give them to? Which tasks do I assign where monitoring and making sure things are happening. And I do think the job, in many ways will look more like a manager than I see like it's felt in the past. 

Hugo: Yeah. And I also think, look, don't get me wrong, the models are getting a lot better, [00:20:00] even looking at recent releases and accuracy of tool calls and all of these things, far better, but.

They're still like very people pleasing, highly caffeinated, very good memory, but fallible memory, superpowered interns as opposed to someone who gets it right every time. 

Chris: Yeah, and that's why it'll start. It'll start like anything else, it's gonna start with you're writing code and they're helping, and then it'll get to, they'll write the code and you'll review it and give feedback.

Eventually you'll get to the more agentic model where they're able to write. Code against tests that you or someone else has de has developed and put it into production. And you're more monitoring and checking on things to eventually where the agents are working with each other to accomplish a goal.

But you still need to be monitoring like, okay, are they going after the right goal and are they spending an appropriate amount of money to do it? 'cause at the end of the day, we're these pipelines do consume a lot of compute and in some cases a lot of storage. And so you'll have to be keeping an eye on that.

Yeah, I absolutely do think as they get better. They [00:21:00] will be able to act like more and more senior data engineers, but we're still definitely at the phase where they're acting like pretty junior employees or even interns. I agree. So something I loved in 

Hugo: your report is that this is all well and good in the abstract, right.

But what I loved in your report is actually seeing how business leaders, they're more on board than I thought they would be, to be honest. So maybe you could run us through some of your findings. 

Chris: Yeah, I had the same reaction that you did. I think this is a, we can talk about this at a high level and we see it every day in.

Sophisticated companies that have been doing this for a while and are at the leading edge have known for a long time how critically important data and how critically important data engineering is to just their success as a business. But I was amazed to see how widespread it was. I'll pull a couple of of the specific stats here, but we found that 72% of the folks that we surveyed.

Believe that data engineers are integral to their business, which is incredibly high and I would imagine higher than it's ever been. At the same time, we found that more than 80% of executives say that the job description of data engineers have changed drastically because [00:22:00] of ai. Already they're seeing that 80% of people have already seen the role change, and part of that was we asked about how much time are they spending on ai, and we saw that the survey respondent said, that's doubled between 2023 and 2025 from 20% to close to 40%.

Of their time is now spent working on AI projects, and they expect that to go up again to 60% in the next two years. And frankly, I, I wonder if they're even conservative there. We talked a little, I, I quoted earlier, the workloads at the same time, 77%. And these are, we're talking CIOs and chief data officers.

They saw that 77% of their data engineers have increasingly heavy workloads. Quotes about the increase in the output and improvement in the quality. Like it's all been positive, but it's increasing the workload quite a bit. The one that surprised me on the other side, so these were all great that, that these are so high that three quarters of people are, are fully on board because again, usually in, in I'd say enterprise data, we tend to have a very long tail of people who take a long time to get on board with changes.

We're still. At Snowflake working on a large number [00:23:00] of cloud migrations or people's data. States still live on premise and they haven't moved to the cloud yet. There's good reasons for it, but there's still a lot of work to do there. Um, but the one that surprised me where the number was lower than I thought was actually that CIOs in particular, as opposed to to chief data officers or chief a officers cio.

Only 55% of them said that they viewed data engineers as integral, which I thought was a very interesting discrepancy between the overall survey and CIOs in particular. So I think there's work to be done there and I wonder if it's almost CIOs, traditionally their job is to provide. A platform for the company and set of applications for the company.

And the folks who are actually seeing the value in data engineering right now are the ones driving the business. And it's, you're seeing it with, yeah, chief Data Officer and CFA officers, of course. But I think the actual business owners are the ones who are arguing about how important and how critical the data engineers are.

So I thought that was one of the really interesting insights in the survey for me. 

Duncan: I wanna double click a little bit on the. Discussion of ROIA couple minutes ago, and [00:24:00] I love ROI calculations. I'm an economist by training, so I can get really geeky about this kind of stuff. You spoke a little bit about how every agent pipeline you need to consider both the cost and the benefits of that pipeline.

And in an AI agent world, it's even harder because AI agents are potentially running amuck and creating value and creating costs in new ways. How should data teams just think about where AI agents are genuinely adding value versus not? 

Chris: So I think it comes back. It's the, your point is exactly right. It comes back to that original ROI calculation, and again, in, in a lot of these platforms, snowflake included, it's relatively easy to figure out how much a particular pipeline is costing you.

And I think this is one thing that people need to not lose as we move into a more agentic world. You still need to have tracking and budgets and expectations around what they're spending. And ideally, you're probably doing that both at a per agent level, but also at a per pipeline, per outcome level.

Like how much am I [00:25:00] spending on this? And your cost is gonna shift. Today your cost is mainly in compute and going forward, it'll actually be in the work that the agents are doing and your LLM costs and other things, which can be very high. But that's only half of the equation. And I think in a lot of cases, we even have discussions with CFOs where they look and they're like, Hey, my company's spending this much on Snowflake.

We want, let's make that lower. And it's, okay, what do you know what value you're getting from it? In a lot of cases, the CFOs haven't looked at it that way. They're just more focused on, this is a large line item that, that I'm having to pay for. And so what you have to start with is really what are the business decisions you're making?

What is the outcome that you're driving? And it's very difficult, I would argue, to actually put. A direct dollar amount on that. Instead, you have to look at, are the teams moving faster? Are we making decisions more quickly? Are we making better decisions than we were making before? Are we generating insights that were too difficult to get to before?

Like the, how much revenue are we generating based on this particular term? Those are the types of questions that you just didn't ask [00:26:00] before because it was too expensive to even ask the question. And so you have to start to figure out how much, how valuable is that to the business? And I would say there's not.

We've tried a bunch of times. There's not a simple way to hand every one of our customers a dashboard that answers this question, but I think with ai, the underlying exercise that you have to do stays exactly the same. It's really, it's still, what's the business value? What would we do if we didn't have this pipeline in place?

And that starts with understanding who's using it, what are they using it for, what decisions were tied to this. And then I think you'll ultimately end up less in assigning a dollar amount to that, but instead looking at the cost of a given pipeline, looking at all the use cases downstream, and basically saying, okay, if we shut this off.

Could these teams scream loudly enough that it's worth the cost that we're paying? Or even if you went to them directly and asked each of the teams who are consuming the data or making the insights to say, Hey, if you had to pay for this directly, is it worth it to you? And I think in most cases what you'll find is [00:27:00] it's a pretty resounding yes.

And even with ai, that will continue to be the case, but you need to make sure you're still asking that question. We even see now in a non-AI world, we have a lot of customers who have pipelines that they've been running for a long time. That no one's actually really using or using in a valuable way, and they could pretty easily shut them down.

And we try to point that out and help them do that. But there is definitely a bit of inertia around, well, maybe someone will need it. But I think in the AI world, as things get more expensive to do this, at least in the short term, you'll want to make sure that you're really, you're running the pipelines, you're asking the questions that you're getting value from, not just because you've been doing it that way for a while.

So the data 

Duncan: engineering role is really evolving then to be both architecturally more focused and also. As you highlighted, more, more strategic and opportunity sizing, more PM kind of skills, if you will. 

Chris: Yeah, and that's where the, that's where I think it's for a lot of data engineers. I think this will be a very exciting transition.

It's gonna go from worrying about the plumbing to worrying about the outcomes that you're driving and the value that you're creating for the business. I, I think it will be a, [00:28:00] it'll be a challenging transition for some people, but I think it will will end up on the other side with data engineering looking.

Well, like a very strategic, critically important partner to the business, which I think will be very exciting. 

Hugo: I really like that we've started looking into the cost benefit analysis and identified a bunch of Im important variables. I do wanna throw a spanner in into the equation. I forgive the mix metaphor, but I'd like to know how we should start thinking about the risk we take on, particularly as.

Agents are generating more and more code and then perhaps even like second order uncertainty around this risk. 

Chris: Right? Yeah. So this is a fantastic question, and I actually think this is another case where the challenges we'll face with AI are not completely new. They're just slightly different versions of challenges that we've been facing for a while.

So you have this challenge today around. Any set of data that you bring in, who can use it for what reason can they use it? How do you trust the outputs? Even when you bring a new data analyst in, or a new data engineer in, you have to get to a point of how do you [00:29:00] trust the results that they're providing you?

How do you decide who can access which insights and which pieces of data? And so my take is that the way to solve these problems is actually a set of tools we've had for a while. It's having really good governance on, on top of your data. And governance to me is a variety of different things. It's both.

The specific row level security rules around who can access which tables and which columns and which rows. But it's also the semantic information about, again, like what does a customer actually mean and how do you generate that query, and how do you make sure that you're pulling it the right way. It also means things around data quality.

How do I guarantee that? There's not nulls where there shouldn't be or these dates make sense, but there's also a set then, and I think this will be the ones that will need to encode the sort of common sense checks that you do on any data pipeline or any output for this. If someone comes back and tells you that, I'll use Snowflake as example.

If someone comes back and tells me someone's been a customer since 1927. I know something's broken somewhere because we haven't existed since 1927. And so [00:30:00] there's things like that you'll have to build in as checks. And initially those will be people looking at these, but you'll start to encode a little bit, Hey, there's a set of expectations that we have around the data and what's going on.

And then as long as the agents are operating within those guardrails you, I think you can feel reasonably safe in the outputs and in kind of the access they're doing. O One of the things that we've seen a lot with customers, and this came through in the report as well, is it's actually very easy in a lot of cases, even today, to spin up an LLM and hand it some static data and ask it some questions and say, oh, like I've got a, I've got a new tool.

Let's go roll this out to everyone. And pretty quickly you realize, you're like, well, it's actually one that data's not somewhere where I can get it in real time. So how do we, now I have to go build a bunch of pipelines to stand it up. Two is how do I think about the governance around that data? Like in the case of the contracts I was using before, you probably don't want every employee in your company able to access every single contract you've signed with every one of your customers.

There may be some rules you need to place around that, or certain [00:31:00] salespeople should only see contracts for their particular region, and so how do you actually enforce that? And then how do you make sure that the AI agents are enforcing that and using it in the right way? And as you've probably seen, trying to in your prompt convince an LLM not to give certain answers to certain people.

It is a massive challenge. You need to do it lower down in the actual data layer. And so I think what it comes down to is not that we need necessarily brand new tools to build trust around agents. We need to take the tools we already have, I think a couple steps further and make sure that we're actually using them and enforcing them.

And we've seen this a bit with customers that we have already, where the customers who have, we prefer this as like a solid data foundation, but the customers that have actually done that work to get the data in one place, to have the right rules and governance in place, they're the ones who are actually able to deploy AI into production significantly more quickly.

'cause they're not having to redo that work. They've done it already. 

Duncan: How do you think about like the speed quality trade off there? [00:32:00] Wherein getting the revenue numbers a hundred percent perfect across every region, across every geo time period might require a lot of heavy weight kind of modeling to nail.

And often part of the premise of AI actually is it isn't perfect, but that it's wicked fast and it gets you most of the way there. Is that enough in data applications or is it not? There's this interesting kind of. Paradox there. It's 

Chris: like many of these things, I think that it'll depend on the particular use case that you're trying to do there.

There's a, a, forget exactly who the quote's from there. There's a quote about how do you get an army to move fast? And the answer is by moving slowly. And I think there's a little bit of that here where again, what we're seeing is the companies that have invested upfront to put the foundation work in place.

And it doesn't have to be getting every single query exactly right. And in fact. What I often recommend to our customers is to start narrow and say, let's pick one problem you want to answer, and now let's get the data foundation in place for that one question. This might be, I actually generally argue, don't do it by data set.

Do it by the question that you want the, let's say you do want a [00:33:00] revenue forecasting dashboard. Let's work backwards through what data do you need for that? What governance rules do you need? Let's get that in place once you have the ground in place, and that does take some time, which is why you wanna start narrow.

I would also say, yeah, doing. Saying we're gonna go build a perfect data foundation for our entire company is generally a multi-year project and is a huge challenge. So you wanna start narrow, but then on top of that foundation, that's where AI allows you to go incredibly quickly. And having the foundation means that you can not just.

Ask the questions and build the first version of the tool, but actually put these things into production shockingly quickly. I'll, I'll give one example of this actually from ourselves. We built an AI agent internally that we use for our go to market teams, and it's designed to ask, Hey, gimme all the customers who are using Snowflake for this particular use case.

Or, Hey, I'm talking to this customer next week, let me know what I should know about them. So I use this before almost all of my customer meetings and most of our sales team is now using it. We were actually able to build this incredibly quickly [00:34:00] because we had a lot of the data governance and other pieces in place, and so we have some policies around, we're a publicly traded company, and so if, if people can predict what our revenue is going to be, they have to be on the insider trading list for a bunch of reasons, and so.

There are rules about what data different people can have. And so I, I have access to all of it, but I'm on, I have a bunch of extra rules around when I can trade stocks and other things, but we don't want to make everyone follow that. And since we had actually those rules enforced in Snowflake itself, we have every employee at Snowflake has a Snowflake login and go run queries and look at data themselves.

It was actually very easy then to just say, okay, the agent has to just follow the same rules, and we didn't have to define anything new. And so if a sales rep. An individual sales rep goes in and asks questions about revenue specifics. They can get very detailed answers about the accounts they cover, but if they ask about other accounts, it will just say, Hey, you don't have access to that data.

Not because the model is trying to enforce it, but it ran a query and it got [00:35:00] redacted results. And similarly then when they ask though, gimme examples of other customers that are doing things like this. They have access to that and underlying data. And so it's able to give them answers of, Hey, here are other customers where you might want to go talk to the sales rep.

Who is this person? And we're actually in a lot of cases, inheriting some Salesforce specific access controls that that we're passing through. And that meant we were able to build and roll this out without having to build governance specifically for it. Governance was already in place and so there we were able to move fast again because we'd done some of the foundational work.

Duncan: I love that. Let's shift gears a little bit and talk about how organizations are changing in shape in an AI first world. We've talked lots of data teams that are changing and are often a little bit unclear actually in where the future is, especially in kind of data science and higher up the application stack.

And so curious in, in your view, what are the most effective data teams doing to organize themselves, to [00:36:00] adopt ai, to influence AI and to take advantage of these technologies? 

Chris: Yeah, that's a fantastic question, and I think we're seeing it. Like many things AI related, we're seeing it evolve quickly and rapidly.

And I think that's actually probably the core answer is the teams that are best taking advantage of this are organizing themselves for rapid change. And I think it's less about, Hey, there's a particular way that everyone needs to organize. If I gave you what I think that is right now, it's probably gonna be out of date and wrong six months from now.

And so I think instead what it is, the best teams are, they're organizing to be nimble and and flexible. And that's both about how they're adopting ai. If you, we've already seen this with Microsoft, put all their chips in the open AI basket early on, and they're now actually doing a ton of work with Anthropic on the coding side because it's turned out that.

The world changes quickly. And what those two companies have focused on is diverged. And for certain use cases, anthropic is better. And for other use cases, OpenAI is better. And so I think that same thing's happening with data [00:37:00] teams where they're building foundations that allow them to be flexible.

They're, I think they're removing focus on a lot of the infrastructure and other things that they can. Hand off to companies like Snowflake or others, but they're handing that off where they can so that they can focus on the higher level pieces that are more interesting. And I think they're re, they're designing their organizational structures still to be a bit around.

You'll have teams that are focused on let's get the data and let's bring the right sources in. Teams that are focused on actually building the pipelines, teams that are focused on managing the infrastructure and the costs. And then you still have analysts and data scientists. I think those lines are blurring though, and so the teams that are best set up for success are the ones who are anticipating that and setting their organizational structures up in a way where they'll be able to adapt quickly.

I'm curious, is that what you guys are seeing and hearing as well as I think you, you get a different view and perspective on this than I do. 

Duncan: We're seeing that the roles are shifting quickly and also seeing the [00:38:00] kind of rise of these super, super icy. Where the more experienced folks, especially who know what good looks like, can just get a vast amount more done.

I think the tectonic plates of the org changes are still both slow moving and like quicksand at the same time. And so I, I like your response of the kind of move fast is the answer, but I think it's really unclear what the final roles actually are in this future world and, and what that means for. The existing functions and something 

Hugo: we discussed recently Duncan, and this is more for product builders in a lot of ways, but I think it's relevant here, is.

Being able to, of course now we can build a lot more prototypes, a lot more quickly, so we're seeing a really important new skillset emerging. It's not quite a new skillset, but the ability to spin down projects and decide when to stop 'em so we can prototype a hundred things, but the people who know which 98 or 99 to spin down quickly and which to double down on that might be a far more important skill of the future.

Chris: That's super [00:39:00] interesting. I, I would agree with that. I think it is, you want to be, th this comes back to what we were talking about before a little bit on knowing which pipelines and which projects are actually generating an ROI and which aren't is going to be a super important skill. 'cause in the past.

Building a new pipeline was expensive. You have a limited number of data engineers. Building a new pipeline takes a bunch of time and a bunch of energy. And so you were very thoughtful about which ones to do. And so that led to generally if you're resource constrained, most of the projects you do are going to have an ROI.

'cause you, you do the work upfront. In an AI world, it's a lot easier to say, Hey, I wonder if, let's go find out if. And that's an incredibly powerful thing to be able to do. But it also means you have to then say, okay. No, that was a bad hypothesis. Let's wind it down. That was an experiment that didn't pan out and be willing to kill those things off and ideally pretty quickly and knowing I guess, when to kill it and when to iterate on it is gonna be an incredibly important skill I'd.

I'd agree with that completely. 

Duncan: And having the kind of confidence and ability to shred your own [00:40:00] ego around those kinds of things. We had a good chat with Roberto Mere, who was a VP at Meta on exactly that topic around how. In this new world, right? You, it was always the case. You had to be willing to abandon your losers and now you just have to move vastly faster to even the project you thought you'd get promoted on next quarter, like may not work, maybe you just need to go and that's the way of the future.

And Roberto actually 

Hugo: gave the example of when he launched reels. Him and his team launched reels, right post TikTok on Instagram. And he said it, the numbers weren't looking good after three months. And everyone was like, what's. What's happening? And he said, no, but we doubled down because we had a, it was hypothesis driven and it took longer, but we had this hypothesis and it ended up paying off.

So knowing when to double down and look at it now. 

Chris: Yeah. And this has been, I, it's interesting. It's, this has been, I think the mark of great data people and product managers for a long time is separating the. Person from the hypothesis and being able to say, Hey, we have a hypothesis. I, there's a stat I, someone shared with me at one point that they said the the best product managers assume they're [00:41:00] wrong about 80% of the time.

And I found that to be true. And it gives you a very different perspective. If you assume 80% of the things that I'm gonna argue we should do, I'm gonna be wrong about you. You have a much more humble and different perspective on it, and really your goal is not. The, the trick's not to increase. You know, I'm wrong 80% of the time to I'm only wrong 50% of the time.

The trick instead is to figure out which, like what I'm, which are the 80% that I'm wrong about as quickly as possible, and, and move on and basically discover the 20% that you're actually right about. And in the case of reels and others, double down on those and really pour your energy. I think in data engineering we'll see a lot more of that going forward as well.

Hugo: And I don't wanna be one of those guys who's like, Hey, every challenge that we create using technology, let's try to solve using technology. Technology. Having said that, I do think there's an interesting future in which we use agents to, we already see them writing tests and that type of stuff, but to filter down the available options and even perhaps optimize different agent assistance or fine tune them [00:42:00] to, to do so.

This is something actually, I use LLMs for. Ideation and content generation and those types of things to generate a lot of different ideas. I also use them to filter down ideas and of course I'm a human in the loop there, but it can actually be very, 'cause they're so good at generating a hundred ideas, right, that then we don't have the bandwidth to jump in and deal with.

And then using them to help with a human in the loop to filter down can be incredibly fruitful. 

Chris: Oh, this is one of the things we've seen, I think in, in in LM research and a lot of the latest things we're using this where they actually have. Behind the scenes, there's two LLMs and one is generating the ideas, and then the other one is giving an opinion on whether those ideas are any good, and you actually get significantly better results out of that.

You get this where you ask it a question and it tells you something. You're like, no, that's wrong. And then it immediately corrects itself. There's a bunch of cases where the LLMs are just doing that behind the scenes automatically, and you end up with significantly better results now, not as good, I imagine as when you're in the loop, you go, but it's still, they can automate a decent chunk of that.

And so I think we will see that. I think again. You look at this of you've had in data engineering [00:43:00] and data in general there, there's multiple layers of work that has to happen and we'll start by automating things at the lower levels and that will elevate where people go. But the higher level, more strategic, more challenging questions, I think will be the hardest to automate with LLMs and may be impossible to automate with LLMs, which is why I think the people thinking about careers and where you want to go, making sure that you are strong at those types of skill sets is gonna be incredibly important because that's, I think, gonna become increasingly valuable.

Duncan: How should kind of data engineers think about developing those skills? Those are traditionally ne, not necessarily like the mm-hmm. Skills you would've learned in school or the early data engineering jobs. So yeah, on the career advice kind of line of thinking, how should they actually be educating themselves or acquiring those new ways of working.

Chris: So I think it's, I think the, the simplest way is to ask a lot of questions and ask why. So not in a defensive way that, not why should I do that, but in a, when someone comes and says, Hey, we need a pipeline that does X, Y, and Z, and say, oh, [00:44:00] okay, what, what's the business goal? What are we trying to do at the end?

And. And initially just getting that context and understanding, okay, why is it that this is an important pipeline to build? Why is it that this is an important problem to solve? Why is it that this is even the question that the analyst is working on? Why is this an important question for them to work on?

And I think just getting in the habit of asking about that and trying to understand it will is a really good place to start. What that'll lead to then is as you start to understand more about what they're trying to do, you can start recommending actually better ways in a lot of cases to, to solve that problem.

And actually the more I'm talking about this, the more I'm realizing, Duncan, your point earlier about this makes you more like a product manager is exactly right. Where one of the key skills of a product manager is a customer will come and say, I need this feature. And I think a poor product manager builds the feature.

What a great product manager does is they ask why and repeatedly ask why. And there's a Toyota technique called the five why's, which is about usually takes about [00:45:00] five why's to get to the actual underlying business problem. And then you step back and say, okay, what is actually the best way for me to solve that problem?

And in a lot of cases, it's actually pretty different than what the customer was originally asking for. And then when you come back and show them, Hey, here's what we did to solve that problem you had, they go, oh. Yeah, this is way better. I had no idea this was even possible. And so I think for data engineers asking those why's, getting to the underlying business problem, figuring out is there actually a better way to solve this that helps build the skillset and helps build the capability.

So then later what you can actually do is be instead of them coming saying I a pipeline that does X will say, Hey, I have this problem and I don't know how to solve it. And now you're a strategic partner. So I think that would be my advice is yeah, just ask why a lot. I love it. So. 

Hugo: It's gonna be time to wrap up in a minute, sadly.

'cause I feel like we could, we could chat for hours about this type of stuff. I am interested in stepping back a bit and wondering if you're advising a technical executive, whether it be a CTO or CIO on making their organization AI ready for some definition of AI ready. [00:46:00] Where should they start and what should they stop doing first?

Chris: Yeah, it's a great question. So I think in terms of how to become AI ready, there's two things that I think are the most important and one is. One is that data foundation thing we were talking about before. I think it is invest in having your data. I mean, this can be as simple as having your data somewhere where AI is going to be able to get access to it.

In a lot of cases, company's data still sits in. 15 different databases, some of which are in data centers somewhere, some of which are in a closet somewhere. Some of it's in the cloud, but maybe on multiple different clouds and different platforms. And so even just saying, Hey, I want to be able to bring these different data sets together is challenging.

And so I think one is start solving that problem, and two is doing that in a way where you're putting the right governance and data quality expectations and semantic models on top of it. So that. You can easily say yes to ideas on how to use it. So I think the first is getting that in place. The second [00:47:00] big piece then, and I've seen this in a bunch of companies, is a lot of companies are still scared of AI and of LLMs, and part of that I think comes from not having the foundation and the governance in place and being worried about what might happen.

But what you want is you want everyone in your company experimenting with this stuff and trying different things out. And I think we're still in the phase where. No one knows exactly what problems it's going to be able to solve for your particular company. And so you want to encourage everyone to be trying these things and using them.

We did an interesting thing inside Snowflake. We actually gave cursor access to all of our sales engineers, so we gave every our solutions architects, so all of our technical sellers, there's, there's a lot of 'em, and we gave them all access to Cursor and then we actually found that a bunch of, some people were using it, some were not.

So we actually started tracking it and saying, okay, like everyone, we want you to at least use this once a week and try it out. And the result of that was suddenly an explosion of really creative, interesting demos that people were creating. Really interesting, creative, just different [00:48:00] ways to showcase what was possible with Snowflake.

We even found a lot of the solution architects were learning about new capabilities of Snowflake because they were trying out different things and the cursor was actually suggesting them. They use different parts of Snowflake that they didn't know about, but it started not with. Use cursor to do this very particular thing.

It was, Hey everyone, just try this and share what you're learning and let's go experiment. And so we started doing this in a lot of different parts of the company and trying, Hey, there are tools that can help. Let's go use them. We went through a similar thing where initially our security team, rightly so locked down, access to a bunch of various different tools.

And then what we've ended up doing is, and this will be a temporary state, but subscribing to a lot of different tools, getting enterprise licenses and basically letting anyone who wants to use them while we figure out. Where are these most effective? Where do they create the most change? And then we've got a session to tomorrow actually at Snowflake, where for a bunch of the PMs, we're doing a show and tell of how are you using AI and how is it making you more efficient, just to let people see.

So I think that's the second big thing, is [00:49:00] let people have access to these things in a controlled sandbox way, but let people have access and let them be creative and go a little crazy. 

Hugo: I love it. So data foundations and unification, and then a culture of experimentation and play. Fantastic. Exactly right.

Congrats once again on, on the report, Chris, and thank you for all your work and time and sharing it with us and our audience. We appreciate you, man. 

Chris: Absolutely. It's, it's an incredibly fun time to be in the data space in particular right now. I think both the explosion in data, but the ways that AI are changing everything and just how much more powerful the tools are than they were 10 years ago.

Is making it really fun to see how things are shifting, how much more impact data engineers and others are able to have. It's, it's really fun. So thanks for having me. It's been a lot of fun to talk about this. Such a 

Hugo: pleasure. And just one more thing. I just love that you refer to it as the data space because it is still the data space.

I know it's the AI space as well. Yeah. But it is, we focus on the models a lot, but it is really wonderful to bring the conversation back to the foundations, the data. Yep. Absolutely. Thanks so much for listening to High Signal. Brought to you by Delphina. If [00:50:00] you enjoyed this episode, don't forget to sign up for our newsletter.

Follow us on YouTube and share the podcast with your friends and colleagues like and subscribe on YouTube and give us five stars and a review on iTunes and Spotify. This will help us bring you more of the conversations you love. All the links are in the show notes. We'll catch you next.