The following is a rough transcript which has not been revised by High Signal or the guest. Please check with us before using any quotations from this transcript. Thank you.
===

Vishnu: [00:00:00] So I think there are two pieces here, right? One piece is, can I have a small team which can ship a workable, useful feature out there in front of millions of users. Second is after I've shipped it, can I still keep it operational with a small team? So I would say it's much easier today to do the first.

The second is much harder. Because the processes and tooling and the mindset necessary for small teams to be able to operate something, which is not too deterministic of a lot of probabilistic pieces going in there. That's all still evolving. We are still figuring out how to do a, 

Hugo: that was Vishnu Ram Karaman outlining a central challenge in the generative AI era, moving from the thrill of a successful prototype to the reality of operationalizing non-deterministic systems at scale.

Vishnu is a generative AI [00:01:00] executive and entrepreneur reimagining observability for finance leaders as a former AI leader at Credit Karma and Intuit. Vishnu scaled AI solutions that directly impacted over 140 million users across the financial system. In this episode, Duncan Gilchrist and I speak with Vishnu about his decade spent building and scaling robust machine learning systems at Credit Karma and beyond, and how those hard earned principles are shifting today.

We dive deep into the strategic implications of this shift, discussing why the shelf value of code is dramatically falling, how to design a new organizational triad, built for speed and iteration, and the critical differences between testing traditional ML and testing generative AI powered software.

It's a crucial conversation about engineering discipline, team structure, and how to successfully manage risk and architecture when platform stability is no longer guaranteed. Hi Signal is brought to you by Delfin, [00:02:00] the AI agent for data science and analytics. If you enjoy these conversations, please leave us a review.

Give us five stars, subscribe to the newsletter and share it with your friends. Links are in the show notes. Let's jump in. Hey there, Vishnu, and welcome to the show. Thanks a lot, Hugo and Duncan to be here. Such a pleasure to have you here, and I'm so excited. Jump in and talk about where this space is moving.

You've spent over a decade building machine learning systems at scale, and by that I also mean really, I'm so impressed and inspired by the way you've robustly thought about tying engineering to data, to machine learning, to organization, and then to, to business value and this, this complex system finding, finding stable patterns there.

So I'm love to jump in and hear how. In your opinion, the shift from traditional ML to generative AI has changed the way technical work actually gets done today? 

Vishnu: Great question, Hugo. I think [00:03:00] maybe we have to start with what is changing before we figure out the how here, and I can talk intelligently about what is changing.

I am pretty much a novice like everyone else in terms of how the change is happening. I think what's changing is in traditional ml, you need to collect a lot of data across tens of thousands of user interactions, if not more before you're able to build a model, and then that model is then giving you some predictions and then you are making in nine Credit Karma scales.

We were doing recommendations of financial products and recommendations of other content for millions of users on a daily basis. And, but to be able to build a model at that scale, you needed to wait maybe a month, maybe a couple of months before even you are able to build a model with, with Genai, the amount of data that you need to actually make something useful, show up in front of the user is really small.

It could just be in [00:04:00] context direct from the user. So the user is giving you some information and then you're able to immediately use that to provide value to the user, which is very different from how you were doing traditional level. Traditional level. You're trying to collect tens of thousands of users interactions and data points, and then you are trying to see, hey, how best can I learn about Hugo's Netflix movie recommendations based on Duncan or Vishnu's movie recommendations.

In this case, I don't need to use your your data to give me recommendations or Duncan's data to gimme recommendations. I'm gonna be able to collect information directly from me and use that to provide recommendations directly to me, which is like a big shift. I would say what that means is the amount of data needs go down dramatically.

The time to market in terms of how you are able to ship products, which are learning. Again, that just shrinks dramatically. You can learn very quickly. [00:05:00] And then as a result of that, there, there is a wide range of problems. I don't, I'm not sure how many of the modeling problems that I've shot down when I was in Credit Karma, so we had a lot of success.

And what that means is like every business unit will come and say, Hey, can I. Model this. Can I build a model here? Can I build a model there? Can I build, can I use it to predict this, predict X or Y or Z? And invariably I'm constantly shutting it down because you don't have enough data. Even if you have enough data, that is not enough ROI, to fund the data scientist or to fund the machine learning engineer, to keep the data together, to keep the model together.

All those problems are all open now. So the amount of problems that you can go after is just, you just have to think about it and then you can go after it. The flip side of it is you then need a lot more creativity in how you design these products to acquire data in context, use that data to be able to power the [00:06:00] experience for the users.

So that, I would say those are the big pieces there in terms of what changes are happening and then the how. Part of it is how do teams then react quickly? How do they then understand what they can deliver to the users? So that part is, I would say it's like everyone's learning. Like how can I react quickly?

How can I learn quickly from the user and what they need and how can I learn what this technology can help me power? I think that's, those are all changing today. You are no longer limited by the old challenges that you had with traditional ML and unlocking these problems. 

Duncan: It's so exciting to, as a data practitioner, to realize that, that so many of the problems that really needed a large team to go tackle before are now approachable by smaller teams.

I'm wondering though, if we can get a little more concrete, could you talk a little more about a specific examples of problems that can now be tackled by small teams? So I think there are two 

Vishnu: pieces here, right? One [00:07:00] piece is, can I. Have a small team which can ship a workable, useful feature out there in front of millions of users.

Second is after I've shipped it, can I still keep it operational with a small team? So I would say it's much easier today to do the first. The second is much harder because the processes and tooling and the mindset necessary for small teams to be able to operate. Something which is not too deterministic of a lot of probabilistic pieces going in there.

That's all still evolving. We are still figuring out how to do ils. We're still figuring out there are a lot of val tools out there, but the val tools out there are all like very general purpose and they don't really fit the use case. Like I was talking to one of my friends yesterday and we were going back and forth on, hey, whichever tools do you care about, and I said, I need to build my own for my use cases.

Going back to the specific problem that you were talking [00:08:00] about, Duncan. I think one of the things that my startup is focusing on is helping CFOs deal with their problems. One of the problems that we started out with was expense categorization, and they had expenses coming in from multiple different sources.

They needed us to come in and help them categorize their expenses. And interestingly, I had at one point of time when Mint users moved to Credit Karma, uh, and the data science team was reporting it to me. I had a few data scientists on the data science team who were, uh, categorizing Mint customer site, uh, transactions.

So I had some knowledge about heap, the challenges that were involved in, uh, categorizing transactions at that point of time. So my immediate thought process as, as soon as I heard the problem was like, this is gotta be super hard. I remember all the scary days from categorizing transactions from Mint, where the users will just be able to go and tweak the rules that you then have to remember how to use to [00:09:00] do those expense categorization.

This is $120 check that is going to. Pedro is for gardening. Is that, is that something that is gardening expense? I just see 120 in Pedro and automatically put it to gardening. How do I do that? So all those challenges came to my head, but then I realized that let me just do a quick experiment. Can I use plots on it or plot haiku to do this job?

So then if I'm doing with CLO haiku or clots on it. This is just me, right? Like I don't yet have a team to be able to do that, support me on this project. So I'm just saying, let me just do this experiment. And I have transactions coming in from one of the leading the credit card providers for businesses.

I said, okay, they have a great A PII can use their API pull in the transactions and understand what's going on here. Can Haiku do the job? Can I use ESPY and ML float to run my experiments and then get this good thing going? And I was very surprised to see that. I didn't how to relive some of the, the [00:10:00] challenges that we had to face with much larger teams when we were doing, doing that transaction categorization that Credit Karma.

But at the same time, the scale is very different, right? There's a hundred million plus users doing all kinds of transactions across all kinds of businesses across the country versus, uh, a business, uh, still a public company, but it's still operating with the. Smaller set of vendors that you have a grasp on, Hey, what are they using this money for?

Who's spending the money, which department they are part of, and all of that. So I, it was, it became a matter of pulling together the right context and allowing the model to go to its job. 

Hugo: Fantastic. Look, I'm, we've been speaking for less than 10 minutes and there are so many interesting moving parts in here.

The first is, and I think people who haven't done a lot of work with generative AI in their mind and in the culture over index on the generative aspects of AI and not the in context learning the way you can do classic ML plus with these systems. So anyone out there really go in and [00:11:00] drop in some examples and see how you can do in context learning there.

The other thing you mentioned with respect to. Tooling is so interesting to me because you're right, we don't yet have tools which solve all our needs. And I think it, it's worth saying a bit more about this and it makes sense because when we had a Fei FEI Lee on the the podcast, she described generative AI as a civilizational technology.

And I think it's so important to recognize how horizontal it actually is. Right? And when. Currently in parallel with new foundation models coming out. We're all building on the application layer, and then we have. People building tools in parallel as well. It absolutely makes sense that these tool builders won't be able to provide everything e everyone needs, right?

So a pattern I see a lot is people adopting a tool and then custom building things o on top of it. And actually AI assisted coding for hand rolling your own like custom data viewers and that type of stuff is one of the most, uh, one of the, one of the [00:12:00] biggest wins there. I, I find. As well. The other thing you mentioned how you could just jump in and do exploratory data analysis and notice patterns.

I think AI assisted coding excels at that so much with exploratory data analysis. I'm not gonna ask Claude unless I'm gonna get it to write some Python code. I'm not gonna ask it to find the median of a dataset for me clearly, because it will likely not do that correctly. But if I throw in some customer data and say, Hey, what do you see in this?

It will almost always do some clustering and notice patterns that I may not have noticed myself. 

Vishnu: Yeah, I think as long as, first, I won't throw customer data into cloud code unless I connect it to their setup and like this is like going back to my security training at Credit Karma. So the second part to that is, more often than not, you want, you want to find things that you are not going to catch immediately.

Second, when you do these kind of explorative data analysis, you are looking to find a [00:13:00] few different things there. There might be some five standard things, and as a practitioner, you might just do two or three of those things at that point in time because you're just like, maybe you are not. You are a little lazy, you're strap in time, whatever.

But I think the opportunity with the model is it's going go 1, 2, 3, 4, 5. It's not gonna miss any of those five all.

Do I have coverage? Do I have any data date gaps? All of that, and then give that to you. And that means that if something is not there and you didn't take the time to go do it, it's gonna put it in front of you. Then you're going to, you have an opportunity to then say, oh, I, I would've missed it. I would've missed the date gaps because some system change and the data was not flowing in during those days, I would've completely missed it.

And the system caught it immediately for it. That's again, the power that you get here. And, uh. A lot of fun, fun incidents that I should be careful in talking about. Some of them. And I think I was working with two different APIs and one API was giving [00:14:00] me money, numbers and cents. And another API was giving me money, numbers and dollars with decimals and cents.

Think about that I am pulling in transactions and now if I don't do that data transformation properly, if I don't understand like what the API is giving me, then I'm gonna miss it. And those are the kind of things that you have to take care of when you, uh, do these kind of projects. 

Hugo: Absolutely. And I think something we're circling around now is there are huge changes afoot, right?

And particularly in engineering. Not only in engineering, but I'm wondering if we could, I'd love your thoughts on how you are seeing. Engineering as a discipline, like fundamentally change as more code insights, even decisions come from AI systems rather than humans. 

Vishnu: Yeah. I think the, just the shelf value of code, right?

I think historically the shelf value was code, especially when you're building platforms or when you're building systems. I can talk about some of the systems that I built in Credit Command. My team's built in [00:15:00] credit command in 2015. They still operate the core model inference system, which does more than 8 billion predictions a day.

Or I might be like super dating myself that I think they do 65 billion predictions a day. We built it in 20 15, 20 16. So that's the value that we were able to get from that piece of code that we built 10 years back. Now, if I look at some of the code that the model is writing the shelf value. I can't think of the shelf value from that context at all.

It's just like, I wanna get utility out of it today, tomorrow, next week, next month, but by the time I come to next month, I might have to throw it away, and I'm completely willing to throw it away because I know that that is that piece of code. If I anchor myself to that piece of code, it's gonna drag me down.

It's not gonna allow me to do the five other things that my customer wants from. So if I look at shelf value of code and shelf value of working code, it's not just like code, right? It's working code. It's just gone down dramatically. I [00:16:00] can generate it very quickly, I can validate it very quickly, and it's just completely gone down.

Hugo: Fascinating. I am interested and that's totally my experience as well. I do think that the surface area of what is possible with software. Is increasing and has increased so much. And in fact, we do have another episode with Tim O'Reilly called The End of Programming. As we know it, and he talks about this revolution allowing so many more people to build all different type of types of software.

And we talk about things such as ephemeral software, like single purpose software or, or just in time software. Sam Altman even tweeted out at some point and there's a lot of different takes on this, but I think he is something like SAS is entering a fast fashion era or, or something like that. Now, I wouldn't necessarily go that far, but we, it is absolutely changing.

Duncan, I know you had a question next. 

Duncan: There's something really interesting there. Actually, one, one practice I think we're seeing is that not only has the shelf life gone down and that you can build entirely new things very quickly, [00:17:00] but oftentimes it's actually easier to completely rip out the first version and rebuild the second version from scratch than it is to modify the first version so that the fast fashion allergy actually there might work.

Where you're literally just like discarding the thing you had before moving on. 

Vishnu: It still hurts Duncan when I do that. I know I want to do it, but it still hurts. 

Duncan: Yeah. The biology is not designed for this. I wanna talk about evals. Vishnu, I, that's a, I feel like that's the word of the month right now in my inbox and on, on Twitter.

And in particular, how you've thought about evaluating these kinds of systems and how you think about testing them, how you think about evaluating them because they're so non-deterministic in a new and complicated way. It's just different from what we've seen before. 

Vishnu: Yeah, I think first I just want to like get across a point of view.

The way I think about it is all outputs from traditional products, they only seem deterministic, to be honest. They're all they, it's not like they're [00:18:00] all deterministic. They only seem deterministic. And to a large extent, the way I think about it is like they encode the product manager and the engineer and the company's decisions at that point in time.

So they are all in the world. Is world is not deterministic. So I think I'll start with that. And I, traditional ML at least gave us the sense that the probabilistic outputs that you're getting from traditional M, they at least encode the learnings from the data. That you're using for building out the model.

But with gen ai, your, your, the probabilistic output is like encoding the learning from you don't know who and what and when. That creates its own set of challenges. The, the way I really look at testing and evaluations is like completely dependent on your users and the usage situations. Like I, I was in, in a place where I was building gen use cases for a hundred million, 140 million users.

Consumers, I needed to make sure that the use case is able to [00:19:00] help a very large variety of people from all kinds of walks in life and with various levels of personal finance expertise. And then now I'm focused on building generative AI use cases for, uh, limited set of users who are all experts in their own domain and who are very conscious of what does good, what, what does bad look like.

And if they see something that is inaccurate, they're gonna catch it immediately. So these are complete two different sites of spectrum in terms of the use cases that I am operating on. And over there, the amount of time that we spent on input guardrails and output guardrails, we have tested everything that we could outside of the production environment.

But once I put something in front of the users as I need to have extremely robust investments and input guardrails and output guardrails, what is going into the model? What is coming out of the model? I would much rather not show something and tell the user, sorry, I can't help you with this. I can help you with A, B, and C, but I [00:20:00] can't help you with that over there.

Over here, I'm looking at it as, Hey, I have an opportunity to show something or help the user unlock some value. And I know that they are going to be discriminatory, very highly discriminatory if they're going to decide like, Hey, what is good to use and what is not good to use? So the, the investments that you're making, depending on the use cases, it goes a long way irrespective you are thinking of it as, I always thought about it as free shipping.

And then once you have shipped, what are you doing in the production environment context? And third, what are you doing post? And I think across the spectrum, what you do post is actually more key, more than anything else. And I think that the reason I say that is it's also something that is not natural for our engineers to do.

Our engineers care a lot about shipping. They care a lot about testing [00:21:00] beforehand, and then they put it in front of the users. Once they put it in front of the users, they're all used to looking at, is my system working well? Do I have errors? Do I not have errors? That's what they are used to, whereas now you are going into a different world where the range of inputs and range of outputs are not totally in your control.

It's as the world changes, the inputs can change. If the user asks for something different today, then the inputs have changed and the outputs will change as the models change, as the input into the model change. And the model itself is something that you didn't build in traditional levels. You built the model so you know what's coming out of it.

I couldn't agree 

Hugo: more and I actually feel like changes in terminology could e even help. Like in my consulting work, I the way traditional software engineers think about maintenance and the word maintenance, and I do try to help people understand that. Maintenance is actually continually building the product.

So perhaps using the term building. Similarly, the word test, a lot of our traditional software engineers want their tests to pass a hundred percent of [00:22:00] the time. And of course in our line of work, if your tests are PA passing a hundred percent of the time, maybe you should be writing a few more tests as well.

As 

Vishnu: something is broken. Something is definitely broken. 

Hugo: Exactly right. Another super important moving part here is cost structure, right? So for example. To your point earlier, a lot of the time we can build and ship and provide a huge amount of value without training in the way we, we used to. So suddenly the economics changes.

So I'm wondering what you think this new cost structure, what the new cost structure is, we're building and maintaining in some ways is dramatically cheaper, what this means for how teams operate? 

Vishnu: Yeah, I think it. Any of these products are now starting to match what you would look at it from a, like a lot of your cost is going towards tokens.

A lot of your cost is going towards upfront. How do you do validations? The couple of costs that people don't actually, I think about is the adoption cost and the, the reason I say that is our people. [00:23:00] Adopting the product the way or, and using the product that you thought they were gonna use the product, that that is actually something that you need to be much more sensitive to.

And some of that could be investments in user research interviews or user studies. Seeing, watching the users use the product and understanding like where they're getting stuck and.

You were in the past, you were very clear whether it was with products that involve traditional ml or did not involve traditional ml. You had high level of clarity in terms of what is the output that you're putting in front of the user. You don't have that charity anymore, so you need to be able to get that clarity.

So the adoption cost is definitely something that teams, which are using generative AI should be investing in. And then the other point that I wanted to bring in here is also the cost of itration. You just need to like put it in. We already talked a little bit about it, but you, [00:24:00] machine learning engineers and data scientists are, they just live and die by experiments and iterations, whereas normal product engineers and product managers, they don't do that.

They, for them it's, yeah. I want version one. Version 1.1. Version 1.2. Great. But did version one work or version 1.1 work, and if it didn't work, did you learn from it? And then what did you do? That has to be just like. You can't, you can't offer not to have that. 

Hugo: Absolutely. Yeah. It's calculus, it's the limit, right?

It's con it's really continuous in, in so many ways. And I, I one other moving, but which I think is implicit in what you're saying is because there's, the cost has been lowered so much in building prototypes. You've gotta be careful not to get too excited and try too many things at once because you incur huge costs by doing that.

And then nothing works. 

Vishnu: I would much rather, so my take on that is, uh. Non value additive pieces very early. Then you don't have, you can ship 20, but give me a promise that you wanna kill [00:25:00] half of them by the end of the week. 

Hugo: Fantastic. I love that. I'm interested, given your extensive experience and in particular perhaps drawing your drawing on your time at Credit Karma, what practices from mature ML organizations still matter in the gen AI era and which ones are breaking down?

Vishnu: So. This is gonna be a very boring answer unfortunately, but in my head, the investments that you make in metrics, investments that you make in evaluation data sets, the, you can't get away from that. You can't get away from making investments in your data. And what is the data that you use as part of the context?

You need to be, you need to be investing in observability there. If some of the data is changing, you need to be aware. Because if once the data changes, guess what? Your product has changed. So do you want to know if your product has changed? Then please observe your data. The other one that, where I will say is I think it's [00:26:00] still very early and where you still have a lot of value in using traditional ML models, understanding when you need to make a choice between using generative AI versus traditional ML models.

Something. I think it's, it's something that you still want to do. So I wouldn't throw away if I, if I have an investment in traditional ml, if I have models that work great, they do a great job in predicting some part of the funnel. Keep using them. You don't need to throw them away and then like completely replace that With Gen ai, I don't think that makes sense because you have much more control over.

You can build that con, you can have higher confidence, you can take bigger swings with. Generative AI features if you're, if you're placing your bet on top of some foundational models that really helped you, 

Duncan: we'd love to ask about the intuit and integration, and that seems like the type of integration that would be extraordinarily complicated with highly sensitive data.

What would be curious if you could talk a little bit [00:27:00] the lessons learned on your side about what it takes to bring those types of data and ML systems together effectively. 

Vishnu: Productively. Yeah, I think very, I would say interesting and fun times of my career and not because this was during the COVID years, the first couple of years of COVID years 2020 and 2021.

But I think the biggest thing was just like honoring the user commitments that each company had made. At that point of time, credit cost still operates as an independent company, so each company had made different commitments, uh, to their users. To the user basis, basis. It could be, I had, I was a, I used TurboTax a lot.

I used Min a lot, and then obviously I was a credit for my user. But each of those products had made different commitments to me, and you put yourself in the shoes of the users that you serving. You are serving the users, you are there to serve the users. So this is very clear across the [00:28:00] leadership in terms of how we needed to honor the commitments that each product had made.

When they had signed those terms and conditions at that point of time. So that was something that we wanted to literally put it down in code and test and make sure that we are able to satisfy those commitments before you do anything further. And if once we did that, then it is very clear, right? Hey, these are users who have given us permissions to be able to use the data across multiple products.

These are users, they did not give permission. So they didn't know that they had to give permission. So we didn't, we don't have a record of them giving permissions in those situations. You are just like taking the call on behalf of the user. As a user, what would do you want to do? Right? What, how do you want your product?

The companies which are giving you this sensitive products, right? Credit Karma Institute, all the products, they all run on trust, user trust. So how do you wanna operate that? So I think that was something that we built in [00:29:00] very early in into the pipelines. The, I think a lot of fun conversations were, how do I know Vishnu from TurboTax who uses this old Yahoo account and Vishnu from Credit Karma, who's using this modern Gmail account?

Are they the same? Are they different? How do I figure out like whether they are the same and what do I use, what data points do I use to match and check whether they're the same? And if I, if I don't know. Can I ask Vishnu, how do I build the product? Can I do it in the product, right? Can I do in the product?

Can I ask Vishnu, Hey, are you this user? Are you that user? Then am I then, so those are all conversations and design ideas and all of that you have to go through, right? And secure the security policies. Like the companies are like very aligned on how we operated, how we had our security policies. But there are like so many implementation details, right?

Like when you start looking at the implementation details, then how do you pick which one that you want to go to? Because think about it, like [00:30:00] Credit Karma had more than a hundred million users, which meant that while it was being acquired. It's like a much, it's a very large user base. You can't just adopt one policy over here, so you had to go through a lot of those conversations to get to a good place.

From a user perspective and how we wanted to operate both companies, 

Duncan: safety and responsibility is even bigger problem today with generative ai, with models that can write Python code and maybe browse the internet, go post on Reddit. So how do you think about using generative AI safely with personal and financial data?

Vishnu: Yeah, I think the, as much as possible, or frankly all the time, you try to containerize the usage in as best of fashion as you can, depending on the use case. Whether you need to run, I mean, you, you have to trust the cloud providers. I trust the cloud providers too in, in terms of what they offer from a data privacy perspective.

There are people who are still, like [00:31:00] I was talking to another friend who was telling me about their chief product officer not wanting to do anything with the cloud providers as far as model usage is concerned, but I think, I don't think that takes you anywhere. You have to trust the cloud providers and then you have to, container is the usage and what you do on for like development purposes or evaluation purposes, or experimentation purposes is very different from what you do.

With your user data, with your, with your partner's data and all of that, you just have to be very clear, and it's not easy for engineers. You need to make that very clear. You need to set up that policy. You need to make sure that you are being really hard, you need to be a hard ass on this piece, right?

You really need to be hard and make sure that like, sorry, you can't use this. You're not allowed to do this. If you have that clarity, then you can be, allow your engineers to do their exploration without a lot of, with a lot of freedom. They just don't get access to the data. And then once you go into production, then you lock it down.

You lock it down, then you get [00:32:00] access to the data. And some of it is also, I think, making the right investments and synthetic data. And making sure that you have the right amount of synthetic data, so you are one of, you're building is actually gonna work over there. So the uh, and frankly for the few use cases that I, we have tried out and like synthetic data has been great for us and it, it needs to do the job of helping you evaluate what you built is gonna be working, will it work, will it be useful?

So that's work for us. 

Hugo: I love that so much and I'm so excited that you mention syn synthetic data because something that I, I'm really interested in is. Generating synthetic data once a product is, is live for testing, evaluating, and of course you need to monitor drift of synthetic data from a actual user data to make sure it's representative, but to generate synthetic data before you launch something just to check out how, how your product actually works.

So I'm wondering for people who haven't played around, we generating synthetic data, synthetic user data to feed in into their products, what advice would you give them when doing so? 

Vishnu: [00:33:00] I like, I'll be honest, right? Like pretty much all the AI products that I have built so far in our startups career, we've started with synthetic data.

Without synthetic data, we couldn't even start working on it. The first step was like, hey, understanding the shape of the data. The benefit that I have is like I'm working with use cases, which, for which the model has a lot of awareness. The model knows that credit scores are between X and Y. The model knows that nobody's gonna spend a billion dollars on a credit card and things like that, right?

Which means that it's going to gimme useful synthetic data that I can use. But it completely depends on the kind of use cases that you're going for. So the, you need to first get like the structure of the data first, and then if there are any documents that you can. To be able to explain the structure of the data.

I explain the type of data. I explain how the data gets used in different use cases. It doesn't matter which use cases. Some use cases, right? It could be a PRD that someone wrote two years back about how they're using that product [00:34:00] and some user facing feature. All of that is helpful to generate synthetic data.

And the model context center has grown so much that you are able to then put a lot of this context in, and then it uses that context and you're able to then get really strong synthetic data outta it. And you don't need a lot to start with. You just need a hundred thousand depending on the use cases that you're going for, depending on the amount of edge cases that you care about.

And that gets you started. And then once it gets you started, then you can figure out a path to. Real data. Understand if there is a different set of rules or edge cases over there that you wanna use to augment your synthetic data. 

Hugo: Fantastic, and I'm so glad you mentioned A-A-A-A-P-I-D. The number of teams who are building wonderful things, don't get me wrong, who jump into building prototypes without necessarily having PS, is still startling.

And even writing down some basic user personas and scenarios can very much help with the synthetic data generation process. [00:35:00] We've been talking about a lot of different things. One of them has been. Building products using generative ai. Another one that's been implicit and we've talked around is using generative AI to help you build products such as AI assisted coding.

And I am wondering, very speculative of course, 'cause it's because it's early days, how you think about organizing teams and organizations around using AI assisted coding, even with the amount of code that's generated, like things are changing, right? 

Vishnu: Yeah, I think this is something that I'm trying to think through carefully, right?

Very early in our startup and we have a very small team, but once we start scaling and once we start hiring more people, then how do we organize around it? Who do we hire and how do we organize around it? These are things that I'm constantly thinking about. One of, one of my thought experiments, uh, is like, I think one of, uh, one of our chief product officers or our chief product officer at Credit Karma, some seven, eight years [00:36:00] back, he had, he and our CTO had introduced this concept of a triad, like where you have a product leader, you have an engineering leader in, have a design leader working together on any new product direction or new product feature.

So that, that's always stuck in my head. So I was thinking like, Hey, what is the new triad in this new world? And, and then the new triad that I kind of, I'm like noodling around and trying to say, Hey, can I operate in this way? Can I ask my small team to operate in this way? The new triad I have is there is an outcome owner, and the outcome owner's job is to define what does, what is the outcome that you wanna achieve.

They are also the person who is. Identifying what are the measurements, how are they gonna measure? And they also are the people person who's gonna judge in our world. We can think of them as the owner of the ML set or the benchmark that you care about, or particular use case or a particular product. And [00:37:00] then there's a second person who's the experimenter.

You can think of them traditionally as a product manager, or it could be a designer, or it could be. Someone who's constantly hypothesizing all the time, what do I want to try today? What do I wanna try today? And then you are going after some short term outcomes or goals. And then you're also very clear like what you gonna throw away.

And you're also the, the person who focuses on. Leading indicators a lot more than the first person. First person cares a lot about the lagging indicators because you measure outcomes by lagging indicators. Second person, the experiment is constantly looking at what are my leading indicators? And then, and then they are also the person who's thinking about, Hey, what do I need to learn?

Explorers is how much, what do I do from an explorer perspective? What do I do from an exploit perspective? You gotta have some experiments where you're just gonna explore some experiments where, where you're gonna say, I've learned enough. I wanna exploit, [00:38:00] I wanna measure how much I I can exploit here, which will help me move towards my outcome that the outcome owner is gonna help me deal with.

And then the third person in the triad is actually the person who's responsible for execution, who's building out the agents stick, understanding what context goes into the agent. They probably are also doing ux, they're doing, they understand the models very well, they understand the technology very well.

What this means is, I mean like this is natural, right? As a startup, I want an engineer who can do Terraform to get my infrastructure up and running. Who can set up models, who can run, who can run cloud stuff, who can build my ux. You need to have exposure to all of this. I've built exposure to Terraform.

I've never set Terraform script in like the last 10 years, and I don't think Terraform existed and maybe it existed 10 years back. I didn't do that, and now I like the last four months, I'm building out Terraform scripts and I'm able to get it to work right. I can do it. You can do it. So I think the idea is like you are, you have an execution person who's a generalist engineer, who knows how to do data well, [00:39:00] who know how to, who knows how to do react well, who knows how to do Terraform well.

And they are partnering with the experimenter day and day out. Maybe they're in the same room, they just go keep shipping. I think this is the new tri. 

Hugo: I love it. One direction. I thought maybe you were gonna take that, which I'd like your thoughts on is. Having an AI agent as a team member as well, and o one way I'm, and of course this is future music in, in some ways, but we are near perhaps a bit a bit science fiction, but not so much something I've seen more recently as having Cursor or Devon.

In in Slack or Discord, and you can say, Hey, check out this PR at Devon or Cursor, or, we need this fix in this documentation. Now I think multiplayer collaboration with LLMs is still still nascent, but I am interested in your thoughts about the future of having AI assistance and agents as team members that Vishnu Duncan and Hugo can chat with.

Vishnu: Yeah. I'm just assuming away the [00:40:00] agents and the reason I'm agents is like each of these. Each of these, each member of this triad is gonna use AI agents and depending on how they wanna pick what agents that they wanna use, right? If I'm gonna build an internal facing use case, I don't need a security agent, right?

I'm just going to use what I have. And like I, you decide and choose if I don't need to build a traditional ML model as part of my, uh, as part of my infrastructure, if I need, then I talk to Duncan. So the, I think, see the reason you go, like, I went for this, is that. Team interactions are changing drastically there.

There was a point in time where a lot of my team interactions were, say, through Jira or Linear or Gith. But now I see that the team interactions themselves are changing drastically. So we are all focused on how each individual is going to use AI agents to improve their productivity. But finally, you still have humans in control.

You have your human triads and control who are still pushing the envelope in terms of how [00:41:00] they want to use ai. They need to decide how they, how they want to use ai, and where they want to use, and what AI they're gonna use. So that's really one of the reasons I focused a lot more on the new human team.

And then they're then in charge of discussing, Hey, what are the agents that I'm gonna pull from my repository and start using here? Or do I need to build a new one? 

Hugo: That makes a lot of sense and I'm wondering. How the different roles and skills within teams evolve as as we go forward. So we've talked about the organi organizational structure, but I suppose for practitioners and and leaders out there, what type of skills should they really be focusing on in, in terms of their own career path and professional evolution?

Vishnu: Yeah, I think me, I mean, I'm, again, it's, it'll be like a little bit of a boring answer, but the boring answer is pick up whatever is I just sent to you. Right, like I mentioned, right? Uh, I had to go pick up Terraform. I had to go pick up some react [00:42:00] pick up. That's the easy thing to do. Go and pick up. Use the model to do something in, in an adjacent technology, uh, without impacting your customer or impacting your company.

Pick up skills in your adjacent area. Don't just rely on your core area skills. We have to start picking up your skills in your adjacent area. Skills in the adjacent area that allows you to be someone who can question the model, who can validate whether the model response is working well or not. And it's, I, I think that's the easiest way to think about it.

Take the, take that action today. What is adjacent? Pick up some pick up a skill in the digest area. 

Hugo: Fantastic. Vishnu, thank you for such a wonderful conversation and bringing your decades of experience to the table, and super excited to see what you're building now and building next. Thank you, Vishnu.

Thank you. 

Vishnu: Thanks a lot, Hugo and Duncan. 

Hugo: Thanks so much for listening to High Signal, brought to you by Delphina. If you enjoyed this episode, don't forget to sign up for our newsletter. [00:43:00] Follow us on YouTube. And share the podcast with your friends and colleagues like and subscribe on YouTube and give us five stars and a review on iTunes and Spotify.

This will help us bring you more of the conversations you love. All the links are in the show notes. We'll catch you next time.