The following is a rough transcript which has not been revised by Vanishing Gradients or the guest. Please check with us before using any quotations from this transcript. Thank you.
===

thomas: [00:00:00] So the question there was can we instead ask synthetic consumers, namely AI systems and get the same answers and turns out we can. And that's fascinating research and I'm happy to dig more into that. But this is really just another way, right? Allow businesses make more informed decisions. So it includes sophisticated simulations, you could say, right?

And these simulations could come from puzzle basin models that model the data generative process and can generate new data. They can make predictions, they can provide insights, they can provide counterfactuals and what if scenarios, but also includes more realistic simulations where we have an LLM act as a human and respond as a human.

hugo: That was Thomas Viki, founder of PMC Labs and co-author of PMC, talking about how his team worked with Colgate, using AI to simulate customer reactions to new toothpaste ideas. Including mango flavored or even glow in the dark toothpaste, and how [00:01:00] closely these synthetic surveys matched what real consumers said even across different demographics.

In this episode, Thomas and I get into how companies can use generative ai, not just for chatbots, but to test product ideas, guide marketing spend, and actually support business decisions. We also talk about where these systems break, what it takes to make them reliable, and how Bayesian modeling fits into the picture.

This was originally recorded as a live stream, and I'll link to the full video in the show notes. I'm Hugo b Anderson. And welcome to Vanishing in Gradients.[00:02:00] 

Hey there, Thomas, and welcome to the show. Thanks so much for having me. Such a pleasure to be here with you and take, you know, one of our many conversations that I've always loved public. Welcome to everyone who's watching the live stream. Great to have you all here on YouTube. As I've written in the description and in the chat, our q and a will happen on Discord, and I've put that link in the description and in the chat.

So please join us there and ask questions, and we do it on discord so that we can have threads and we can respond later and we can continue the, the conversation. So I'm really excited to be here today with Thomas to do a live stream of the Vanish in Gradients podcast, which originally was a data science podcast, then it became a machine learning podcast.

Now we've rebranded as AI, of course, but it's data and, and data powered software and decision making [00:03:00] all the way down. Thomas is the author of imc, the Leading Platform's First Statistical Data Science. He has a PhD in computational cognitive neuroscience from Brown. He, former VP of data Science and head of research at, at Quantopian where, um, he built a team of data scientists to build a hedge fund from a pool of 300 K crowd researchers.

And I, I know Thomas through the PI data ecosystem as well. And Thomas has spoken at open data science, a lot of different PI data conferences at the Strata Conference, and I've known, uh, Thomas on and off for the past decade. So it's so much fun to be here today with you, Thomas. I'm wondering if there's anything I I, I missed out in, in my introduction.

thomas: I think that was perfect and uh, yeah. I'm also very excited to be here. I have very fond memories of when we just like, got on a random call and we're like, oh, like, well you right now. And you were like, I'm in New York. I'm like, I'm also New York. And then we of course had to meet up and had delicious tacos and chat about like.

AI and Bayes in the world, and it's really cool to now be able to also share these thoughts more generally. 

hugo: Exactly. And [00:04:00] it was actually when we were eating those delicious tacos after seeing the Mingus dynasty that, um, you mentioned something you'd worked on which inspired this, this podcast, and it was when you worked with Colgate to generate synthetic data to, uh, simulate surveys with potential users to try new ideas.

And you actually found that there was over, uh, in, in the end 90% a, a agreement between what you were doing and what user surveys. So that's something we'll, we'll jump into, but maybe I. We've, we've framed this today as building generative AI systems that make business decisions. And I'm wondering, just to set the scene, if you could help me understand what's the definition of a gen AI system that makes business decisions, and how is it different from kind of the typical LLM applications and demos we see everywhere today?

thomas: Yeah, so if you allow, maybe I like, um, give a bit more context to then provide a more solid answer to that big question. So throughout my career, it always was the motivation that this amazing tool that I discovered while I was in grad school called phase [00:05:00] modeling would be made more widely available because I definitely am 100% convinced that the value in the approach is immense, but it's still not utilized to its fullest extent.

And then of course you ask, well, why is that the case? And well, the answer was, and today still is, is it's just very difficult to use. So. Then 10 years ago when I was in grad school, I joined the PMC developers to like build PC three at that time. And that already made it like so much simpler to like build large scale base and models.

And particularly they're really powerful for cases where you need to make decisions and where you needed some type of governance, right? Where you really wanna have transparency into how the model arrived from noisy data all the way to like the final answer and giving you a recommendation of what it is you should be doing, right?

And that I think is the name of the game, right? And like making informed decisions. So it is the [00:06:00] perfect tool for it. We just need to make it more palpable. And then for one answer that I found was to actually, uh, start a. Consulting company called IMC Labs, which offers, uh, and works with enterprises to build custom models for them for particular use cases.

A lot of CPG you already mentioned Colgate. Also a lot of, uh, these marketing attribution e mix models, MM which probably will come back to and marketing analytics and research in general. And that I think is like a good step in the right direction, right? So we have like the PhD experts, huge team effort, right?

Like IMC is q an effort from like so many people by now I'm like one of the lower ranking contributors and um, also just at PMC labs, right? Like the people building these models are the best I've ever worked with. And through that motivation of trying to make these tools more available, AI emerged as like, I think [00:07:00] a great solution.

So naturally as an organization, and one way I also like to think about PS Labs actually is as an. Industry research lab where the brightest minds that I know come together and solve some of the hardest analytics problems. And that can include gene ai. So this particular one that you're referring to in, uh, synthetic consumers is related to this where you wanna have a proxy for how consumers make decisions, right?

So there are basin models of that, but there will always stay a little bit on the real data level where you have certain surveys maybe that you wanna analyze, right? And like say, well, with a certain probability, maybe this one product is more like than the other. And that's great. And that is the decision making process based on survey data that you collected from million months.

So the question there was like, okay, well can we instead ask synthetic consumers, namely AI systems and get the same answers and [00:08:00] turns out we can. And that's like fascinating research and I'm happy to dig more into that. But this is really just another way, right, to like allow businesses make more informed decisions.

So it includes sophisticated simulations, you could say, right? And these simulations could come from causal basin models that model the data generative process and can generate new data. They can make predictions, they can provide insights, they can provide counterfactuals and what if scenarios, but also includes more realistic simulations where we have an LLM act as a human and respond as a human.

So that is what I would say is the key difference to these other systems where you maybe just have some very superficial analysis or maybe just a text to SQL kind of thing where really, I mean, all you're doing is like retrieving data in the right way. And maybe doing a summary here, we're really talking about sophisticated analytical systems that can take the insight and the domain knowledge that you have, right?

And so they instill it [00:09:00] into that causal model. Like most people already have a causal graph in their head of how they think. That problem is structured and allows you to take that, pour it into statistical model, estimate that on data, and then take that all the way to like the decision making process.

Like which of those products should I bring to market? What are the risks and what do, for example, my synthetic consumers say is like something that they would rank and then you also improve that loop. 

hugo: I love it. And something I'm hearing there, which you and I have discussed before of course, is helping make decisions in a very practical way.

Decisions that work and not necessarily getting a model that's the most attractive or even that is the most correct. It's something that's actually useful. And this is something that you mentioned in our, our Discord when we did our workshop with your colleagues, Alan Downey and Chris Fons recently, which I've just linked to.

And speaking of you working with, with amazing people, Alan as you, you know, and Chris are two of my favorite people in the space. So it was wonderful to do that with them. I would love to jump into [00:10:00] this Colgate case study 'cause I found it. Quite, quite surprising. The premise that you can have new toothpaste ideas, for example, mango toothpaste and develop an LLM which you survey with, with respect to it and somehow actually get that to align with human survey.

So you can scale survey techniques in a lot of ways. So stepping back a bit, I'm just wondering how did this partnership with you and with Pie Scene Colgate Arise and how did the idea to simulate consumer reactions come about and and what problems was Colgate trying to solve? 

thomas: Yeah, so we've done work with Colgate before on canonization and group fatality models that were like facing in nature.

So we had a working relationship and actually a great relationship with, uh, CLE Papas, who is the VP of Data science and Colgate, who like gave us that project. And with him, basically we just started chatting and he was the one that had found like some, like one of the very first papers where people started doing like synthetic consumers.

And when he [00:11:00] mentioned that I was like. Okay. I mean, that's supposed to work, right? So like isn't that sounds too good to be true, right? That you just, um, instead of running like costly, noisy, but service on humans, right? That like the LLM could respond in a similar way on like on unseen data, right? I mean, if you tweak it, maybe you can reproduce it, but to really uh, extend to new product ideas, right?

Which is what they were looking for. I was skeptical and then I started thinking about it more and I was like, okay, well maybe it could work because what are lms, right? They're like trained on the entire human text data and in a way, no us better than we know ourselves, right? Because they have really consumed like everything we've ever produced.

And not only that, but then are trained in a way where they would predict what humans are gonna say next. Right? That's like the next token prediction. And as such, I mean, it makes sense that that is what they're trained for, right? There is a [00:12:00] prompt, and that can be just a question about a fact, but it can also be a product idea.

What about mango flavored toothpaste? And then it is explicitly trained to predict in the way that, uh, human would. So then I was like, okay, that maybe there's a way, but it still sounds farfetched. And we did a lot of research on this, tried all kinds of things and found that it didn't work as well as would be satisfactory basically.

And Ben Maya actually, who is a, a researcher into labs, brilliant guy. And he was really the one who had the key insight that, well, the way that these surveys are structured is in like responses. So the, you ask like, what do you think of mango flavor toothpaste scale from one to five, how likely are you to buy it with five being the highest?

And then naively, right? That's what we would present the LMS with. It's like, okay, how lucky are this? Say one to five, and then you can use the library to like [00:13:00] constrain the output to it. But he thought, well, uh, that's really not how elements work, right? You just like constrain them. So the thinking process is embedded in the, uh, answer process.

So the idea was to then say, okay, uh, let's have them just provide reasoning and actually like, just talk about mango flavor toothpaste and how likely they're to do that. And that was really a key moment because once they started to think about it, you could really see them reasoning through it and like, oh, okay, well mangoes are like a hit flavors, so, and I want to try the next coolest thing.

And one critical part of this also is that, uh, we had a lot of demographic information for this too. So the prompt included, you are a high school student living in Boston Middle Income, and then the idea was that it would. Uh, respond in a way as that student would. So the thinking process included that, of course, then there's still the question, well really at the end of the day, we do need to map this back to the like of response to one to five.

So he developed a new method [00:14:00] that takes that through, uh, an embedding vector. And then in that embedding space you can link it to those responses. I can go into more detail, but that's the gist of the idea. And then you get like uncannily good results. I was like really shocked when I saw how well it was able to really reproduce and this data was set, was like pretty big.

So we had hundreds of surveys for hundreds of products and we're really able to correlate the average purchase intent of the LMS versus the humans. And found that like of the explainable variants, basically if you were to repeat that experiment, uh, they got to 90% of reproducibility, which I thought was like amazing.

That's fascinating. 

hugo: And that was across different demographics as well. 

thomas: So there we pooled across all the different demographics and I was like, okay, that's cool and like very validating. Next we looked at the distributions, like could they respond in a similar way? If you just like look at the [00:15:00] histogram of 1, 2, 3, 4, 5, how well do they reproduce that?

And I should also say we tested various LLMs, not all of them did a good job. So the ones that did the best job were for DPT four oh and Gemini 2.5 Pro. Other models were further off, but so they were able to ate these distributions. But that I could still like sort of imagine that maybe, I don't know, there's some over fitting.

So, and also I should say these models were not trained in any way, right? So these are like off the shelf L and msms. Just with that prompt, what do you think? And fine tuning. Yeah, 

hugo: there are so many details. Firstly, you know it wasn't overfitting 'cause I presume you had a holdout set, right? That at the end.

And you mentioned you use embeddings cleverly. You didn't fine tune or use any data per se, but I, I'm just wondering what levers you used throughout the process. So I presume you iterated on prompts, perhaps iterated on embeddings. Were there any other things that saw, that allowed you to see lift in the model before finally checking performance on, on, on hold outset?

thomas: Yeah, so those two things you mentioned were the main ones. [00:16:00] So it didn't require like a lot of magic, like once the method was established and there was some, some tweaking with, with the prompts actually, like there's a couple of ways you could present it. For example, where you have either just the. Image.

Right? And the copy is text or both of them. So what we found is that if you just have the image of like, the entire thing with the text, that works best. So there's like small nuggets like this, but really it is that straightforward. 

hugo: And can I also ask for, for context as, as you know, but people may not, I, I teach a course on building LL empowered software, links in the show notes in the description, that that was smooth for those, those interested.

But we recently wrapped up a cohort with a hundred students from all types of industries, right. And not, they're not just students. I mean they're, you know, CTOs and leaders of ML teams and, and that type of stuff. But we surveyed them with respect to, in one of the classes, chatted about how many prompts they'll try when iterating on products in the first week or something.

Some people said 10, some said 50, some said 500. We got someone who, like, they built a product where they [00:17:00] tried 2000 or something along those lines. So I'm wondering like what your process, like how many prompts you try over what time before you see like 90% alignment between, uh, the synthetic query, the synthetic results, and actual surveys.

thomas: Yeah, so I think it was like between five and 10, so not interesting, like insane, um, tweaking. And they were like very straightforward, like, there's no wordsmithing going on. We're like, well if you like add a pause there, then like all of a sudden it works. It is pretty robust to the, the pro choice, but certainly different prompts we'll do better than others.

So that was interesting. But then what really blew me away was that the demographics were also recovered and that I really, uh, was surprised by. And there's like a couple of effects that are not trivial either. So one. Big driver of server responses to product purchase intent are the age. So very young people seem to have lower purchase [00:18:00] intents and older people also have lower purchase intent.

So it's in a very new shape. And the model just like reproduce that just without any sort of, uh, guiding it towards that, it really just fell out. And another big one, for example, is income. So if you have higher income, you are willing to pay, uh, more for expensive toothpaste. Uh, very sensible. And that effect was also produced.

And then there are other effects that don't make a difference, race or sex. And they also don't make a difference for the LLM. So it's this fascinating. Link between them that is still, feels like magic when talking about it. 

hugo: It's wild. And I do wanna move on in, in a second, but I'm so fascinated as to what there is in the training data of these element.

'cause I, I presume mango flavored toothpaste for people, teenagers, 20 year olds, 30 year olds and so on, isn't in the, the training set or all, all the things that, that you tried. But I suppose the propensity to [00:19:00] try new things, it's probably in the training data that people in their twenties will be more experimental with consumer products.

People in their thirties are bit less so people in their forties a bit less, so people in their fifties as if they're going to like, want to try macro fla on average. Right. So is this your thinking of how the LLMs have this ability? 

thomas: Yeah. I, I think that's a reasonable thing to assume, but, uh, one other cool thing about this dataset, which I didn't uh, mention, is that.

Not all, but many of the product ideas were also AI generated. So this is work that Colgate has done independently where they have something that would come up with new product ideas and some of them are quite wacky, like low in the dark toothpaste for example. And because they are wacky, they are like novel and I don't think anywhere on the internet someone has like really, uh, asked a high schooler what they think of like glow the dark toothpaste.

So just the novelty of the ideas and make that, that it like had a [00:20:00] direct link where like people were talking about bangle the toothpaste less likely. 

hugo: Yeah. I suppose my question to that would be, is the propensity to be interested in glowing the duck toothpaste pretty correlated with the propensity to be interested in mango fla?

Like is it just because I experiment more? 

thomas: Right. It's a interesting question. I hadn't considered it, but I also would say that that is unlikely because you can also test it with extreme cases where, for example, uh, one of them was like PCs, tasting toothpaste. Yeah. Which was just like, okay, will it, like, just something that's like obviously a terrible idea, will it like hate it?

And it does, right? So maybe then like, if it was really just about like trying wacky things and I'm an, uh, 25 aged LLM, so let me, so it, it also doesn't want that, 

hugo: man, you, you know, I was on the ride for mango flavored and then glow in the dark. I'm 43 and I was like, yeah, I'd try that, but you got me man.

You got me. You found something that I would not try. I'm so glad you mentioned foundation model generated ideas though, because I've got in my [00:21:00] notes, the next thing I wanted to talk about, uh, is what I, I refer to as closed loop design now for a word from our sponsor, which is well. Me, I teach a course called Building, LLM, powered software for data scientists and software engineers with my friend and colleague Stephan Raic, who works on agent force and AI agent infrastructure at Salesforce.

It's cohort based. We run it four times a year and it's designed for people who want to go beyond prototypes and actually ship AI powered systems. The links in the show notes.

So just thinking through how to build closed loop systems with generative AI generates and critiques product or add ideas. I'd love you to walk us through it, but with a sense to thinking how it mimics reality. Like how, how it actually helps us once again, make decisions. 'cause you could imagine doing this and it exists in the abstract and you're creating [00:22:00] flying unicorns or something, right?

What does it take to build these closed loops, loop systems where generative AI generates some critiques product or ad ideas, and how does it actually help businesses? 

thomas: Yeah. So, uh, I love that question, uh, because it really aligns with the way that I started to think about it. So once you have a system that generates new ideas and you have a system that can critique whether those ideas are any good and maybe even provide feedback, it's kind of, uh, direct to want to put them into a loop, right?

And then just like have it spew out better and better ideas, right? And the PCs flavored, uh, toothpaste will be thing of the history books, and that is one of those loops. And I think that concept of loops can be extended more. So actually just today I had this weird realization this day, but also like actually a lot of days.

So currently I'm doing a lot of thinking around this, right? And like communicating ideas and working on slides and strategies and what the realization was that I was all day just working [00:23:00] with ai, different AI systems. So they were GPD five, which I think is great so far, gamma, which is like an amazing slide.

So generation tool, uh, which makes me a lot more effective in just like trying things out. And then also, uh, lovable, which is also obviously everyone's very excited about. And I was just like going between those three and like talking with people and I was the human in the loop. And it occurred to me that there, these action loops just start to become more and more powerful, right?

Like before it would be me, an editor, just like coding web app, for example, right? Or whatever, or like a basin model. And then it's me in cursor with ai, auto complete maybe doing that. And that's cool. Now there's like less I have to do and like a very small loop that the, maybe it's not actionable, but it's AI loops, right?

So that the AI is doing. Then we go to agents and then they're doing entire workflows. What I talked about where I'm just like sometimes copy pasting things between P five and, uh, lovable [00:24:00] or uh, gamma. And then obviously, do you want to combine these things into one Korean system too? You have like one much more bigger powerful loop.

And that's how I think this will unfold is that we just will build more and more powerful AI loops that allow everyone to just like do be immensely productive. I mean, that's the key benefit for me. I can just like get so much more done and just like try all kinds of ideas on a whim. And even the idea generation, I'm supported by ai and that is the part that is I think very exciting.

It makes me a big fan of AI is that it just allows me to do a lot more with a single button press, right? And not just me, but everyone and these loops will just become more and more rich and powerful. And I think also as the next logical step include data science, right? We're like today, you maybe can do it in cursor, [00:25:00] but obviously.

Working with data and as data scientists, we know that like there's a full workflow. And probably today we wanna be very engaged in that workflow and like really focus on the data analysis and then the first data cleaning, right? And getting the data and then the modeling is the part that we're all very excited about, but is oftentimes not where most of the time spent.

But nonetheless, so starting to roll up more and more AI loops and automating more and more of that process just feels very natural. And that's in fact what we have been working on as like a first step with the, uh, so-called MMM agent. Which is that media mix model agent, what I was referring to do before, where you can use an AI assisted workflow that will take you through the full MMN analysis.

Right. What I just mentioned in terms of like the data cleaning and all of that. [00:26:00] And uh, it's a multi-agent system. So individual agents will be experts at data cleaning and that's all they do. And they have maybe access to like a rag based system where they can look up data and then pass that through the workflow chain to the data operation agent and the feature engineering agent, and then the basin model fitting agent, and then the make sure the model converts correctly and did everything.

And then the inside agent that like can then suggest strategies and help you with actual decision making. Right. Which brings back to what we were saying before. So then giving a lot more people the ability to do very advanced analysis. And being taken through that workflow. So that's, yeah, what I'm just incredibly excited about and, uh, the ability to make me and everyone else just a lot more productive.

hugo: Amazing. I've actually linked in Discord. I'm put in the show notes to, uh, anthropics blog posts from last December called Building Effective AI Agents, where they introduce a bunch of workflows and patterns for using [00:27:00] LLMs, augmented LMS along the agent spectrum. Once the, uh, evaluator optimize our pattern, which you, you, you, you made me think of, which is a, a loop where you generate things and you critique them and so on.

I'm really excited to start talking soon about the agent approaches you are doing for marketing and, and media spend. Before that, I just, with this closed loop design, I wanna go down two path, the types of problems we may see and may not one on how we get it to align with what humans. Doing and thinking, and then engineering and product challenges.

I'm gonna be slightly provocative, Thomas, and I'm gonna put it in the discordant in the show notes. It's from a Corey doctor o blog post called a Google's chat bot Panic from a couple of years ago. And it has to do with the challenges model collapse type things that may happen when we use AI and we feed AI responses into AI and we get synthetically generated responses time and time again.

Corey Ro writes, Sadowski [00:28:00] has a great term to describe this problem. Habsburg ai, just as Royal inbreeding produced a generation of supposed supermen who are incapable of reproducing themselves. Sotu will feeding a new model on the exhaust stream of the last one, produce an ever worsening gyre of tightly spiraling nonsense that eventually disappears up its own.

And I won't say the word that he writes, then I won't say it because the YouTube algorithm will penalize me for saying it, not because I don't wanna say it, but I am wondering, I mean, that's a very extreme version of things, but I, I'm wondering what, what you saw play out in these particular cases with respect to making sure that synthetic upon synthetic upon synthetic stays aligned with, with humans.

thomas: Yeah. I, I didn't know that quote, and I think it's quite negative, so, yes, because it, it ignores a lot of the realities. So when I work with these AI systems, and I'm sure that's the case for you, there is. [00:29:00] The the human in the loop, right? So the question is just how confident are you or how much do you trust the system, and when do you want it to come back to you for like that sanity check?

And that started to be very small, where people were like, oh, well it should be four just code completion in your editor. That's like a little bit smarter than just 30 default one. And you still have a lot of control. And if it does something wrong, which it did at that time, it's not a big deal. But slowly, as we all become more comfortable with this and the systems become more powerful, then these loops become larger and larger and they already have, and they will continue to do so.

And also, I mean, intelligence is in, is intelligence. So whether that's artificial or human, I don't think it's, I mean there are definitely differences, but I don't think it's worth it to make categorical difference. We're like, well, of course AI Fed AI will just create like. Idiotic stuff. I [00:30:00] mean, then you could say like, well, uh, humans talking to each other will just create dumb stuff.

And I mean, that does happen, right? We have adults and sex and crazy stuff. So there are error modes, but there are also checks in place in society. And also you can have checks in place with AI where you have also agents that are critiquing other agents and have this interplay and then have control instances that are intelligence.

They could be human, they could be other ais. So that's how my experiences and, and how I think this will pay out. 

hugo: That makes a lot of sense. And to be clear, Dr. On Sadowski, were talking about cases where you have AI feeding into ai, as has happened without enough hu human checks. Something I've seen play out is when building, for example, LLMs as judges, if you use.

An LLM to create your prompts for an LLM as judge, a lot of the time it will assert, insert emojis into it, and that's, that's a bad, that's a bad, you can tell if, if someone's prompting and they've got emojis in their [00:31:00] prompt, you can almost guarantee they've, they've used an AI to, to generate that. As it turns out, when you try to align your LLM as a judge with human alignment, emojis seem to underperform, right?

So removing these types of smells can be incredibly useful. So I think the point is making sure you have a human in the loop who's doing that type of judgment, or if you finally can build an genetic system that will help you there, but you need to tell what you're looking for. The other thing, as I mentioned, I'm interested in, in doing these closed loop designs where you have gen AI generating and critiquing product or add ideas, what were the biggest engineering or or product challenges you, you faced in making the loop reliable?

thomas: A lot of trial and error. So in building, for example, the MMM agents, like I mentioned, the way that we built that was to have pretty tight guardrails and very. Well specified workflow. And actually the very first version was not like that. So the very first version I just hacked together as a GPT code interpreter kind of thing, that was able to kick off model runs to [00:32:00] a server that I was running and that was working okay, but it was unreliable.

So it would go off the correct path at various points, at various random points every time you would run it. So the ability to then constrain the system at each step of the way, which is a multi-agent system, really helped a lot with that. So before, I would just end up with like a huge prompt, but like, don't do this.

Definitely don't do this. I told you not to do this. And it would just like get confused with like all these different prompts, uh, with one, this one big prompt. And then being able to split it up is, I think the best, one of the best ways to really. Control and constrain the system. And then once you have something that's working very similar to the way you work, uh, you build a basin model where you start just very, very simple and very constrained and then you slowly layer in more complexity.

You start to combine workflows, you give a little bit more flexibility in the path chosen for the analysis. And that is [00:33:00] research and it's hard work, but of course, and you need the right evals. But that is how we did it and it seems to be effective in that. 

hugo: Super. Cool. So I do wanna move on now to the type of agen systems you've been building and I've put a link to your wonderful blog post, the AI mm m agent and AI powered shortcut to Asian marketing mixed insights.

I'm wondering if you could just set the scene and tell us a bit a, about this agent and how it actually works and what it does. 

thomas: Yeah. So, and this is in pretty direct succession of what I was mentioning in the intro. About trying to make these things more available. So IMC was the first attempt and that maybe made Bain modeling from impossible to possible for like people that were like Bain experts and really knew what they were doing with statistics and had probably PhD in some technical field and that's great.

So now they can do something that they weren't able to do [00:34:00] before. Then with mms, the very first one we built, uh, was with uh, head of Fresh and Luki, who is now working at, at Pine Labs, uh, running the Gen AI incubator With him. We built a an mm MM that, uh, for us was like pretty straightforward, but actually like looking in the literature was something that was a advanced just because we had access to more powerful tools.

And then from there we got asked to build a second MMM and then the third MMM and the fourth there. More and more clients were demanding that from us. And even that has said that we were like aspiring to be an industry research lab. We don't want to keep doing the same thing over and over again. We wanna.

Package something once it's working and just like make it available for free. So that's what we did with, uh, a package called Pie C Marketing, which a lot of people have contributed to One Autos and will Dean on like the two top contributors and do amazing work. And that package allows you to do without like really building the model from scratch and like all the individual [00:35:00] random variables and, uh, know about saturation functions allows you to just like build MMS just with a single line of Python code.

And that's awesome. Right? So we made it a lot more simple, a lot more accessible, maybe now like data scientists that are not based experts and can build models from scratch can use it. And that's cool. And like the, the community has really taken up on it. We're getting a lot of positive feedback on that and how it's helping enterprises, but still, right, you need to know Python, you need to know, like model fitting and investigation.

So the next logical step is, well, how do we make it available for even less technical users? So maybe the data analyst, and that is what the MMM agent is trying to do. So making this whole workflow of which pie MC marketing is just like a small piece, right? Like I mentioned, once you get to the model fittings, like the, the good point before that, there's a ton of prep work required.

So the agent is able to do that and take the user through and ask like, okay, well [00:36:00] here, you upload the CSV data file, let me analyze it. This is what I think the different columns mean. Is that correct? Right. So there we have the human in the loop, and then I, I ask the analyst, I'm like, yeah, that's correct.

Uh, but actually this column is differently interpreted. Then it'll update and then. It will keep holding your hand through that workflow and just making you more efficient. And because it is particularly designed for that purpose, it will know all the domain expertise that you might require. And that's what it is.

And the motivation and the model is able to like run code and its own and communicate and produce plots and all of that. And this just in the history of Pines Labs, has been by far the most impactful thing we've done, which I think is pretty telling. So I mean, we love to open source, almost valuable ip.

We love to write blog posts and we get like really good feedback on them and there's like a dedicated community of people that really love that content. [00:37:00] But when we released the MMM agent, it was like nothing we've ever experienced before where in the first week, like 60 companies reached out to us and wanted to trial it.

So now we're running pilots and getting people onboarded and uh, they're trying it out and finding what works, what doesn't work. So we iterate out the kinks. But I think it really goes to show that there is a lot of demand for advanced analytics and it just was not accessible enough. So that's where we are today and I'm super excited to just like also now, once that is working, right, you become ambitious and want to add more features and the ability to maybe also act on the world and provide very specific recommendations.

And maybe it's not only the MMM, right, but maybe CLE, which has also supported customer lifetime value modeling is supported PC marketing, right? So media mix is for estimating how good the marketing channels are performing. One thing that is obviously very important for it, but often ignored is the creative.

Like [00:38:00] how good are the ads that you are actually running and. Well, don't we have just the tool for that with the synthetic consumers where, well, instead of running surveys, we could just like ask the synthetic consumers instead of, do you like this product? Do you like this ad? Then get read for that. So we can layer these systems together and create workflows that are very rich and really combine very different things.

And again, we will start small, but then, um, we'll be able to just like expand and really build decision making systems, decision making operating systems for, uh, companies that, uh, incredibly powerful and allow technical users to do insane things, but also non-technical users to do really cool stuff.

hugo: Awesome. I would love to know, I mean, because we have, you know, listeners and viewers from all walks of life doing all all types of things, what type of things is it currently finding value in doing and who would you like to reach out to you? What type of things would listeners be working on that you'd be really excited to get involved in?

thomas: Yeah, so I [00:39:00] would say anyone that has a. Decision making problem where governance, uh, understanding how you got to that solution and transparency is important. And that is a pretty broad answer, but I think it is correct. And where people are embedded in organizations where the data story is probably figured out to a pretty big extent.

So if you're still like really trying to collect all your data and like get it into a system like Databricks or you just haven't figured that story out, it's not really worthwhile to talk about decision intelligence. But when you have those things in place and you have data science teams that are doing really cool stuff, but maybe they're siloed in the company and then I think becomes really interesting to talk and see how these agents can elevate the business and be extremely helpful in really.[00:40:00] 

For business strategy. 

hugo: Super cool. I'm wondering whether you see these systems break or fail during development. How do you detect and then, and then fix these failure modes? 

thomas: Yeah, so one thing is to actually just close the loop for the ai. And I think this is also just something we are starting to see, where inevitably there will be errors made by the ai and then the old workflow I think was just like, oh, like it errored out.

And then maybe, I don't know if that happens in cursor when I used it a while ago, uh, I would just like either one who's like copy and pasting the output from the error to like the LLM and then they're like, oh, okay, that's the error. Lemme just like fix that And other tools, uh, and now cluster as well, right?

With agent mode is able to like run it, see the error and fix it and just like loop until it figured it out or gave up. Lovable also. Right. Started out with. I was just like, well, yeah, it will go off the rails, [00:41:00] but now it autocorrects and it's like surprisingly good at that. So the, it's like, uh, complex things you can do and will just self-correct.

So that is, I think, the most direct way. And then from there, of course you really analyze the system in terms of the, the prompts. And so this is not something we use, but I would imagine that like if you wanted to automate it more, a system like D Spy, you have an optimization that is like tweaking the prompts.

You have an eval system that says like, okay, well now out of say a hundred workflows that we know where we want to end up. Right. So you have a data set of Yeah, just like different challenges for this system and know what correctly solved looks like. And then you can place that into an optimization loop and just keep optimizing the system and the prompts to get a good outcome.

hugo: Fantastic. I'm wondering, we live in a world where, I mean, [00:42:00] Beijing, this type of generative modeling, and by that I really mean the real generative modeling, which is modeling the data generating process has been so powerful, yet underutilized for reasons you've, you've spoken to for a, a long time and gen AI has taken the world by storm.

So I'm just wondering where you see the line between what gen AI can do well autonomously and where it's still useful to use, like structured statistical approaches and how we merge these two things going forward. 

thomas: Yeah, this is a really bit of a question and one that I thought a lot about because Gene AI is so amazing at so many things.

However, what I think we haven't really figured out right, is how it works with numbers. So it's sort of almost ironical that I remember when we had these, uh, I mean any science fiction right? That you, uh, look at. It always has the AI or the robot be very unemotional [00:43:00] and have problem like articulating, right?

And like all these nuances of language totally alluded, but it's like just really good at calculating things. And it's so funny how the reality is like completely opposite to it, right? Where like one of the first things it could do was like write beautiful poetry and um, be just very, very well spoken and really interpret the nuances of what people are saying, but couldn't count how many Rs and strawberries.

So I found that interesting. And so they can't really handle numbers that well. And that's where the disconnect happens between the many business problems today, which are, well, we have the measurements of our marketing spend and how many sales we have or any other problem and. How do you use AI today to fix that?

I generally don't know, but that's where I think the synergy is so powerful between having really any data science tool. But I think there is, [00:44:00] modeling now emerges as like the perfect link in that chain of taking very complex data and having a system where you can intelligently design the solution, right, very flexibly and then get unparalleled insights that are the best of both worlds.

There are not only the intelligence that has the domain expertise and the intuitions, but also really the insights from the data. So it's bringing both of those together, the hardcore measured things and the intuitions of humans or and LLMs, and allowing us to just make decisions. Like we never were able to before or it would require, I'm Labs consultancy and some of the smartest people I know to like build that are like experts in PMC and have done this for 10 years to build these models.

And then hopefully just being able to speed that process up and give people access to these powerful technologies. [00:45:00] 

hugo: I love it. I'm interested in just speaking once again to like very, very practical things. If, if a company or organization wants to start building generative AI decision systems or generative AI powered decision systems, what's the first thing they should start thinking about?

And I mean, before touching any code, 

thomas: so they should give me a call. That's, that's the first thing. And then there is I think a rich tapestry of like existing things that are out there that people can, um, start looking at and from like. Starting with things like code interpreter, right? Which like already can do great things and Cursor now, right?

Has access to Jupyter Notebooks. So there are like already some cool things. Marmo is like an excellent tool and they are starting to lay in ai. So I think those are really great places to start and we are just seeing the very first of like things. That we can't even imagine [00:46:00] today, uh, that will come into, 

hugo: we have a great question in Discord from Larry Jones.

Hey Larry. Uh, Larry says, science does experiments, but reality wins over theories. Can we apply this technique to, of growing knowledge, to improving AI solutions? And actually, I do wanna step back a bit and quote you Thomas, from our most recent livestream workshop. And if you'll bear with me, I'm actually gonna bring up this quote, which I ha I have it very handy.

I meditate on it every morning. Um, I'm half joking you. What you wrote was, and it was around a conversation about how to choose priors and are we being too subjective in choosing priors? And you wrote, and I quote verbatim bay modeling where you embrace subjectivity, iteratively improving the model very closely with the data.

The focus is more on learning and getting a useful model rather than a correct model. So I'm actually interested. If we can bring together [00:47:00] getting a useful model rather than a correct model, how this relates to business, how this relates to generative AI and how this relates to the scientific method.

thomas: Mm-hmm. Good question. Yeah, so this really is part of a long thinking process, processing myself of what really is it that makes these methods so useful? And I really start to like the word useful rather than like best or correct or anything. Because that does not seem to lead to like very useful debates for like, oh, frequent test or base or machine.

I mean like, I think comparisons are generally not bad, but when it comes to just wanting to do something useful, right. Uh, other principles apply and. I actually would just to preface, draw a bit of a line between, even though it's not often done between based in statistics and BAS on modeling and why [00:48:00] they think that's useful is that when I say bas in statistics, I, for me that means more, okay.

We really wanna maybe estimate a thing in the real world where we wanna measure the speed of light, right? Something a physical constant and we have experimental data and this is what science is doing and IMC is used a lot in science and you can then build a model that gives you that answer. But you have to be very, very principled in how you build that model because just saying like, oh, basin statistics has uncertainty.

So anything you do and then you just like get these posteriors and some are magically, they are well calibrated and really track the uncertainty of your system. Which is wrong because they are influenced by the priors and your prior choice. So if you are choosing them to subjectively, they will not be correct objectively if they mismatch.

So then there are ways [00:49:00] of trying to get to priors that are objective and that's why it is useful tool for science and that is the most defensible position. Right? But in reality, that's not how we at Pine Labs built models. Um, so, and that has been, I guess it confusing to me because like, well what are we even doing?

And it's sort of difficult to describe, but it really emerges when you start working on it. So when we teach these methods, people will ask like, oh yeah, so I guess I can't look at my data and I have to specify this prior ahead of time, just like somehow. Downloading all the information I have about this problem into this prior and I never can touch it again.

Right? So that's not at all how we build models. We tweak priors all the time. We build models. We look at the posterior, we look at the data, we change the prior in a loop. It's part of the iterative loop. And that is a problem if you try to do like proper statistics and you want to estimate like we what effects.

But if you want something that's useful and gives you useful answers, then you actually [00:50:00] have an extremely powerful tool that allows you to constrain your model and tell it like, well actually I see those parameters don't make sense. And then you constrain the prior appropriately and you do lose a certain amount of trust in the uncertainty of the output system.

But nonetheless, you actually, it's still useful because it allows you to compare internally where you say like, well, in this direction the uncertainty larger in that other direction, it's smaller. And when you start to link different data sets with each other, like for example, maybe the prior you came up with, right, this is like the fifth experiment in a row and you have previous ones.

So you can place those as a prior or you can just like have one big model that estimates all of them and that those two will be the same thing if it's like a very well specified prior. So that's especially what I meant with that is these methods are immensely useful for solving very advanced problems.

And usually that does not include a very [00:51:00] rigid way of I degree of scientific accuracy, and that is what I would call base in modeling. 

hugo: Wonderful explanation. Thank you so much. We're gonna have to wrap up in a minute, sadly, so I'll have to have you back again soon. I'm wondering, are there any areas where you think people are overestimating what gen AI can do for decision making?

thomas: That's a really good question. I don't know, it's so hard to predict, right? I mean, there is immense amount of hype and those sky high expectations when they then sometimes meet reality. Like with the G five release, there's huge backlash where people are like, oh, it's all like complete nonsense and, and vaporware, which it obviously to me is not, but which of those are maybe not yet true, but will be true?

I, I don't know, one. Final curve ball that I just wanna bring up for fun is, the other thing we connected over was, uh, meditation. So if some of the listeners are maybe have heard about it, but [00:52:00] maybe it's interesting to them to hear that we both like have really strong dedicated meditation practices. And for me it has brought a lot of value personally, but also in my career.

And that is, um, something that I would just like also love to motivate people to look at that, uh, the favorite podcasters are also into that stuff. 

hugo: Absolutely. Meditation is a beautiful practice and a in, in and of itself, right. But the, the amount of value it, it brings to, to my life and my, my, my family and the people around me and the communities.

The people you and the people you, you bond with as well. Yeah. I am actually interested in your thoughts on meditation is can be very paired with, um. Philosophical inquiry, self-inquiry. One can meditate upon the progressive path on something external, like a flame, the breath, a mantra, A guru one can also meditate on consciousness itself and, and perform self in inquiry, inquiry in the nature of the mind, [00:53:00] the direct path.

And I, I, I haven't done this, but I, I wonder if you are interested or listeners are interested in, you know, doing this with humans of course is incredibly important, but developing an LLM based system and there's a lot of training data within these LLMs to do it, which can help one perform self-inquiry and suggest meditative paths and that type of stuff, I think would, would be very interesting.

And back to your, I mean, the fact that you tried five to 10 different system prompts or, or prompting for, for Colgate, but a, a particular case, it could turn out that it's not that difficult to do something like that and that would be interesting, wouldn't it? 

thomas: Yeah, yeah, absolutely. And. I have actually worked quite a bit with AI on this self-inquiry practice and found g PT five to be immensely powerful.

And when now looking at it and having this background of, uh, neuroscience and cognitive neuroscience, it sort of emerges to me as a technique to really calm your nervous system through establishing a feeling of safety [00:54:00] so that it can really open up and like release, uh, previous whole tensions in all kinds of ways that can be emotional or physical or anything.

And that's what I think really the, the value is and like unclogging our nervous systems from this insane world with like social media and, and various things that, um, yeah, I think is, is important and beautiful practice and really, I mean really why are we even doing this whole AI thing and like based in modeling?

I think it is just to enrich. Our human lives and really help each other. And it's all integrated, uh, in, in this one approach. And I think AI can play a big role in also providing these type of interventions for people and helping them open up their nervous systems. 

hugo: I agree completely. And I think, but before we went live, actually, we're talking about internal family systems, which, and we're not gonna go on about this too much, everyone, but it is, it, it is deeply relevant, which is a form of inquiry and, and therapy, which [00:55:00] allows one to identify different parts of oneself.

One like protectors and then wounded parts and, and this type of stuff. And what it actually, one of the interesting things about it is it means. You can identify different parts. So it isn't one self in conflict when internal conflict occurs, you can identify these different parts and have a more stable self that identifies and heals these, these different parts I was chatting with with g BT five about it.

And the types of exercises it came up with were incredibly useful. I wonder, Thomas, so thinking about the failure modes, they can be very serious though, right? I haven't encountered these, but if you're in a challenging place and using AI as therapy and it makes a mistake, or it's like, it says something which a therapist clearly like, it just messes up completely, right?

Like that actually could, is is highly problematic, wouldn't you say? 

thomas: Yes, I completely agree. And safety is even more important there. And so with my meditation friends, I had a really interesting discussion about this too, where the pastis [00:56:00] right had this big problem of ancy, um, just like really sucking up to you like, oh, what a brilliant insight, right?

And they would just keep reinforcing you and like sending you down whatever path, like sort of you're on. And g PT five feels very different because mm-hmm. It doesn't seem to do that obviously to any extent, but there is a risk where like, well, maybe it's just so good at inferring what I want to hear that, like, it knows that I don't want behavior, but like still leads me astray into like cult-like behavior.

So that is real danger and warrants care. And, and looking into what I can say is when I worked with GP five in this way, um, in the self-inquiry way, there was, um, a key moment where I, where there were a lot of openings and things were sort of interesting and like, uh, a spiritual lens would say like Kundalini energy or something.

And then it was GP five that was like, well actually, like maybe you can just look at this in the way of their, uh, nervous system. Like just, and like [00:57:00] your autonomous nervous system, uh, doing its thing. And then it became clear to me like, oh, okay, well actually that makes a lot of sense. So there's one data point at least where, uh, GPD five very explicitly like guided me towards a more grounded, embodied explanation and didn't just like, sort of send me through like hundreds chakra or whatever.

hugo: I love it. And, and I, I totally agree that there's less sycophancy with GPD five. I've noticed a weird thing with a, at least, and maybe it's 'cause of some of the, some of the memory it has of me, I honestly don't know. But when I was talking with it about in internal family systems, it was more engagement beatty at the end.

Like it would say, Hey, this is the process, and it's like, Hey, do you want three bullet points that will make or break this practice for you? And I'm like, no, I don't, I don't want that at all. Well, yeah, exactly. But every time it was like, I'll give, I'll give you the final thing that will make, 

thomas: you know, 

hugo: and 

thomas: that's true.

And there's like an addictive, um, potentially addictive mechanism in there. Yeah, 

hugo: absolutely. Yeah. It is actually very, you know, and I [00:58:00] almost hesitate to say this, but it is like very like. Silicon Valley engagement Beatty stuff in, in, in some ways, right? Yeah. Um, and actually, I mentioned this to gbd five, and it was like, I understand why you'd feel that way, but dot, dot, dot, dot dot.

And it's like, I could summarize this in a LinkedIn post for you if you'd like, and I'm like, no, I actually, and funnily, I actually, because I use, um, uh, chat GBT to help write my substack, I was chatting with chat GBT about a sensitive situation I was having with a friend. And, and we, we came up with a, a way to approach it and then chat, GBT said, uh, would you like me to, to incorporate this into your next Substack newsletter?

And, and I was like, let, leave that with me. Let, let me think about it. Um, so Thomas, it's time to, uh, time, time to wrap up. Um, I, I love all the directions this conversation went in. I've linked to PMC labs in the Discord, and I'll do so in the show notes. I'm just wondering what's the best way for people to reach out and, and get in touch.

thomas: So happy to stay in touch with [00:59:00] people. I mean, I'm in your amazing discord community. There is, uh, LinkedIn is probably the easiest way. That's where I would post the most. And, uh, then just email almost wiki at pine labs com. And yeah, looking forward to staying connected to you and the community. And thanks everyone for tuning in and joining the discussion and being part of it.

Uh, I think we live in really exciting times as we discussed, and it's just gonna be, become even more interesting. 

hugo: Absolutely. Thanks to everyone for joining. Thank you, Thomas, for all your, your insight and just the, the wonderful conversations we always have and I look forward to more. 

thomas: Thank you so much.

Take care. 

hugo: Thanks for tuning in everybody, and thanks for sticking around to the end of the episode. I would honestly love to hear from you about what resonates with you in the show, what doesn't, and anybody you'd like to hear me speak with along with topics you'd like to hear more about. The best way to let me know currently is on Twitter.

Vanishing data is the podcast handle, and I'm at Hugo [01:00:00] Bound. See you in the next episode.