The following is a rough transcript which has not been revised by High Signal or the guest. Please check with us before using any quotations from this transcript. Thank you.

===

peter: [00:00:00] Let's pretend we actually have stumbled upon the beginnings of a technological revolution that is probably the most important thing that homo sapiens will discover, like maybe alongside fire and writing. That's the scope of what we're talking about. Maybe that's a hype thing, maybe it won't really quite be that.

I think it's actually something pretty close to that. What we're talking about here is the ability to turn smart thinking sand and electricity into things that we used to have to make humans and feed them and clothe them and house them and tend to them when they're sick. Nope, we can just go and rack and stack a pile of chips in a data center and get something like human cognition out of it.

Imagine you would have an army of very smart grad students and you would just give them direction about areas to explore, things to do, and ways to look at certain things. And you would timebox them, and you'd probably energy bound them, and then they would come back with some assessments and you'd take a, take a look at that.

hugo: Eda Wang is the Chief AI Officer at Anaconda, co founder of the company, and a [00:01:00] key figure behind Pi Data, shaping the open source ecosystem that made modern data science and AI possible. But as AI advances, Peter's questioning where things are headed, both for software development and for the open source community that helped build this revolution.

In this conversation, Peter and I dig into some of the big questions. Why companies are obsessing over models when they should be focused on integrating AI into their workflows. Why AI is breaking traditional software development and what that means for engineers. We also talk about what happens when open source tools that powered this movement are increasingly shut out of it.

In addition, we explore how AI is starting to change the way we work. Instead of writing code line by line, engineers are becoming orchestrators, directing intelligent systems to run simulations, generate insights, and iterate on problems autonomously. What does it mean when our tools don't just execute commands, but think through problems on their own?[00:02:00] 

Before we get into the full conversation with Peter, let's just take a quick moment to chat with Duncan Gilchrist from Delfina, who makes High Signal possible. So I'm here with Duncan Gilchrist from Delfina. Hey there, Duncan. Hey, Hugo. Could you just tell us a bit about what Delfina does? 

duncan: Yeah, at Delfina, we are building AI agents for data science.

And, you know, we talk to a lot of experts in the field. And so with the podcast, our goal is to share the high signal. 

hugo: And as you're building AI agents for data science, I presume a lot of my conversation with Tito resonated with you. 

duncan: You know, with this episode, it was so fun to take a step back on AI, both the hype and application of AI is becoming so normalized now.

For me, it's kind of normal to talk to chat GPT on the way home from work about esoteric science topics, like how CO2 levels in your work environment affect my energy, [00:03:00] or to fire up a one pro deep research and analyze sales automation tools. And, and yet Peter kind of brought me back to the tingles of fall 2022.

I've been thinking about how crazy it is that, like, what is basically next character prediction with a huge neural network and a huge amount of training data gives rise to cognition. It kind of makes you think about the nature of intelligence and like, is that really all life is? So it's really a beautiful episode that way.

hugo: Yeah, I couldn't agree more. Well, with all of that said, let's jump into the episode. Hey there, Peter, and welcome to the show. Hello. Great to be here. Thank you for having me. Such a pleasure and I'm really excited that in the past, I think year or so, maybe a bit longer until then you were CEO at Anaconda.

And of course you've had many roles there, but you've got a new role, chief AI officer. And I'd love to hear what that means for you and what type of [00:04:00] things you're interested in, particularly given your background in. Analytics tooling, data science tooling in the PI data space. Where, what happens in AI now?

peter: I, I, let me give you the answer to all that in five minutes. Now, big question, lots to talk about there. So yeah, the, the switch to the chief AI officer role, it's an innovation role. It's an exploratory role. We're really in a very interesting time where there is a lot of hype around AI. There's been hype around AI.

For quite a, quite some time, five, six years going on now, but there's certainly now with chat GBT, a deeper, there's a realness to it where people are really using these LM tools and everyday settings to, to accelerate their work. But that's, we all know that's just the beginning of it. All right. So the sense of what is coming next, where is it going?

What is real and what is hype and what's happening and what is our view as a company of what we should do? Where does the open source ecosystem go? And what is our role to play in, in the stewardship or the innovation there? All of those questions are serious and deep real questions, right? That affect our trajectory for the [00:05:00] next however many years.

And so you really need someone with a certain level of expertise to get into that, and you can't just hire a consultant to tell you, and that's really why I stepped into the role. And also as the company is growing and scaling last year, as we're going through, someone who transitioned as we were growing and scaling the business.

I actually, I don't know how many people are aware of this, but I took the CEO role really in an interim capacity, like whatever, five years ago, and I just stayed in it for a number of years, but I actually, when Travis and I started the company, he was a CEO. I didn't really want that job. I was CTO and I much preferred that job.

Anyway, I'm happy to be doing tech and innovation work again, really as my full time job, CEO job, a lot of his HR, a lot of his investor management, a lot of his like. Organizational admin and kind of thing. There's a lot of that, which I can do, but it's not really my strong suit. But to answer your question of what I've been doing over the last year, yeah, it's been, I've been trying to investigate the fundamentals of really where are we at in the AI journey?

What is real? What is hype? What does that mean for our current users, for our current products? And what does that mean for the ecosystem moving [00:06:00] forward? So those are the things that I've been thinking about quite deeply for, I would say a year and a half at least now, but much more full in a full time way in the last year.

Incredible. 

hugo: Well, I'm excited to jump into all those things. So I think one instructive framing, maybe historically we've sliced the space in many ways, but one way we've sliced it is into descriptive, predictive and prescriptive analytics, right? Now we have something generative, which is incredibly exciting as both you and I know, but you know, descriptive analytics is dashboarding, exploratory data analysis, predictive is ML, those types of things.

And prescriptive analytics is decision science, which hopefully all of these things inform as well. So I'm wondering in this slicing, how we can think about what generative AI, how generative AI changes it. So maybe we can start with exploratory data analysis and prescriptive analytics. And so what's the future of descriptive analytics and exploratory data analysis in a world filled with large language models?

peter: Well, um, do you mind if I modify the question just [00:07:00] a little bit? I would be excited 

for 

peter: that. I would like to actually change the scope to be something more than just large language models. I think LLMs and transformers and the current sort of models are very interesting. And they've certainly done some groundbreaking things.

But I think where things are going to be, maybe beyond. I mean, something still neural network, presumably something. Some kind of autoregressive or something into nature, but whatever it is, it may not explicitly be an LLM per se, but it's something where I think the best way I've thought about it is we can turn electricity into things that could only be produced by human brains, right?

Which took like protein and starches. Now we turn electricity directly into the output, which is shocking. So for those kinds of technologies, yeah, they're, I want to be real. I had a conversation with some policy and legal sort of folks in DC a few months ago. And there was a lot of talk about AI and copyright and all these things, and at the end of it, I wanted to shock people outta their sense of complacency about this.

And I said, look, let's [00:08:00] pretend we actually have stumbled upon the beginnings of a technological revolution. That is probably the most important thing that homo sapiens will discover. Like maybe alongside fire and writing, fire writing, maybe electricity. You, you put this kind of thing like. That's the scope of what we're talking about.

Maybe that's a hype thing. Maybe it won't really quite be that. I think it's actually something pretty close to that. And if you actually accept in a improv style, if you accept that reality, let's take a step back and what we're actually talking about, what we're talking about here is the ability to turn smart, thinking sand and electricity, just electricity and thinking sand and do things that we used to have to make humans and feed them and clothe them and house them.

And tend to them when they're sick. No, we can just go and rack and stack a pile of chips at a data center and get something like human cognition out of it. So that is the kind of thing we can equip people with in their [00:09:00] jobs of descriptive exploratory analytics, whatever it might be. So I think we, from that lens, I think we can start talking about how those particular.

Parts of the data science job might change or the analytics function might change. But I think that the change is a sea change. It's very different. I think it's going to be more like, I don't know, maybe this is getting a little too fantastical, but I imagine you would just give you, you would imagine you would have an army of very smart grad students and you would just give them direction about areas to explore things to do and ways to look at certain things and you would time box them and you probably energy bound them.

And then they would come back with some assessments and you take a, take a look at that and then, you know, take passes on it like that. And it would be a very smooth blend between the descriptive to prescriptive. In fact, the smartest of the grad students are going to go and run their own simulations.

They will start up Jupyter notebooks that actually go and run simulations that they've done. So I think we're just moving into an entirely different area where I [00:10:00] think there's still very much a role for the human mind and human expertise, but very much more in an orchestration. and direction capacity versus a detailed down in the weeds kind of a thing.

We're not there yet to be very clear. It's going to be a transition to get there, but I absolutely can believe a future where we're really doing more of the orchestration. 

hugo: Absolutely. And I love that you mentioned having a variety of graduate students. I do. I don't know if he. Uh, coined the term, but Mark Seraphim from PyTorch, at least enlightened me to a term called graduate student descent, which he thinks is how a lot of results of academic research flower these days.

I, I appreciate you helping me to zoom out in this conversation as well. And I do think we say this year is the year of the agents, whatever that means, right? I do think it's discovering failure modes of agents, but also discovering how I don't think we're going to build necessarily a huge number of robust apps with agents, but we're discovering how tool use can actually be formalized in a [00:11:00] variety of ways.

In the past few weeks, Peter, it's Thursday, February 10 here in Sydney, Australia. In the past few weeks, we've essentially gone from Seek being released to deep research coming out of open AI to deep research coming from perplexity to grok three seemingly have something along these lines as well. So my point is in a few weeks, we've actually seen the commoditization, the beginnings of commoditization of an entirely new type of product and service.

I think what I'm saying is that things are so novel and moving so quickly. Our assessment, we need to be flexible. I suppose where this leads me to. Is wanting to know from you, from what you've been working on as Chief AI Officer, what do you see as the short term and medium term future of people working in this space?

What, what should people be thinking about and looking at tool wise and process wise? Should we actually dedicate two days a week to experimentation currently, for example? Is it this type of paradigm shift? [00:12:00] 

peter: Unfortunately, I'm an innovator, creator, entrepreneur kind of guy. So my answer to that is always going to be yes.

So maybe I'm the wrong person to ask and you need someone who's, you need a straight man to play the other side on that, but, but actually there's a formal argument for why we should take time explicitly these days to try things and do things, and that is really based on the Cuneffen framework spelled C Y N E F I N.

And that, that framework describes the, the, the correct OODA loop. For different kinds of environments. And when you find yourself in a static environment, then it's very clear. So then you sense, and then you do some, are you familiar with this concept? That's like, I am. Yeah. But this one's work 

hugo: based. 

peter: Oh, okay.

There's different, what Conefn does is it says there's different modalities of an environment that you might find yourself in. There's clear, there's complicated, there's complex, and there's chaotic. And we generally have the right intuitions about this. Although I think some people might not know the distinction between complicated and complex.

And so my best description of that is complicated is a 747 complex is a [00:13:00] cockroach or a bird, but even the cockroach, right? Most people haven't stepped on a bird. Most people have stepped on a cockroach, but the idea is nobody can put a cockroach back together. But a lot of people can put a 747 back together.

So 747 is complicated. 747s aren't self healing as well. That's right, that's right. And cockroaches, if you get on them hard enough, a 747, a cockroach is also not self healing, right? That's true. So there are boundaries on what its complexity is able to encompass. But chaotic then is just like, whatever, it's chaos.

We all, we all do chaos. So anyway, the point is, in the Coneffa framework, in these different Modalities of your environment, there are different appropriate operating procedures. So in the clear environment, you sense, then you categorize, and then you respond. And then in the complex environment, you probe, and then you sense, you do some sensemaking, and then you respond.

In the chaotic environment, you have to just act. You do stuff, and then you see what happened as a result of your actions, and then you respond. And, and this is a bit of an elaborate sort of articulation. This might actually be a little bit informative of later discussions around Exploratory [00:14:00] data analytics, its role in the cybernetic revolution that's coming and all these things.

So I think this is actually maybe not too much of a digression, but the point is relative to what is good practice now, if you're a data science professional, if you're an MLOps person, you have a good paying job, you're interested in what's coming, but it's not affecting your day to day job yet. I think now is absolutely the time when you should be thinking about that, at least a probe sense respond, if not act sense respond modality.

And to your point, About the rate of new things coming out. Actually, most of them commoditizing previous things. It's a little surprising to me how many things people are trying. I would say fairly straightforward kinds of approaches that then end up shifting quite a bit, what the possibility landscape looks like.

And what that says to me is that there's so many other low hanging fruit that have been unturned. There are a lot of people trying things. Some of them, a lot of them don't work. Some things do work, but the rate at which. Interesting, but basic things seem to make significant improvements says to me, intuitively, it [00:15:00] smells to me like there's a lot of low hanging fruit out there for people just to try things.

So anyway, I think that would be my absolute guidance to people to the extent that you can squeeze a couple of days a week to just sit there and play with stuff and try and get a real sense for what can a local model do? What can a small model do? What can the biggest models actually do? And how is it different, right?

On these tasks, data tasks on coding tasks. Summarization, et cetera. It's really quite something. 

hugo: Yeah, without a doubt. And I really like those analogies and thinking through the Kinefin framework. It actually reminds me an old friend of mine who's a CTO, a European based startup. He talks about. How, in the early days of a startup, you need a lot of, you need explorers, like people will go and, as we said in Australia, bush bash and build things quickly and then see where the river is and that type of stuff.

And as you scale, then you need to build, essentially, settlements and have villagers, who is a very different type of persona to the explorer, right? But, in, with the [00:16:00] movement of the current landscape So many companies may actually need to employ more Explorers full time, in my opinion. So actually have, at an organizational level, individual units that are there to go and rapidly iterate.

Not necessarily build a robust software that's served to stakeholders and customers, right? And I wonder if this is something you're seeing in your work, but also in a lot of the companies who use Anaconda and use the PyData stack. 

peter: Yeah. Yeah. When I first started on this journey of pushing Python as a valid choice for doing data analysis, there was a lot of advocacy I had to do within businesses to say, look, here's this thing that's happening.

Here's the tool that is really great. And there's an ecosystem with these wonderful tools. And it was seen as being quite disruptive, right? I would say this is like the 2013, 14 timeframe. And when you get into these large businesses and you're absolutely right. The term I've used, uh, to distinguish between those kinds of modalities or personalities.

Is pioneer versus settler. And, and when you go into those situations, you'll find large businesses. They're not. [00:17:00] Strangers to this concept. They know they need to have some pioneers who are out there figuring out what is innovative, what can they bring in and they'll empower groups to go and try to really bring innovation in house or, or adapt and then enable or, or, or catalyze digital transformation within the business.

Right. There's a big term, lots of money gets spent on this kind of stuff on consultants. At the end of the day, though, what I think where most businesses get stuck is that. They have a very instrumental view of technology and they don't realize that technology is your, your business adoption of technology depends on the people actually wielding the technology.

And so the unfortunate truth is if you actually want to transform certain business operations into a pioneering modality or innovation modality. You have to get rid of the settlers. Most settler types are not the pioneer types. It's very uncomfortable for them. This is something I didn't understand, actually, as an open source hacker, innovator, like love to make new things, love to dream about what the possibility of space is and the art of the possible, all that stuff.

Okay. That's literally my [00:18:00] water that I swim in. So it was hard for me to imagine until I started really getting into consulting and working in large businesses. Where I started engaging with people and I realized, oh, there's actually another kind of person out there. Very smart, very capable, but they really want to know exactly where the rails, what are the trains running, when the trains have got to run on time, where are the boundaries, what's the process for changing the boundaries.

It's all very regimented and there is absolute value in having that at a large scale operation because otherwise it's complete chaos. Okay. So it's not really dinging that it's more of, it took me development learning as a person to realize there's this entire other kind of personality out there, but I think if you have a whole business that you want to shift into a reality where look, the ground underneath our feet is changing, the business needs got to move, then you have to actually put these kinds of disruptor radical pioneer kind of people in charge.

And that's terrifying for established businesses where everybody is used to like living within the walls of a big city or the very domestic sort of setting. [00:19:00] So. I don't know, maybe the harsh truth for people is to say, look, if you're in such an environment and it's quite regimented, or it's just gonna be a lot of lip service from on high about this and that and AI that, well, for your own personal preservation sake, you should try to figure out how to carve out time.

So you have that time to go and uplevel yourself and get really familiar with these things. That's a. Again, this whole like pretending like we are going through one of the great technology revolutions of humanity, what would be an appropriate way to respond that would be the appropriate way to respond, 

right?

Absolutely. 

peter: Clocking in nine to five. And also I like, 

hugo: there's almost like a Pascal's wager vibe here as well, right? Not necessarily concerning the eternal and God, but if there's a chance, what's the downside, right? Yeah, exactly. If there's a chance that we're going through it, better jump on board. I that's my, so I'm not telling anyone else what they better do.

I'm saying what I better do. But yeah, a few weeks ago, the week of February 8th, right? I'm just picking one of many weeks. It seemed like a pretty big week in AI for me, but not in terms of like GPT 5, right? We had [00:20:00] open AI deep research. We had Repl. it agents come out. We had GitHub Copilot agents. We had Mistral Le Chat, which generates 1, 100 tokens per second, which is 10 times faster than GPT 4, right?

We had Mistral Small, we had Gemini 2, and all of that come out, which is amazing. We also had Pica 2. 1. We had ImageN 3 API through Google. And if that's not enough, Andrej Karpathy dropped his deep drive into LLMs like ChatGPT video, which I think. There's an argument that's as impactful as in terms of democratizing access, not making the impossible possible, making the possible seriously widespread.

Right. So just thinking that all of this stuff can happen in one week, how do you feel data science leaders? So I say is definitely. probably should be exploring the new technologies. How should data science leaders be thinking about this and team leads and CTOs, chief AI officers, all these things with respect to incentivizing people in their own [00:21:00] teams to do this, the type of necessary exploration.

peter: Yeah. For the leaders in the way they should think about this. I think unfortunately they have a hard job because they have to land and make sense of all this technology that is a whirlwind like I'm a pretty smart geek but like for me to even absorb all this and process what's happening is a full time job almost right like you're right it's just And that's not even naming all the papers that drop about all these different kinds of things like the right to dissent, the latent diffusion, the large length diffusion model, like all these papers are literally dropping over the course of the last week.

So if you think about all this then, so as data science or analytics function leader, you're looking at all this stuff happening on the world. And then internally, your problem gets doubly hard because not only do you have to process all this, make sense of it and figure out a strategy, but then all of your execs are looking at this and be like, oh, crap, we're going to try to make sense of this.

We're going to read a bunch of stuff. We're going to get. Our whatever distillation of these things, and we're going to tell you what we think should be done. And so now you have to carry the weight of, of all of that. I feel like [00:22:00] it's just, it's really, I wish I had a perfect answer for any of this stuff.

I just at least can maybe name the struggle and just say, look, everyone's going through the same thing, right? Because the leadership in a lot of larger organizations. They have been so far removed from technology and they have been used to, I think this modality of paying a lot of lip service and again, viewing technology in this instrumental way.

We acquire technology. It's another turn of the crank. Oh, how do we buy some AI things to make ourselves an AI company? And if like you and I are thinking about this, I think probably very similarly, which is this is a reset of like how you do software, how you do information systems. What does DevOps look like?

What does infrastructure look like? What, which businesses even survive from a business model standpoint at the other end of this, like after five years, these are valid questions to ask right now, because this stuff is so transformational. So I think that the business leaders to the extent, okay. So if you might, my actual answer here, I'll try to be helpful here.

The actual answer for the heads of analytics and, and maybe ML kind of [00:23:00] folks, even AI officers. They really have to create like a zone of high integrity, sane discourse about their business, because there's a lot of forces out there, which are, which have their own motives to create a certain hype or to create a certain narrative around what this is.

I was just at the Paris AI Summit and I got quite a wind of this, right? Because politicians are trying to couch this as being like, oh, the next Manhattan project. And in the fifties, actually, Tom Lehrer has a song about this. So and so gets the bomb. Who's next? And it's like the bomb, like the AGI, Necron's like, we'll put a hundred billion into making data centers or into doing research on having a European AI.

And, and so when you, but then you go, you look at this and you're like, does it take a hundred billion dollars to make a frontier model, especially if like DeepSeek just went and distilled a bunch of stuff out of GPT 4, right? You're like, well, I don't know. Maybe you could just get by with not that much money.

And you'd be better served as a nation, [00:24:00] or whatever, dramatically improving the level of education of the people about what the technology is, what it isn't, how it might be used, etc. That, like, I, and this is a weird argument, but I actually, I get a little tired of the boomers and the doomers on both sides of the acceleration and the whatnot.

I think at one point in Paris last week, I held up my phone with a screenshot. I think it was actually a Carpathian screenshot. It was like 80 lines of NumPy. That is like the heart of the transformer, like the multi attention block, the inference block. It's just, it's it that's 80 lines of Python code.

That's it. It's not the bomb. You don't need all of a Hanford site processing, enriching uranium. You just need a pile of tokens. You need some like a, maybe slightly bigger than the university supercomputer, but some kind of supercomputer level sort of supercomputer. And then you need a hundred lines of NumPy.

That's it. Not quite NumPy, but you know, but whatever it is. So the point is, I think the hype around this is doing a tremendous disservice about the actual adoption of this [00:25:00] technology in the corporate setting, but also in the public sphere, the policy sphere. Cause if you tell people country X, Y, Zed is making a thing that is like going to be the Terminator Skynet that will end humanity, potentially, obviously people are going to be freaked out about this thing and they're gonna start boycotting stuff.

Right? But if it's, no, this is a really amazing transformation technology that makes cognition possibly just basically the price of electricity. And we put this in the backpack or in the pocket of every school child, we can put this on the cell phone of every adult. So they have continuing education on whatever topic.

And the arts and history and science and whatever they want, like we could actually unlock a tremendous amount of abundance. Yeah. There's some dangers around this and yes, there's things we should be aware of and we should think about, but can we have an adult conversation around this that isn't, I need $500 billion to hand over to Jensen or GPUs or, or like, we need to shut down all the data centers and nuke them from orbit.

Like there's what, what happened to the same conversations around. Hey, this is a matrix multiply yields [00:26:00] Shakespeare. So we have to process what that means for us as humans. But meanwhile, we should be thinking about how we can radically reduce the cost of so much technology spend and make it radically more effective and make so many things more transparent, more accountable, higher quality.

Where are those conversations? Right? 

hugo: Absolutely. And also where's the conversation, even the conversation we're having now, because the conversation out there is so model centric. The conversation we're having is model centric and not data centric. And I think this is one thing too, in a world where the model you use likely won't be your differentiator down downstream when a lot of models are the same, really thinking about at an individual level, but then as a team lead or an executive thinking about how to make sure your organization knows.

How to integrate their own data into foundation models. That's what it's becoming more apparent will provide your defensible moat essentially. 

peter: Oh yeah. People are saying data is the new oil for a very long time, but we didn't have an internal combustion [00:27:00] engine to use that oil until now. Like we're using it for the kerosene, like heating lamps or something, but like now we have internal combustion engines, which make that oil really valuable.

And so you see it across the board organizations are absolutely, they see the value in their data and they're trying to do rag and vector DBs and all these other kinds of things to surface the, the organizational intelligence. And I think that's great, but that actually tracks correctly. The obsession with these models, it is a part of, I think the obsession of the models is, is.

Part of a capital markets and kind of American Silicon Valley sort of dynamic driving the conversation. If you think about it mathematically, Google engineer, some Google engineer wrote about this over a year ago. Some of the papers coming out last year around model representations. I think these things, the larger the models are, the more they do converge to basically being the same model because they're training on the same.

Pile of data, which they can't talk about, but it's mostly like, but also to your point, 

hugo: the distillation process as well. And the way that can help you make everything significantly cheaper. That's right. 

peter: That's the weird thing. The models [00:28:00] are, I don't know. I can't even imagine there's even argument anymore.

That model could be a moat. Honestly, I don't need 

hugo: to tell anyone listening or yourself that story. The other thing I do think you remember just before the plague, we, Andrew started some sort of Kaggle esque machine learning. Contest, but it was the model was fixed and you had to work on the data, right?

And he was, of course, for very good reasons, something I'm very grateful for promoting data centric ML and data centric modeling in the cultural consciousness. The conversation has moved significantly back to models because these are the things which There's a sense of like marketing and kayfabe about it, right?

But they're the sexy things, but still the things that will deliver value are figuring out your data. So making sure that everyone's aware of that. I do want to, we've been talking around the open source ecosystem as well. And as someone who, you know, with a bunch of colleagues started PyData and. The [00:29:00] condor distribution and then built continuum analytics and anaconda.

You've been so steeped in and a steward of the open source movement for, for so long, what happens to open source now? 

peter: Yeah, that's a very good question. It's a very good question. I've learned a lot and my perspective actually on open source has changed quite a bit over the last 13 years that I've been doing the Anaconda thing and the journey, and I've come to realize that even my old school days as a, like a Linux champion and web open source nerd and all that stuff.

Uh, I viewed really open source, I think in a very much as a lot of, I think coders do it's a license or it's a community, it's an approach like we share and we code together and we build cool stuff together and so on and so forth and watching the progress and the evolution of the Python data science community, as well as the non data science Python community over the years, it's become quite clear to me that open source as a license, It's actually not as interesting [00:30:00] as the way that open source allowed for people to collaborate in an abundance, non zero sum way.

And in doing so create value at much greater capital efficiency than market economics, capitalism would like to admit that most of the people I know in the open source community, they have progressive leanings economically speaking, but they're not. Like ardent communists, socialists, whatever. They're not on the vanguard of the Marx like class consciousness kind of thing.

They have jobs. They work for the man, they pay their taxes. They contribute to market economies. They have 401ks. But what they do is they collaborate and they take their extremely differentiated intellectual skills. And they engage in craft work and craft that then they give away for free and they collaborate with other people and they do things that are not just fun.

A lot of open source. Work is not fun. It is like debugging some random weird ass bug from a user that didn't quite give you enough information to reproduce the bug. [00:31:00] There's a lot of this kind of like hard work that goes into it out of passion, out of care, out of a sense of shared values. And some of those values are meritocratic about the quality of the code and the quality of the thing we're trying to build.

And Wes McKinney, of course, was obsessed with performance on pandas, right? And so he had performance oriented regression set up, or regression tests set up from the very beginning. So these are the kinds of thought and care that people in this community put into it to build something, which now creates, I think, immeasurable amounts of economic value for people.

hugo: Yeah, and that, it isn't only people creating stuff to build fun things or, like, not that would be an issue at all, but to be very clear, this collaborative sharing, abundant process, which is incredibly taxing on people and nearly all serious open source developers I know have burnt out several times doing it.

But it has fed so much growth in enterprise and we just need to look at current tech stacks, right? And actually most of modern tech, you look at your Airbnbs and your [00:32:00] Metas and Google and that type of stuff. Of course, they've contributed a lot to the ecosystem, but so much of it is built on NumPy, right?

peter: Yeah. And you can never, Bill Joy had this great statement that all the smartest people are outside the four walls of your building. And no matter who you are, and it's true that like Meta, Facebook, they've contributed to a number of different kinds of projects. They've done a lot of open source contribution.

Obviously PyTorch has been a huge boon for the community. Google created TensorFlow and open source that and put that out there. And so many other things that they've built as well, Kubernetes and these other things. So these big companies have been creating open source, haven't giving back. As a percentage of their market cap, it's, let's say quite small, but nonetheless, we appreciate all the contributions as a community, but there's not, the thing is the narrative around how technology impacts or accelerates human thriving.

It's almost the open source ecosystem sucks at marketing on that front. Like you don't have these people making to the cover of time or vanity fair. [00:33:00] There's just a bunch of like nerds in the basement. Like we're still very much seen as that. And, uh, and so I think there's the people with them build the marginal bit on top that then goes and reaches a 50, 100 billion valuation.

They're the titans of industry. But the entire stack of stuff that build on is just a bunch of open source volunteers. I think, and the reason I'm not bitter about this, really, I'm just like the reason, but I want to name it. Okay. I want to name it because this does actually speak to the question that you have, whether AI, whether open source in an era of AI, because AI actually is all the code gen stuff, all the stuff that's going to come for the entry level and soon, probably mid level programming jobs.

All of those things are trained on source code snippets, answers on Stack Overflow, a lot of code on GitHub, GitLab, SourceForge, all these things. So people have laid like a lot of volunteer effort from untold millions of people. Lives have put their lives into [00:34:00] this stuff to make this kind of incredible technology possible.

But the stories that are told, the people who are at the front row of the inauguration, people were invited to go speak in Hobnob with politicians at the Grand Palais in Paris. It's, it's, we have this like real hero myth, this combellion, who is the avatar of all this progress. And we cannot see when it is a community contribution.

We cannot really see when it's an ecosystem of hundreds of projects that actually then meet on Zooms or do sprints together to figure out some version conflict or something. All of that stuff is utterly invisible. That's the tenant farmers whose backs, like all the stuff is built on. And so as someone who has been through a bit of a journey himself through the open source ecosystem, through software entrepreneurship, starting and running a business, trying a variety of methods to, to take our revenues and pay those back into open source innovation and to maintenance.

I think a lot of people just know Anaconda for conda, which makes sense of the packages, but we have put like 30 million into open source. Tech over the [00:35:00] years. We incubated JupyterLab. We currently maintain Jupyter notebook. We adopted certain kinds of projects like beware, because I personally care about having Python running on mobile and many, many kinds of platforms.

Created PyScript to really push Python and Wasm. There's all sorts of, and that's not counting other tools like Dask and Numba and Bokeh, these other things that we've created. So we have tried our best to put money back into the open source ecosystem, but that's a drop in the bucket relative to how much work is done by everyone else in the ecosystem.

But having been through this journey as an entrepreneur, as a person, as a humanist, as an innovator, I look at this and I say, okay, what happens next with AI, right? What really does happen next? And this is part of, I'll be honest with you, Hugo, this is part of the work I was doing over the last year. And I thought about what is the role to play.

And I banked on two things. I banked on the models getting smaller, getting more efficient. I will stand by my statement that there is still a hundred X improvement in model performance available to us. At least I stand by that statement. We'll see. And I think the fact that we're [00:36:00] able to get, go from 10 to a hundred to a thousand tokens per second with small changes in software architecture, the fact that we're able to quantize models that are hundreds of billions of parameters down to.

A few billion parameters and have 85, 90 percent of the performance. It shows us the level to which we're overpaying for these things. Right? So I stand by my concept that models will get smaller. All the core foundation models will converge, basically being the same platonic model with the same representation of basically the corpus of human knowledge, mostly rendered in English because the low resource languages are not represented at all in the stuff.

That's a mystery though, 

hugo: that recently, this week released, sorry, I can't remember what languages it was, but it's, it's a model that is not English as well. I'm not sure if you saw it. And what's 

peter: fascinating is that the structures inside the neural structures are different depending on what languages you trade on.

It's not simply a matter of taking an English model and translating it. Uh, there was a fascinating paper that just came out talking about the, the idea that, uh, LLM's trained on Chinese, uh, corpora have a different density, possibly a denser neural structure than [00:37:00] those trained in English. I'm bilingual in Chinese and English, and I could actually.

Buy that maybe a little bit, uh, because the structure of Chinese language Yeah. I'm just looking 

hugo: now. So the, yeah, it's Mytral Salba. It's a 24 B parameter model trained on meticulously curated datasets across from across the Middle East and South Asia. And I'll link to that in, in the show notes as well.

I, this actually, I wanna take a slight, I, I wanna talk about Stack overflow Peter. Don't worry. Because I actually think Stack Overflow is. A very important example of what can happen to a relatively open community of people making contributions because they want to and have their own incentives to when models become commoditized.

Also thinking about models that are trained on particular languages so that people Can use them instead of having it go through English and these types of things. I personally would have loved to envisage a future where Stack Overflow, maybe they have a base model foundation model, they train, they fine tune on or something, but where they have something in their UI where they don't necessarily need to use open AI.

But [00:38:00] they have something which interacts with all of their data, as opposed to all of their traffic being immediately taken out from under, under their feet. I'm interested in your thoughts on the future of more personalized models for organizations or for service. Same with the New York Times. I would like to see a future in which the New York Times doesn't need to collaborate with OpenAI, where they can have their entire corpus via some information retrieval system, interact with a large generative model that allows me to query it, essentially.

peter: Right. And I, um, you know, over the course of the last year and a half, almost a couple of years now, uh, for, for me, it has really been number one, I think the model is going to get smaller. And so people will be able to roll these things. They won't have to rely on just calling out to a data center, hosting some giant 400 billion parameter model.

And second of all, the data is where it's at. And we have actually right now, there's a situation around data for LLMs and for deep learning is a complete total trash dumpster fire. Like an absolute, you know what, it's a, it's a Chernobyl graphite fire is what it is. Nobody wants to talk about [00:39:00] their data.

The people who do talk about their data for the most part, with the exception of maybe one or two models. Every other model out there is training on stuff that they know they're going to get in trouble for if they talk about it. So what's the end game guys? What is the actual end game? The end game has got to be us thinking about treating data and treating content and treating users likeness and users voice.

All of these things. We have actually got to create a new kind of system or vehicle container, something to have a policy and economic discussion around what this is. And so that that's why I've actually been working quite diligently on building and scoping out what is the shape of a new kind of legal and technological sort of framework that would allow the preservation.

Of human authorship and creators intent with regard to the ideas in their data with regard to the [00:40:00] ideas expressed in in in their artifacts that would actually be preserved through a training process and would carry through. And so you can imagine as sort of a if you start creative commons. And GPL into a, a blender, right?

And you have something that is contemplating copyright, text and data mining provisions, fair use in the jurisdictions where it exists. Data rights and database rights, right of publicity, likeness rights, all of these things, privacy, putting all of that into one place or into one compatible set of things.

To then say, okay, here's a data artifact. I'm going to take this piece of data. I'm going to put it into my training data set. And I can do so with full knowledge that I have the consent of the person who provided this to do this and the artifact that I yield, whether it's a training data set, whether it's a pre trained model, I know what I need to do in order to conserve, continue preserving that intent all the way through the chain.

So [00:41:00] that's essentially the piece I've been working on. And it's very weird. It's very off to the side. I'm not a lawyer. I did not like, I'm more of a tech guy. I would love to write code as opposed to figuring out legal stuff. But it seems to me that what we're really bottlenecked on in order for a open collaborative abundance mentality to actually take root in this space for us to actually have open collaboration and transparency about what is in a model.

We need to actually have an environment where model builders are, they feel legally safe to talk about what they put in their model. And all the people whose work and whose content and whose ideas, whose likenesses appear in these datasets have some agency and have some ability to, to voice their preferences about how that stuff gets used.

And I think the existing frameworks of copyrights and these other things are insufficient for the task. So that's why we as tech people need to innovate and create something new here. And remind me what a framework you've called. It's a working title right now is the Anaconda NL Public License, NPL.

hugo: Great. And there is a talk that we can link to [00:42:00] online that you've given about it. Is that 

peter: right? There is. Yeah. I gave a talk to an internet archive event about some basic ideas around it. I've put a lot more stuff around it since then. And certainly new things have been emerging in the space, which kind of have been informing the design.

That is something that we're actively working on trying to put forth. It's something that people are definitely, I think. Finding they have a need for 

hugo: absolutely. So we'll link to a few things in the show notes that I'll get from you. I am interested in how we think about processes and tools these days.

So prior to our. chat gbt moment, so to speak, and or stable diffusion moment, which preceded it, we had at least some form of stabilized stack, maybe not quite in the ops space, but at least in the analytic tools and machine learning tools space, right? Like someone would come to me and say, what type of algorithm should I, I'm like, get you use random, if you can use random forests and scikit learn and build some neural networks in PyTorch and.[00:43:00] 

XGBoost, that'll be like, and I think I got that actually from Jeremy Howard at some point, like you'll be able to solve most things, right? Um, but then today you look at like Matt Turk's mad, wonderfully named mad landscape, because it is utterly insane as well with respect to even thinking about what type of tools I should adopt or use.

Should I be playing around with crew AI or open ALS swarm or trying to build a gentic stuff using vanilla Python API calls. So we're in this trend, a chaotic space, right. And transformative space. And I suppose from this point, where can we go? Should we consider. Pairing the ecosystem down to like simpler fundamentals and what will like, what will the generators or the primitives in that space look like?

And neither of us can answer that, right? But I know you think about this a lot. What type of primitives do you think we, we should think about using? 

peter: I think the reason we ended up with, with the architecture of tooling that we did in [00:44:00] the sci fi ecosystem was because it was built by people who had other things to do besides write software for an ecosystem.

And so they had to be very humble about their scope. They had to be very time efficient. Okay. Let's say you have Matplotlib. Very few people will want to go and rebuild Matplotlib. They will try to build their own extension to it, like Seaborn, right? Or they'll build something that's compatible with it.

Let's say you don't like the way that some particular data adapter work, or pandas, you don't want to rewrite pandas. But what you could do is wrap something around it or build an extension thing that loads a different data set in a different way. Et cetera, et cetera. So as long as people are using the tools or the ones building it, and those people are also like very busy with actual day jobs, then you, you end up with a certain economy of scope.

Now, the problem with the modern, once you get into the MLF space and to the AI tooling space, every single person is viewing that, like say for that series a from a VC, and they want to put a thing out there to get GitHub stars to go and raise money [00:45:00] around it. And now that's a very different ecosystem that comes about from that.

Now you, as a developer, you entered this thing and you're like, like, everyone's trying to get my attention to use a thing. And then we got sexy marketing as landing page and all that stuff. They're throwing great parties and meetups. And you're like, yeah, but is the tool any good? Is it, are you going to be around in 12 months?

And what's the tool for, do I have to learn to think about the problem in only your way or all these kinds of things. And so I think this is one of those areas where. Honestly, the, the capital markets have done disservice to an emerging developer space because it's a tragedy of the commons. Like what they actually want to do is not fund anything until a few things born in the darkness of the cold have accumulated a certain amount of like users who love it because they really actually love it.

And those things start taking off and then they can give those people funding. When they flood the space with all of this money, then all sorts of random stuff starts, you know, and this is not to like put a ding on anyone trying to build stuff in the space because if you have an idea for something cool to [00:46:00] build, you should go do that.

Obviously, I would never tell anyone not to, but I'm just saying as an ecosystem dynamic, what ends up happening is we have these personal landscapes of just like these disastrous like mixes of things. And at the end of the day, what are you really trying to do? Like you talk about agents, right? And yes, the ability to do tool usage and connect that to an LLM is like fantastic.

But how much of a framework do you need? Like how much, really, how much of a framework do you need? So I look at the stuff and maybe because I'm just an old crotchety guy who's not afraid to throw together some system scripts himself. Uh, but I feel like that's where we end up with this like framework, uh, overabundance kind of thing.

And. I don't know how we cut through that, honestly. 

hugo: And this is something I hinted at earlier. I've, so I've recently taught a course on building LLM powered software for data scientists and software engineers. And that's really something about the space I'm really excited about now is actually how a lot more people can contribute to the AI space at ML from Swayze to, I think UX and product managers.

Go to be able to come in and do a lot of [00:47:00] incredible work. But in this course, people like teach us the new agent frameworks and like, you really don't want to learn them. I can actually tell you, and that isn't to say they're not super interesting, but in the end I spent a week teaching these and what happens is a big part of this course is looking at data to improve product essentially.

What happens is if you use even one of the most basic frameworks and then start looking at your traces, you recognize that it's forgetting tool calls that you wanted it to do. That it's not only hallucinating, but forgetting a lot of things that other, uh, have said to it throughout the process. And nearly every student in this course, and these are people from companies like Meta and Ford and TikTok, right?

They said, oh yeah, we clearly need to go and build these ourselves using Vanilla. LLM calls and put serious guardrails around them in order to build reliable software. And this is if you want to build reliable software, right? If you want to build POCs, totally different. And there's an argument and a lot of capital goes into the demos these days.

Right? So we're a civilization of demo itis in a lot of respects, but [00:48:00] I suppose one question I have for you is it's a really interesting space where generative AI doesn't act like. Classic software, right? It's nondeterministic and it can't, they're horrible calculators. Well, they're 

peter: not calculators. 

hugo: Yeah, but we, I think there's an expectation that they would be, right, as well.

So, how then do you think about incorporating them into classic software stacks where we actually want re, potentially reproducible results in some form of consistency? 

peter: Yeah, I think you end up having to be a lot more explicit about your data guards and about your, basically you build modules, um, at the boundaries of testability.

And so you're like, where can I draw a line and say, here's an artifact that this module should deliver to this other artifact or to this other module. And here I can, I know that I can test this from a high level [00:49:00] perspective. And I think if you just use that general motif, you can start drawing these lines in the architecture.

It's not how I do, it's not a bad approach in general to software architecture. And when I think about just putting on my software architect hat on, like when I'm thinking about building software systems, oftentimes I think, I think about the right way to build a particular module or the right way to build a particular package of functionality is around some business model concept boundary.

Right. What is a coherent business model thing? How much, what is really the cognitive, like lowest load for a user of this module, this interface, this package. But with these, with using LLMs to do these kinds of things, I think you actually have to put it along the boundaries of like, where can a human who doesn't know the implementation draw a definite yes, no, pass, fail criterion on the inputs and outputs.

So it's almost like your loop conditions and your preconditions, post conditions, you remember from computer science algorithms classes. You'd have to put those kinds of correctness boundaries in terms of data and in terms of what you're expecting. And that's the only [00:50:00] way to really bound these things and govern them.

And I think you have to really think quite explicitly about having an evaluation framework. I think your evaluation framework essentially becomes your steering wheel. And that's a very different way of thinking about software development. We talked about test driven development. Now we've really got test driven development.

And we also, this 

hugo: is actually something that came up in my course. We need to reframe what tests are for software engineers. Because software engineers expect tests to pass. 100 percent of the time, 

peter: right? So this is the thing. I called it years ago. I said, look, AI, like we talk about ML models. We talk about ML ops and these kind of things and putting data science and putting ML to production.

And the big thing that a lot of traditional software engineers don't seem to understand is that a model is a different thing than a piece of software because it's correctness is contingent on its input values on its input data. Whereas most software, you can actually bound correctness on types and general ranges.

Whereas the actual specific values determine correctness in the case of models. And with AI, it is [00:51:00] just that in spades. 

hugo: Without a doubt. And it, one of the wacky things now, it isn't just contingent on input because you can get so much flip floppy behavior, right? Because they're stochastic, essentially, and probabilistic.

You can lower the temperature and that type of stuff to get some form of reproducibility. But it's definitely non obvious. And I do wonder then, what this means for even the development, the processes of developing and building software, whether we're entering into new paradigms where we all know the move fast and break things, and we know where we ended up with respect to that.

But there's almost like now with this tooling, A lot of people are moving even faster to spin up prototypes, see what works, see what doesn't, and it's a very experimental mind, mindset. And we've seen that in the shipping of LLM products as well. ChatGBT, even for 200 a month on Pro or whatever, is not a stable product, right?

How do we, how can we even think about all the experiments we need to do and the types of software we build as a result? 

peter: I [00:52:00] just, I don't see how people can seriously ship. Products that have to, that are like a chatbot is one thing. Okay. And I guess there's that air Canada case where the chatbot screwed up and they got fined.

But for the most part of chatbots, the chatbots like looking up some documents and trying to advise the user about some help screen help desk stuff. But when you're going and doing like real LLM driven stuff for things, I don't see how you don't run your own model because they're just updating the, they're updating the models in the background all the time.

And they have to, because new jailbreaks are found all the time. So they can't not patch the jailbreak, but then if they patch the jailbreak, who knows what else changes when they do that patch? So if you're shipping real product on top of someone else's API, like that, it seems to me like you just decided to cast your fate into the wind a little bit.

But to your bigger question about how do we think about the practice is just moving to this kind of very stochastic approach. I actually think that we'll have two things converging and we'll get less stochastic here [00:53:00] that will actually put some sanity into this. Number one, I think as models get smaller and as people start building more chain of thought and mixture of experts kind of things, we will get sharper and more specific models that will tend to get them pulled in to do specific kinds of things and just the architecture of this will cause these things to have maybe a bit more correctness, okay, on some of the stuff.

And then on the other side of it, I think we may actually This is just a weird thought, but we may actually think about creating some intermediate declarative language like structure and have that be the thing that's generated and have the thing get automatically validated before it gets executed, right?

And so now you have models that are not directly just generating a bunch of Python, maybe with some like wrong code or even, God forbid, some red team like adversarial code in there. You actually generate stuff. By patching together, well known bits that came in through a [00:54:00] fine tune or something. So we're not just writing arbitrary bits of Python.

You're actually doing more of a stitching process of well known pieces. I don't know. It's just a thought that I think as an ecosystem, we're going to figure this stuff out and evolve it to where I think this is right now a current problem, but hopefully I think it'll settle out a little bit. 

hugo: Yeah, totally.

And to your point, several patterns I've seen, one is people originally, at least especially with internal testing using. GPT 3. 5 whatever GPT 4. 0 and then getting tens of thousands of conversations then deciding to self host and fine tuning that model based upon these conversations, getting similar performance.

The other thing that we're talking around is how we incorporate classic techniques with modern ones with generative ones. And one pattern that I've seen very successful that actually reduces latency and reduces cost as well. Let's talk about information retrieval. You can use. More classic techniques like BM25 to do your information retrieval and then have [00:55:00] a generative aspect which produces a generative response.

But then you can do all your evaluations on the information retrieval part, which is significantly cheaper than doing it on a generative part. Starting to see how decoupling these things, and to your point, we're creating pipelines, right? We're stitching things together. It's Super Mario Brothers once 

peter: again.

It's all plumbing, right? It is a lot of plumbing. It is a lot of plumbing and figuring out what's good at what. But. Yeah, as opposed is, I think it's just even the thing about we're talking before the podcast, the question of counting the number of letters in a word or something, the standard thing for tripping up one of these models.

The models are really good at writing the code to do that. Let the models write the code to do that. And that's actually a very powerful way to use them, right? They can probably generate SQL queries to go and get the information from an actual existing database that will not hallucinate. Let the models do that.

So we have this incredible bridge from human semantic intent and technical semantic content into these artifacts, which would have taken teams of programmers, leverage them for that part of it, and then let these other [00:56:00] things do their bits that they need to do. Totally. That's obviously a very workable approach.

hugo: Yeah. And, but those examples are so wonderful because they're almost like MVPs of. LLMs were tool use, right? Well, they bring up a variety of different concerns that when we think about Even experimenting with agentic approaches, we really need to be mindful of. For example, if you're getting your LLM to generate SQL, are you allowing it to execute arbitrary SQL, right?

And maybe you give it read only access as I would an intern, right? So maybe Simon Willison talks about LLMs as being weird interns, which I really like then thinking about getting the first way I'd get an LLM to actually calculate something I can trust. Is perhaps getting it to write Python code. Do I allow an LLM to execute arbitrary Python code on my local file?

I mean, on my local laptop? Hell no, right? I want to like, dockerize, I want to make sure there are serious guardrails. Or, have a human in the [00:57:00] loop at that point. To have a quick check that it hasn't done delete everything from my local file system. But this brings me to a question that you alluded to earlier on about cybernetics, like how do humans and computers, what does the pilot do?

What does the machine do if we're entering a world where we're getting LLMs to do stuff with this type of tool usage? And I love that we've talked about agents. We can think first as Anthropic has these great blog posts around of augmented LLMs. So LLMs that can use basic tools before even talking about agents.

What do you, how do you think about? The cybernetic nature of it, what we do and what the machines do. 

peter: I think it, this gets into a few things. I think that the machines, because of the fact that they emit language, it's very easy for us to convince ourselves that they know more than they actually do. There is emergent behavior, which is creepy and okay, there's something going on there maybe, but for the most part, we shouldn't allow ourselves to be deceived by that dynamic, right?

That's a skeuomorphism in a sense, the cognitive, cognitive skeuomorphism. [00:58:00] For the most part, LLMs, even through the transformer circuits, as they're cranking through and, and building a string of code or whatever else in response, LLMs don't have a lot of context of the outside world as it exists right now, the way that a human just knows, right?

And so the human's job is, we say the LLMs are augmented with tools, but the human is augmented with the LLMs. And the human's job is to bring the context to the LLM. And say, here's what I need you to do. Here are the kinds of things that you should be thinking about. Here are the sites to look at, whatever else.

I would draw the line at the human to LLM interface, really around the human calls what parts of the unknown unknowns to engage with, and what parts of the known unknowns and the known knowns to deal with, and how to deal with all of these things. And it leaves, so the human might collaborate with the LLM on defining some of the what's.

It definitely leaves the LLM to go and implement the how, although it might spot check certain things. Going back [00:59:00] to what we talked about earlier about demarcating, have an LLM running some code, I would actually really like to have an interface that just shows me, here's where it's trying to read from the file system, and here's where it's writing to the file system, from this code it produced, right?

And, and I'm like, network interfaces. 

hugo: Something you're really talking about there is the abstraction layer at which we interact with machines as well, which is one, as we've seen in the past couple of decades and historically one, the success of PyTorch is in huge part. There are a lot of factors, but due to the API and the abstraction layer, which we can interact with these systems.

peter: You can actually imagine for the cases, like if you want to sanitize your SQL, right? You can imagine how the LLM generated the SQL and then you'd either have a linter on the outside of it and say, I asked the LLM to play with these tables and these, are these actually the tables that are being referenced and the columns being referenced in the SQL query.

And if that all checks, then great. You can automate that a little bit. It's not, not foolproof, but, but in any case, my point is that I think you would look at the human role is to land that kind of context and contextualize the what and bound the what. [01:00:00] The LLM can take the lower side of the what and then turn it into how.

The human can check the hows and then go and execute. And yeah, that's still a bit of extra work the human has to do, but it's the whole thing is still way accelerated relative to the human doing it themselves. And I think that's a very kind of healthy modality to engage with these things in. But this is my point about a little bit earlier about how I think as this time goes on, we're going to find out these additional, better abstractions to put around the LLMs as they're doing the tool use.

So they don't hurt people. They don't hurt themselves. The example would be like a industrial robot arm. If you're around those big industrial robots. They don't have eyes on them. They have no idea what's around them. If they energize when you're in the orbit, they could whack you and kill you like instantly kill you.

Right. So what, what do we do? We don't forbid the use of industrial robots. We just have a bunch of different things. We've got like a big red disarm button, six foot away. We draw a big yellow and black taped thing on the floor to stay away from the robot. Sometimes they put actually cages around them.

There's all these things we do just to bound the safety of human interaction with them. And I think cognitively speaking, it's going to be [01:01:00] no different. We're going to be evolving these agent frameworks and agent tools that do put some kind of sanitization around IO file system, et cetera, et cetera, kind of things.

Yeah, 

hugo: absolutely. And I don't know about the history of industrial robots, but I presume we have to see serious failure modes to, 

peter: you'd hope that humans had a little bit more foresight. But probably not. Uh, so 

hugo: we're going to have to wrap up in a minute, but I'm interested in a lot of our listeners, data science leaders, and data science executives, and of course, machine learning AI people.

So I'm wondering if you could just give me some advice for them on practical takeaways that they can make in their kind of work lives and at an organizational level to make sure everyone's keeping up with what's happening. 

peter: Yeah, I think some practical things would be. Honestly, like you and Anaconda, like we, we have a Slack channels that are open to everyone, not just the tech organization, talking about news that are happening in AI and our internal people's takes on some of this and some dialogue we're hoping here to start up.

We're a bit [01:02:00] remiss on this show started this a while ago, but we want to start like a reading, almost like a brown bag lunch series, right? Where people are making this accessible to everyone in the org. So that product managers who tend to be pretty technical here, but then sales and marketing and executives, everyone can have a very accessible place.

And the reason I think that it's important for the leaders, for the data science leaders to do this, not just from a practical level, like this helps your org get up leveled and democratize the information. Inoculates them a little bit to the hype, maybe, but from a political standpoint, it's also very good because it keeps you from getting pigeonholed as the old crusty data science ML person and not the new hot, sexy AI person.

No, I'm actually the hot, sexy AI person too. And that would be, that would be a way to just grab a little bit of those, the limelight and the laurels for yourself there, because let's be honest for any organization to actually credibly. Move into adopting AI, it has to have its stuff together on data, on MLOps, kinds of things on being able to stand up and flexibly burst into GPU workloads, all the things that data science and ML [01:03:00] people have been tripping up on for years are the exact same things you're gonna have to get good at for AI.

So it's not like it's a completely new, weird thing. It's really just an extension of what all the data science and ML people have been doing all along. So that might be a thing to think about, which is to say, look. Here are my battle scars from trying to get a GPU cluster stood up in our environment.

Here's my battle stars trying to access these data sets and put them together for this particular project that we did for that group, for that line of business. And the lessons for, lessons learned from here to our AI journey are the following, right? So all these things, you can see the theme here is you have every right to put yourself in the middle of the path, in the middle of the conversation on AI within your organization.

Have a little bit of creativity and a little bit of chutzpah about going and doing it. But, but I think just from a career advice perspective, that's what I would tell people to go and do for sure. 

hugo: I love it. And something I'm hearing in there, and I'm not sure whether this was quite intentional, but I presume we agree on this is I think there's an [01:04:00] incredible opportunity to bring technical builders closer to the business and business leaders closer to technical aspects as well.

And people develop understanding within an organization also. 

peter: Yes. And again, anytime you're not building just a mere like text document or Chilo bot or a chat bot or something, when you're actually trying to use these AI technologies to drive business outcomes, those business outcomes and decisions are almost always driven by classic ML data kinds of things.

So you're going to be a component of that, right? You have every right to drive the awareness about what, what does it mean to be Bayesian? What does it mean to explicitly talk about our assumptions and our priors in this, right? These are things that the LLM is never going to tell the executive leader of whatever business group.

You're going to have to actually drive that awareness into that group into that particular POC or whatever they might be doing with AI. 

hugo: Yeah, I couldn't agree more. Although I would say you can never say what it is to be a Bayesian. You can only say what it probably is to be a Bayesian. I'm so sorry. So many good jokes 

peter: there.

hugo: Yeah, I had [01:05:00] to. 

peter: What it would have been like if you had been a Bayesian. Exactly. 

hugo: Conditioned on. 

peter: Conditioned on you having been a Bayesian, this would have got much better. 

hugo: Yeah. Peter, I always loved speaking with you and I've always appreciated all the work you've done, but how you, you come and talk about it and bring everything, you know, to a wider audience.

So really appreciate it, man. And thanks for such a wonderful chat. 

peter: Thank you for having me. Always appreciate our conversations. This is a pleasure. 

hugo: Thanks so much for listening to High Signal, brought to you by Delfina. If you enjoyed this episode, don't forget to sign up for our newsletter, follow us on YouTube, and share the podcast with your friends and colleagues.

Like and subscribe on YouTube and give us five stars and a review on iTunes and Spotify. This will help us bring you more of the conversations you love. All the links are in the show notes. We'll catch you next time.