The following is a rough transcript which has not been revised by Vanishing Gradients or the guest. Please check with us before using any quotations from this transcript. Thank you.

===

hugo: [00:00:00] What is up, Ty?

ty: Hey, thanks for having me. I'm super pumped to 

hugo: be here too. So Ty, you co founded continue, which is an open source among other things, AI code assistant, empowering developers with a modular approach to build smarter workflows.

You focus on flexibility, customization. and develop a first design, all things that are close to my heart. And we, although we only started chatting recently, we have a rich history of people we've worked with and we both worked with Rasa, for example, on conversational AI stuff, which is, which is super cool. 

ty: Yeah, definitely. Right. Between Alan Nichol, who I saw you had a post earlier today on O'Reilly with him was quite good. Exactly. And, other folks like Vincent,at scikit learn, at probable as well, number of folks. So it's good to finally get on, get on the vanishing gradients podcast. 

hugo: Exactly.

So I'd love to know, maybe you can just start by, I've given a brief intro, tell us a bit about yourself and what led you to, to found continue and what you're up to there. Sure. 

ty: Yeah, definitely. Yeah. Really appreciate you having me. Yeah. Like Hugo said, I'm [00:01:00] Ty or Tyler. I'm one of the co founders of continue, my journey kind of in this machine learning, natural language processing these days called AI generative AI, space kind of goes back about a decade.

I went to university around the time that deep learning was really starting to come into to its own as kind of data compute and algorithms got better and more prevalent and so very interested in and how kind of language and computation intersects study cognitive science got very interested in.

Conversational interfaces or chatbots, which at the time were a natural language understanding component that was using recurrent neural networks and then the state of the art and industry at the time was state machines, right, where you would have a deterministic kind of dialogue management approach and.

Ended up joining Rasa as the first product manager. If you're not familiar, Rasa is a open source machine learning framework. At the time it was, I guess it's now evolved, but an open source machine learning framework for conversational interfaces for chatbots. And, the kind of real level, like, innovation we were focused on is [00:02:00] how do you bring machine learning?

How do you bring deep learning to the dialogue management space? And so through that experience, I got to experience building for developers, building an open source community. Getting that project into about 10 percent of the fortune 500, really kind of really getting my feet wet and actually like, how do you productionize, machine learning, in organizations in the real world.

And when we were there, we were working on research with open AI at the time. and. We were fine tuning GPT 3 to be a user simulator because you couldn't actually put these base models without fine tuning into production. But what we did think you could do is have them almost approximate your users so it could talk to your supervised machine learning based chatbot before you gave it to customers and maybe burned some customer relations and that there was an opportunity for a feedback loop there.

And while we were doing that, we were working with With some folks at OpenAI and Codex came out. So I just said, Hey, can I get access to Codex? Which was the, the Python LM that powered GitHub Copilot originally. And so I got access for about five of us,engineers at the, on the Rasa team, and we were generating [00:03:00] Rasa action SDK code, and it became clear to me at that point.

And then as more models came out, eventually GPT 3 and. GP 3. 5, GPT 4 at one point that LLMs were going to be at a minimum general text manipulation tools that were going to really change how we work as, as, as cognitive workers, as people who often are moving and manipulating text and chief among those are often software engineers and software has a lot of properties that I struggled with in the conversational AI world where it was really hard to know, like, what is a successful conversation?

But with software, right? You can say, does it compile? Does it pass a linter? Does it pass tests, right? There's ways in which you can have signals. does the developer end up manually editing some code? Maybe that you generated, they merge it in, right? All of those kind of signals can be used to as feedback mechanisms for the AI system that can make it better.

And so as I was working on side projects with my now co founder and my free time, we just started using Github copilot. Eventually, GPT three da Vinci, and then eventually chat [00:04:00] GPT more and more in those side projects. And so kind of with that fundamental understanding from Rasa, with that very practical hands on experience of building side projects, the last side project we ever did is the one that turned into continue and, decided to turn it into a company.

And I've been working on it for about 18 months now. So that's kind of the origin story of. 

hugo: Wonderful origin story and congratulations on all the, all the work you're doing as, as well. I do want to get into what you're up to continue, but I also want to say, for those who are building with LLMs and AI at the moment, I do think looking to work that tie and, and people at Raza and people who've been working in conversational AI and chatbots for, for a long time is actually worth doing because to your point, like you needed a decade ago, you needed to do.

Create a lot more structured workflows, and have quote unquote, like intent recognition and all of these, these types of things and NER. and now we're at a place where people have started cause we can chat with LLMs. [00:05:00] People have started to try to just to get them to do everything kind of prompt and pray to, to the point of the essay you mentioned, which I've linked to in the notes as well.

Lo and behold, now we're actually, we've reached a point where it's very clear that it's the structured workflows with significant guardrails around large language models that are delivering the most value. And I'll link to the LLM database that the people at ZenML have been building and all of their articles around the importance of these structured workflows.

But all that's to say is that the working conversational AI, even pre LLMs, can tell us a lot about how to build these systems now, don't you think, Tai? 

ty: Yeah, definitely. Like I remember the NLP for developers channel that used to Rachel Tateman, Vincent had at Rasa was quite good in terms of kind of just breaking down machine learning and natural language processing concepts.

Right. Like. Whether it be tokenization or how to vanilla neural networks work. All of that is very important for understanding and beginning to learn how to use LLMs. And, you know, I mean, at the end of the day, kind of [00:06:00] generative AI or supervised machine learning, I view it as software, right? And we're using this software to automate things in our lives and in our work.

And LLMs just open up a new set of reliable automation tasks. And the kind of excitement right now is we're figuring out where is it reliable? Where can it be trusted? Where do we need to introduce kind of software 1. 0 concepts? Where do we even maybe need to have software 2. 0? Like classifiers, involved in, in the process, right?

And where can we leverage LLMs in a way that actually allows us to have confidence that these systems are working like we intended and having the, the effects and side effects in the world that we want. 

hugo: Without a doubt. And it's kind of fun that we're starting to actually. Be able to build software again, because machine learning has, you know, had a lot of software associated with it, but a lot of, you know, handwritten spreadsheets and those types of things as well.

Funnily, and, I am teaching a course currently on it's building LM applications for data scientists and software engineers. [00:07:00] And I'm just really excited to have 80 people to have like half data scientists, half software engineers in the room together. and conversations we have even around nomenclature, like.

So the term test, we may need to change when we talk about using PyTest to test machine learning pipelines, engineers, software engineers rightly so expect tests to pass a hundred percent of the time. If they don't, your app's not working, right? Whereas if any of my tests passed a hundred percent of the time, that would be a bad smell.

Right? 

ty: Yeah. How have you overfit to whatever problem you're modeling? 

hugo: Exactly. so having said all of that, can you tell us what, what you're up to at Continue 

ty: yeah, for sure. So, we're building the leading open source AI code assistant. And so, first thing to understand is practically what does that mean?

It's a VPS code extension, a JetBrains extension, that allows you to leverage large language models while coding. Whether that be kind of an auto complete experience when you're typing and there's ghost text that you can just hit tab to accept. So you don't have to [00:08:00] manually type it out. Or if it's more of like a chat GPT like experience where you're chatting, you highlighting some code pressing command L and you're asking about it, getting advice from the large language model to help you figure out what to do next, or even edits where you highlight some code and you ask for the code to be transformed in some kind of way.

And so we enable you to do that with an Apache 2. 0. interface and VS code and JetBrains, it's open source kind of for three main reasons, first and foremost, like, I think it's really important what will ultimately be very important public infrastructure, like similar to Git and Docker, a lot of the way in which we were able to build really incredible cloud based systems, really incredible kind of DevOps systems and other kind of ML and ML ops systems is the result of ensuring that those like very fundamental pieces are open source or at the very least foundational.

For, for the other people kind of in the ecosystem. And so we're very focused on that because then it sets up like developers to really rally around it as well as other kind of tool [00:09:00] makers and vendors to be able to rally around it where they, they can create things that on top of that foundation, knowing it's going to be there, understanding how it works, trusting it when they get a suggestion that's, that's bad or that they don't like from the system, they can go understand how and why the system came to that.

So they can figure out solutions, whether that's new modeling techniques or. New rag techniques or new guardrail systems that they, they put in place in order to, get better suggestions in the, in the future. and then lastly, kind of the one that I think is going to ultimately drive a lot of how these systems work is the actual data you generate by using the system by making it open source instead of other vendors in the AI code assistant space, who are actively collecting your data and using that to improve their models on average for everybody in their user base.

Often without your knowledge or with your, without your full realization. Instead, we enable you to have access to your data so that you can use it to improve the models or improve how the system works and decide if you want to pool it with other developers in your [00:10:00] organization or with other folks in order to in the longer run have models that are much more capable and systems that are much more capable of helping you to automate software development in the way that you work, right?

So you're not having to adjust yourself to whatever the suggestions of the AI code assistant are, but that you're able to adjust AI code assistant to have suggestions that are more like how you build. And I think that's critical because each of us has our own definition of what, what great software is.

Each of us has our own definition of what is the subjective right way to do things. And if you're stuck in a world where. You either have to accept the suggestion as that's the right way or go do it manually. I think that's a bad world to end up in. I think a much better, brighter future would be if you get a suggestion from the system that, that you don't like, or that's failing, or that's bad in your opinion, you can go do something about it.

And so that fundamental problem, that fundamental frustration is, is what we're really trying to tackle at continue and why we're open source or why our extensions are open source and kind of how we're approaching the problem. 

hugo: Super cool. So it sounds like I'd have a lot more freedom and control. [00:11:00] having said that, that often introduces more complexity into my life as a developer, right?

So how do you think about these, these trade offs when you're building developer tools?

ty: Definitely, right? I think you have to find ways to offer sensible defaults, right? Like the first experience when you go try continue, right? Like you're not faced with. The fact that there are six or seven models that you could choose from like model types you could choose from and then probably thousands of models that could fit into those, those types or roles.

Instead, you're just given here's the best experience and there's actually subdivisions of it of best overall if you want to use API's if you want to be entirely local. We have that experience with Ollama or LM studio, or you can go kind of dive into the docs and, and begin to learn about those concepts, but upfront, you don't need to, right?

The most important thing is kind of time to first custom AI code assistant is kind of how we put it. but you don't need to know in the beginning that it's custom or even what a code AI system is. You just need to get it installed and then highlight some code and begin playing with it [00:12:00] and playing with it starts to unravel.

Kind of the value that you can benefit from leveraging LLMs while coding. And then once you start to get suggestions that, you don't like or that you wish were better, then you can start to explore like, what if I like changed out the model? What if I changed the kind of system prompt, the rules? What if I created some prompts because I find myself repeating these things over and over and put that into a dot prompts file?

What if I build a custom code rag system that indexes all of my JIRA tickets and my My github repos and plugged those and could that benefit me so they don't have to do so many things manually. You don't have to do so much copy and paste. And so, the idea for us is always kind of like first start with sensible defaults.

And then number two is, is then progressively reveal things to you, right? So that the abstractions are available for you to easily just like Lego bricks. Swap in and out, different components. So you don't have to like go implement some custom class yourself and learn all about kind of our internal code base.

How much as [00:13:00] possible, can it be just configuration? So today we have a config. json. We're currently testing and about to move over to a config. yaml, which is kind of setting us up for a future where, those Lego bricks are more like packages and a package manager and, and, kind of longer term. It's all about just making it as frictionless as possible.

For folks to be able to basically take advantage of new Lego bricks that are created by tools and vendors and in an interoperable fashion, right? But it's a really challenging, thing to try and figure out what are those interfaces that are the right ones for the interoperable to work, but definitely the starting point is, , sensible defaults number two is.

Progressive reveal of kind of different things. And then as you've progressively revealed stuff, how do you make it so that it's as frictionless as possible to, once you understand all the concepts, swap in and out the Lego bricks and that, that third piece is the one we're, we're working on this year in 2025, 

hugo: amazing, sensible defaults, and then slow revelations, depending on what, what people's needs are.

And I actually. I, so our mutual friend who [00:14:00] introduced this Vincent Varmadam. That's the best Dutch I can do. And I, I, I can't, so I'm sorry, Vincent, if I butchered, butchered your name. he actually had a post recently, some time ago about he, the term sensible defaults. Makes a lot more sense theme as opposed to best practices.

And we should use sensible defaults, a lot more because it always depends on what, what situation you're in. I'm interested, so a big part of my background and one of the communities I feel at most home in is in PI data and sci pi stuff. Right. the reason I mentioned that is because, you know, this is like big open source development, community. and even in that community over the past several years, people have been talking about.

All the proprietary, coding assistants. And I've, I've rarely heard someone even say, you know, Oh, why don't we have open source coding assistance? Right. So I'm wondering from your perspective, it feels like a rug has been pulled out from under us or something like that. I've got a similar [00:15:00] critique of.

Release notes that we don't get with like open AI now. I think we all need more, more release notes, but I'm wondering from your, your perspective, why, why hasn't there yet been a larger conversation around, more open source coding assistance, even in open source communities, if that makes sense. 

ty: Yeah, yeah, definitely.

I think it first starts like we're at the part where like people just want something that works and that is easy to use and allows them to begin trying it. So I think that's kind of the stage we're at. That's the first part of it. I think the second part is okay. Now that I can try it, like. We're just so many people were so early and figuring out like, when do I personally decide to code something manually?

When do I decide to ask the LM to help? Right? When do I give up on asking them to help and do it myself? When is it worth reviewing like all of us? And it's really a personal journey that this requires a large amount of play today to figure out like. When am I faster at, like, typing this specific shell command versus asking for the LLM to help me with this shell command?

and it really [00:16:00] depends, like, have I memorized that shell command before, or is that something that I know the LLM is capable of giving me reliably, or does it give me an insecure one, and then I need to, like, go and, and learn about what shell command it gave me to try and figure out what all these flags are, right?

And so. I think that's number two, like, and those are two of the biggest ones that I think we're just so early in figuring out how to leverage LLMs while, while coding, like GitHub copilot is, I guess, almost four years old now, but like in the grand scheme of software and the vast majority of us, I think only started in the last year or the last two years, the last three years kind of using, tools, maybe that's a part of it as well as like the first tool was, was proprietary.

And it's only been recently that open source models have even existed. So. Yeah. and got, and lately they're just getting good enough. Like with the release of DeepSeek R1 yesterday, like we're starting to really see the gap close in terms of open source versus proprietary models. And so maybe number three would be then that, like, there really wasn't an option to create, an open source AI code assistant, or, it, it was not nearly as [00:17:00] easy, say, even a year ago, let alone two or, or three years ago.

And then lastly, I would say we're pretty early in like. Figuring out what are the Lego bricks, even if we do have, models and prompts and rules and context providers and rag systems and all of these things were really early in figuring out how do we actually change those, those, components in order to actually get better suggestions, better results.

And I think what will happen is. As folks are able to build better AI code assistance, like so internally at Google and at Meta, at Salesforce and organizations like that, they are already building their own custom AI code assistance. And it's now just, I think, just starting to diffuse beyond those organizations to other organizations as, as best practices emerge, we'll see more and more people be like, Oh, I could, we could do that too, right?

Like you would really benefit like, Oh, I didn't realize when I use insert one of these proprietary tools that like gathers our data and like. Oh, it'd be really nice to have that data to like create kind of the dashboards to figure out [00:18:00] what trainings we should give people or use that data to fine tune the autocomplete model to give suggestions more like us or use that data to figure out how do we improve our RAG system.

And so, I think we're just super early as maybe the, the TLDR across all the examples that I gave. That I think more and more what we're seeing is folks be like, Hey, I want something custom, something tailored to my developer environment, the way that I worked, the tools that I have. but you're right.

Like it, in some ways is very surprising that, that many of us have, have decided to just go to use proprietary tools, despite working in, in, in the open source and the data and the, in the Python world and seeing the benefits of, of kind of a modular interoperable. We're in this together approach. And so my hope is that we can remind people of that and get them back focused on, on something that allows us to all kind of, have a, a rising tide, lift all of our boats is kind of put it.

hugo: And I do think it isn't something that's happened in an isolated fashion. Right. At the same time, we have. Incredible vendor APIs, which we all love pinging, [00:19:00] which is a really new thing to have black box models that you just ping and you have no access to them. Of course, it was preempted a bit by I, the ability to ping models people think is something new.

But of course hugging face is one of the big spaces, quote unquote, where, we, we saw this happen. So there are precedents similarly culturally, I don't wanna get too sociological, but. We've had the rug pulled out of us from owning stuff. We also have subscription services to stream everything now. Right.

so I, I do think there are a lot of kind of interesting confluences here. I, I love that you use the term diffuse. I don't know if it was intentional or not, but I do, I do think it is a diffusion process and we're seeing things transforming as well. I, I would love to know, From you, what you see is the.

Maybe the main point, the main differences between the developer experience in perhaps you could even talk me through what the developer typical developer experience is using a proprietary coding assistant, and then a typical developer experience using something like [00:20:00] continue. 

ty: Definitely. Yeah. So, I'm not as, as familiar with the proprietary experiences 'cause I use continue all day, of course.

But can I did start off using them like most of us. so I would say that like the proprietary experience is like very nice out of the box, right? Where you often are just putting your credit card in or they, they have some kind of free trial where you can just install it and begin using it. and you don't have to think about at all, like where the, the, the models.

What models are being used and like how context is being constructed. You don't have to think about, pretty much anything. You just begin using it and, and you do the auto complete, you kind of chat with the model, you do some kind of transformations with edit and it just works. And that's really nice And and many ways it's the thing that we are always aspiring to with continue, right?

It's basically sensible default. but it's, it goes further than sensible defaults. And this is where I think, the differences start to, to come in is it's kind of the only option. Lately, many of them, I think I've started to add the [00:21:00] ability to like switch out different models, but they're still only the models that they've decided work.

And it's generally only for chat, right? They don't really let you decide on your autocomplete model or your apply model or your edit model and, and things like that. So you kind of have. Basically, everything decided for you. And then as you're using it, often these tools are taking your data and then, you're signing terms of service, privacy policies that allow them to use that data, for, for training.

some allow you to opt out. Of that, but you have to learn that they're collecting in the first place and they go figure out how to opt out and do that. And then, you know, I mean, as you get bad suggestions or wrong suggestions from those systems, you're quite limited, you know, I mean, they have some amount of ability to like, maybe change your, your rules or maybe, add, add some kind of like different documentation as, as context, but it's, it's quite limited.

And, as a result, right, they kind of have like a one size fits all approach, which is great when you begin, but then when you start to realize like, oh, [00:22:00] the, the t shirt size that they've selected is, doesn't fit you, right? You're, you're kind of just stuck. And so then what happens often, what we see is people go, Hey, I really enjoy kind of using LLMs while coding, but.

I really want something that works better for me. And they kind of then embark on this journey of like building their own, their own AI code assistant, like whether it's like a command line tool or. They try to create a VS code or JetBrains extension, and that's a lot of work, right? Like that is, as someone building those, it's, it's not easy to do everything from scratch custom, right?

I mentioned earlier, like Google and Meta have, but the vast majority of us, like in the beginning, it's maybe something that we can do, but as we, as we want to, to, to like really make it a polished experience and have it be something we use every day, what we found is it's, it's, people get quite excited about that and trying out continue because it's kind of like the in between of that, like.

Proprietary, completely vertically integrated experience and the like, start from scratch experience. We do still get the benefits of like having [00:23:00] something with sensible defaults, in the beginning, but then you're able to, change those defaults. You're able to understand when a suggestion comes in, how did it come to that?

Right. We, as transparently as possible, try to give you logs that you can go see. You can always jump in the code base and learn, you can even ask, continue to. To explain the continue code base to you and understand, okay, if you get an auto complete suggestion, what, what really caused that and our documentation is all focused on that, where you can actually go see, this is how the system works.

This is how you either manually or it automatically selects context. These are different types of models that you have, whether it's like an apply model that. Use a speculative decoding to very quickly take the section from the sidebar and insert it into your text editor. Works the autocomplete model that has a special fill in the middle token that allows the model to be able to be very performant at filling in code in between existing code.

That those concepts exist and that they, they're the things that power your assistant. And so instead of treating [00:24:00] the code assistant like a black box, like you do in the proprietary world, instead, we begin to help you build mental models. Like as developers, we're always trying to understand how things work and continues all about trying to shine that light to help you.

Not necessarily upfront. You don't need to be overwhelmed with information, but as you start playing and using it, you start to better understand. this AI code assistant you're working with, and then it's all about like the knobs and the tunes and the dials and the, the different Lego bricks that you can begin to easily swap in and out so that when you get an autocomplete, ghost text, or when the modeler applies to an answer with your question, you're able to get better suggestions going forward.

And then as you're using the system, instead of your data being collected and sent off to centralized databases. instead it's all stored locally on your computer and you can do an analysis on it, or you can decide to set up an ELT job on your computer and all of your teammates computers and put that into a central place so that you can see all of your data together in that longer term, right?

We [00:25:00] end up in a world where, I mean, as we're kind of moving from the old way of manually programming, which I would say metaphorically is very similar to when we used to program in assembly. That you're now moving towards a world where you're programming in higher level languages. Often you're able to move an abstraction layer higher and use an LLM to do something that you used to do manually.

So it's very similar to the assembly to higher level programming language, transition. And the, but the key part about that is it's not about like automating your job away. Instead, it's about amplifying. kind of yourself where in the end you end up building the system that does a lot of the work that you used to do and your new job becomes building and maintaining and monitoring and improving that system versus the kind of proprietary experience.

If you take it long enough into the future, where somehow to be determined, they somehow manage to automate everything away and you just don't have a job anymore. So it's very much like. Thinking about as a developer, as a friend of developers, as someone who's built a career on building products to developers, what is it [00:26:00] they want?

And I think what they want is, you know, I mean, automate the tedium and repetition and, and anything that is automatable, but that in a way that allows them to have understanding, to be able to, to configure and play and to ultimately, be in control of the system. And so that's very much what the continue experience offers relative to.

Kind of the proprietary experience. 

hugo: Wonderful. That's, that's such a wonderful vision. And it seems eminently sensible as well, to be honest. And it's something that I'd like to, to be able to use. I am, I am very attracted to, you know, there's this old question like, Oh, once, you know, the machines create all the value, won't we still need people to fix the machines?

And whether that's, you know, That age old question aside, I think it is relevant here to think through the idea of part of our jobs becomes figuring out how to put all these different Lego blocks together to build our system to help us. build software, but also allows us to [00:27:00] focus on systems design more.

And this is something I'm very excited about and with your background in product, I presume we haven't talked about this explicitly before, but, the roles of, product UX and systems design. Now we have things that can help write our, you know, if else statements or whatever it is, right, is, is actually a fantastic win for the space more generally.

ty: I very much agree, right? Like to be a designer that's able to start to do some front end prototyping without, in a way that they couldn't be before a writer for a product manager to be able to, to help fix bugs and, and solve things they couldn't or solve problems for themselves in a way they couldn't before.

I'm quite optimistic about the ways in which all of us can be more deeply involved in the, in the process of building software as a result of this, and have it be a more collaborative. fruitful kind of enterprise, and that these tools should be thought of as how do, how do we use them to enhance, collaboration, right?

Or kind of amplify collaboration among all of us by making it so that it's not like I have [00:28:00] my specialty and you have yours, but all of us are able to kind of go into. Domains we weren't there before, we weren't able to as much before so that we can have better conversations, more clear conversations so that we can explore together where it's not me knowing my, my thing and protecting my kingdom, but really benefiting from learning and listening and, and my collaborators also doing the same.

And I think this kind of unlocks that. So I'm, I think the definition of developer is even evolving right now, which is super cool as more people are, are getting, gaining confidence that they can, they can build software, which is probably my, my definition of a developer is someone who builds software and doesn't necessarily need to be, have the title of software engineer or something like that.

hugo: Totally, without a doubt. And I'm also excited about the other direction that us as Developers and people who've written code and build, build software. Don't have to worry as much about the minutiae of getting like debugging pylint stuff, and we can actually focus more on building systems and user experience, all of these types of things as [00:29:00] well.

I'm really excited in hearing about what, cause you think, think about this so deeply and work on it, and live and breathe it. For people building these systems and writing code, when thinking about building your own coding assistant, what are, what types of Lego blocks do you need to consider?

What's the space of available Options and feel free to give me like a graduated approach. Like when you first use continue, what do you think about? And then move through.

ty: Definitely. in the beginning, usually it's kind of, I would say three main things that people are, once they've decided that they want to know a little bit more.

The first is like, what is the model that you're going to use for kind of chat? Like you're almost like your chat GPT experience. So this is often like the largest model. that you could use. So a Claude Sonnet three and a half from Anthropic or a Llama three from, from Meta, like these models that are very good at kind of the stack overflow replacement questions or other kinds of consultation you might do.

Then they figure out their [00:30:00] autocomplete model. And this is one that people, that's like the first insight people have. They're like, Oh, you don't use GPT 4. 0 for autocomplete. It's like, why not? And well, it turns out that smaller models actually do better at autocomplete than the larger models.

And not only is it better completions, but they also are faster and then also much cheaper to run. Like why run a 400 billion parameter model when you can run a 12 billion parameter model that. gives you much faster responses and is far cheaper. And so kind of, that was like in many ways, the first creation of, you know, I mean, not in chat GPT powered one single model, but the reason to have AI code assistants is in many ways, because there are multiple models.

And then the last one is your, your embeddings, right? So that you can embed your, your code bases and your docs and other software development, lifecycle data. And so, a good example would, would be from voyage AI. They actually have code embeddings, which is super cool where they've trained, I think it's called code embeddings three is their third version of it, and they've actually trained embeddings, kind of [00:31:00] on text to code, on code to text, on code to code.

That actually performed better than maybe general embeddings that are just off of, say, English. and maybe have some code tokens in there. And so those three models are generally the first models that people begin to play with. And so, some people like to have an Ollama experience where it's all local on their computer, and they can have that, and they have to like figure out how much VRAM they have.

In order to figure out how to fit all three of those models on there, while others will be like, I just want the best possible. And I will use proprietary vendors. And they'll select Voyage AI, Anthropic, and Codestral from Mistral ai and plug those in. Some will do a mix, right? Some will deploy models in their infrastructure.

And so that's usually where people kind of get started in terms of, of customization. And then over time, there's. Things like, oh, your apply model, which could be the same as your chat model, but actually that, that task is once again, a task that can use a model that's much smaller than some of these bigger models.

So maybe you use haiku from Anthropic and then your chat and your edit model are usually the same. But [00:32:00] once again, like that task is a little bit different. You don't necessarily need. The best model. and we're seeing more and more of these models. Like when we first started, there was just the three.

And now I think we're up to six or seven that the most advanced users customize. So models as the thing that's really driving this AI revolution, are some of the most important customizations where you're going to see the. Biggest gains, but then in addition to that, you're able to do things like, determine your rules, which is kind of our word for system prompt, where you can say like, Oh, I'm programming in Kotlin, but I'm like kind of a TypeScript programmer by default.

So like teach me Kotlin as if I'm a TypeScript developer, and then the model will follow that going forward. Or I'm on Mac OS, so stop suggesting me Linux and windows. And instead just give me Mac OS commands, like kind of rules or this is how I do react. And these are the components I like, and that can be really helpful so that every single time you're to these different models that you're sending instructions, it kind of comes along with things that it should always remember rules that it should always follow.

then if you find yourself kind of [00:33:00] doing repetitive tasks, like we see people customizing prompts and where they can just do a slash command and say like. Slash test. And it gives a prompt that is like very thorough about how to generate the unit tests for that person so that they don't have to like type out because these models do super well when you get very specific and clear with your instructions.

But if you have to type that every single time, then that's a little bit annoying. in addition, there's docs. So the ability to pull in your scikit learn docs, for example, or to pull in your PyTorch docs where you can add to the PyTorch docs and ask a question where Pulls the embeds those docs and then uses a rag system to go and retrieve those, like similar code base and other context providers that allow you to pull in your JIRA tickets or, or your confluence docs, there's tools.

That's kind of a newer one, like giving the model, the ability to take actions in the world. So the MCP, the model. Context protocol from Anthropic is getting people really excited about, like, what if it could like write a SQL [00:34:00] command and execute a query that pulls out some data from a database that then, then, shows up as context that can be used by the model.

And then lastly, I've covered the main building blocks today. The last one is data, right? So what is the shape of that data? Where do you store it? And then, how can you, and this is kind of the most open ended and The loop needs to be closed here right now. All we do is help you store it and put it somewhere.

But how do you build dashboards on top of it? How do you actually use it as a fine tuning data or training data or, kind of data to adapt these models on like a domain adapted fashion where you just do continued pre training all of that stuff is a little bit more advanced today, but. What we see most people do is just at least collect that data and then the first kind of folks are, are building dashboards to better understand it and doing analysis on it.

So those are the main building blocks today, but more kind of emerging over time. 

hugo: Super cool. And I am interested in starting to talk through and reason through how people could use the data that they get out of, out of your system and all of their work. we have a great question [00:35:00] from someone with a great name in the chat.

Powerforce 22. That I wanted to ask anyway, but he actually is asking about the MCP specs from Anthropic, and PowerForce has said few LLMs are compatible while open AI function calling has wider support. So what are your thoughts about the MC, MCP specs? A more general question as well is. Like building something which requires you to plug and play all of these things that may have slightly different protocols, like how do you think about interoperability and composability and those types of things.

So maybe the MCP question, then we can speak more broadly to interoperability. 

ty: Definitely. I've been excited about the excitement around MCP, right? It's, it would be nice if instead of every lab coming up with their own standards and interfaces for very common, similar things that, that we can standardize.

So I was very excited to see Anthropic kind of push that out. and with the advent of Claude Summit 3. 5 and QPT 4. 0, and even some kind of, open source models, like it's becoming function calling is becoming more reliable. And so [00:36:00] we actually continue allow you by default, it's not on, but you can actually like, like a, like a tool button and then basically go into that mode.

And then when you're in that mode, you can decide, okay, is this a tool that runs with confirmation for me from me before it runs? Or is it a tool that runs automatically as the model determines? once it determines. And so I think by default, we only had Anthropic support to begin with. I think recently we've added OpenAI and, Olama support, but it's really kind of a limitations of the models.

And we expect the models to get better, I think, between OpenAI and Anthropic and others, like they're calling it the year of. of agents for, for better or worse that, that term. But I think what that really practically means is they're going to be better at kind of using tools and, and benefiting from it.

And so, I think MCP ideally, like all of us kind of standardized around that, or maybe not necessarily like that, but around something, right. Because to tie into the second half of your question or your second question, It's really important that we [00:37:00] do start to have some amount of standardization.

So like in the APIs for LLMs, a lot of people follow the open AI really nice because then all you have to do is change your, your base URL and you can, benefit from, I mean, easily swapping out your open AI model for your OLAMA model, for example. And so if we can have something similar for MCP or kind of at least the concepts that MCP represents, whether it's.

It's prompts or whether it's tools or, or those other things, I think that's critical. And I mean, I really appreciated how they did, it's, it's just file in file out how it's a local experience. We don't have to think about permissions in the way that you would, if you had to have a full MCP server that you deploy, which I think they're working on, but that ends up having a lot of, of, challenging aspects to it and, kind of round out your answer.

This is what I was hinting at earlier, what we're going to be launching in the coming months. around packages, like I think that's kind of the critical component that you need a sort of package manager where you can define, right? Like these are the different Lego bricks. And that there is a [00:38:00] configuration file and format that allows you to easily plug those in.

And so we're moving from a world where continue has its own config. That's just JSON. And that doesn't, that you have to define every single Lego brick yourself, and you have to understand all the concepts and you have to, to compose it by basically understanding the Lego bricks to a world where the Lego bricks are packages that can be kind of composed together where you can say, okay, let me take.

Like someone is going to create the Python package, which includes rules for Python, the docs for Python, the prompts for Python, and then you can just take that and you can combine it with your package that has all your models in it. And just put them together and then begin using them as a custom AI code assistant when you're using Python.

And then we were going to make it super easy so that you can switch and say, Oh, I'm now going over to Terraform because I need to actually deploy my Python application. And I want to use, use the, the HashiCorp ecosystem to do that. And I don't know anything [00:39:00] about Terraform. And so you have a separate assistant for your, for Terraform that maybe was created by someone at HashiCorp that allows you to easily get started with.

Without having to understand anything about like how to create a custom Terraform assistant, maybe in that case, you just go use that one. And then you use that while you're using Terraform. And then you end up generating more Python data for the Python assistant and more Terraform data for the Terraform assistant.

And maybe in the long run, we could live in a future where you take that data and you donate it back to the person who created that assistant, or you do some kind of arrangement where maybe you pay with your data effectively, right? Instead of paying for money to use their models, instead, you, you give them data to help them improve their models, but it's very transparent and upfront and, You're able to get the benefits, of having something custom.

So that's where we're heading. And as a result, we think that's, what's going to be, the way that we solve the, okay, Lego bricks and building is fun, but maintaining is not fun, right? Dependency conflicts a [00:40:00] year from now are not fun, right? So how do we make that as, as minimal as possible?

so that you get the best of both worlds where it's easy to get started. It's easy to keep up, but it's not so restricting that, that you can't. You can't change it at all. 

hugo: I appreciate all that context. And I do love the vision, which I share, I think is somewhat optimistic, but us actually having a transparent market for data and recognizing the value that that data brings, as opposed to, you know, the way that I think tech has evolved is antithetical to that philosophy in a, in a lot of ways.

I am interested in now thinking through, I love that. anyone who uses continue and part of the vision is you're able to, then get all your own data. So I'm wondering what next steps, how you see people using this type of thing. Cause where my, when you were telling me about the experience, I was like, Oh, next, would I want to fine tune on my data or would I want to.

Do some prompt engineering or even having prompt [00:41:00] templates or these types of things and use my data to inform what type of prompt templates I have. So I'm wondering if these types of techniques are part of the envisaged experience or what people do next. 

ty: there's actually a startup out there, called DataCurve.

ai, which is creating, they call it premium curated coding data for applications and LLMs, right? What they're doing is they're, they're helping folks, that are our software engineers create examples of themselves reasoning, which the labs are then using to help create some of these awesome models for, for coding.

And so in the case of where I think AI is goading with this, with these reasoning models, like the data that you're generating while using these systems. is, is really critical, right? It's the data about what happens in between Git commits. It's the step by step process you take. It's what context you pull in.

It's maybe even to some extent the implicit why you decided to, to, to, take this step instead of another step. And you're kind of explaining it in the models, explaining it as you're working with it. And then also you're able [00:42:00] to see, okay, if I did accept an autocomplete suggestion, did I go edit it? If I edit it, did it stay the same 5, 10 minutes later, an hour later?

Did I ultimately merge it? Was that code still in the code base a year from now? All of these kind of signals, which are a sort of almost annotation that are just done, not as additional work, which was kind of learning from the supervised machine learning era that anywhere that you require people to do additional annotation beyond their work.

Is a lot more work and sounds like a lot of work and therefore won't be done. it's not, it's not something that, that we can, we can bet on. But I do think that what we're seeing kind of the early signs of is, is the labs taking advantage of that data. And as they do it. Right. I think we're also seeing a number of hobbyists and individuals.

Right. also begin to explore fine tuning like supervised fine tuning or this new reinforcement learning fine tuning the data you're creating here can be used for that, like autocomplete models. For example, every time you, you accept a suggestion, you, you, you get that. Recorded. And you could [00:43:00] use that to fine tune models, for autocompletely, which are much smaller than.

The larger models, generally between 1 and 15 billion parameters, there's things like domain adapted, continued pre training where you start with a base model, like say Llama 4, when it comes out probably this year, and are you're able to like continue the process or the way that these new reasoning models work, like, in practices, it seems like they're distilling them from very large things into very small small versions because they need them to run fast enough.

and I think that the distribution of your data and all of your teammates coding is a really, be a really good distribution to, to use as part of the distilling process. So many of the things I'm describing here are, are very cutting edge or very few people in the world know, know how to do these. but I'm starting to see them start to diffuse and democratize further and further.

and would enable us to actually use that data to customize the models, but that's not the only thing you can customize. You can also customize the systems, right? And so that's, we're kind of creating dashboards and doing [00:44:00] analysis of the data to figure out like, Oh, everybody at our company is asking questions about Docker.

Maybe we should create a training module for the team on Docker, right? Or run some workshops on it. And then as you create the content about how we do Docker at our work, you can then feed that as. as context documentation that can be leveraged by the models and maybe in the future leverage for future training data's, training of the models, but then also those other Lego bricks, whether it's kind of rules or context or docs or, or your code base engine, you could start to play around like, what if we swap those out, right?

What if we used a different prompt here? What if we changed the rule? How does that affect, ultimately the acceptance rate of, of us or. You have a large enough team, right? Some of these enterprises have tens of thousands of engineers on their team. How does that affect kind of the outcomes we have? And so as you can see by my answer, there's lots of opportunity to try and see, but I think it's worth pointing out that at this point, there's not like hard and fast best practices or [00:45:00] sensible defaults.

For how to use this data. We're seeing a lot of people just get excited about the potential of it and begin to use it first in, in, in the large AI labs, but my hope is that over time, many, many more folks are able to take advantage of it. And this becomes something that, that, all of us are able to use and benefit from rather than just a few people.

hugo: Good. You, you almost just convinced me to try to like. Convince you to hire me or something like that. Like there are so many wonderfully exciting things within there and it's such early days and there's a huge amount of cool stuff to work on. I actually, I couldn't say it better. Powerforce 22 has written in the chat.

These traces add a lot of depth to data sets. Currently scrape code is a static picture. This new data is a time series of a project being built, exciting, exclamation mark. and I, I think that puts, 

ty: yeah, 

hugo: so that's, that's so cool, man. And I just want to, so a pattern I've, I've seen, of course, is time and time again.

Now over the past 18 months is someone [00:46:00] starting a project, by. Using GPT 4. 0 or . Or Claude sonnet, like one of the big ones and then, getting a whole bunch of data out of those traces and conversations, then fine tuning smaller models on that, that data. So is, is this something you can see people doing with AI coding assistants as well, being able to reduce the footprint, like in the size of the models by cleverly fine tuning on data they've already, generated.

ty: Yeah, that's kind of the future. If I, if you're going to ask me to make predictions, I think we're heading towards is that we'll have first base models that are specialized for functions like we're already seeing that whether it's autocomplete or chat or apply today. those are already happening.

And so I think we'll have base models. So models that are trained for a generic. Autocomplete or generic thing, but over time, I think what you'll see is people adapt those, say, in the case of autocomplete for go. So I don't know if you saw David Croshaw's post. on how I program with LLMs is on Hacker News, and I [00:47:00] think it was an Ars Technica like a week or two ago.

He has a new startup or he has a new project he's working on called sketch. dev, and there they're focusing completely on Go. So I can imagine them creating kind of tools and environments and models that are particular to Go, right? And then maybe. Once they have those models out and available, you can actually take those models and further fine tune them for how you do go, right?

And I think those models are going to be much smaller and much more focused on specific tasks. And so kind of my view of the future is like really strong base models for specialized tasks that you then can do that. Maybe you or other people adapt for things that are a little bit closer to your domain, right?

Maybe it's your programming language or. You know, you do tabs and not spaces, or you do spaces and not tabs, right? And someone domain adapts it for that. And then you may be taking even one further step of, of adapting it for your company or your team, or maybe even you individually. and so I definitely view that, that future coming, [00:48:00] but, it's going to take time for us to get there for the skills and knowledge and technology to be democratized and better defined so that there are best practices.

It's kind of like. Back in the day, DevOps was called web operations, right? And it was something that was only done on Amazon, right? And then eventually, right, it became something that became a practice that we followed. Right. And we're still, I think, in, in many ways early in our journey of knowing how to do DevOps and SRE all of these kinds of things.

But, it took 20 years to get even those terms. Right. And so I think that's where we're at at the moment is if we fast forward a decade or two from now that. If a lot of our software is going to be written by AI code assistants, if it's going to be written a lot by these AI systems that there's going to be whole new roles potentially, but definitely functions of developers.

If it's not an entirely new role that is about making sure that system codes in the way that we want that works like we want and. If it gets really good, maybe it's just about figuring out [00:49:00] what part of the system it should work on next, what it should do next, what we would want to automate next.

But if, if not, I mean that, that glorious of a future where all we have to do is sit around as if it's hacker news on a table and just argue about what to do next, right? There's shades of that future that are more hands on that involve more, building and more fine tuning and using of the data and looking at it that I think are Are likely to be, precursors to that much longer term future, if that ever comes or otherwise, I think the future is a lot of us building the systems that build the software.

hugo: Yeah, fantastic. I am interested as well, if we're working more and more with, data that results from human computer interactions, synthetic data, but also conversations with LLMs and all the traces we've been talking about. do you have any concerns or how do you think about model collapse? And correct me if I'm wrong and I'm, and for people out there who haven't heard of, heard of model collapse, but it's quite intuitive.

You generate a lot of synthetic data and then retrain [00:50:00] models based on those. and there is some evidence that these models can become less and less good because they're being trained on their own exhaust essentially. 

ty: Yeah, potentially. There's also other evidence I'll just put out there, especially with deepseek coming out and the remodel that the synthetic data actually helps us.

To really improve models. So it could be the opposite direction that like, Oh, these reasoning traces of models themselves could improve. I think it's too early to like say, I think when we first talked about this, Hugo, I probably had a different answer. Right. Cause I didn't know as much at the time about test time compute and these new reasoning models and stuff like that.

So in some ways I'm kind of shifting still in terms of, is it going to be model collapse, right? Because the internet is going to be flooded with. I am chat GPT, like you look at deep seek. I think if you ask it like who it is, it's chat GPT, right? even though it's a deep seek model because there's just so much code or there's so much training data that is from chat GPT on the internet claiming to be chat GPT.

And so it's like, definitely that's an example of like, Oh no, deep seek models should definitely say it's deep seek, not chat [00:51:00] GPT, right. Not created by open AI. so that's an example of like a sort of collapse, right? Will it result in. Models that are, I mean, not useful or that we can't get any better models.

I'm skeptical of that because of what I'm describing of, I think the future is like humans very much being involved heavily in the process of creating data that help guide it in the right direction, right? Like me coding and maybe in the longer term future, people are talking about even this year, computer use models, right?

Maybe it's a future where I always just record my screen over time and the models are capable of. using computers in the way humans do, and so by recording my computer screen all the time, that could be used as, as training data, right? And then over time as The model, some like the models aren't gonna be able to do everything immediately, but maybe they can do some parts.

And, you know, I mean, I press the continue button and it just continues with creating the thing for a few steps ahead of me. And then when it goes wrong, I pause and I say, no, no, no, you did this wrong and I'll take over and I'll do it. And [00:52:00] then I'll hit play again. And it does the next steps of, of the process of building software that I would do.

And over time, I'm kind of very much guiding it. and. I think that, along with the kind of self play, the synthetic data, kind of, that is much, allows for much faster cycles. Like the thing about what I'm describing with humans creating a lot of the data and guiding the process, that's much slower than like, oh, what if we let the model kind of go and do that?

And so I think that's what the labs are, are doing right now, from my perspective is, is exploring that, right. And figuring out in what ways does it lead to better models and what ways does it lead to collapse and what ways is it, is it useless. just in general, my kind of view on synthetic data and simulation is that I think you can, it's very tricky sometimes to get it to, to map the distribution of reality.

Right? And so that's ultimately what we're trying to model. That's what we're ultimately trying to automate. And so you just have to be careful. I think otherwise, I think you could end up, building models that are modeling some other universe than the one that we live in. And so that's where if your, your data is based off of [00:53:00] reality, it's, it's, It's much easier, I think, to build models that, that, that model that reality versus that kind of synthetic or simulations.

If they're not good approximations of the distribution of reality, then you can definitely have them model something that isn't something we can put into production and maybe even lead to collapse. Like there's early signs of, so kind of jury's still out. It's kind of my TLDR data and simulating. 

hugo: I'm also very glad you mentioned 2025 is the year of agents and I'm, I'm bullish on it being the year of agents, but in a different sense, I think.

We're going to find out a lot about what doesn't work with agents and how we need more structured approaches and defining business, business logic. but I am interested. Okay. So one of the hello worlds of computer use, can be get your LLM to access a calculator in Python, for example. Right. And I also love that the hello world, one hello [00:54:00] world of computer uses.

Actually using a calculator because LLMs are software that are horrible calculators. I think there's, there's a beautiful joke in, in, in there. but this is actually really interesting, right? Because, and I'd love to know how you're thinking about it with what you're building. Because, let's say I would have built my first Hello World where I, ask an LLM to create a Python script that will act as a calculator.

and I give it. Access to my computer to be able to write and execute that Python code, whatever that Python code it is. Right. that sounds like a horrible idea to me, right? You would, you definitely don't want an LLM to be able to write and execute Python code on your local system or any way, really, unless you've got a Docker eyes and very, very safe.

of course you could imagine. Which is like generate code, human in loop, read code, then clicks, you can execute like that. So I suppose [00:55:00] I'm just wondering how you're thinking about these types of things more generally for you personally, but in the work you're doing with how we think about allowing these agentic approaches to do things either in the software world or the physical world.

ty: Yeah. So I think like, from our perspective, we, we think about like, we kind of separate things into kind of like inflow where it's like, okay, I am doing some tasks myself. And then I'm deciding when do I allow the LLM to take over some parts? And I'm very much deciding when to bring it in, longer term, maybe like, it's not just reactive.

Maybe it's also proactive. The tricky part there is like, it gets really annoying sometimes, like if it gets annoying, then you turn it off. Yeah. But it would be nice if, like, as I'm coding, the LLM suggests something, but the issue is people churn when you suggest something that is not good, right? And so, like, the inflow is, is kind of one aspect, and I think there it's like what you were describing in terms of, of, you need to figure out when [00:56:00] does, where does it require human review?

When does it require human confirmation to take actions? And I think until the models get really reliable or. Or, or maybe even once they become very reliable, like it really depends on your comfort level of like, are you really going to let it run arbitrary commands on your computer or not? And, that's where I think then the second group of like async workflows, which is the word that I use instead of agents, right?

I think,Vicky Boykus,was the ML engineer, on blue sky. I thought she had like a good explanation for Devin. The software AI software engineer, and she was like, Oh, I think I figured out what it is. It's a dockerized Slack bot web app with endpoints to presumably open AI completions and some variation of open web UI as a front end to Docker shell and other services, right?

Like I tend to think more in that way than like AI software engineer. And when you think in that way, right, you realize it is a system. And that it runs, I'm like, Oh, maybe I should sandbox it. I would be much more happy to let the, the so called agents or AI software [00:57:00] engineer go do stuff for me if I know it, it can't destroy my entire file system.

And so maybe it's about being an inflow in the IDE and you, in the AI code assistant and VS code, you, you run some command or you tell it to go, go off and solve this bug for me. And it goes off and solves it and then comes back to you. And you kind of have to review, the process and like, maybe it goes wrong at step seven.

And so you're like, and it's a 15 step task. You're like, Hey, do this instead of that. And you send it off again. And then it was able to complete steps eight to 15. Now that seven has kind of gone wrong, but we have to figure out what is that paradigm, right? It's similar with our phones, right? Like notifications can be the most annoying thing in the world and end up turning them off if you're overloaded.

Bye. If you don't have the ability for the system to notify you and it gets stuck or it goes wrong, right, then, then it might be waiting a long time. And so how do you, how do you build something where it's either inflow or async or some combination of those? and how do you do it in a safe and reliable manner?

Right. And you were talking about like, what permissions do you give it? [00:58:00] Like that is, I think one of the biggest open questions in all of these systems, right. It's like, okay, they need context data, but how do you, how do you fit into our existing web of, of RBAC and SSO and all that in order to make sure it has access to the data needs, but not more than that.

And that's a really tricky part, I think. Right. So it's like, oh, I want to give it access to my Slack messages. But not my, coworkers slack messages are not messages from channels that I don't have access to. Right. And I don't want anybody to be able to just use this, this, this bot or this AI code assistant of with my slack messages.

I only want to be able to do it. And. That starts to get really, really tricky and I don't think we have a lot of open answers there, but in order for computer use to work really well, and that's why I like MCP because it runs locally and then I can kind of just do OAuth on my own computer for my own self and I can kind of get rid of all the permissions issues that would come in if I'm trying to build a server based approach, but I don't know, there's some ramblings.

Hopefully there was some, some nuggets in there [00:59:00] for people. 

hugo: Many, many nuggets. one thing I do very much appreciate is. Dissecting the term agent a bit more and splitting it into and what I did was I put a link into anthropics building effective agents posts from whenever it were like a month ago, or whatever.

But to your point, breaking things into thinking about what workflows are, what agents are and anthropics idea of augmenting LLMs, with tools as kind of a graduated approach to thinking about agentic systems and agentic behavior. Cause it is a multidimensional space, right?

yeah. 

ty: And like you were saying earlier, right? If it's a calculator, right? Like let, let the LLM just use a calculator because like similar to humans, that's what we would do, right? Like calculators are quite reliable while LLMs are very unreliable at something like multiplying large digits together.

hugo: Exactly. And to risk tolerance as well. I mean, you probably can and, you know, simulate a bunch of things and just get a sense [01:00:00] of how often failure modes will occur as well. Like seeing one in 10, 000 runs, your LLM will likely say something like, like this, right? that couldn't be prohibitively expensive, but there are ways to, to think about this.

ty: if we get that rigorous, right, then you could buy insurance on, something like that. Right. Allow yourself to use it despite the rest. 

hugo: Yeah, 

ty: exactly. That far in the future, but I could definitely see that happening. Like, somewhere with humans, you buy insurance to come. Yeah. 

hugo: And I don't want to anthropomorphize too much, but I think with thinking about what type of access you'd give an agent to, you can think about what type of, you can first think about what type of access you'd give, like, a new intern.

Like, would you give them write access to your production database? Probably not. Would you give them read access? Maybe, right? So, and start trialing things that way. just as when we roll out these systems, [01:01:00] often we'll roll them out internally first and then to 0. 1 percent of users and so on. Right? 

ty: and that's like the best practices that we had from.

Data science from web development for mobile development or just apply, right? AB testing, right? Exactly. Red, green deployments, all of those kinds of things that are critical here. But for some reason, and this is like about being in the early years, like we all like throw it out, like as if that stuff doesn't matter, in the early days while it's exciting and we're trying to figure out what can you do with it.

But I think as we go from proof of concepts to. Internal production to eventually like external production with customers. Like that's what will happen in order to get reliable systems that we can count on. And without a 

hugo: doubt, and look, I do also with your, in my background, and we've been talking about open source quite a bit, I would just like to talk about what I may provocatively refer to.

Well, I've got more provocative terms, to be honest, but like the moving target of open source, let's say, and I'm just going to give a few [01:02:00] data points. Okay. there are clearly different types of open source, right? So if we think about the PI data stack, which is. Kind of a collective community of a lot of people.

If you think about the R stack in, in, in a lot of ways, like tidy models and dplyr and ggplot2, got a lot of different people working together to bring something. together, I suppose R is somewhat different because a lot of people there work for RStudio now, which is part of my next point that you can be open source and have.

Kind of a different governance model. So NumPy is community backed, open source through and through people from a variety of different organizations. They make sure that not too many people who are on it are working for the same organization, as opposed to something like PyTorch and TensorFlow open source in a variety of ways, but the governance model is totally different.

It's not community back. They're actually corporate corporation and, and,company backed and [01:03:00] governed. Right? Then we get now to the world of,language models and other types of generative AI models. And we use the, throw the word open source around, but sometimes we're talking about open weight models.

Other times we're talking about they're open weight in the sense that you know them, but you're actually not allowed to use them commercially. At all. Right. So what's happening there? And then where I to think about what open source and we discussed this briefly before. And I know I'm kind of giving a few data points from around the place now, but if you think it like open source kind of feels like it needs to be reproducible for me personally.

So we're a model to be open source, but weights wouldn't necessarily be enough. The code that I'd need, the code that generated it, and I'd need the training data as well. And maybe even the system information of what. What hardware you use. So I think using triangulating around these points, or feel free to tell me I'm an idiot and dismiss them completely, but I'd love to think, to hear how you think about open source and in your stack, what [01:04:00] needs to be open source and what doesn't for the developer.

ty: That's a good question. It's something that, I'm not going to be able to list here, today, myself. It's an open question that I'm, I'm kind of thinking about, but my entire career up to this part, both RASA and continue has been in, in Apache 2. 0 licensed open source. And that's one you didn't even touch on, right?

Like, even in the, It's a broader world of developer tools and infrastructure, like license changes and how to license and how to commercialize are all open questions, independent of AI and then AI is kind of adding on this additional aspect to it. and so things like redistribution, the ability to access the source code, the ability to create derived works from it, but something that still maintains the integrity of the author's source code and that.

there's a clear distribution allowed or not allowed and that, you're not restricting other software and you're not having it be specific to a product, right? Like these, I [01:05:00] think are the stuff that tend to be the stuff that I kind of think about when I think about open source definition. I think that the, the open source initiative, the OSI, does a decent job for like traditional software.

It tends to be the one that a lot of folks say on Hacker News site. But they recently in October at all things open announced the open source AI definition and there, right, they say I think That you need to have sufficient information about the training data and how to reproduce, right? But I think sufficient is, is a very subjective and debatable thing.

And so some people were okay with that definition, others not. So I think it's kind of Well, I 

hugo: love that it's actually tautological. Yeah. So you, to, to do what you need to do, you need to have sufficient X, right? Like 

ty: Yeah, exactly. So it's like, what, what does that mean? And so I think it's just going to be an ongoing dialogue, right?

And it's like, the broader thing for me is like, I just want to make sure that this moment in time, the creation of LMs that are helping us code that it actually results in the expansion of open source, which I think it's [01:06:00] Most people would say it's not going to do that. and the reason that they say that, right, is like, like, or the reason I want it and the reason they say it is kind of for the same reason of like, the reason LLMs are so good at coding is because so many developers have put out on GitHub, their code as open source and licensed it.

Like LLMs wouldn't be nearly as powerful and good today with coding or probably anything. If developers hadn't done that. And so I think that the future will be similar that like we'll have much better systems and be able capable of automating much more if we, if we have more code available for, for, people to create models, but in some ways the labs have burned the trust of developers by just going and training that data and people didn't, you know what I mean?

Developers didn't know that their code when they open sourced, it was going to be used in that way, or don't believe their license allows for that. In some cases where completely their licenses were ignored and the people who just trained it on it, or, and, and they didn't, were never asked of it. And so that's, I think leading a [01:07:00] lot of people to being like, I'm going to produce less open source.

I'm going to do. less kind of open source. But my hope is kind of talking about the data point that we actually find a way for, the future to actually be more open source. That it's not just the git commits that we open source, but we even open source the data about how we create software. So something we've talked about on the continue team.

We haven't acted on it yet, but we'd love to have our development data, that the data, as we build continue with continue. To be something that we put out for labs to create and we license it, right? so that people can understand how we created continue and that that can be used as example code about how to create a TypeScript open source AI code assistant.

And so when future models suggest how to do that, right, they're able to, to do it with not just the Git commits, but all of the, the reasoning and traces and development data about what happens in between. And if we repeat that for. Alan's project with Rasa or Vincent's project with scikit [01:08:00] learn and things like that.

I think it could lead to a better future, but I think a lot of people are weary of, of a vision like that because of how previously their data was just taken and used in ways that they didn't necessarily support. And so. Figuring out how to repair that relationship or figuring out how to structure things, in a way that is fair and equitable and, and enables that future, I think it's still possible, but it's, it's going to be a lot of work and it's going to be hard.

And I think a lot of folks, especially developers will doubt it because, like you were talking about earlier in the tech industry, even more broadly, right? Like how we think about data and who has rights to data and how our data is. is collected and taken from us has been, maybe not done in the way that many of us would like it to be done.

And this is another opportunity for us to define that. Do we want to repeat the things, the mistakes of the past, or now that we have the hindsight of, of being able to look at that, maybe now is the opportunity to. Find ways to change that. So I'm quite optimistic we can use it to expand open source, but many, many folks are not.

hugo: I [01:09:00] totally agree. And I, I, I also, I'm a big fan of optimism as a strategy as well. like the jury's out. So may as well, you know, be optimistic and do the work to get, get the work done. and I do, I mean, a lot of the technology is incredible. Don't, don't get me wrong, but I, I do think there are strong arguments that the way.

The labs have approached things kind of feels a bit land grabby in, in, in a variety of ways and kind of enclosure movement in, in, in, in a lot of respects. and me personally, and this is just me, I'd prefer to see a world where a platform like Stack Overflow could just have its own LLM that it can fine tune on its data.

And then we go to Stack Overflow and chat with it there. I'd prefer to live in a world where the New York Times has its own LLM and doesn't need to negotiate necessarily with. An organization like OpenAI that has clearly gone and trained on their data behind a paywall and all that just to be, you know, totally, totally frank.

I'd love the New York Times to be able to have [01:10:00] its own LLM, which I can, I can query. O'Reilly is doing very nice things towards that. And I'll actually, if I can find it, I'll link to a post. I'll definitely in the, in the show notes, but, they're using RAG systems at O'Reilly, the publisher, so that you can ask something on its platform and it will give you an answer and a reference to a book.

So it follows data provenance. and they compensate the person whose book provides that answer. So, there are a lot of different models, floating around, which seem 

ty: How can you combine the New York Times and O'Reilly, right? So that you can have a model there, but have them get into an arrangement, or Like, I was talking with Vincent at Probable, it's like at Learn as well, it's like, they're frustrated with How, like so many suggestions from the models are how they, you would use scikit learn like five or 10 years ago.

And it's not the way they would recommend now. So I think if you create a tool and you're the steward of the governance over that tool, you should be able to determine how the LLMs suggest about [01:11:00] that, the, how the LLMs get consult about that. And you should be able to, to guide and steer that. And so to be able to say, this is the official continue or scikit learn or O'Reilly or New York times data.

Yeah. And this is what you're allowed to use it for. If you do, you need to compensate us so that we can compensate the users that, that created it in some, some way, shape or form, or, or maybe not, maybe it's just permissibly licensed, but that it's clear and that people respect that. And that the labs respect that.

I think that's the world that we're hopefully heading towards, but, Absolutely. And we'll only get there if we work towards it, like you're saying. 

hugo: Yeah. I haven't spoken with Vincent in a while, but I do chat with some scikit learn people regularly. And I, They're very generous and polite about this in a lot of ways, but I can't imagine the type of blood, sweat and tears they put into open source things to, especially on the machine learning side, to see this happen to, to the space.

I'd, I'd definitely, you know, be, be somewhat out, outraged by myself. I, I think, I would love to jump into a, a demo in a second, to see, [01:12:00] see what, what you're working on. but before that, I do want to Say, so for context next week, I'm doing a live stream for the podcast. I'm just bringing up the title.

So I get it, get it right. And this is shameless self promotion promotion, but it's actually incredibly relevant, called looking at your, look at your data, debugging, evaluating and iterating on generative AI systems. and it's with my friend, Hamel Hussain, who does a lot of work in the space. and part of his thing and my thing is we just want software people to start looking at traces and data more.

People are always like, hey, how do I do evals for multi agent, whatever. And if you start looking at your traces and doing error analysis and looking at failure modes, you see the places that you need to fix, essentially. And that doesn't mean it's straightforward, but looking at your data, doing pivot tables, these types of things in spreadsheets, is incredibly instructive.

In, in, in, in the space, and testing and debugging as well. The reason all of this is [01:13:00] relevant to our conversation is because you, you're building an AI product among other things, right? So I'm, I'm wondering just what you can tell us about how you even evaluate the efficacy of a product like continue.

And I think both something I think and work a lot about it in is thinking about the relationship between, product. level evaluations, and then micro, so that's at a macro level, how, how good, how well is this product meeting user needs and what I'm trying to build versus micro level individual LLM call evaluation.

So over the entire stack, and I know this is a very broad church, so to speak, but how do you think about evaluation at a product and then at an individual level micro level as well? 

ty: Yeah, this is complex for us because of the open source nature of the product. so I think in many ways, the best evaluations are like, how are people actually using it in production?

So we do have anonymous PII, GDPR, PII [01:14:00] Without any PII GDPR compliant, anonymous telemetry that anyone can opt out of, but that gives us some level of understanding, right? Of like, what different models people are using and how much success they're finding. And are they accepting or rejecting suggestions?

So in many ways, evaluations that are like running it in production in our cases is quite helpful, across all of our users. And then as we think about different features, right? Like the code base retrieval system, like that's. Okay. One kind of like way, one kind of part of a system that we need to evaluate versus, okay, what are the different prompts we use for the different tasks?

Like all of those have their own. kind of evaluation parts. And so we kind of have to break it down into, okay, the subcomponents of this system. do we create like test sets, right? That we, we use as a way of evaluating, like evaluating, do we try and leverage F1 scores right in that process? Like, do we play with it ourselves?

Maybe that's another way that sometimes we realistically kind of do it. [01:15:00] Give it to our most excited power users. And we have a pre release, right. Do we have them play with it and kind of see, okay, what did they, they do compared. We do AB experiments in the open source even as well, right. Where folks are, are given like different temperature settings and different things like that.

so it's a whole mix of stuff. I would say we, if anywhere, we can improve quite a bit by getting more rigorous about that. And we're, we're actually hiring some folks to help, help with that. And then the last piece is because of our system allows for, and that's the really tricky part for us. Our system allows for it.

People just swap out so many things, even if the sensible defaults work really well. The question doesn't work well when you're using only Ollama on your local computer for all of your models, right? Does it work well when you build your own code rag system that you plug in? How does that perform? So then, then the next step will be, okay, how can we start to teach others?

And I think this is where. Folks like, like you and Hamel can come in and like, even with continue, right? We need to teach engineers how to, how to benefit from, from, being more [01:16:00] rigorous with their evaluations and, and apply that same methodology, apply those same best practices to, the, the practice of improving your AI code assistant.

But we're quite early and I don't have great answers here, unfortunately. 

hugo: and I didn't expect you to, but once again, your, your way of thinking just brings clarity to for me as well to the paths to pursue. I also just want to say, I've meant to say, you've mentioned Ollama several times and I've meant to say this every time anyone who hasn't checked out Ollama, check out Ollama for, running, running models locally.

Go and, I mean, Probably any laptop you have, you can just get LLAMA 3. 2, like their 3 billion language model and probably their 8 billion vision model as well and play around with it on Ollama. I have a lot of fun doing it, using it locally with Simon Willison's client utility, LLM, pip install LLM, that allows you to do all of that, log all your traces to a local SQLite database and explore them using data set or.

Also, you mentioned LM studio, [01:17:00] which I'm a huge fan of, local stuff and one for, I haven't played with it in a while, but there was one feature which it was in beta last time I used it where you can like load several local LLMs and then give one prompt and see in real time the LLMs respond and how each of them respond.

Also, if you haven't checked out Llamafile from Mozilla and Justine Tunney. Check that out. There's such an incredible space. and if you haven't heard, I mentioned just ping me and I'll, I'll, I'll send you links or something because it's, it's actually amazing. Isn't it Tyler?

ty: Yes, it very much is. It's, it's super exciting.

It's fun to be around. Like, we have support for Llamafile, for LMStudio, for Ollama, lots of folks are trying different things and exploring and playing and, for example, I have a USB drive that is foldable and goes around my wrist. And it's really fun to have Llamafile, like, to have the Llamafile on your computer.

You have the weights of a model that knows so much on your wrist, right? If you could take anything with you, like that would be, I guess you would need a computer though, to plug it into, but, in some ways that it's amazing that you can carry [01:18:00] around, like so much knowledge and reasoning ability, I guess, in a little file on a USB drive these days, thanks to LamaFile.

hugo: It's incredible. And to your point as well, and people who've listened to my podcast before or seen a live stream, I've ranted about this. I've got an app on my phone called MLC chat. They're a bunch, but. It allows me to load small language models on my cell phone and, and, and chat with them. like the day after Llama 3.

2 came out, I was playing with, you know, the new Llama model on, on, on my cell phone. Drains battery pretty quickly, but It's absolutely incredible. I, so I do want to get into a demo. I mean, this is a generative process, right? More questions keep coming up. But this one is actually, quite important. we've just talked about Llamafile, Ollama, LM Studio.

You mentioned Deepseek, we've got Anthropic, we've got Mistral, we've got Blur, right? So we've got all these different evaluation frameworks. for people who want to use something like Continue and in [01:19:00] a modular fashion build, coding assistants, how can they stay on top of everything in the space? And actually, I've got one answer.

Sorry, I got too excited, I'm just knocking things over. I've got one answer to my question. I think LM Studio. Does a nice job like on, when you open the app, like at least new models and that type of stuff, kind of like hugging fight, like it will surface things, to you, but I'm just wondering how you think about for developers keeping up in a space, which like I live and breathe this stuff, like you at the moment.

And I, I can't keep up, right? 

ty: Yeah, neither can I, so I think the, the best advice I would have is we try to keep like the continued docs up to date. So for each of the different things we. We have our recommendations and we try to help folks. So if you're just getting started, that's a good place to go.

But even if. I mean, you try something out and then six months from now, you like check back into the space, like going back there. I'm sure we've updated the docs to, to the latest, recommendations for models. If you want to go a little bit deeper, like the continued discord, I have a lot of people [01:20:00] sharing their configs and, and talking about it.

Like we put stuff out on our blog, for example, like we have a newsletter that we try and send out that summarizes things. a lot of it is like. Yeah, I mean, if you check in every three months, like that's probably a good cadence for every six months and find out what the latest things are by going to the docs or looking at the blog or newsletter or things like that, or the discord on our side, that's a, those are great ways of kind of keeping up but not feeling overwhelmed.

Right? Often, like, it's not like, it's rare that things come out that are so transformative that you should Try them out immediately unless you really want to, right. Usually if you check in a couple of times a year, a few times a year, you'll largely be caught up. and yeah, so those are like my main, there's some good newsletters out there.

like I like, import AI from Jack Clark. He's one of the co founders of Anthropic. I think I followed that newsletter for like six years. 

hugo: yeah, the Turing 

ty: Post does a good job of like covering different things. yeah, a number of newsletters as well are helpful, but [01:21:00] it's tricky, right, to not feel overwhelmed.

Back in the day, Twitter, these days, often a lot more blue sky. Hacker News often has good discussions around these models. yeah. And then longer term, right. This is a problem we're trying to solve with, with continue, right. how do we go beyond, I mean, here's all the possible Lego blocks you could have to, here's the number one used chat model, here's number two, here's number three.

How can we publish some of our open telemetry data so people can see, right, not just what is the hype model that everyone's talking about, but actually what models are people using in production. We want to help contribute to that. But it's a tricky chicken and egg problem because a lot of the time, like, everyone uses a model because it's hyped, right?

So then the data shows. That model being used, even though it actually might not be better. you kind of have two realities that are always in play. Like what is the underlying technical reality? And then what is like the psychological reality on, on top of that? And oftentimes they're, they're correlated, but not always right.

Sometimes. Lama four has dropped [01:22:00] from meta and everyone's super excited about it. And so it gets overrepresented, but over time, I think people find, right. The models that are better and. 

hugo: I totally agree and, and funnily like I do, I think people who are building real stuff know that what the shit you see on LinkedIn and Twitter perhaps isn't the stuff that you need to be paying attention to.

So once again, in this course I'm teaching, I, I did a workshop on embeddings yesterday, and showed people and embeddings model, or just put it here as well, it's the all mini LM L six V model, V two on hugging, hugging face, and it's a sentence. It's from sentence transformers. And it, it does the job so well when you want, a lot of the time when you want a sentence embedding model.

And you can see on Hugging Face that last month over 72 million downloads. and my only point is perhaps you don't need, you know, the state of the art DeepSeek or OpenAI's most recent embedding model or that type of stuff. And you'll figure it out in terms of latency and cost, what you're dealing with as well, to be honest, [01:23:00] right?

so it isn't necessarily the hottest stuff to your point that, that you need. and similarly, I mean, Jeremy Howard, and I think, I don't know whether Hamill was involved, but the team he works in was, they've released modern BERT recently, right? 

You don't necessarily need, like, large language models for absolutely, absolutely everything. so with that having been said, why don't we jump into a demo? And you also, I mentioned that. If I hadn't gone freelance recently, I'll try to convince you to hire me now after everything we've been talking about.

But, I do want to let everyone know you are hiring engineers currently in, San Francisco. Is that correct? 

ty: That is correct. Yep. In person. We have an office here and, I'd love for folks to apply. Reach out, ty at continue. dev if you're excited about what we talked about today. And, we definitely love to chat with you.

hugo: Amazing. And I'll include that in the show notes as well. and also I'll include your LinkedIn tie tie done. Do you double in on LinkedIn? and you said you've got a thriving discord community. So I'll link to the discord as well. Is that where you'd suggest people to come and say, Hey, also? 

ty: Yeah, definitely.

I'm [01:24:00] I'm I'm I'm there. Everyone is always reaching out to me. So I'll see your message and we can, we can amazing. 

hugo: So I know we kind of at time that would schedule, but if you've got another 10 minutes and would be up for doing a brief demo tie, I'd love that. Yeah, definitely. should have screen sharing conditions.

ty: Yeah, yeah, yeah, cool.so I'm in VS Code right now and what you can see here on the left is continue. So it's an extension for VS code. If I hit command L on my Mac and if you would mind just 

hugo: increasing your font size just a bit, your zooming in slightly would be great. There we go. Adjust 

ty: the windows a little bit better.

hugo: Yeah, perfect. 

ty: Yeah, so, you can see continue kind of pops open. And so, I mean, someone who's maybe not as familiar with Python, I'm like, well, what is this? I have no idea what this is going on, right? So I can ask it, right? I can highlight some code, press command L, and this code will be thrown in here. And so by default, I'm using Cloud 3.

5 [01:25:00] Summit from Anthropic, a really great model, but I could switch, right? I could use DeepSeq R1 7 billion, which is actually running with Olama. So if I click the settings button, I can. Go see, I'm exposing my API keys, so I'll have to turn those down. after our, our live stream ends. Totally. I do that all the time.

hugo: With 

ty: Olama, here, that I'm using DeepSeek R1 7 billion. And, I can ask something like, what is this data, this app data class?

Let's see what Deepseek R1, I press enter here. So the reasoning models like take a little bit longer than some of the other models. And so you can see it kind of thinking.

hugo: Super cool, man. You're running Deepseek locally. Is that right? 

ty: Yeah, exactly. 

hugo: 7 

ty: billion. What [01:26:00] this is, there's a 7 billion parameter model. So you can see it's cause it's a reasoning, right? It kind of shows you it's thinking, and then now you hear it's kind of. Telling me the, the key components. So I'm actually in GPT fast, library from, from meta.

It's just a GitHub repo. Right. And it's like, Oh, I'm really interested in understanding how, right. I torch native transformer text generation work. And this is like some pretty heavy stuff, especially if I'm new to deep learning. And so the ability to have continued to understand this is really critical.

So I can go back to the cloud sun at three and a half and say, I can do other things Hit command I instead and it's I've selected this code here and I don't find this very readable with the values of N and K here so I can do something like make this far easier to read. And what it's going to do is it's actually going to just give me apparently a comment and then actually replace.

And in K where it has like number and divisor and gives me a [01:27:00] definition and I can hit accept and have a much better more human readable version of that code that helps me kind of understand it. And so, in addition to kind of the chat and edit experiences I saw you saw here you have the kind of experience of autocomplete as you're typing you have the ability like we saw here earlier to to go through and edit the configuration for.

Or continue, you can go to the docs, which you can't see my screen because I didn't share desktop, but now you can write and we can see continue and dive into, like, how to use that chat experience that you saw how to set up the different models, how context selection works. How it the feature itself works under the hood if you don't want to read the code how to customize it further if you want to customize it, and then you can jump into the GitHub and you can see, the actual code and go into it.

And so we can actually go into. GUI, and then we can go into source [01:28:00] for it and we can kind of dive deeper into how the system works. And so, hopefully that gets you excited about the potential of continue. You try it out. Like Hugo mentioned, you can find me on discord and. Give us feedback there.

We're always trying to improve it. So if there's things that don't work or customizations you would want or really anything, I'm happy to hear on GitHub, on Discord, or even via email if you're so inclined. 

hugo: Amazing. Well, I would encourage everyone to, to try this out. super exciting and to play around and see what, what models are best for your use cases as well and build your own.

AI assistant. I am wondering Ty, so you do have an open source and free version, but you also offer an enterprise version, correct? 

ty: Yeah, so that's kind of the upcoming launch. we have the launch of our, our enterprise commercial product. So I can't quite tell you everything here today, but check back in and about a month's time and, and, there'll be more information, but basically at a high level, it just helps [01:29:00] organizations roll out, continue to be able to have confidence of as they're sharing and creating and using those different packages.

We mentioned earlier that They can allow list some of them and block list others of them and make sure that their code data stays in their environment and doesn't leave, kind of their, their cloud and things like that. And so, well, we're committed to the open source extensions and also layering on a commercial product that helps us to have, enterprises invest so that we can invest more in the open source and create a feedback loop.

hugo: Fantastic. And someone in the chat has even just asked, can I link to the discord? So people want to jump in ASAP. What I'm going to do is link to continue. dev and you can see in the top, right, there is the link to the discord. and I'll do that because continue. dev will persist and discord links do not always.

Yours may, but they do not always persist and this video will be up for, for some time. Ty, well, firstly, I'd like to thank everyone for joining and, and, and sticking around. We've had a bunch of people here for, wow, an hour and 40 minutes, which is super [01:30:00] exciting. So appreciate you all. Love to thank you, Ty, for bringing all your, all your wisdom and knowledge and all the work you do with open source, pushing the frontiers of what we're doing with coding assistants currently.

And as I hope is clear, this is a space I'm incredibly excited about and invested in. so I'm, I'm really excited to see the future of what happens and excited for your launch next week as well. Next month. Sorry. 

ty: Yeah. so much. Thank you. Appreciate it. Thank you for the invite. it's been fun. I'm glad to be here and would love to do it again.

And would love for you to try and continue all of you listening and anyone who comes to listen and, yeah, it's been fun. 

hugo: Fantastic. Thanks once again, Ty.

ty: Thank you.