Okay, welcome, Eleanor. And let's start off with some introductions. So I'm Pamela. I'm from Microsoft on the Python Cloud Advocacy team. And can you tell us about yourself and what you're doing now? Sure. I'm Eleanor Bega. I'm a consultant and advisor on AI engineering. And more recently, also an educator focusing on AI-assisted software development. And I'm running the elite AI-assisted coding course and community together with Isaac Thoth, which we hope, we think is probably the most comprehensive course on using AI for doing software development. So you're going to be doing a whole course about AI-assisted coding. How did you realize there was like a need for that? Oh, the world told me. It's kind of funny. I'm talking to lots of teams and leaders and people working in software development in general. Mostly it was around helping them build with AI. So the sort of thing where you use APIs or fine tuning, evals, this kind of thing. And at some point I just started hearing like, okay, that's cool. But can you help us with this? Because this looks really powerful and we need to understand how to integrate it. Okay, so that's just something you kept hearing from people at companies. It does seem like in the past year, we've all come to realize like, wait, this could be really, really helpful for us if we know how to use it, right? You know, it became possible, right? So if you asked me a year ago, I would have said, Yeah, I don't know. You can chat with AI and it will help you a little bit, but don't count on it too much. And then we got the next generation of models and it's amazing what they can do. And now the possibilities are so much more powerful. And so what sort of tasks do you think are the best suited for these agentic coders now? What I'm especially excited about is anything that you can fully delegate, orchestrate during the background. I think that there's a lot of focus on interactive working with AI. It's fun. It's very helpful. I do it myself, of course, a lot too. And that's something we also cover in the course. But when I think about what's really going to make a difference for software development in the large, for teams that are trying to improve productivity, improve quality, which is in many ways even more interesting than productivity, It is the ability to delegate to these agentic models complete tasks and get them happening while you sleep or when you trigger something from a new branch or anything like that. What sort of tools are you using for these delegation scenarios? And I use a lot the GitHub Copilot Coding Agent, which is the one that's hosted on GitHub. It is really fun, easy to use. you need to get the discipline to write a really good spec. Because once it's gone, once you've delegated it, this is it. You're not going to intervene anymore. But when you develop good habits and sort of good techniques for doing this, it's really powerful. And I even learned to trigger it through your unpublished API, which is now available. and kind of get it working from GitHub Actions. I have like a few things that like just trigger automatically from time to time and do things like, I don't know, update my documentation or do some code reviews or things like this. So I really like using that. I've used Open Hands quite a bit. It's another sort of remote agent. It's open source that I really like. There are things you can run in GitHub Actions, so any of these command line tools, I've used Codex and OpenCode and some of these other command line tools, just kind of giving them a prompt and some context and triggering them from GitHub Actions. Okay, cool. So, and you mentioned spec, the importance of the spec, and there is also this movement now, everybody talking about spec-driven development, PRD-driven development, test-driven development. What do you think is the right approach when working with AI coding? Yeah, I think you don't have to, To hang too much on the word spec, it can be a little bit intimidating to people sometimes. When you say spec, they think, oh, I'll need to write this book-length description of everything. The point is, you need to give full instructions. If you don't instruct clearly what it is you want, the problem with these models is they will not tell you hey, you didn't tell me what to do, so I'm just going to sit here and do nothing, right? Don't do something and kind of something random. You don't want... Software development is not a good place to do random things. You want to get a good specification. And so... It doesn't have to have any particular structure. In fact, models, they cope with anything. But you do want to have all the information there, including what not to do, what version to use, what are the constraints that should be respected. And the more you're able to do that, the more you can direct the results you got. Yeah. One of the issues we've seen, you were talking about what not to do, one of the issues is sometimes it does too much. How do you get the agents to focus and do just enough? Yeah, I mean, this is especially a problem with cloud, isn't it? It's very eager, and it will sometimes invent things for itself to go. And it's less of a problem, for example, now with GPT-Five, which I really like for that reason. But you just say, so I had this thing, for example, recently I set up this process where I trigger the coding agent to update my documentation and I would tell it, yeah, go update the documentation. Sometimes it would update some of the code. I was like, no, I definitely don't want you to update the code. That's not what this is about. And I don't want updates to the code that I didn't direct. That's a terrible thing to do. And so I just added this paragraph in all capitals. Don't touch anything else. You should only update the documentation. And now it works. So that's another question I have is that, how do you evaluate whether the instructions you've written are good instructions, right? Do you just evaluate by just doing one check, or do you have any sort of formal evaluation systems for determining whether you've written good instructions? I have to be realistic. You're not going to have a formal eval suite for every little task that you do. If there is something where it's high stakes, you're doing a lot of it, then yeah, go ahead and write an eval suite. It's often a lot easier than having to then chase problems again and again. But for simpler things that you do, like one-off or something like this, just try and make sure that you do it in a situation, an environment where the risks are not high, right? So if I do it like in... I don't know, an environment where it could have destructive operations or kind of side effects, that's very risky. But if I do it in a sandbox environment, it would get to where I can revert whatever it did. The cost is fairly low for experimenting. Do you find that you need to write different prompts for different models? Because you talked about Cloud versus GPT-Five. What's the experience with different models? It is a bit different, but what's common is you just need to say what you want and say it clearly and unambiguously as much as possible. And then, yeah, with code, I find that sometimes I need to repeat things multiple times or use this kind of very standard language, not like, don't you dare do this thing, like whatever I can do to convince it to follow my orders. Whereas with GPD-Five, it's a bit more... sort of almost clinical, just say something, I'll keep it as short and concise as I can, just with all the information in it, it will follow it. So it is a bit different, but you kind of fundamentally, it's not that different. And you get a feel for the subtle differences between the models. Yeah, it's stuff like that where you say that with cloud, you have to say, don't you dare. I really kind of want to evaluate that as like, do you actually have to like, or like using all caps, right? Like stuff like that where people will kind of like say like, oh, if you threaten or use all caps, or even some people said, like I was sharing a prompt which had a do not. And somebody was like, no, no, no, if you say do not, then the lm's gonna just do it because it's gonna you know it's just gonna see what you want to do so like you know stuff like that like how do we know like whether we're really like following the best practices and how to get it to you know stick with what we're saying You know, there's a lot of superstition around these things. I think the things moved really fast. I think the first generations of models, like GPT-CV and even the first versions of GPT-CV and Cloud, they were quite sensitive to kind of subtle differences in wording in a way that I think the newer models, especially the reasoning models, are not. And I'm less concerned about it now. The thing is, you can't know unless you have formal evaluation. So again, if the situation is high stakes, it is important you have the time and the budgets to invest in formal evaluation. Sure, that's the best thing to do and the only way to know for sure. If not, I don't know. I think sometimes I write these things to myself. Like, yeah, why would I use all caps? Does it make a difference? I don't know. It makes a difference for me because they're like, yeah, I vented. I got it out there. I thought, like, don't do this. Did it really make a difference? Like, I didn't do a comparison. I didn't try a version with, like, normal capitalization and one with all caps. In some quantum universe, I would just spend all day evaluating little things like that just because I find it fascinating because it feels like you're inside the head of the LLM where you're like, oh, this LLM cares about capitalization. Anyway, so what's your current favorite models or different models for different tasks? So, yeah, my go-to model for coding is GPT-Five. The full model. Yeah, it's really good. It's great. It's like this really complex analysis. It's really good at following orders. It's very agentic, right? You can send it off and it will go and do a thing and come back like three days later with a complete thing. I like that. I use GPT-Five Mini a lot. For some reason, it's not getting enough attention. It's a great model. It doesn't have baked into it all this knowledge that the huge model has. But in terms of the behavior, it's very similar. And so I'll use it for simpler things where I just need to operate a little, kind of do this, change that, update this file. And it's a lot faster. You get it if you use a copilot, you get it more or less for free. So that's nice. Later, I've been using the grok code fast model a lot. It's amazing. It's so fast. It's not very good at like, it's a little bit like a teenager, right? It's not very well behaved. But But if you just do simple things, it works really well. And it's incredibly fast. That's nice experience. Yeah, I saw a demo of another one yesterday, Kimmy, I think. There was one that was surprisingly fast. I assumed it was sped up, but I don't think it was. Oh, when they run it on like Cerebras or one of these specialized providers, yeah. Yeah, because that's a big part of it, just the user experience. If you are using these LLMs in the editor, as opposed to the delegation background agents, then you do care about speed, and you want it to come back fast. So when a new model comes out, do you immediately go and try out the new model? What's your strategy when a new model comes out? How do you figure out where you want to move to it? It's no longer possible to try everything, right? There's so much going on, which is wonderful, right? we have it used to be there's like two or three different kind of main labs now there's so many but i try a lot i really try many new things lately there's been like a wave of really great releases from some of the open labs the chinese one there's a great model from quen from alibaba there's kimmy and the one from jipu i try them The thing is, it does actually make sense to know a model well, to get a feel for it. And so at some point I need to make a decision. Is this not part of my toolbox? Now I'll invest in learning about it and trying different things and kind of figuring out how to use it. Or if it's not, I'll basically ignore it because time is limited, attention is limited. Yeah. I mean, for most developers, we don't really want to have spent any time evaluating new models, right? Like many people said, like, you know, just pick the right model for the job, but it seems like that's a really hard problem to do generally. Right. I think, you know, I actually, I think that people are maybe missing out if they're not taking a little bit of time to learn. Because this is becoming one of the main tools we use. So I don't think any developer would say, well, I don't care which programming language I'm going to use or which compiler I'm going to use. No, obviously, that's like a main tool they're using. But we still didn't kind of get used to AI being so central to what we do. Sure, what? You're not going to spend like two, three hours to learn about this tool that is now going to write all the code for you? Okay, so you would actually say that most of us should at least spend a little bit of time getting to know our model? I think so, yeah. I think it really makes you... more efficient and gives you more sort of control when you understand how this model works, what are the different quirks, how to steer it. Yeah. All right. Yeah. So everyone listening, go and study your favorite model. I do actually like GPT-Five quite a bit. I like that it's quite clinical, not as emotional as Claude. So I first learned about you from your project Ruler. Can you talk about Ruler? Yeah, Ruler is a project that I created also kind of in response to demand, because I was talking to lots of teams, and they were saying, well, how are we going to manage rules and context and configuration for our AI tools? I was giving everyone the advice, you should manage it centrally. Like, it doesn't make sense for every developer on the team to do their own thing, because then you'll miss out on the opportunity to gather context together. But it turned out there's no way to do it because all the different tools have their own format, their own location. It has improved somewhat now with the AgentsMD standard, which I'm really excited about. But even with that, MCP server configurations are different. Okay, might as well write a tool. And it also became a nice opportunity for me to... It's the first project that I did completely with AI. I didn't write a single line of code in this project. All of it is just like I write the specifications, I get AI to execute on them. And it's open source and I got quite a lot of contributions. I don't know if everyone, but I'm pretty sure a lot of people also do it because it's so central. And, yeah, so it's a nice little tool. It's nothing big. It's like a glorified script, really, that kind of takes context from a single location. You can put in multiple files, your configuration for MCP servers, and will then kind of instantiate it for all the different tools you're using. Because you were finding that people at companies, like different developers, were using different tools, and they wanted to be able to share the same rules for those agents? Yeah. Sure, because developers, they develop this intimate, passionate relationship with their tool. If you tell them, like, hey, stop using this IDE, you should use another IDE, they'll quit. They'll be like, I don't want to use this other tool. That's my thing. That's how I roll. And I don't think that's changing. I think that there's always going to be people using whatever is their tool of choice. So we need the infrastructure on it to support it. Yeah, that's why I'm also really excited about agents.md, because that's a single file you put in the root of the repo. And they're expanding that to try and be able to get a little fancier in different subfolders and stuff like that. But even having just the standard on that I think is really exciting. And in your ruler repo, you have a list of all the different coding agents. um what their rules file is and recently you got to change a lot of them to agents.md right yeah and every week like we get to add another one to kind of convert it to use agents and d i think it just in the last release of vs code it got converted or added because you can still use the instruction files and there'll be more i think that's really good trend it's it's people are really rushing to implement new features and new tools and And that's great. I love it. But we also then need to stop and standardize and come together so that it's not all over the place. Yeah. So with agents.md, that's where we put our rules for the agents. What do you think are the most important things to have in that agents.md? I think one thing is just where to get relevant information. Because, again, the problem is the models will not tell you, hey, I'm down tools. I'm not continuing because I don't know how to use this library. They'll just invent something. So you need to be explicit about it and say, I'm using this library. This is where you can find the documentation. This is the version I'm using. This is the style of coding I want to use and so on and so forth. And the more you do that, the more control you have and the less sort of backtracking you need to do, right? The less you get into the situation where you did something and now you're like, okay, we're going to have to clean up after that because it's not really what I wanted. That's very good to do. I put a lot of operational stuff. So things like always when you commit at yourself as an AI agent, as an author, so that I can go back and review it later. Or when you branch, I want you to always branch in this way with this naming convention, because that's how I like the project to run, this kind of thing. Do you find, because you've been recommending this for companies, so if a company has a repo, that repo has an agents.md, are we seeing that multiple developers on a project are all sending changes to agents.md? I'm just trying to picture, what is the collaboration story for creating the agents.md for a shared repo? I think the reality is that many of you, if you look at different teams, usually there's like this one or two people who are early adopters. They're very excited about AI and they end up leading and showing the way. And there are often people who are still learning, still getting used to it. And they're quite happy with someone else to do it. I guess with time it will even out more people will get kind of familiarity with these tools and confidence. But for now, I find they're always like, it's like one or two people who are especially passionate about it, and they kind of organically almost become the, you know, the maintainers of agents.md or the tooling or documentation. Okay, that's really interesting. So, so you're you really like the, you know, you're having like the asynchronous background agents, but you are still doing some work inside editors as well. What's your current editor of choice? Yeah, I use Visual Studio Code with Copilot. It's, it's really nice. And I think it keeps improving every release now is like, new features, better adherence to standards, all kind of like nice little improvements in the interactions. I really enjoy using it. I sometimes use some of these terminal tools. I've been using Warp a lot lately, which is nice more for like admin work, things like this. I don't use it so much for software development because For some reason, I still like being in the IDE and looking. It's more like I'm a spectator. I don't actually write the code, but it gives me some feeling that I'm in the know. At least I understand what's going on. But yeah, more and more, I'm trying to just move to delegating and using background agents. I mean, even if they're unlocally, I'll still kind of take this approach of I'm going to do everything I can upfront to define this task so that then it just happens. What I don't do much is this kind of vibe-coding, where you do a little bit of this, a little bit of that, backtrack. I find it exhausting, to be honest. So when you're dealing with agents, are you using MCP servers? Are there any particular MCP servers that have been particularly useful for AI agents? I use some. I use context-seven always because it's just a really easy solution for this documentation problem. Usually, if it's a library or something that I'm using a lot, I'll actually pull the documentation in or at least a link to it. But if I forget to do it or it's just something that comes up, context-seven often can pull in the stuff. If not, sometimes I'll use gitmcp for a specific repo I want to refer to. It's really convenient. Playwright is a lot for anything that has like a UI to just get it to go and review its own work. Markitdown, that's really nice to use because you can take like any kind of document, whatever PDFs, Word documents, and get them in a format that works for the LLM. So these are things I always have available. Tavili, Tavili I use a lot. It's a nice search engine for LLMs. But other than that, I try to usually rely more on command line tools than MCPs. I don't know if it matters much for the model. Again, that's probably more for my own sake, but I feel that I understand really well how they work and I can review how it operated them. And there are so many of them, right? And many of these tools they've been around for like years, decades. So I rely more on command line tools than MCPs. Yeah, that makes sense. I think I've seen people argue that it is a good thing to use command line tools because they do know it very well. They have seen a lot of CLI usage in their training data. A lot of times, there are equivalent command line tools to MCP servers, right? Often. So for example, for GitHub, there's a really good kind of comprehensive GitHub MCP servers. But I found that I can use the GH command line. I don't know if it matters for the model. It matters for me because I understand how it works, what it does. It's a tool I've been using for years now. And so when I see it invoking this command, I'm like, yeah, I know what's going on here. And if there's a mistake, I understand what it is and how to instruct differently. Do you end up putting in your agents.md, like, hey, when you're doing it, you're going to use the GHCLI? Because otherwise, it would be probably inclined to use the GitHub MCP server if it knew it had it available. Yeah, so one thing I have in my standard instructions at the user level is always prefer using a command line tool over an MCP server and specific tools that I like it to use. OK, that's a very good tip. So yeah, on tips, what are the big tips for people who are getting started on this path of using AI system coding? Use more words. Use more words. That's basically the biggest tip, because I think it's always The biggest problem people have is they under specify. It's really obvious to them what they're trying to do. But then when you look at how they prompted, the information is not there. And so just get in the habit of getting it all out there. One thing that I do now almost all the time is I dictate instead of typing. And I find that for some reason that I don't completely understand, it's more about psychology than software development, it unblocks me. So if I had to type it, I'll probably do a little sentence. And if I dictate, it becomes a really long story. And it doesn't have to be even that well organized. The LLMs, they cope with anything. So if you give them like this stream of consciousness dump of everything you have in your head, now they have the information and they'll actually use it. The biggest problem is, yeah, people, if you don't give the information, the model will not tell you, hey, you didn't give me enough information. I'm not going to do anything. It will just invent stuff. Mm-hmm. That's a great tip. Use more words and you don't actually have to type those words because there's so many tools for giving voice dictation now. I need to start talking to my computer more too. It seems to be very popular things that people are picking up on. It's really hard to get used to. And it's one of those things you have to force yourself a little bit to do. And I found that after insisting on doing this and feeling really awkward, it feels just really strange to do this in the beginning. And after a few days, this is it. I can't stop now. I'm just talking all the time at my computer. Yeah, I think in a new age of generative AI and AI-assisted coding, we're all going to have to learn more habits if we want to take full advantage of all the new tools that we have. So it might feel awkward at first to talk to our computer, but then we can get used to it. Yes. All right. So thank you so much. Now, if people want more tips, they can, of course, sign up for your course. And also you have a great newsletter. Like you send so many amazing posts every day. So you can also sign up for your for your newsletter to get more of your tips as well. yeah please go follow us um we try to apart from the course which which we hope is really kind of the the most comprehensive journey through the current space we release some tips and guides and something every day um and yeah and talk to us yeah awesome thank you so much for joining elena thanks pamela.