1
00:00:00,000 --> 00:00:09,000
Welcome back everyone to the VS Code Insiders podcast.
Your one stop shop for your favorite code editor with the team behind it.
Today with me, the one and only Julia Castro. How's it going?

2
00:00:09,000 --> 00:00:21,000
I'm good. How are you, James?
I am absolutely delightful and yes I am your host today, James Montemagno.
We have worked together for, like, ever, basically.

3
00:00:21,000 --> 00:00:36,000
I'm excited to be here. Our paths have crossed multiple times.
Now I'm back working in the IDE space, so yeah.

4
00:00:36,000 --> 00:00:51,000
When you joined the VS Code team, I was really excited.
You've worked on many things in VS Code, but one of the more advanced topics is models and evaluations.
I want to dive into that.

5
00:01:19,000 --> 00:02:03,000
The most exciting thing is how AI has changed how developers work.
It speeds up development in amazing ways and feels like a mix of low-code and traditional coding.

6
00:02:14,000 --> 00:02:45,000
Developers often ask why I pick certain models like GPT-5 mini.
Sometimes it's just a vibe — smaller models are fast and free.
But how should developers think about choosing models?

7
00:03:18,000 --> 00:04:19,000
At first I stuck with one model, but now I switch depending on the task.
We continuously evaluate models, and sometimes older ones improve, so it's worth revisiting them.

8
00:04:47,000 --> 00:05:29,000
Why does a model feel different week to week?
We change custom prompts based on feedback and evaluations.
That’s why behavior can shift even after launch.

9
00:05:55,000 --> 00:07:17,000
Was Sonnet 4.5’s behavior change due to us or the provider?
Both. Providers update checkpoints, and we adjust prompts on the client side.

10
00:07:17,000 --> 00:09:10,000
Fine-tuning means taking a base model and training it with developer workflows.
Microsoft’s team fine-tunes models like Raptor Mini to optimize for speed and repetitive tasks.

11
00:09:49,000 --> 00:12:21,000
What’s the story behind Raptor Mini?
It’s a code name in the bird group. It’s optimized for speed and smaller tasks.
It’s free and great for repetitive scenarios like data analysis.

12
00:12:21,000 --> 00:16:19,000
Developers should pick models based on task complexity.
Mini models are fast and good for clear, repetitive tasks.
Larger models like Sonnet are better for creative or complex workflows.

13
00:16:47,000 --> 00:19:35,000
Evaluations (evals) help us measure model performance.
We run online evals with live data and offline evals with test cases.
Metrics include time to first token and resolution success.

14
00:19:35,000 --> 00:21:15,000
Benchmarks like SWE Bench are useful but language-biased.
That’s why we built VS Code’s own eval suite (VS Bench) to test coding scenarios.

15
00:21:15,000 --> 00:23:34,000
Developers can give feedback via Reddit or thumbs up/down in VS Code.
We track that data to improve models continuously.

16
00:23:34,000 --> 00:24:08,000
Thanks for joining us.
Remember to subscribe on YouTube or your favorite podcast app.
Until next time, happy coding!