00:00.41
James
o Welcome back everyone to Merge Conflict, your weekly developer podcast where I right click picture in picture and I shove Frank Krueger on my monitor over here. How's it going buddy?

00:10.65
Frank
Very good, very good. I like being on the little monitor through the two-way mirror that's sometimes called a one-way mirror. Hi, I'm coming to you through a mirror.

00:19.06
James
and we're gonna see if my camera continues to work today and doesn't freeze up. I had to replace myself before, although now it seems to be flickering. So I don't know what's happening.

00:27.75
Frank
Did you put an AI? well How'd you do it? Did you use one of those like image to video things? That would be kind of hilarious.

00:34.77
James
I, I guess I'm using, I'm using camera hub because you said I look better when camera hub processes my video. And then when, when, uh, what's this other app that I have?

00:47.15
James
Wavelink processes my audio and makes me sound better. Basically you're saying just let all of the software AI overlords process this.

00:57.54
James
And then it's not actually me still in a garage, but here.

00:59.81
Frank
Yeah. Mm.

01:02.65
Frank
We'll never get you out of that garage. Well, it's funny, like in my endeavors to use more and more local models, I've actually been playing around with the current text to speech models, which is, you know, text to speech has been good for a decade now, but like they've gotten really good.

01:13.59
James
Ooh.

01:19.25
Frank
And so i was trying some of the open source ones and just seeing how really good they are. Turns out, James, they're really, really good now. And to the point where you can actually fingerprint people too.

01:31.95
Frank
So it's not this episode, I promise all of you, but it would be fun to at least put in like a five minute stub somewhere with ah Synthetic James and yelling at Synthetic Frank and find out if anyone notices the difference.

01:46.54
Frank
You will because you won't hear my giggling. I probably won't giggle as much with the AI.

01:52.38
James
I did watch a, I believe it's a morning brew video, I want to say, where the gentleman went to a few of the leading avatar companies where they will scan your face and they will create an AI representation of you and then they'll do voice models on you.

02:08.22
Frank
Who?

02:14.42
James
And he did a professional one.

02:14.64
Frank
Okay.

02:16.50
James
More seems to be used in like training videos, for example, where they'll have a person and then they can to have them talk and go through training videos. So he tried that. And while that was processing, he said, okay, let me just find the top tier online, whatever avatar and top tier video or audio um voice spoofer.

02:31.29
Frank
Yeah.

02:36.57
James
And what they did is they did some trials. I'll find a ah link to this and put it in the show notes. But basically what he did was he said, okay, first, let me see if I can spoof my ah coworkers on a zoom call.

02:51.22
James
So

02:51.49
Frank
Oh, sure.

02:52.79
James
This one is quite complicated because what they ended up having to do was, and then what he also did is once he put it in there, he let his friend. So the idea was to not even have you be the person presenting.

03:04.81
Frank
Ugh.

03:05.37
James
It could be somebody else spoofing you.

03:05.88
Frank
Okay. See, now it's a lie.

03:07.41
James
So,

03:09.21
Frank
I didn't mind it when it was him typing something, but if if you're not even typing in the responses, now it's bad, but okay, fine.

03:16.25
James
Well, the the idea was like, okay, if you can do voice in this, can i you know, do this? oh But the the problem with the software, at least for this was it wasn't real time.

03:21.32
Frank
Yeah.

03:26.11
James
It was only pre-recorded. So you had to render out the video and things like that. So they said, okay, they worked with their manager and it was for a piece. they said, okay, we're going to,

03:37.24
James
Ask me a question. I'm going to prerecord and see if I can spoof my coworkers. And they even had like a looping like head nod, like, yeah, you know, and there's is an OBS playing it back.

03:43.00
Frank
Okay, cool. Yeah.

03:46.90
James
And they definitely spoofed some of the people that maybe weren't paying as close attention. And then other people, they definitely spoofed. But then they did this other one, which was they did a FaceTime call with his parents. and did a thing where they're like, hey, how's it going?

03:58.14
Frank
Terrible.

03:59.26
James
and did this, blah, blah. And they just pointed the webcam or the the camera at the computer audio, you know, or video to play it back.

04:05.75
Frank
Yeah.

04:07.54
James
And his mom right away was like, something is wrong. This is not, you know she knew right away.

04:12.54
Frank
Yeah.

04:13.18
James
And then his dad didn't notice anything at all. Like not, and it was the same exact clip.

04:19.90
Frank
Moms.

04:21.11
James
It was like the same exact clip, the same exact thing. And then they did one more, which was with the professional

04:23.03
Frank
Yeah.

04:26.79
James
spoofer thing where they rendered out a whole video and, and it was, you could just tell it was like completely AI generated. So, I mean, I, I, it's mostly because it's,

04:34.02
Frank
Oh, okay.

04:38.99
James
procedural audio talking, right? It's lip syncing, do all this stuff in real time. You know, if you go in and you're doing motion capture, you know, that's real motion motion capture, and then it's usually animation, CG on top of it.

04:54.20
James
And even if they are doing other stuff, it's quite complex to to deep fake for long periods of time that looks pixel perfect accurate. But it was kind of fascinating in general just to watch it. But it gets me thinking...

05:07.58
James
Frank Kruger, about thinking.

05:08.44
Frank
and Oh, hi.

05:10.36
James
ah

05:11.83
Frank
Are you a philosopher now?

05:11.99
James
And no, about LLM thinking. you know, recently in the last year or so, models have evolved and you often now see in the CLI, in VS Code, you go and you pick a model and then you have this thing called effort, sometimes called thinking effort, reasoning effort, thinking level, whatever.

05:34.71
James
And there's a default.

05:34.87
Frank
Yeah.

05:36.95
James
And I just pick said default.

05:38.77
Frank
Extra high.

05:40.02
James
No, just default. Whatever the default is, that's what I pick.

05:41.27
Frank
No? Okay. Okay.

05:43.75
James
It's usually default thinking.

05:43.78
Frank
Auto model default thinking.

05:46.62
James
Yeah, auto model, default thinking. you know I guess in auto, it does you you don't even get to pick the thinking level. But the um thinking level, I never really... understood it.

05:57.66
James
I guess I understand that it's a level of thinking of some process, but I figured in my mind that there are very smart people that picked a default for a reason.

06:03.51
Frank
it?

06:07.93
James
So I never really changed it. However, a lot of people have been asking me about it. So figured we have a conversation about thinking, reasoning, effort, thingy, what it is, why, or why you shouldn't care about it and what it's actually doing under the hood and what's Frank doing today in his coding.

06:14.95
Frank
Ah.

06:26.46
Frank
Yeah, it's a good topic. um I don't think either of us are ah model experts in this area. I'm a model expert, I think, when it comes to building and constructing AIs, but tuning them, this is a tuning thing. This is a quality of response thing. This is machine learning, but even weirder machine learning because It's weird, right? Like, um we had this idea that if you ask the AI to plan ahead, it does better.

06:57.21
Frank
And that's because it gets to ah first take like a high level approach to the problem, and then get into the nitty gritty of it. And I think that that manual process of planning was recognized by a lot of the ML experts out there.

07:14.01
Frank
And so they decided to basically bake that into the training of the model, where no matter what you told it, it would start thinking first.

07:22.04
James
Mm-hmm.

07:24.33
Frank
And thinking is basically planning. i mean they're really not too different other than planning is a mode that we humans tell the agent to go into whereas thinking is kind of baked into their training where whenever given a problem to solve they know to first output thinking tokens and to kind of think try to well it's hard to use these words but try to work out the problem at a high level before it dives into the nitty gritty

07:54.87
Frank
And I think think think it's been mostly OK-ish good. it It's definitely quicker for us to just say solve a problem than it is to go into plan mode, ask it to think high level, then slowly whittle it down to a thing. I think it's nice that it's a little bit automatic. What's your gut feel for the thinking?

08:18.10
James
Well, it's hard because I'm often demoing and the more

08:24.08
Frank
Mm-hmm.

08:24.92
James
things are thinking, the longer it will take.

08:25.28
Frank
It's rough. Yeah.

08:28.12
James
You know, a good example of something that is not thinking is maybe older GPT models, or I believe Haiku doesn't think. I don't, I mean, at least there's not a thinking level that you can set.

08:38.50
Frank
Okay.

08:41.95
James
Maybe there is thinking associated with it, but my understanding is there's thinking models and non-thinking models. So, you GPT was extremely quick and Haiku is also extremely quick.

08:48.04
Frank
Yeah.

08:51.62
Frank
Yeah.

08:53.91
James
ah I have never changed the setting because I really don't know. That being said, how you talk about it does make some sense.

08:59.80
Frank
Yeah.

09:03.35
James
If I get a really hard problem, ah let's say mathematical problem or house construction problem, I inherently go into a deep thinking mode.

09:15.13
James
you know If I have to flip a switch on or off or I need to set up the podcast, flip the switch, there's no thinking. I just do it right in here. I breathe inherently. There's no thinking that's happening. It just goes. If I'm setting up for the podcast, there's a fairly low level of thinking that is involved. It's turning on lights. It's checking the software. it's I'm thinking about the microphone placement, the U placement, the topic, X, Y, Z. If I'm doing research for the podcast, then I guess I go a little bit deeper. So I guess if I start to

09:51.10
James
think about thinking, yeah if I start to reason about, no I start to you correlate thinking to planning or researching before executing said plan, which would be like executing this podcast,

09:52.25
Frank
Yeah, it's hard, huh? Yeah.

10:10.46
James
then that resonates a little bit more with me. Because one, if I am doing and performing deeper levels of setup, of thinking, of research, then that takes longer.

10:27.41
James
And I am burning more

10:32.98
Frank
Calories.

10:36.00
James
Tokens. I have more processing units of my brain. or my my brain is is My brain is more focused and more processing is happening towards that equation. Yeah.

10:46.62
Frank
Yeah, I don't like to anthropomorphize too much, though.

10:47.54
James
yeah

10:50.38
Frank
These things don't work the way the human brain works. We we need to be clear about that. In fact, thinking is a solution because of the big flaw in how LLMs work, and that is they generate one token on at a time.

11:04.78
Frank
based upon the current context and what it's already output. It's history of what's already been seen.

11:11.47
James
Mm-hmm.

11:11.95
Frank
It's outputting the next token, the next token, the next token. So if it's doing that in a serial fashion, outputting tokens, like go write this line of code, by definition, it has no idea what the end product is going to be.

11:25.62
Frank
ah What is the last thing it's going to change the code to? Or what are the final lines? What is the net effect of all these lines? And so the thinking mode gives it that opportunity to outline the problem and state, like, from the beginning, okay, I'm going to edit this file, then I'll change this icon, and then I'll change this path, and then I'll recompile the thing.

11:50.84
Frank
That's its thinking stage. Now when it goes to output the real first token, as in generate a line of code, it already has that outline to work from.

11:52.68
James
You

11:58.48
Frank
It's in the history. So we're working around this basically... flaw in how llms work and that they're they're one token at a time based purely on their history they have no advanced thought no forethought you know while i'm speaking my brain's racing a tiny bit ahead and again i shouldn't be anthropomorphizing but my brain's getting a little bit ahead of my words and that's why sometimes i'll stumble when i'm talking because my brain's going a little bit faster humans can do that we're we're we're advanced we can think faster than we can speak

12:31.13
Frank
AIs can't. They think and speak at the exact same rate. um So in order to do something intelligent or that requires future context, they need to write that future context first. And that's the thinking mode.

12:45.56
Frank
ah But I do want to comment on your or um fast versus slow. Like, it's real, man. It's real. Like, there have been so many advances in... the execution of neural networks to make them faster and to make per token CPU cost or GPU cost, whatever processing costs lower.

13:04.38
Frank
And we we we really worked hard on it. We made big contacts work. And then we added the stupid thinking mode that has basically 10x the amount of tokens that we need to output. i don't know about you, but if you ever like open up the thinking window and you see like it basically solves the whole problem in a serial fashion.

13:23.19
Frank
But that was just thinking. So now it has to come back out and rewrite the thing in a serial fashion a second time. And it's just painful sometimes, especially with the slower models where you see it write out this 300 lines of code. And then you're like, oh, great. But that was all in the thinking buffer. Now I got wait for it to write out those 300 lines of code in the real buffer. it's It's super frustrating when the thinking isn't actually doing what it's supposed to be doing, working at a higher level, working on an outline, thinking of the future. But really, it's just working in that serial fashion.

13:53.98
Frank
So it's a little bit annoying. We had all this effort to make neural networks more efficient. And then we just 10x the amount of work that they actually have to do. For sometimes no benefits, other times huge benefits, sometimes really bad benefits, especially if it gets into a circular thinking loop.

14:02.24
James
Yeah.

14:10.68
Frank
i don't know if you've ever seen this, but like, I think I'm going to do it this way. But wait, what if this? But wait, what about that? Oh, I'll go back to a A's good, but what about this? And you're like, oh my God, just break out of the loop and write some code.

14:23.75
Frank
I've i've done the steering thing. I'm like, just stop thinking and write some stupid code. You've been thinking for 30 minutes, write code.

14:31.06
James
yeah No, it's true. It's it's ah the fear I've always had about changing anything from the default is for that exact reason.

14:44.89
James
It's that I never knew technically how it works. And if I was going to get it into the correct state in which a higher or lower thinking and reasoning level would be appropriate.

14:54.07
Frank
Yeah.

14:59.77
James
I mean, I guess... if we're talking about what developers should think about is not just how it works and how it's reading and writing and processing, but there's going to be this decision that is very visible to them. It already is.

15:23.19
James
and I have talked to a lot of developers that seem to say, well, I'm just going to pick the biggest context, the biggest model and ah the most reasoning, because why wouldn't I?

15:34.10
Frank
Yep. Heck yeah.

15:40.28
James
Because that's the best.

15:40.25
Frank
Yeah.

15:42.65
Frank
Yeah. Such the wrong mentality. Yeah, it's it's the bigger is better. Bigger is always better. You know, Ford F-150, good. 350. It's almost three times better.

15:54.68
Frank
You know, it's like two point something better. Yeah, it's it's you think that, right? Oh, God, hate that word now. Yeah. But I would say it's just not true.

16:05.72
James
Sorry.

16:07.68
Frank
um The hardest part is we don't know when it's true, because sometimes thinking helps for sure. It's it's that scenario that i was talking about when it needs future.

16:18.14
Frank
It needs to come up with a whole plan before it starts executing. But I would argue it's probably better to work out a plan first than to rely on thinking. Thinking is kind of the autopilot thing where you're like, I'm going to give you two sentences and I want you to one shot

16:37.30
Frank
Unreal Tournament. you know Give me the whole game and in these two very vague sentences I wrote. In that case, thinking mode probably good because you did not give it enough information, and it's going to need to work out some details before it writes that first line of code. It's important.

16:53.56
Frank
But other times, if you tell it, hey, there's an error on line six, it's because there's a missing semicolon. ah you You really don't need thinking. i promise you, you do not need thinking in that case. As you said, GPT-4 could solve that problem trivially, and you just don't need it.

17:11.80
Frank
And the problem we have as developers is deciding. We don't know how these work. These are very sophisticated black boxes. We don't know. So of course, Ford F-350, always. You need that to go up and down the street at 30 miles an hour. 100%.

17:29.05
Frank
Of course you do.

17:31.00
James
no Why wouldn't you?

17:31.16
Frank
It's the problem.

17:32.28
James
Makes logistical sense to me. There's a graph from Anthropic when they launched Opus 4.7.

17:43.00
James
for seven And they were talking about the token usage. And it seems as though they were optimizing token usage for lower reasoning thinking levels.

17:57.43
James
However, they added a new X high, extra high or max. There's max and there's extra high.

18:02.68
Frank
Yeah.

18:03.15
James
I don't know. And it goes off the charts.

18:04.31
Frank
Yeah.

18:07.78
James
It's wild.

18:08.35
Frank
Oh, does it? I believe that.

18:09.93
James
Mm-hmm.

18:10.92
Frank
I've seen it going in its loops, yeah.

18:13.78
James
And it just goes and goes and goes because there's that. And I agree with you. I think your analogy of what problem are you trying to solve? What is the complexity of it? What is the prompt that you're giving it? Are you giving it planning or not? And there's that. So Pierce used to always say, wise man, Pierce Bogan used to say, that he would plan with a larger model.

18:34.07
James
So he would say, I want to plan. I want to get the details correct, ironed out, thought about. And then he would execute it with a smaller model. because it's following, it's just following very specific instructions and what to do, and it would give him pretty good results quickly.

18:48.38
Frank
yeah

18:53.02
Frank
I agree. and But there's other knobs to play with because what you want in that planning stage is creativity, basically, and the freedom for it to kind of think of the future and and also the present and think through all the problems. So think through all the potential problems. That's where a big model helps. Creativity, and no one ever talks about this, but you can change the temperature setting. Remember, these are random processes. You can make them more or less random depending on your mood. No one ever does this because it's not exposed in our UIs, but you can make it more random.

19:26.58
Frank
um Give it a prompt to plan and have it plan four different ways with a very high random ah temperature. And you're going to get four different plans. And then, yeah, i agree with Pierce.

19:40.85
Frank
If you have a sophisticated plan and you agree with it, ah give it give it to a simple model. problem is James, I don't know about you. um i often don't read the plan.

19:53.07
Frank
I'm terrible.

19:53.26
James
Yeah.

19:54.83
Frank
I'm like, plan this because I know planning is important and because i know it's it's a way of thinking. It's a way of thinking through the problem because it'll still think during a plan, too. Now we're at multiple levels of thought.

20:05.94
Frank
um But then I don't read the plan. I'm just like, yeah, yeah, go, babe. I just know I wanted it to work it out before it started coding because that generally improves the results.

20:17.66
Frank
um So what I said about that, I agree with Pierce, but um if I was a better developer, I would have it generate four different plans. I would read those four different plans, choose the best one, and then execute that.

20:33.05
Frank
that's That's thought.

20:33.46
James
were Or, or would say, spin up four work trees, plan it four different ways and then implement all four of them and then see them and test all four in action and then pick one based on the implementation that you like the best of the four.

20:40.15
Frank
so Yeah.

20:44.63
Frank
Yeah.

20:48.18
James
Cause you know, why not?

20:49.13
Frank
See, I guess that's where the problem, that's my problem with the the thought, like, because it's not like being more creative. By the way, um creativity, our only knob on creativity is this random sampling.

21:00.66
Frank
The more random you make the result, the more creative it's going to be.

21:00.84
James
Hmm.

21:03.84
Frank
And I put that in scare quotes for people not watching. You should watch, by the way, it's on YouTube. um and' it's my My biggest gripe with thinking is that it's it's one way to improve the quality of the results of these networks. It is not the only way. We've all learned now that harnesses and system prompts make a big deal. The tools available to the agent have a big impact on its quality. The ease of using those tools has a big impact on the quality. Planning has a big impact on quality.

21:37.75
Frank
Thinking is like a cool little academic hack because they can bake that into the model. It can be a ah part of the training and they can post benchmarks saying, look, it thinks with this efficiency and all that.

21:50.06
Frank
But that's just one of many knobs that we can turn to improve the quality of the cell I'm outpost.

21:56.57
James
Yeah, it's, I recently was using, and talking about system prompts and how the harness works. I was recently using GitHub co-pilot for Xcode.

22:08.98
James
I was working on a Swift app and they've had tons of updates.

22:09.27
Frank
Okay, yeah.

22:12.65
James
And first and foremost, they updated it to use the built-in Xcode MCP tools. So in Xcode now, you can say expose the Xcode MCP and coding agents can connect to it.

22:19.00
Frank
Yeah.

22:25.60
James
Makes it thousand times better. Like it basically just gives it access to Xcode.

22:29.05
Frank
We need to do a whole episode on that because that, yeah, it brings up a lot of things, but yeah.

22:29.79
James
Okay.

22:33.73
James
It brings up a lot. Well, it brings up the extensive. Why is Xcode just not as extensible? So, you know what I mean?

22:40.92
Frank
Well, it also brings up why does Apple get to turn their apps into MCP servers, but they don't have a way for normal apps to become MCP servers.

22:48.78
James
Ooh, spicy.

22:48.76
Frank
It brings up that. That's the question I care about.

22:51.92
James
Spicy.

22:52.38
Frank
why why Why do you get to be special, Apple?

22:55.30
James
Well, it's their operating system. They do what they want.

22:57.78
Frank
Yeah.

22:58.71
James
So, I digress.

22:58.68
Frank
Yeah.

23:00.35
James
But what I noticed... was that there is only two, I think there's only two modes. It is just ask an agent and there was no plan mode. You can of course add a custom planning mode.

23:17.40
James
However, what I realized is at least that team, and now this might change because I know they're evolving the product pretty quick, but this is at least two weeks ago, is when I would give it a prompt it would, as part of the system prompt, tell it to create a plan, ask me about it, and then execute on it.

23:38.90
James
so

23:39.01
Frank
Oh, good. Okay.

23:39.86
James
and you And you would see a UI for a plan in place.

23:40.12
Frank
Yeah.

23:43.22
James
And inherently, that's what's happening, the to-do list. You see the to-do list. It has come up with a plan of sorts.

23:46.94
Frank
yeah

23:49.53
James
However, enjoyed this because i could see the plan ahead time, and it was asking me naturally questions for it. And I, and then it seemed to do that for every single prompt, no matter how complex it was, it was like, come up with a plan for this.

24:04.07
Frank
yeah

24:05.97
James
And to some extent that is kind of good for our friend, the context window and cash tokens and having it all right there in some way, because I think that all the tool call, like I don't think they were changing the system prompt.

24:06.17
Frank
yeah

24:20.69
James
I don't think they were changing the tools.

24:21.74
Frank
Right.

24:22.33
James
The system prompt was come up with a plan first, then ah ask questions, then execute the plan. when all questions are sufficed.

24:32.42
Frank
I really appreciate that in a harness because honestly, I'm sometimes so lazy. i can't even click to change the mode to plan, or I can't remember the key binding to switch the mode to plan.

24:41.82
James
yeah

24:42.36
Frank
And therefore I won't plan. um And also sometimes I'm like, God, no matter what I, it's going to have, have three questions. I could ask it to write a lunar lander, or i could ask it to write a script that says, hello world. And it's still going to come up with three questions. It's going to ask me what Who baked in this three question thing? Anyway, I digress because i agree with you. One of my personal criticisms of a lot of harnesses is that when you switch out a plan mode, it kills your cash. um

25:13.78
Frank
That doesn't matter if if networks are free to you, but it matters a lot if you're paying for um requests because cash tokens are far, far, far cheaper than input tokens. Usually like a 10x multiplier on that. And money matters.

25:31.64
Frank
um So yeah, I really appreciate that that happens. And it makes perfect sense because it's just another system prompt you're injecting. You're like... here's 30 tools you can use, but don't you dare call any of them right now.

25:45.03
Frank
And you know the harness can make it even better.

25:45.63
James
I'm

25:47.51
Frank
If it tries to call one of those tools during this pseudo plan mode, then it can just say, hey, buddy, not allowed. Stop it. and Without messing with the system prompt.

25:58.36
Frank
ah there's There's so much progress harnesses can make over all this.

26:02.18
James
um'm I'm getting ready for a world and I've i've never seen this yet. if there is a harness that does this, please write in, leave us a show note comments is, you know, I'm already living in that auto model world.

26:16.17
James
Right. And I'm already living in the, whatever the thinking token, the thinking level is just default to it.

26:16.82
Frank
Yeah.

26:22.52
James
But It would also be rad just have an auto agent mode. Like you just figure out what I want to do and you do it now. Technically I could be like, I'm asking you a question, no need to change code.

26:30.87
Frank
Read my mind mode.

26:34.36
James
Right. And then it will just do it, but figure it out ahead of time in this, in one system prompt. So I just opened it. It's one chat. It's like, I'll figure it out. You know what I mean? but I'll figure it out.

26:44.60
Frank
the The next time I prompt you for a question, have an LLM answer the question for it.

26:49.85
James
Yeah.

26:50.71
Frank
Read the plan up till now.

26:50.77
James
Did figure it out. Well,

26:52.67
Frank
Yeah.

26:53.11
James
Well, let's, you know, we've been talking about different pricing changes. You know, this episode is going to go out, I think on usage-based billing day. So timely June 1st.

27:02.78
Frank
Oh, no. The apocalypse.

27:06.27
James
So one ah thing for developers, there's a lot of knobs to turn when balancing the models that they're using.

27:13.45
Frank
Mm-hmm.

27:15.92
James
And one is the model. The other is the context window and think about the cash and the cash tokens. The other things is the plan mode and creating the plan ahead time. But the other one is this thinking level.

27:28.02
James
ah So would you recommend developers play more actively with this thinking effort, reasoning levels? is you know When you do turn it up, it does say this may increase costs.

27:43.83
James
Do they then turn it down in some scenarios too?

27:44.15
Frank
Yeah. You know...

27:48.86
Frank
you know

27:49.18
James
to Do stuff? What are thoughts?

27:51.12
Frank
It's tough. it's It's a tough, because I do have an answer and I'll give an answer, I promise. But first I want to hedge my answer before I even give it.

27:58.07
James
Okay.

28:00.28
Frank
It depends. That's the problem. That's the problem. Like sometimes you need a lot of thought. Sometimes you don't need a lot of thought. It's really hard to predict. um So, and the other big part is it's hard to AB test these things because they're random samples of a random number generator. Like you can give it the same exact context.

28:21.14
Frank
And same exact code base, and it's going to give you two different answers. And they're going to be so similar to each other, but they're going to be slightly different. And maybe on the first run, they're not going be so different. But maybe six chats down, they're going to be very different because they they on each step, they're going to diverge a little bit more from each other.

28:33.53
James
Yeah. yeah

28:38.38
Frank
So it's really hard to A-B test these things. Should I have done this or not? But I'll give you an answer because I promise to you should absolutely try low thinking mode.

28:49.56
Frank
One hundred million percent. And you should do planning more because planning is just a manual way to say thinking mode. And it's a manual way that gives you a chance to interject a little because it will ask you three questions and you'll get to answer those three questions.

29:06.69
Frank
And whenever it asks you three questions, that's your chance to just put in any old garbage, by the way.

29:11.27
James
Yeah. Yeah.

29:11.54
Frank
um Just because it asks you what color is the sky doesn't mean you have to answer what color is the sky. This thing is a slave. It's a robot. you can put in You can put in the works of Shakespeare into that answer field to change how it's going to behave because it's going to read your response and it's going to act on it.

29:29.59
Frank
So 100% you should try the lower thinking modes, especially now that we're in the pay per token world. 100 million percent. That said...

29:40.25
Frank
Will you be able to tell? Like, what if it gives you a bad result? You're going to like, darn you, Frank, you told me low thinking and ah it just it wrote bad codes.

29:47.45
James
Yeah. Yeah, Frank.

29:49.70
Frank
is This is Frank's fault. Well, you can't prove it because I already said you can't A-B b test these things. It's too difficult. And so you can't prove it's my fault and therefore don't yell at me.

30:00.73
Frank
um i i what i what would be a little bit fun is keep it on low or keep it on zero and i'll get to i i'm actually i'm i'm trying out zero by the way no thinking um but uh try low and you know what if it writes some bad code take that as a learning lesson or you saved a bunch of money on not having thinking tokens Maybe now you can go back to a snapshot, go go up one level in the chat, fork the conversation, and be like, all right, do do some extra high thinking this time and see if it does better.

30:14.82
James
Hmm.

30:38.81
Frank
um Because honestly, the only way to get a feel for these things is to play with them a little bit. And we have to not be in this world where we think that there's for one prompt, there's one answer.

30:49.66
Frank
These are random number generators. They're going to do random stuff. So you just got to try those different paths.

30:57.62
James
and There you go. Pro tips from the Frank Kruger.

31:00.22
Frank
yeah

31:00.70
James
I'm going to start to play around with it more and see.

31:04.41
Frank
yeah so

31:05.55
James
Although, wouldn't it be nice to have an auto-thinking toggle too?

31:11.54
Frank
There should be. There really should be. like There should be ah a tiny little network in the back being like, okay, that is a crazy request. You gave zero context to us. You did zero planning.

31:21.94
Frank
Time to extra high the thinking. um So I just want to put out there. So I've been still enjoying rocking QN 3.6 27 billion parameter model, which ah twenty seven billion parameter model which is tiny like I think the new Grok that was just trained is like 1.0 or 1.5 trillion parameters like yeah and I'm doing I'm rocking 27 billion over here quantized four bit quantized I i don't know what Grok's running but you know whatever um so there's been an interesting study with um especially this quantized version where it's a little dumber

31:43.38
James
Oh my god.

32:03.00
Frank
Have you ever had a memory? You know you have the memory, but you just can't recall that memory. That's kind of what quantized is like with these smaller models. And ah people have found that turning off thinking is actually really beneficial.

32:18.58
James
Hmm.

32:18.58
Frank
A, because you're running these things locally, you're burning a lot of compute and a lot of context on those thinking windows.

32:18.66
James
Hmm.

32:28.18
Frank
and B, they ah don't seem to help too much with these quantized models. you know This does not necessarily apply to Opus 4.7 extra high thinking. This applies to 27 billion parameter model running on my 3090.

32:43.29
Frank
it actually benefits you and the model to some extent to not do any thinking. Or at least that's what all the reports are saying. I have to admit, I haven't rocked this mode too much yet, but it's totally going to happen because i am tired of seeing it think in loops.

33:02.23
Frank
I'm tired of seeing it write out all the code twice, once in the thinking window, second time in the real window.

33:06.79
James
Yeah.

33:09.59
Frank
It it seems very wasteful. Instead, I'm going to do what I've been promoting this whole podcast is just make sure I do plan mode first and then rock it out.

33:20.60
James
There go. All right. Well, give it a try, folks. Let us know what you think if you're rocking all sorts of different thinking levels ah with your models or what models you're also running locally or remotely.

33:30.70
James
and Let us know what what you're thinking.

33:31.54
Frank
Hmm.

33:32.62
James
Go to youtube.com forward slash at MergeConflict.fm. You can also go to MergeConflict.fm for all of your podcast needs. And you can like subscribe. You can rate and review. You can send us an email. You can hit us up on Twitter. the All of the links are mergeconflict.fm. That's a little website. Boom. You just go there. and Things are there. right. It's going to do for this week's Merge Conflict. Have a great build, everyone. I'll be at build. Vibe coding. If you're doing this here we got a uh a live code booth will be on the github youtube um where we have the terminal live which is there's this whole terminal section um there like i guess in booth area or whatever you know the booth there is bunch terminal stuff and then we have a live code booth and we're going to be talking all the different amazing stuff from built so if you're interested in that it'll be live streaming tuesday and wednesday june 2nd and 3rd the

34:15.87
Frank
Ooh.

34:23.39
James
GitHub YouTube. So give that a look-sees and don't miss out. There's going be a bunch of goodies. So head over to build.microsoft.com. You can register it for build for free. It's online.

34:33.67
James
You just watch it online.

34:33.76
Frank
Ooh.

34:34.59
James
just Boom, you're there. You can also...

34:35.64
Frank
That's got to be the cheapest way.

34:37.34
James
That's the cheapest way. It's $0. $0. No thinking required. All that's going do it for this week's Emerge Conflict. So until next time, I'm James Altamagno.

34:45.56
Frank
And I'm Frank Kruger. Thanks for watching and listening.

34:48.66
James
Peace.