00:00.41 James o Welcome back everyone to Merge Conflict, your weekly developer podcast where I right click picture in picture and I shove Frank Krueger on my monitor over here. How's it going buddy? 00:10.65 Frank Very good, very good. I like being on the little monitor through the two-way mirror that's sometimes called a one-way mirror. Hi, I'm coming to you through a mirror. 00:19.06 James and we're gonna see if my camera continues to work today and doesn't freeze up. I had to replace myself before, although now it seems to be flickering. So I don't know what's happening. 00:27.75 Frank Did you put an AI? well How'd you do it? Did you use one of those like image to video things? That would be kind of hilarious. 00:34.77 James I, I guess I'm using, I'm using camera hub because you said I look better when camera hub processes my video. And then when, when, uh, what's this other app that I have? 00:47.15 James Wavelink processes my audio and makes me sound better. Basically you're saying just let all of the software AI overlords process this. 00:57.54 James And then it's not actually me still in a garage, but here. 00:59.81 Frank Yeah. Mm. 01:02.65 Frank We'll never get you out of that garage. Well, it's funny, like in my endeavors to use more and more local models, I've actually been playing around with the current text to speech models, which is, you know, text to speech has been good for a decade now, but like they've gotten really good. 01:13.59 James Ooh. 01:19.25 Frank And so i was trying some of the open source ones and just seeing how really good they are. Turns out, James, they're really, really good now. And to the point where you can actually fingerprint people too. 01:31.95 Frank So it's not this episode, I promise all of you, but it would be fun to at least put in like a five minute stub somewhere with ah Synthetic James and yelling at Synthetic Frank and find out if anyone notices the difference. 01:46.54 Frank You will because you won't hear my giggling. I probably won't giggle as much with the AI. 01:52.38 James I did watch a, I believe it's a morning brew video, I want to say, where the gentleman went to a few of the leading avatar companies where they will scan your face and they will create an AI representation of you and then they'll do voice models on you. 02:08.22 Frank Who? 02:14.42 James And he did a professional one. 02:14.64 Frank Okay. 02:16.50 James More seems to be used in like training videos, for example, where they'll have a person and then they can to have them talk and go through training videos. So he tried that. And while that was processing, he said, okay, let me just find the top tier online, whatever avatar and top tier video or audio um voice spoofer. 02:31.29 Frank Yeah. 02:36.57 James And what they did is they did some trials. I'll find a ah link to this and put it in the show notes. But basically what he did was he said, okay, first, let me see if I can spoof my ah coworkers on a zoom call. 02:51.22 James So 02:51.49 Frank Oh, sure. 02:52.79 James This one is quite complicated because what they ended up having to do was, and then what he also did is once he put it in there, he let his friend. So the idea was to not even have you be the person presenting. 03:04.81 Frank Ugh. 03:05.37 James It could be somebody else spoofing you. 03:05.88 Frank Okay. See, now it's a lie. 03:07.41 James So, 03:09.21 Frank I didn't mind it when it was him typing something, but if if you're not even typing in the responses, now it's bad, but okay, fine. 03:16.25 James Well, the the idea was like, okay, if you can do voice in this, can i you know, do this? oh But the the problem with the software, at least for this was it wasn't real time. 03:21.32 Frank Yeah. 03:26.11 James It was only pre-recorded. So you had to render out the video and things like that. So they said, okay, they worked with their manager and it was for a piece. they said, okay, we're going to, 03:37.24 James Ask me a question. I'm going to prerecord and see if I can spoof my coworkers. And they even had like a looping like head nod, like, yeah, you know, and there's is an OBS playing it back. 03:43.00 Frank Okay, cool. Yeah. 03:46.90 James And they definitely spoofed some of the people that maybe weren't paying as close attention. And then other people, they definitely spoofed. But then they did this other one, which was they did a FaceTime call with his parents. and did a thing where they're like, hey, how's it going? 03:58.14 Frank Terrible. 03:59.26 James and did this, blah, blah. And they just pointed the webcam or the the camera at the computer audio, you know, or video to play it back. 04:05.75 Frank Yeah. 04:07.54 James And his mom right away was like, something is wrong. This is not, you know she knew right away. 04:12.54 Frank Yeah. 04:13.18 James And then his dad didn't notice anything at all. Like not, and it was the same exact clip. 04:19.90 Frank Moms. 04:21.11 James It was like the same exact clip, the same exact thing. And then they did one more, which was with the professional 04:23.03 Frank Yeah. 04:26.79 James spoofer thing where they rendered out a whole video and, and it was, you could just tell it was like completely AI generated. So, I mean, I, I, it's mostly because it's, 04:34.02 Frank Oh, okay. 04:38.99 James procedural audio talking, right? It's lip syncing, do all this stuff in real time. You know, if you go in and you're doing motion capture, you know, that's real motion motion capture, and then it's usually animation, CG on top of it. 04:54.20 James And even if they are doing other stuff, it's quite complex to to deep fake for long periods of time that looks pixel perfect accurate. But it was kind of fascinating in general just to watch it. But it gets me thinking... 05:07.58 James Frank Kruger, about thinking. 05:08.44 Frank and Oh, hi. 05:10.36 James ah 05:11.83 Frank Are you a philosopher now? 05:11.99 James And no, about LLM thinking. you know, recently in the last year or so, models have evolved and you often now see in the CLI, in VS Code, you go and you pick a model and then you have this thing called effort, sometimes called thinking effort, reasoning effort, thinking level, whatever. 05:34.71 James And there's a default. 05:34.87 Frank Yeah. 05:36.95 James And I just pick said default. 05:38.77 Frank Extra high. 05:40.02 James No, just default. Whatever the default is, that's what I pick. 05:41.27 Frank No? Okay. Okay. 05:43.75 James It's usually default thinking. 05:43.78 Frank Auto model default thinking. 05:46.62 James Yeah, auto model, default thinking. you know I guess in auto, it does you you don't even get to pick the thinking level. But the um thinking level, I never really... understood it. 05:57.66 James I guess I understand that it's a level of thinking of some process, but I figured in my mind that there are very smart people that picked a default for a reason. 06:03.51 Frank it? 06:07.93 James So I never really changed it. However, a lot of people have been asking me about it. So figured we have a conversation about thinking, reasoning, effort, thingy, what it is, why, or why you shouldn't care about it and what it's actually doing under the hood and what's Frank doing today in his coding. 06:14.95 Frank Ah. 06:26.46 Frank Yeah, it's a good topic. um I don't think either of us are ah model experts in this area. I'm a model expert, I think, when it comes to building and constructing AIs, but tuning them, this is a tuning thing. This is a quality of response thing. This is machine learning, but even weirder machine learning because It's weird, right? Like, um we had this idea that if you ask the AI to plan ahead, it does better. 06:57.21 Frank And that's because it gets to ah first take like a high level approach to the problem, and then get into the nitty gritty of it. And I think that that manual process of planning was recognized by a lot of the ML experts out there. 07:14.01 Frank And so they decided to basically bake that into the training of the model, where no matter what you told it, it would start thinking first. 07:22.04 James Mm-hmm. 07:24.33 Frank And thinking is basically planning. i mean they're really not too different other than planning is a mode that we humans tell the agent to go into whereas thinking is kind of baked into their training where whenever given a problem to solve they know to first output thinking tokens and to kind of think try to well it's hard to use these words but try to work out the problem at a high level before it dives into the nitty gritty 07:54.87 Frank And I think think think it's been mostly OK-ish good. it It's definitely quicker for us to just say solve a problem than it is to go into plan mode, ask it to think high level, then slowly whittle it down to a thing. I think it's nice that it's a little bit automatic. What's your gut feel for the thinking? 08:18.10 James Well, it's hard because I'm often demoing and the more 08:24.08 Frank Mm-hmm. 08:24.92 James things are thinking, the longer it will take. 08:25.28 Frank It's rough. Yeah. 08:28.12 James You know, a good example of something that is not thinking is maybe older GPT models, or I believe Haiku doesn't think. I don't, I mean, at least there's not a thinking level that you can set. 08:38.50 Frank Okay. 08:41.95 James Maybe there is thinking associated with it, but my understanding is there's thinking models and non-thinking models. So, you GPT was extremely quick and Haiku is also extremely quick. 08:48.04 Frank Yeah. 08:51.62 Frank Yeah. 08:53.91 James ah I have never changed the setting because I really don't know. That being said, how you talk about it does make some sense. 08:59.80 Frank Yeah. 09:03.35 James If I get a really hard problem, ah let's say mathematical problem or house construction problem, I inherently go into a deep thinking mode. 09:15.13 James you know If I have to flip a switch on or off or I need to set up the podcast, flip the switch, there's no thinking. I just do it right in here. I breathe inherently. There's no thinking that's happening. It just goes. If I'm setting up for the podcast, there's a fairly low level of thinking that is involved. It's turning on lights. It's checking the software. it's I'm thinking about the microphone placement, the U placement, the topic, X, Y, Z. If I'm doing research for the podcast, then I guess I go a little bit deeper. So I guess if I start to 09:51.10 James think about thinking, yeah if I start to reason about, no I start to you correlate thinking to planning or researching before executing said plan, which would be like executing this podcast, 09:52.25 Frank Yeah, it's hard, huh? Yeah. 10:10.46 James then that resonates a little bit more with me. Because one, if I am doing and performing deeper levels of setup, of thinking, of research, then that takes longer. 10:27.41 James And I am burning more 10:32.98 Frank Calories. 10:36.00 James Tokens. I have more processing units of my brain. or my my brain is is My brain is more focused and more processing is happening towards that equation. Yeah. 10:46.62 Frank Yeah, I don't like to anthropomorphize too much, though. 10:47.54 James yeah 10:50.38 Frank These things don't work the way the human brain works. We we need to be clear about that. In fact, thinking is a solution because of the big flaw in how LLMs work, and that is they generate one token on at a time. 11:04.78 Frank based upon the current context and what it's already output. It's history of what's already been seen. 11:11.47 James Mm-hmm. 11:11.95 Frank It's outputting the next token, the next token, the next token. So if it's doing that in a serial fashion, outputting tokens, like go write this line of code, by definition, it has no idea what the end product is going to be. 11:25.62 Frank ah What is the last thing it's going to change the code to? Or what are the final lines? What is the net effect of all these lines? And so the thinking mode gives it that opportunity to outline the problem and state, like, from the beginning, okay, I'm going to edit this file, then I'll change this icon, and then I'll change this path, and then I'll recompile the thing. 11:50.84 Frank That's its thinking stage. Now when it goes to output the real first token, as in generate a line of code, it already has that outline to work from. 11:52.68 James You 11:58.48 Frank It's in the history. So we're working around this basically... flaw in how llms work and that they're they're one token at a time based purely on their history they have no advanced thought no forethought you know while i'm speaking my brain's racing a tiny bit ahead and again i shouldn't be anthropomorphizing but my brain's getting a little bit ahead of my words and that's why sometimes i'll stumble when i'm talking because my brain's going a little bit faster humans can do that we're we're we're advanced we can think faster than we can speak 12:31.13 Frank AIs can't. They think and speak at the exact same rate. um So in order to do something intelligent or that requires future context, they need to write that future context first. And that's the thinking mode. 12:45.56 Frank ah But I do want to comment on your or um fast versus slow. Like, it's real, man. It's real. Like, there have been so many advances in... the execution of neural networks to make them faster and to make per token CPU cost or GPU cost, whatever processing costs lower. 13:04.38 Frank And we we we really worked hard on it. We made big contacts work. And then we added the stupid thinking mode that has basically 10x the amount of tokens that we need to output. i don't know about you, but if you ever like open up the thinking window and you see like it basically solves the whole problem in a serial fashion. 13:23.19 Frank But that was just thinking. So now it has to come back out and rewrite the thing in a serial fashion a second time. And it's just painful sometimes, especially with the slower models where you see it write out this 300 lines of code. And then you're like, oh, great. But that was all in the thinking buffer. Now I got wait for it to write out those 300 lines of code in the real buffer. it's It's super frustrating when the thinking isn't actually doing what it's supposed to be doing, working at a higher level, working on an outline, thinking of the future. But really, it's just working in that serial fashion. 13:53.98 Frank So it's a little bit annoying. We had all this effort to make neural networks more efficient. And then we just 10x the amount of work that they actually have to do. For sometimes no benefits, other times huge benefits, sometimes really bad benefits, especially if it gets into a circular thinking loop. 14:02.24 James Yeah. 14:10.68 Frank i don't know if you've ever seen this, but like, I think I'm going to do it this way. But wait, what if this? But wait, what about that? Oh, I'll go back to a A's good, but what about this? And you're like, oh my God, just break out of the loop and write some code. 14:23.75 Frank I've i've done the steering thing. I'm like, just stop thinking and write some stupid code. You've been thinking for 30 minutes, write code. 14:31.06 James yeah No, it's true. It's it's ah the fear I've always had about changing anything from the default is for that exact reason. 14:44.89 James It's that I never knew technically how it works. And if I was going to get it into the correct state in which a higher or lower thinking and reasoning level would be appropriate. 14:54.07 Frank Yeah. 14:59.77 James I mean, I guess... if we're talking about what developers should think about is not just how it works and how it's reading and writing and processing, but there's going to be this decision that is very visible to them. It already is. 15:23.19 James and I have talked to a lot of developers that seem to say, well, I'm just going to pick the biggest context, the biggest model and ah the most reasoning, because why wouldn't I? 15:34.10 Frank Yep. Heck yeah. 15:40.28 James Because that's the best. 15:40.25 Frank Yeah. 15:42.65 Frank Yeah. Such the wrong mentality. Yeah, it's it's the bigger is better. Bigger is always better. You know, Ford F-150, good. 350. It's almost three times better. 15:54.68 Frank You know, it's like two point something better. Yeah, it's it's you think that, right? Oh, God, hate that word now. Yeah. But I would say it's just not true. 16:05.72 James Sorry. 16:07.68 Frank um The hardest part is we don't know when it's true, because sometimes thinking helps for sure. It's it's that scenario that i was talking about when it needs future. 16:18.14 Frank It needs to come up with a whole plan before it starts executing. But I would argue it's probably better to work out a plan first than to rely on thinking. Thinking is kind of the autopilot thing where you're like, I'm going to give you two sentences and I want you to one shot 16:37.30 Frank Unreal Tournament. you know Give me the whole game and in these two very vague sentences I wrote. In that case, thinking mode probably good because you did not give it enough information, and it's going to need to work out some details before it writes that first line of code. It's important. 16:53.56 Frank But other times, if you tell it, hey, there's an error on line six, it's because there's a missing semicolon. ah you You really don't need thinking. i promise you, you do not need thinking in that case. As you said, GPT-4 could solve that problem trivially, and you just don't need it. 17:11.80 Frank And the problem we have as developers is deciding. We don't know how these work. These are very sophisticated black boxes. We don't know. So of course, Ford F-350, always. You need that to go up and down the street at 30 miles an hour. 100%. 17:29.05 Frank Of course you do. 17:31.00 James no Why wouldn't you? 17:31.16 Frank It's the problem. 17:32.28 James Makes logistical sense to me. There's a graph from Anthropic when they launched Opus 4.7. 17:43.00 James for seven And they were talking about the token usage. And it seems as though they were optimizing token usage for lower reasoning thinking levels. 17:57.43 James However, they added a new X high, extra high or max. There's max and there's extra high. 18:02.68 Frank Yeah. 18:03.15 James I don't know. And it goes off the charts. 18:04.31 Frank Yeah. 18:07.78 James It's wild. 18:08.35 Frank Oh, does it? I believe that. 18:09.93 James Mm-hmm. 18:10.92 Frank I've seen it going in its loops, yeah. 18:13.78 James And it just goes and goes and goes because there's that. And I agree with you. I think your analogy of what problem are you trying to solve? What is the complexity of it? What is the prompt that you're giving it? Are you giving it planning or not? And there's that. So Pierce used to always say, wise man, Pierce Bogan used to say, that he would plan with a larger model. 18:34.07 James So he would say, I want to plan. I want to get the details correct, ironed out, thought about. And then he would execute it with a smaller model. because it's following, it's just following very specific instructions and what to do, and it would give him pretty good results quickly. 18:48.38 Frank yeah 18:53.02 Frank I agree. and But there's other knobs to play with because what you want in that planning stage is creativity, basically, and the freedom for it to kind of think of the future and and also the present and think through all the problems. So think through all the potential problems. That's where a big model helps. Creativity, and no one ever talks about this, but you can change the temperature setting. Remember, these are random processes. You can make them more or less random depending on your mood. No one ever does this because it's not exposed in our UIs, but you can make it more random. 19:26.58 Frank um Give it a prompt to plan and have it plan four different ways with a very high random ah temperature. And you're going to get four different plans. And then, yeah, i agree with Pierce. 19:40.85 Frank If you have a sophisticated plan and you agree with it, ah give it give it to a simple model. problem is James, I don't know about you. um i often don't read the plan. 19:53.07 Frank I'm terrible. 19:53.26 James Yeah. 19:54.83 Frank I'm like, plan this because I know planning is important and because i know it's it's a way of thinking. It's a way of thinking through the problem because it'll still think during a plan, too. Now we're at multiple levels of thought. 20:05.94 Frank um But then I don't read the plan. I'm just like, yeah, yeah, go, babe. I just know I wanted it to work it out before it started coding because that generally improves the results. 20:17.66 Frank um So what I said about that, I agree with Pierce, but um if I was a better developer, I would have it generate four different plans. I would read those four different plans, choose the best one, and then execute that. 20:33.05 Frank that's That's thought. 20:33.46 James were Or, or would say, spin up four work trees, plan it four different ways and then implement all four of them and then see them and test all four in action and then pick one based on the implementation that you like the best of the four. 20:40.15 Frank so Yeah. 20:44.63 Frank Yeah. 20:48.18 James Cause you know, why not? 20:49.13 Frank See, I guess that's where the problem, that's my problem with the the thought, like, because it's not like being more creative. By the way, um creativity, our only knob on creativity is this random sampling. 21:00.66 Frank The more random you make the result, the more creative it's going to be. 21:00.84 James Hmm. 21:03.84 Frank And I put that in scare quotes for people not watching. You should watch, by the way, it's on YouTube. um and' it's my My biggest gripe with thinking is that it's it's one way to improve the quality of the results of these networks. It is not the only way. We've all learned now that harnesses and system prompts make a big deal. The tools available to the agent have a big impact on its quality. The ease of using those tools has a big impact on the quality. Planning has a big impact on quality. 21:37.75 Frank Thinking is like a cool little academic hack because they can bake that into the model. It can be a ah part of the training and they can post benchmarks saying, look, it thinks with this efficiency and all that. 21:50.06 Frank But that's just one of many knobs that we can turn to improve the quality of the cell I'm outpost. 21:56.57 James Yeah, it's, I recently was using, and talking about system prompts and how the harness works. I was recently using GitHub co-pilot for Xcode. 22:08.98 James I was working on a Swift app and they've had tons of updates. 22:09.27 Frank Okay, yeah. 22:12.65 James And first and foremost, they updated it to use the built-in Xcode MCP tools. So in Xcode now, you can say expose the Xcode MCP and coding agents can connect to it. 22:19.00 Frank Yeah. 22:25.60 James Makes it thousand times better. Like it basically just gives it access to Xcode. 22:29.05 Frank We need to do a whole episode on that because that, yeah, it brings up a lot of things, but yeah. 22:29.79 James Okay. 22:33.73 James It brings up a lot. Well, it brings up the extensive. Why is Xcode just not as extensible? So, you know what I mean? 22:40.92 Frank Well, it also brings up why does Apple get to turn their apps into MCP servers, but they don't have a way for normal apps to become MCP servers. 22:48.78 James Ooh, spicy. 22:48.76 Frank It brings up that. That's the question I care about. 22:51.92 James Spicy. 22:52.38 Frank why why Why do you get to be special, Apple? 22:55.30 James Well, it's their operating system. They do what they want. 22:57.78 Frank Yeah. 22:58.71 James So, I digress. 22:58.68 Frank Yeah. 23:00.35 James But what I noticed... was that there is only two, I think there's only two modes. It is just ask an agent and there was no plan mode. You can of course add a custom planning mode. 23:17.40 James However, what I realized is at least that team, and now this might change because I know they're evolving the product pretty quick, but this is at least two weeks ago, is when I would give it a prompt it would, as part of the system prompt, tell it to create a plan, ask me about it, and then execute on it. 23:38.90 James so 23:39.01 Frank Oh, good. Okay. 23:39.86 James and you And you would see a UI for a plan in place. 23:40.12 Frank Yeah. 23:43.22 James And inherently, that's what's happening, the to-do list. You see the to-do list. It has come up with a plan of sorts. 23:46.94 Frank yeah 23:49.53 James However, enjoyed this because i could see the plan ahead time, and it was asking me naturally questions for it. And I, and then it seemed to do that for every single prompt, no matter how complex it was, it was like, come up with a plan for this. 24:04.07 Frank yeah 24:05.97 James And to some extent that is kind of good for our friend, the context window and cash tokens and having it all right there in some way, because I think that all the tool call, like I don't think they were changing the system prompt. 24:06.17 Frank yeah 24:20.69 James I don't think they were changing the tools. 24:21.74 Frank Right. 24:22.33 James The system prompt was come up with a plan first, then ah ask questions, then execute the plan. when all questions are sufficed. 24:32.42 Frank I really appreciate that in a harness because honestly, I'm sometimes so lazy. i can't even click to change the mode to plan, or I can't remember the key binding to switch the mode to plan. 24:41.82 James yeah 24:42.36 Frank And therefore I won't plan. um And also sometimes I'm like, God, no matter what I, it's going to have, have three questions. I could ask it to write a lunar lander, or i could ask it to write a script that says, hello world. And it's still going to come up with three questions. It's going to ask me what Who baked in this three question thing? Anyway, I digress because i agree with you. One of my personal criticisms of a lot of harnesses is that when you switch out a plan mode, it kills your cash. um 25:13.78 Frank That doesn't matter if if networks are free to you, but it matters a lot if you're paying for um requests because cash tokens are far, far, far cheaper than input tokens. Usually like a 10x multiplier on that. And money matters. 25:31.64 Frank um So yeah, I really appreciate that that happens. And it makes perfect sense because it's just another system prompt you're injecting. You're like... here's 30 tools you can use, but don't you dare call any of them right now. 25:45.03 Frank And you know the harness can make it even better. 25:45.63 James I'm 25:47.51 Frank If it tries to call one of those tools during this pseudo plan mode, then it can just say, hey, buddy, not allowed. Stop it. and Without messing with the system prompt. 25:58.36 Frank ah there's There's so much progress harnesses can make over all this. 26:02.18 James um'm I'm getting ready for a world and I've i've never seen this yet. if there is a harness that does this, please write in, leave us a show note comments is, you know, I'm already living in that auto model world. 26:16.17 James Right. And I'm already living in the, whatever the thinking token, the thinking level is just default to it. 26:16.82 Frank Yeah. 26:22.52 James But It would also be rad just have an auto agent mode. Like you just figure out what I want to do and you do it now. Technically I could be like, I'm asking you a question, no need to change code. 26:30.87 Frank Read my mind mode. 26:34.36 James Right. And then it will just do it, but figure it out ahead of time in this, in one system prompt. So I just opened it. It's one chat. It's like, I'll figure it out. You know what I mean? but I'll figure it out. 26:44.60 Frank the The next time I prompt you for a question, have an LLM answer the question for it. 26:49.85 James Yeah. 26:50.71 Frank Read the plan up till now. 26:50.77 James Did figure it out. Well, 26:52.67 Frank Yeah. 26:53.11 James Well, let's, you know, we've been talking about different pricing changes. You know, this episode is going to go out, I think on usage-based billing day. So timely June 1st. 27:02.78 Frank Oh, no. The apocalypse. 27:06.27 James So one ah thing for developers, there's a lot of knobs to turn when balancing the models that they're using. 27:13.45 Frank Mm-hmm. 27:15.92 James And one is the model. The other is the context window and think about the cash and the cash tokens. The other things is the plan mode and creating the plan ahead time. But the other one is this thinking level. 27:28.02 James ah So would you recommend developers play more actively with this thinking effort, reasoning levels? is you know When you do turn it up, it does say this may increase costs. 27:43.83 James Do they then turn it down in some scenarios too? 27:44.15 Frank Yeah. You know... 27:48.86 Frank you know 27:49.18 James to Do stuff? What are thoughts? 27:51.12 Frank It's tough. it's It's a tough, because I do have an answer and I'll give an answer, I promise. But first I want to hedge my answer before I even give it. 27:58.07 James Okay. 28:00.28 Frank It depends. That's the problem. That's the problem. Like sometimes you need a lot of thought. Sometimes you don't need a lot of thought. It's really hard to predict. um So, and the other big part is it's hard to AB test these things because they're random samples of a random number generator. Like you can give it the same exact context. 28:21.14 Frank And same exact code base, and it's going to give you two different answers. And they're going to be so similar to each other, but they're going to be slightly different. And maybe on the first run, they're not going be so different. But maybe six chats down, they're going to be very different because they they on each step, they're going to diverge a little bit more from each other. 28:33.53 James Yeah. yeah 28:38.38 Frank So it's really hard to A-B test these things. Should I have done this or not? But I'll give you an answer because I promise to you should absolutely try low thinking mode. 28:49.56 Frank One hundred million percent. And you should do planning more because planning is just a manual way to say thinking mode. And it's a manual way that gives you a chance to interject a little because it will ask you three questions and you'll get to answer those three questions. 29:06.69 Frank And whenever it asks you three questions, that's your chance to just put in any old garbage, by the way. 29:11.27 James Yeah. Yeah. 29:11.54 Frank um Just because it asks you what color is the sky doesn't mean you have to answer what color is the sky. This thing is a slave. It's a robot. you can put in You can put in the works of Shakespeare into that answer field to change how it's going to behave because it's going to read your response and it's going to act on it. 29:29.59 Frank So 100% you should try the lower thinking modes, especially now that we're in the pay per token world. 100 million percent. That said... 29:40.25 Frank Will you be able to tell? Like, what if it gives you a bad result? You're going to like, darn you, Frank, you told me low thinking and ah it just it wrote bad codes. 29:47.45 James Yeah. Yeah, Frank. 29:49.70 Frank is This is Frank's fault. Well, you can't prove it because I already said you can't A-B b test these things. It's too difficult. And so you can't prove it's my fault and therefore don't yell at me. 30:00.73 Frank um i i what i what would be a little bit fun is keep it on low or keep it on zero and i'll get to i i'm actually i'm i'm trying out zero by the way no thinking um but uh try low and you know what if it writes some bad code take that as a learning lesson or you saved a bunch of money on not having thinking tokens Maybe now you can go back to a snapshot, go go up one level in the chat, fork the conversation, and be like, all right, do do some extra high thinking this time and see if it does better. 30:14.82 James Hmm. 30:38.81 Frank um Because honestly, the only way to get a feel for these things is to play with them a little bit. And we have to not be in this world where we think that there's for one prompt, there's one answer. 30:49.66 Frank These are random number generators. They're going to do random stuff. So you just got to try those different paths. 30:57.62 James and There you go. Pro tips from the Frank Kruger. 31:00.22 Frank yeah 31:00.70 James I'm going to start to play around with it more and see. 31:04.41 Frank yeah so 31:05.55 James Although, wouldn't it be nice to have an auto-thinking toggle too? 31:11.54 Frank There should be. There really should be. like There should be ah a tiny little network in the back being like, okay, that is a crazy request. You gave zero context to us. You did zero planning. 31:21.94 Frank Time to extra high the thinking. um So I just want to put out there. So I've been still enjoying rocking QN 3.6 27 billion parameter model, which ah twenty seven billion parameter model which is tiny like I think the new Grok that was just trained is like 1.0 or 1.5 trillion parameters like yeah and I'm doing I'm rocking 27 billion over here quantized four bit quantized I i don't know what Grok's running but you know whatever um so there's been an interesting study with um especially this quantized version where it's a little dumber 31:43.38 James Oh my god. 32:03.00 Frank Have you ever had a memory? You know you have the memory, but you just can't recall that memory. That's kind of what quantized is like with these smaller models. And ah people have found that turning off thinking is actually really beneficial. 32:18.58 James Hmm. 32:18.58 Frank A, because you're running these things locally, you're burning a lot of compute and a lot of context on those thinking windows. 32:18.66 James Hmm. 32:28.18 Frank and B, they ah don't seem to help too much with these quantized models. you know This does not necessarily apply to Opus 4.7 extra high thinking. This applies to 27 billion parameter model running on my 3090. 32:43.29 Frank it actually benefits you and the model to some extent to not do any thinking. Or at least that's what all the reports are saying. I have to admit, I haven't rocked this mode too much yet, but it's totally going to happen because i am tired of seeing it think in loops. 33:02.23 Frank I'm tired of seeing it write out all the code twice, once in the thinking window, second time in the real window. 33:06.79 James Yeah. 33:09.59 Frank It it seems very wasteful. Instead, I'm going to do what I've been promoting this whole podcast is just make sure I do plan mode first and then rock it out. 33:20.60 James There go. All right. Well, give it a try, folks. Let us know what you think if you're rocking all sorts of different thinking levels ah with your models or what models you're also running locally or remotely. 33:30.70 James and Let us know what what you're thinking. 33:31.54 Frank Hmm. 33:32.62 James Go to youtube.com forward slash at MergeConflict.fm. You can also go to MergeConflict.fm for all of your podcast needs. And you can like subscribe. You can rate and review. You can send us an email. You can hit us up on Twitter. the All of the links are mergeconflict.fm. That's a little website. Boom. You just go there. and Things are there. right. It's going to do for this week's Merge Conflict. Have a great build, everyone. I'll be at build. Vibe coding. If you're doing this here we got a uh a live code booth will be on the github youtube um where we have the terminal live which is there's this whole terminal section um there like i guess in booth area or whatever you know the booth there is bunch terminal stuff and then we have a live code booth and we're going to be talking all the different amazing stuff from built so if you're interested in that it'll be live streaming tuesday and wednesday june 2nd and 3rd the 34:15.87 Frank Ooh. 34:23.39 James GitHub YouTube. So give that a look-sees and don't miss out. There's going be a bunch of goodies. So head over to build.microsoft.com. You can register it for build for free. It's online. 34:33.67 James You just watch it online. 34:33.76 Frank Ooh. 34:34.59 James just Boom, you're there. You can also... 34:35.64 Frank That's got to be the cheapest way. 34:37.34 James That's the cheapest way. It's $0. $0. No thinking required. All that's going do it for this week's Emerge Conflict. So until next time, I'm James Altamagno. 34:45.56 Frank And I'm Frank Kruger. Thanks for watching and listening. 34:48.66 James Peace.