00:00.47 James Welcome back, everyone, to Merge Conflict, your weekly developer podcast, talking about all things in the world of software development. And as I podcast, I'm actively writing code, Frank. Look it, no hands. it's just It's just going. 00:12.59 James It's happening. 00:12.73 Frank Are you writing code or is your little robot writing code? James, i don't believe that you're writing code. When was the last time you wrote code? 00:22.94 James ah Well, you just have been testing an app for me recently, new Maui app that I've been releasing, my Pet Insulin Tracker 5000. 00:26.33 Frank okay 00:29.27 James And it's really good. 00:29.62 Frank Good icon. 00:30.78 James Good. Yeah. That's a, that's a sonnet or that's a Claude 4-6 Opus SVG. 00:37.77 Frank I thought you were to it was a human being. 00:38.34 James Yeah. 00:39.27 Frank Okay. No? Okay. Yeah, it's it's basic, but all icons are basic now. So good good for the AIs. Still got get around to writing my icon generator app. 00:47.41 James Yeah. Yeah. 00:50.82 Frank Don't steal my idea. 00:50.78 James You definitely should. 00:52.56 Frank don't don't Don't anyone take it. That one's mine. 00:54.58 James i'm pretty I'm pretty sure you could you could just do it right now while we're typing, like, you know, while we're podcasting. 00:55.33 Frank I'm going to do it. 00:58.87 Frank Stop it. 00:59.74 James You could open up your open claw and, like, let it go to work and then just do your thing. 01:00.41 Frank No. excellent 01:04.14 James ah All right. We got a pretty good follow-up episode. If people listen to episode 501 where we talked about whatever we talked about last week, this week we got a special machine learning. 01:14.17 James However, Frank, I don't know if you know,.NET 11 Preview 1 came out today. 01:20.70 Frank I see because, um well, a my god, i can't keep up. I'm still porting all my apps to.NET 10. B, I heard discriminated unions are definitely getting into C Sharp, what, 15? I can't even keep track of the C Sharp versions. 01:37.78 Frank but um And I heard that the Maui run experience is awesome. But that's everything I've heard about.NET 11. I haven't had a chance to install it because I'm still upgrading to 10. 01:47.51 James Yeah, that's all it's all good. i do want to The only thing I want to point out specifically as I was you know doing a bunch of work, ah you know to my team traditionally has helped and managed a lot of the release notes and the blog stuff. 02:01.44 James We're pretty hands-off now, and now everything's like AI generated anyways. But um I will say like there is ah two big things for Maui developers, because we've still got some Maui developers listening, I'm sure, that have been with us through our Xamarin days up until now. 02:04.46 Frank Oh, stop it. 02:14.27 Frank I hope so. 02:15.26 James um One for.NET MAUI apps, the new XAML source generators are on by default. Thank goodness. 02:21.13 Frank Mm-hmm, mm-hmm. 02:21.50 James ah For good stuff. And then there's a new core CLR is on by default now. So goodbye, mono runtime. 02:27.84 Frank Oh boy. 02:29.28 James Bye-bye. 02:29.75 Frank Never, never. 02:32.06 James ah And then additionally... um dot .NET Run has some brand new beautiful CLI commands. It's interactive. It allows you to um pick if you want to deploy to iOS, Android, Mac. 02:47.29 James When you do.NET Run, it'll select your simulators and emulators. It'll show you everything that's connected to your machine. 02:52.75 Frank Oh, 02:53.41 James or What a beautiful world. And I can't wait to use this. And that's going to really nice, I think, actually, like using it with agent mode and things like that. Elsa has to figure out these crazy strings and whatnot. 03:00.25 Frank yes. 03:02.36 James So very, very nice. So definitely check that out. That's all we're going to talk about because we have machines to learn Frank Kruger and they're learning and they're machining. 03:11.00 Frank oh he asked 03:15.29 Frank Well, I just thought it's been a while since we've done an actual like machine learning episode. We keep talking about using ah AI and AI this and AI.com and all the... 03:21.36 James Well, 03:26.35 Frank Did you watch the Super Bowl? was like AI ad after AI ad. So it's all good. It's all important. We're going through the AI revolution. 03:32.66 James super Super Bowl. 03:34.94 Frank Super, Super Bowl. 03:34.99 James It's a... 03:36.82 Frank Super, Super Football. 03:36.91 James Bowl. is it Is it like skating? Because it's like they're in a bowl. and Or maybe it's like ah like ah like skiing or like a snowboarding because they're in a bowl. 03:42.17 Frank Mm-hmm, mm-hmm. 03:46.04 James Like ah it's a Super Bowl. 03:46.97 Frank You gotta cover it in ice so it's super, because otherwise it would just be mediocre. 03:50.81 James Super. 03:51.96 Frank I mean, no one wants a mediocre bowl. 03:52.12 James Yeah. 03:55.22 James I heard it was a mediocre bowl, so... 03:55.61 Frank i It actually was. 03:57.85 James For about a fur or about all almost all of it, except for like the halftime show, which was spectacular, and then like the very like two minutes or whatever when the Seahawks dominated. so 04:07.76 Frank Well, it's better than being stressed out the whole time. That's it. You know, it's like, do you want to be excited? That's one way to watch a game. Or you could just be relaxed and happy that Seattle's winning. 04:13.72 James Or asleep. 04:17.75 Frank Wow, we diverged already. 04:18.38 James Yeah. 04:19.07 Frank How many people did we just make mad? 04:22.42 James A few 12s, probably. 04:24.76 Frank Okay. um Yeah, I want to talk about machine learning because there's ah been a little bit of a revolution that I've been sleeping on. And i think um it was new to me. 04:36.50 Frank And I'm excited by new things, as you can always tell in the show. So I want to talk about it. It gets back to our favorite topic. Speaking of icon generation, image generation. James, remember that topic? 04:48.33 Frank Before the LLMs, remember when we were excited about Dolly? 04:48.57 James Yes. It's 04:51.69 Frank and Remember how every artist is still mad at Dolly? And remember how like there's not one video on the internet now you can believe? You're like, well, that's fake. Everything's fake. That's fake. 05:02.01 James all fake. Yeah, it's all fake. 05:03.02 Frank It's all fake. I want to talk about fake videos. And I want to talk about why there's a bit of a revolution happening there. Because there was um a technical revolution that happened. 05:13.21 Frank But in my way, we have to go back in time first. Are you ready? 05:19.48 James i would say yes, because I feel like you've been setting up like some sort of weird like Beatles analogy with like revolutions and going back in time. 05:26.97 Frank Oh. Oh. 05:29.52 James You say you want a 05:30.19 Frank That'd be too clever. Hmm. I don't actually know anything about the Beatles, so that there goes all my cred. 05:37.69 James by By the way, I just saying that we just got to take down notice. So. 05:40.98 Frank Oh, good. Great. Yeah. 05:41.95 James Yeah, you're welcome. 05:42.69 Frank Demonetized. There goes all our budget. 05:43.83 James the Well, don't worry, because we're not monetized on YouTube. So make sure you like and subscribe and go there. Now, I think what's really fascinating is, yes, I remember when Dali came out. it was kind of this like revolutionary time and the stable diffusion, all these other things. 05:57.27 James And there's a huge divide in the world of AI generation in general, but also AI art generation. 05:57.67 Frank Yeah. 06:05.53 James I have an artist friend, Ben, who was my officiant at our wedding. 06:10.68 Frank 06:10.70 James our, our officiant at our wedding, uh, more properly, but, uh, one of my best friends we worked at at crunch time and, uh, Ben, uh, was really cool because he's building a lot of board games. 06:20.78 James He's doing a lot of stuff and he was like super deep into like the stable diffusion and the discord and doing all this stuff. And he was really, as an artist, really able to make this. i was looking at some of his decks and his cards that he was, was putting out in his games, his board games that he was making. 06:34.46 James I was this art is so beautiful. He's like all, all stable diffusion. He's like all, i so all of it. 06:38.46 Frank Oh, good. 06:40.05 James He's like, He's like, I just, he's like, I once he's like, once I, once I mastered the art of making the art, I made the art. 06:46.46 Frank Yeah. The new tool. 06:49.54 James Yeah. He mastered the tool. 06:50.36 Frank The new paintbrush. Fantastic. 06:51.74 James And it was, it was honestly, because he's such a good artist. i i didn't make I didn't think that that wasn't my first assumption to be like, oh, this is AI generated art. 07:02.09 James I was like, well, you just, and it's, of of course, it's kind of like mystical and, you know, all these like, you know, imaginary stuff. 07:02.26 Frank Sure. 07:07.08 Frank Mm-hmm. 07:07.88 James So it's not like photorealistic, but I was like, wow, it's like really good. Anyways, but that was like two, three years ago. So like two years ago, maybe. 07:15.93 Frank Two years? Okay. 07:16.95 James Three years, three years ago. 07:17.37 Frank Well, do you remember in in the early days, might have been even pre-podcast, I did a presentation at our meetup in Seattle on these things called GANs, G-A-N. 07:18.07 James Yeah, something like that. 07:30.47 James Yep, I remember. 07:31.59 Frank You remember? 07:32.31 James Nobody will forget. Everyone will remember your GANS presentation, Frank. GANS all day. 07:36.25 Frank I feel like you're telling me something. 07:36.79 James GANS, GANS, GANS. 07:38.20 Frank GANs, GANs, GANs. Well, GANs were important. 07:39.13 James We did a whole podcast on GANS. GANS for days. 07:41.24 Frank Good. 07:42.36 James Good. 07:42.56 Frank I'm sure episode 50 out of 500. 07:45.31 James I'm going to look it I'm going look it up. 07:46.44 Frank You can go find it. um So GANs were really important. They were are one of our earliest generative models for creating artwork. And I should preface this whole thing that I want to talk about, this whole machine learning, is a new technique for generating images and generating videos. But I feel like I have to give a little bit of a background. 08:05.98 Frank And that background, though this isn't where the history begins, I just want to mention GANs because they were the first time we actually started getting some interesting artwork out of these AIs. And if you've never heard of a GAN before, it's a general adversarial network. It's kind of a clever concept. You have one network generating artwork and then another network critiquing it, saying, is that good? 08:31.22 Frank Is that good? Is that good? I don't know. Is that good? And the whole point was, because you wanted to be generative, you wanted to be creative. I'll put that in scare quotes because people hate using the word creative with these things. It was um it wasn't like a traditional machine learning problem where you have input and output and you can compare against the output. Because it's creating new things, you don't have an output to compare against. So what you did instead was you trained two networks, one that made the art and one that critiqued the art. 09:02.58 Frank the generator, and the adversary. And that's all it was. It was a very simple idea, a very clever idea, a very brilliant idea, honestly, because you can use it to generate lots of things. So is that a good summary? I just summarized. that that should have been the whole 30-minute presentation right there, that one-minute summary. 09:19.61 Frank Makes sense, right? 09:19.98 James Opening. Yeah. And then, but I'm pretty sure the presentation went on for a few hours, but besides that, yeah. 09:25.96 Frank Ouch. 09:25.90 James But no, I think, I think in our year of February, 2026, that's a great explanation of GANs. I like it. Yeah. 09:32.31 Frank Okay, okay. But that's not what we use today. um There was a great revolution happened in around 2020. Do you remember what replaced GANs? 09:47.00 Frank it's It's part of stable diffusion. It's one of the words in stable diffusion. Diffusion model. 09:50.83 James Diffusion. 09:51.80 Frank Yes. ah 09:52.78 James Diffusion model. And they happen to be pretty stable. 09:58.46 Frank Some of them. Well, actually they were. So the the problem with GANs was they were hard to train, really hard to train. In fact, um I've trained a million of them and maybe one of them worked out of that million sample. 10:14.29 Frank The trick is it's a really unstable kind of training environment because you have one thing learning to make images and you have another thing critiquing them and they're fighting. And if you get that fight out of balance, it just falls apart. We call it modal collapse. it's it's It became um a real struggle, honestly. If you knew what you were doing, no, that's not even in the truth. If you got lucky and you vaguely knew what you were doing, 10:41.69 Frank And he spent a lot of time and he pulled out enough hair and the moon was in the right phase and all your zodiac sign was up, then maybe you could get the generator and the adversary to work together. 10:55.29 Frank But it was a real pain in the bottom. 10:58.41 James Thank you. 11:00.25 Frank So this new thing, this diffusion came out. And the beautiful thing about diffusion was it was stupid, easy to train. Any idiot could make a diffusion network. They were so easy to train because we came up with a supervised learning technique. You weren't training two networks anymore. 11:21.24 Frank Do you know roughly how diffusion models work? It's kind of clever. have you do you know any of the theory behind it? 11:27.83 James I think you explained it to us on a podcast, but it'd be great to have a refresher, Frank, just because, you know, I know, obviously, but you know, there's probably some new listeners. 11:33.88 Frank Obviously, you know James. Yeah. 11:35.67 James you know, I don't want to spoil it for everybody. 11:38.62 Frank So this is still the state of the art. like I'm pretty sure Dolly 3 is still a diffusion model. I'm pretty sure Nano Banana, all all all lots of the um image generator out there are diffusion models. So it behooves you to know how these things work. 11:54.26 Frank So you start with um a very noisy image. Or let's let's actually not even go there. but Let's go midway through the inference step. So you have an image, and you add some noise to it. 12:06.71 Frank And you train a network. to remove the noise. 12:09.60 James Hmm. 12:09.70 Frank Isn't that clever? 12:09.68 James Yeah. 12:10.87 Frank Very simple thing. Because if you want to train one of these, you can just take a really nice image, or let's say a really nice video, or a really nice audio sample, add some noise to it, and then train a network to remove that noise. 12:27.67 Frank Seems simple enough, right? How hard could that be? It's a nice supervised learning task. You know the input, you know the output. Oh, but this is why I say any idiot can train these. and Obviously I'm being reductive, but it's so much easier to train these than GAN. 12:44.82 Frank And the real the real leap you have to make is, well, what if? What if, James, I start with pure noise? What if I just start with all noise? And then I ask it to remove the noise. 12:56.57 Frank Well, a little signal comes out. Maybe still noisy because it's an incredibly hard problem. And then what if I do that again? Oh, okay. It'll remove a little bit more noise and pull out a little more signal. What if I do that again and again and again? And you can imagine going from this thing that's pure noise, pure randomness, until it generates an image. 13:20.12 Frank It's a very clever idea. And you do that, let's say, a thousand times. And you just incrementally, very slowly keep pulling the noise out of a noisy image. 13:32.38 Frank And what are you left with? An image. It's clever, right? That's super clever. 13:37.41 James But there was no image to begin with. 13:39.74 Frank Right. 13:40.20 James It's just noise. 13:40.82 Frank Isn't that the mind trip? Yeah, you start with just noise. 13:42.78 James Wow. 13:43.82 Frank Yeah. So it's kind of neat because that noise in the beginning kind of um obviously it influences what's going to be output. Then you have a second thing that influences your output. It's called the condition. And that's your little text prompt that you give it. 13:59.00 Frank So whenever you use one of these modern dollies or whatever's to generate an image, not SVGs. That's a different technique. 14:06.73 James Yeah. 14:08.72 Frank We're talking about like bitmaps, images, pixels. um What it's doing is it's starting with pure noise, and it's taking your little prompt, and that's just a condition. 14:11.87 James Hmm. 14:19.33 Frank It's throwing throwing those two things into a network, the noise and the prompt, and saying, remove the noise. It can't, obviously, because it's all noise. So it it it it makes it a little bit more like an image. 14:29.98 James okay 14:30.14 Frank What image? Who knows? It's a big network. It's going to make some decisions. It's been trained to remove noise, so it's going to remove noise. And then you do that a thousand times. 14:40.58 Frank And you just very slowly pull out a signal from that noise until you're left with a beautiful image like an app icon. 14:49.56 James Now, how does it how does it know? i know it doesn't know, but it's been trained on image sets and like what they are. 14:55.35 Frank Yeah. 14:59.74 James So is it, there are a bunch of zeros and ones that it's trying to like distill down to for all intents and purposes? 15:00.51 Frank Yeah. 15:07.09 James like How does it come up with like, oh, here's a scene of a dog running through a forest, right? 15:12.39 Frank huh 15:12.81 James Let's say that, if that is just noise. 15:15.99 Frank Right. Well, it's hard to say in that very beginning because quite frankly, like it's kind of shooting off in a random direction in that beginning part. But that's kind of okay because we want the creativity. 15:24.26 James Yeah. 15:27.45 Frank We want a bit of randomness in this thing. So let's not think about the beginning. Let's let's pretend it figured out the beginning and it's halfway through. So what do you have? You have a picture of a dog in a field that's a very noisy. 15:41.06 Frank Like, imagine taking it at the wrong exposure level on a camera. 15:44.41 James yeah 15:44.58 Frank So you're getting a lot of noise. Or it's a nighttime image. It's a very noisy image. You would agree there that it can say, oh, there's kind of a dog, kind of a field. I'm going to find the noise, and I'm going to remove it. 15:55.09 Frank You're kind of OK with that, right? 15:56.82 James Yeah, yeah, yeah. 15:57.34 Frank Maybe. 15:57.34 James Yeah. yeah Yeah, I'm okay with it. 15:58.17 Frank Yeah. So just keep thinking it gets noisier, it gets noisier. So now it's having to do a lot more guessing at what the actual image is there. 16:09.98 Frank um But because it's working in such small increments, because it's removing just a small bit of noise each time, it can... Maybe there's a little dark spot in the noise, and it'll turn that into one eyeball of the puppy dog. 16:25.60 Frank Or maybe there happens to be a tiny bit of a green section in the noise. It'll turn that into a tree. That's a bit of an oversimplification, but you can kind of imagine that as a model. It's easy to imagine the halfway point, half noise, half signal. 16:37.93 James Yeah. 16:38.53 Frank It can do that. So you just got to have a little bit of faith that in the early moments when it's mostly noise, it's just kind of picking kind of a random direction of what kind of image to generate. 16:51.96 Frank You can kind of see that working, right? 16:51.90 James So yeah, I think so. I mean, i think almost like, um, it's kind of like when you used to do the Polaroids, right. Or like not the Polaroids, but the Polaroids, like you kind of would flap them and it kind of go, or you, 17:03.02 Frank Yeah. Don't shake them. You're not supposed to shake them. 17:05.33 James You got everyone does. 17:06.71 Frank but 17:06.93 James Or if you went into a ah dark room, right? And you actually like it went from like nothing and I kind of distilled down. 17:10.74 Frank Yeah. 17:12.61 James It kind of reminds me of that mechanism. still don't necessarily understand how it knows that it drew the thing that I mean, I know it doesn't know, right? It doesn't know anything, but it knows everything. 17:20.92 Frank Right. 17:23.06 James But it doesn't know that it drew a dog running through a field because it didn't actually draw anything, right? 17:26.66 Frank Well. 17:27.74 James It just removed a bunch of noise to distill down to 17:30.33 Frank Mm-hmm. Right. 17:33.78 James that dog running through a field, but like, just because. 17:36.41 Frank Yeah. So the dog running through the field, imagine it was trained purely on dogs running through fields. Absolutely nothing else. It's going to in the end, generate a random dog running through a random field. The harder problem is when you have something like Dolly where you want the users to be able to control what it outputs. And that's where you mix in the, I want a dog running through a field, a brown dog, and it should be a puppy dog, and it should have something in its mouth, a tennis ball in its mouth. 18:04.64 James Yeah. 18:05.22 Frank That's, um it does learn to get a tennis ball in its mouth because it's been trained on a million images of dogs with tennis balls in their mouth or tennis balls and dogs and it learns how to compose things also. These are giant networks, they have so many parameters that they can actually figure this kind of stuff out. 18:25.94 Frank So while its main task is pulling a signal out of noise, even though it could be pure be noise at some point, it also has this guidance, this text prompt telling it kind of what direction to go. Dog, field, tennis ball. 18:42.55 Frank Okay. 18:43.60 James Yeah, got it. So this is how like modern things are working today, correct? 18:45.09 Frank So this is modern. This is, well, let's call this all the way up to 2022, 2023, and then new revolution happened. and then a new revolution happened 18:55.47 James No. 18:55.61 Frank because there's a problem here. Diffusion is great because it's stupid easy to train and it's stupid easy to execute. What's the big problem? 19:07.01 Frank It's slow. 19:08.25 James Super slow. 19:08.63 Frank It's so slow. 19:08.89 James Takes so long. Takes so long. 19:10.71 Frank Oh my God, so long. So, I mean, you train these things to take a thousand steps. So you have to run the neural network a thousand times to go from noise to dog running and park with tennis ball in his mouth. 19:24.73 Frank And that is just inefficient. That will take forever. 19:28.41 James Yeah, like eve even when I'm inside the the Copilot app, the Microsoft Copilot app, like it kind of funnily enough, kind of when you generate an image, it actually is kind of distills down an image. It's all blurry. 19:38.94 Frank Yeah. 19:39.14 James And it kind of like goes, that it goes from to top to bottom. 19:39.81 Frank Yep. 19:42.46 James And it kind of like reveals itself in chunks. And and like you can always like, I think I know what it is. What's it doing? And then eventually I got and then if you're like, that's off track and you can cancel it. 19:48.44 Frank Yeah. 19:51.54 James But you like never know until the very final second or like, here's your image. 19:54.87 Frank Yeah. 19:55.00 James ah But it does take a while to generate. 19:57.88 Frank It does. um And if you have any interest in this at all, anyone who's listening, I highly recommend you download Stable Diffusion and learn how to get it running on your own machine because it's really enlightening to see these things work. 20:07.99 James Yeah. 20:10.08 Frank And especially if you catch them in that midpoint where it's still noisy and you're like, oh, what is it doing? And see how it progresses from there. I should say you train it on a thousand steps, but in practice, that's way that's even too many. Like, you know, Microsoft has a billion dollars, but they ain't going to spend a billion dollars on you. So they figure out clever ways to get it down to maybe a hundred steps. So it's trained on to do a thousand steps, but they figure out we can take bigger steps. We can ask it to remove a little bit noise more noise on each step. So you train it for a thousand, you run it maybe a hundred steps to do that. 20:47.78 Frank So that's state of the art 2020 to let's call it 2023. 20:48.07 James Okay. 20:51.00 Frank Yeah. Yeah. 20:52.89 James but But you just said that like that's how modern or that's how like Nano Banana and all these other ones and even like GPT-5 models like are doing it. 21:00.79 Frank yeah 21:01.41 James So like if that was modern, why haven't these other like newer models and newer processes like evolved, I guess? 21:01.94 Frank yeah 21:09.12 Frank because it takes time for big companies to catch up to change their approach. 21:11.51 James Oh, I see. I see. I see. 21:13.74 Frank Yeah. And because they have so much money and so much investment that they can get away with a little bit of efficiency. 21:14.30 James I see. Okay. 21:19.66 Frank Because what I'm going to describe next is a new technique that everyone's using. But let me let me preface one more time. Imagine that for video. Imagine you want to generate a video. Now these networks are even bigger and you're having to run them for 1,000 100 steps. 21:34.11 Frank you know Even steps is a lot for a video. Like I think, um I forget what Sora does, but I think the like Facebook one is like 16 seconds at 16 frames per second. 21:45.62 Frank That's like 256 little images and you have to do 100 to 1000 steps on 256 images. 21:45.75 James Yeah. A lot. 21:53.43 Frank It's a lot. It all adds up. Eventually those are dollar bills Zuckerberg has to burn for you. um So we need something more efficient. And this is where this is where I finally get to the technique I want to talk about. And it's the perfect intersection of generative stuff and performance and speed. Because these are the things I love. What if, James, I said to you, not a thousand steps, not a hundred steps. 22:20.63 Frank How about ten steps? With the potential to get down to one step. 22:22.46 James I like that. 22:25.67 Frank How's that sound? 22:25.98 James yeah I'm intrigued. I'm intrigued. you spark You've sparked my interest. 22:27.86 Frank Okay. Okay. 22:29.47 James i will I might VC fund this. We'll see. 22:32.50 Frank Okay, okay. ah So I should say stable diffusion one, diffusion. Stable diffusion two, diffusion. Stable diffusion three, rectified flow. 22:43.00 James Oh, well. 22:43.26 Frank Rectified flow. New technique, baby. New kid on the block. It goes by two names. um Flow matching is the general purpose kind of technique. So you'll often see this thing called flow matching. 22:57.06 James Mm-hmm. 22:57.14 Frank um But what everyone's actually doing is a thing called rectified flow. Ignore the names. I'll get into what they actually do and how they simplify the problem. But those are the big names you'll see out there. We know for a fact um Meta's MovieGen is using this. 23:13.42 Frank StableDiffusion3 is using this. Flux is using this. I think Midjourney is training one. So it's basically this technique. The research papers came out late 2022. 23:25.11 Frank So, 2023 people started figure it out, started to write tests for it, started to you know get a feel for how these new ones will work. So, basically anyone who's trying to save a few dollars will be implementing these and these are basically used for videos also because videos are so much bigger than images, obviously. 23:31.78 James Hmm. 23:45.69 Frank So ah how do we do this? Well, instead of 1,000 points along that line of detect the noise, remove the noise, take a little step, detect the noise, remove the noise, take a little step, that's the diffusion model. What if we just say, give me a noisy image, and I'm just going to remove all the noise all at once. 24:09.15 Frank I'm just going to say, that's the noise, one shot this baby. 24:10.46 James I'm going to one shot 24:14.94 Frank And what we're going to do is we're going to draw a straight line from noise to image. So we know directly this pixel needs to change this much. 24:27.86 Frank That pixel needs to change that much. That pixel needs to change that much. Do that for all the pixels. 24:32.12 James Okay. Yeah. 24:34.17 Frank And the how much is called the velocity. Doesn't really matter. But instead of detecting the noise in the image, All we're going to do is detect how much every pixel in the purely noise image needs to change in order to produce a good image. 24:52.57 Frank And that's where the term rectified comes from. 24:52.95 James Hmm. 24:55.85 Frank It's a good good word that really just means linear path, a straight line. A straight line from noise to perfect image. and we're going to train it in a similar way to diffusion. 25:08.10 Frank We're going to have like these partially blended image, a little bit of noise, a little bit of image. And at any point along that path, it knows straight line, straight line from noise to image. 25:21.06 Frank No, like detect the noise, subtract the noise. No, I see noise. I'm going to tell you exactly how much to remove to get to the final image. 25:28.97 James right. 25:31.61 Frank And it sounds dumb and it sounds obvious and it sounds like, well, why didn't we just do that from the beginning? And that's because we had absolutely no math saying that that was at all possible. We had a whole bunch of math, um, 25:49.56 Frank a lot of theory around why stochastic differential equations, whatever, diffusion can work. We knew we could write noise detectors. We knew we could subtract noise. We knew that we could increase the signal to noise ratio. 26:03.99 Frank The big step, the big epiphany with diffusion was what if we just do it step by step? So you could think of diffusion as start with a noisy image and move, keep searching around, keep wandering around this very large, high dimensional space and find an image out there. um So we're moving from the crazy random world to a nice image world. And it's a crazy path. We've got to turn left on Main Street, right on Elm Street, you know left onto First Ave, all these kinds of things. 26:36.73 Frank But what if we just cut straight across all that? What if we just yeah well if we just go as the exactly as the crow flies? 26:39.32 James ah b A to B. A to As the bird flies. 26:44.54 Frank And that's the rectified flow. 26:44.57 James There you go. 26:46.26 Frank So instead of turning on all these streets, we just say noise to image. And you can imagine training is, training was simple with diffusion, but now it's stupid simple because all you need to predict now is take destination image, take noise image, subtract them because that's the path we need to go on and teach a neural network to come up with that number. 27:06.54 James Yeah. 27:12.06 Frank It is the easiest. If I said idiots could train a diffusion model, um monkeys could train a rectified flow model. It is so elegant in its training. And then I'll let you respond to it. And then at inference time, you can see why you could almost one shot it. 27:30.90 Frank In practice, you take 10 steps and I can get into why it kind of doesn't matter. But you take like maybe 10 steps because although we're trying to fly straight, we're not flying straight. There's a high wind. There's problems. 27:43.35 Frank But ah roughly speaking, you can just straight shot. Sounds good, right? 27:48.41 James ah Yeah, I mean, and in some aspects, right, every every image, every video, everything is just a pixel on the screen. And that pixel has an RGBA, you know, and an HSL and all this other stuff. 27:59.96 Frank Yep. 28:01.04 James But an RGBA, right, you know, value to it. 28:02.80 Frank Yep. Yep. 28:03.60 James and and if you're at And if you're at one color, if you're a one RGB, let's use RGB for for for sanity. 28:10.35 Frank Yep. All these networks use RGB, so it's perfectly valid. 28:13.82 James If you're at RGB and need to get to another RGB, you know, there is simple mathematics to add or subtract to those RGB values in general, right? 28:18.81 Frank Add. 28:24.66 James You don't need to slowly, gradually get down to like, I'm at i'm at brown and I need to get to green. 28:25.40 Frank Yeah. 28:31.48 James Like there is a very clear deciding factor of that RGB color and what you need to change it to. 28:32.06 Frank Mm-hmm. 28:36.72 Frank Yeah. 28:38.07 James Now, I would assume that the The computational part, though, is that if you're going to one-shot it, you have to calculate all of those transformations up front, though, like to to figure out what all of those changes are, which seems a little complicated compared to where gradually over time refining. 28:53.78 Frank Yes. 28:58.34 James like It's kind of like you're polishing. 28:59.75 Frank Yeah. 29:00.10 James like I kind of think of like the distillation as like, I'm polishing this. you know so I'm polishing a rock. I got it in a tumbler, and I'm just going to um i'm ah ive got tumbble it's i' got tumble. 29:07.10 Frank Yeah. Yeah. 29:11.93 James ah I was just, I'll even tell I've been tumbling for days, guys, tumbling. And then at the end, you have this beautiful, like, how is that rock shiny? That doesn't, it's a rock where you're like I just, I just tumbled. 29:22.17 James I just tumbled so much, so much tumbling. 29:23.13 Frank yeah 29:24.86 James But now you're saying like, I literally like take my rock, snap my fingers and now it's super smooth. 29:31.00 Frank Yeah, um this is probably a bad analogy, but it's more like you started with your rough rock and you asked Joe to polish it and then Sally to polish it and then Adam to polish it. And each person polishes a little different part. And you hope at the end, maybe it comes out as like a nice Michelangelo or a David or something like that. 29:51.19 Frank Whereas the rectified flow is more like, hey, just just make the statue of David. you know what Let's just go for it. 29:58.17 James and Yeah. 30:00.20 Frank like What if everyone's handed a little bit of a plan? And it's still interesting because um we're still starting with noise. like It's still starting with pure noise. 30:09.80 James Hmm. 30:11.16 Frank And we don't know as people like developing these models, we don't know which noise is going to create which image. So we don't know that we're we want a statue of David in the end. 30:22.71 Frank um Obviously, by a few steps through, we kind of see it's or I'm sorry, or it's looking like a dog in the park with a tennis ball in its mouth. um That's why the technique does, you don't, in practice, you don't one shot it. 30:38.30 Frank You actually do take, you're you're taking a sloping path. But it's a very big difference between like diffusion is solving a maze, left, right, up, down. it's going It's going in every direction to figure it out. Whereas rectified flow is mostly staying on a straight line, but veering because maybe we started on the wrong thing. Like I want a dog in the park, but I started with the wrong noise. 31:02.13 Frank It's the wrong noise, man. I started at the wrong location. 31:03.89 James Yeah. 31:04.47 Frank Like you blindfolded me. And then as we're getting closer to the dog in the park, we're really mixing analogies. Then you can start finding your way to that dog in the park. 31:15.93 Frank Whereas diffusion is just more like, i don't know, you're still walking around with the blindfold the entire time. But the beauty of it is the performance thing. So I have been trying to, I'll take myself as an example, I have been trying to train diffusion models for years now because they are easier to train than GANs, although I have a lot of experience with GANs. 31:39.13 Frank But there's a lot of tricky math with it. um And there's a lot of, we call them noise schedules because technically kind of have to work in reverse. You have to start with an image, add a bunch of noise to train a trainer network to detect the noise, subtract the noise, and coming up with a closed form solution to how to add that noise without iteratively adding the noise. That's difficult. 32:01.31 James Yeah. 32:01.40 Frank You can't solve it end to end. 32:01.47 James like math. 32:03.04 Frank You can't just have it like... Remove the noise at a thousand steps and train from noise to image because then you have the gradients disappear. It's just too big to train. The neural networks can't do it. So you have to come up with very clever math. And James, I love math. I love it to death. 32:20.18 James like math 32:20.34 Frank But i have the hardest I have the hardest time implementing diffusion math because it's tricky probabilistic math. And it's it it's nasty. it's It's gross stuff. I hate that kind of math. Whereas rectified flow... 32:33.62 Frank You subtract the final image from the noise and you tell it that. Go on that path. Don't bother with any other path. Don't bother with any complicated math. Here's the path. 32:44.76 Frank Go on the path, Mr. Network. And it's so simple. And it's so efficient that I'm back to being able to train generative models on my RTX 3090. It's back. The power is back to the people. So whereas... um OpenAI i has more money than God and they can spend all that money to make you Sora video. I don't. 33:08.76 Frank but Compute still costs me and I have limited compute, limited memory, limited math skill. But even now, i can get back into the generative game and that's what makes me excited. I can start generating video again. i can start generating images. I can start generating audio. 33:24.78 Frank And I think we're going to have another explosion in this creative space because it was just the big players that could afford to run these very expensive, very annoying diffusion models. And now with Rectified Flow, there's going to more fake videos on the internet and there's going to be more modalities, more senses being used. I can't wait for the smell generator. 33:48.04 Frank It's going to be amazing. Yeah. 33:49.88 James Well, okay, so the performance is better, but what about like the actual quality of the image, right? Because, you know, as image generation and diffusion kind of like went on, you know, we're at this place now where there's a wide variety of models, right? 34:04.56 James If you use like image playground, you're like, well, clearly that's like a terrible, terrible. 34:05.18 Frank hu 34:08.72 James And then you use like some mid models, you're like, that's pretty good. 34:09.46 Frank Yeah. 34:11.64 James Like they can, you know, with the right prompts, they can do really good. 34:12.34 Frank Yeah. 34:13.96 James And there's some other models like, you know, everyone's all gung ho on the the nano banana. Like, wow, that's like really good. Right. Um, so what is the spectrum here that we're looking at for it? 34:24.76 Frank It's better. What if I told you it's faster and it's better? 34:26.14 James It's better. no I don't believe you. 34:28.89 Frank Does that not sound like... 34:30.34 James No, no, I don't believe you. 34:31.86 Frank All for $19.95. Six easy payments of $19. Yeah. 34:33.53 James Oh my gosh. 34:33.91 Frank sponge. 34:35.83 James Act now and get free shipping and we'll put in ah and i we'll put in a a sponge and a katana. 34:39.13 Frank a sponge Sham wall. 34:42.52 James Wow. 34:44.06 Frank Yeah, it's it's one of those like, um how did we not think of this before? Well, we needed the we needed the proof in the math. That's what the flow matching thing happened to be. It's actually kind of funny. um Two independent groups kind of discovered the same technique around the same time. It was ah late 2022. All these papers are published around them. And there were two independent groups that invented it. And they both came to the same conclusion. 35:09.05 Frank This is both more efficient and produces more ah better. So what does better mean? um More diversity in the images, higher quality. And because it's more efficient, you can generate higher resolution. You can burn those GPU cycles on other things, HD, 4K, You can really start messing with this thing because... 35:35.44 Frank let's say I was devoting one second of compute to generating an image for you, well, now that ah the process is more efficient, now I can spend that on more pixels or doing video instead of images. So the creative quality has increased and ah I want to say there there have been other advancements. 35:55.63 Frank Have have you noticed how like text all of a sudden started working in these kinds of networks? 35:59.65 James Yeah. Yeah. 36:00.49 Frank Yeah. 36:00.69 James yeah 36:00.93 Frank So there have been other advancements along the way that have enabled that. But with this core technique, you can add on those other modern advancements too. And so it's better, it's faster, and it's better. And that's why Stable Diffusion 3 has adopted it. 36:18.03 Frank That's why Flux has adopted it. That's why Meta has adopted it. The only people who haven't adopted it are the people still using networks they trained back in 2023, 2024, when this technique hadn't been established yet. 36:30.49 Frank So in 2026, this year, pretty much all the new generative models will be using a technique like this because it just makes perfect sense from a business point of view and from a creativity point of view. 36:45.59 James Yeah, i mean if you go and just ah search on the internet, stable diffusion 2 versus 3, that's pretty good. 36:56.18 James Yeah, I mean, it's pretty much better in almost all instances compared to Core versus the other ones. 37:02.71 Frank Yeah. 37:03.67 James I mean, the other ones weren't bad necessarily, but they're better. 37:03.86 Frank And that's kind of, no. 37:08.47 Frank Yeah. Well, you you you see it. like um i was We were talking about that greeting card maker I was making using Apple's built-in image playground models. 37:18.56 Frank And those models are still so old they can't do text. you know so like Very early on, we all figured out you have to be able to do text. 37:23.14 James Yeah. 37:26.20 Frank And if if you understood what I was talking about, how we're going from these noisy fields to images, I think you can kind of see why text was so terrible. Because like at no path along that did it have any concept of this needs to be clear, let's say, English text in ah in this font or anything like that. input is not there. And so you can imagine. So we've had to come up with other solutions for that. But um that's because all the difficulty was in getting these diffusion models working. 37:55.54 Frank Well, now that's passe. So I'm hoping like Image Playground, the next version, who knows, WWC 2026, hopefully we'll have models baked into these machines that can actually do text correctly because, 38:09.82 Frank they will have spare CPU or CPU. and They'll have spare GPU neural engine because hopefully they're using this more advanced technique, rectified flow versus diffusion. And then they'll have that spare stuff to use for good text rendering and other stuff and longer prompts and that kind of thing. 38:26.98 James That'd be nice. Yeah, I could see that coming like a long way because the the hope and dream would be, oh, wow, like I'm on my machine and I could just use Image Playground or, you know, chat GPT or Copilot or whatever. 38:35.64 Frank Mm-hmm. 38:41.65 James They could all have local models that are doing some stuff and just like do the thing, right? And then it could just happen. And... it could use the power of the neural engine and not to like go waste and burn a bunch of, you know, other GPUs always fine. 38:53.94 Frank Yeah. 38:54.31 James GPUs, you know what i mean? That are sitting here and doing stuff. um 38:58.87 Frank Yeah. so it's And and if if they're budgeting, I don't know how they budget the stuff at these big corporations, but let's say they're budgeting five seconds of GPU time per user per request. 38:59.06 James And that could be pretty cool. 39:09.79 Frank Well, if they have something that's 10 times more efficient now, then maybe they can generate eight images in parallel, 10 images in parallel. Because you know you type in a nice ChatGPT prompt and you sit there and you wait your five seconds for that image to appear. It'd be nice if a grid of four images or 16 images appeared. And then you can pick your favorite from those instead of going one at a time and asking for another variation. 39:37.72 Frank So you know I feel like all goldfish grow to the size of the bowl to the size of the pond. like I feel like we're always going to be waiting about five seconds for an image generation. But now maybe it can be higher res or more variations at the same time. 39:49.23 James Yeah. 39:52.14 Frank Yeah. 39:52.21 James Yeah. And especially when you, if you get a better result the first time, um Like the goal there, hopefully, is that you don't have to regenerate the image because what we end up finding out, like when Heather and I are generating images, even for the podcast, it's like it's close. 40:06.29 James But then I want you to modify this one thing. I got to wait. 40:08.98 Frank Yeah. 40:09.53 James Oh, you're so close. Like for me with like the podcast, even for episode 500, it's like I really wanted the 500 to be in the center, but like it just wouldn't be. But it was like so close. So then I like redo it a little bit closer. 40:20.72 James And then I like threw it all away. 40:21.14 Frank Yeah. 40:21.85 James i was like, let's try it again. So often for me, the waiting isn't the hard part. It's when you got to wait like 10 or 10 or 12 times to get that perfect image you're looking for. 40:27.99 Frank Yeah. 40:29.85 James And I've noticed that the model has gotten better, especially with the prompts of getting a lot closer that first time. 40:34.49 Frank Mm-hmm. 40:34.81 James um But yeah, if you were to get, here's a bunch of them that are going. 40:36.88 Frank Yeah. 40:39.14 James And I've seen copilots spit out a few images at a time now too, wonder if they're doing something. But yeah, it'd be really interesting to see. 40:43.70 Frank yeah Yeah, so well this is we're we're right at the cusp of this new technique being adopted. you know Google has a lot invested into Nano Banana. 40:54.28 Frank I'm sure they're not in the mood to train a whole network from scratch again. So you know it might take a little bit of time before these things actually make it out there. 40:58.65 James Yeah. Yeah. 41:02.85 Frank But it's really neat from a machine learning perspective. From people who like training networks like myself, this is a great new technique because it kind of gets the small players back in the game. So the open source people are back. 41:13.89 Frank and It's always fun to see the stable diffusion people because most of those models you can run on your own machines also. So I highly recommend everyone go out and download those and play around. 41:24.98 James Well, and what I think about as well in this instance, kind of wrap up this podcast is like, what kind of scenarios in the future a year from now will we have, right? then When we can do a lot on our local machine, because, you you know, about like a year ago, yeah about a year ago, so eight months, 10 months ago, i was building a bunch of little games, you know, with co-pilot, GitHub co-pilot. 41:35.94 Frank Mm-hmm. 41:45.82 James And, uh, there'd be a lot of like three JS games or like little, like, um, like snake games or web games or even some Godot games. 41:53.91 Frank Yeah. 41:56.21 James But the problem is art assets, right? Like in, in theory, even to prototype. 41:58.20 Frank Oh, yeah. 42:00.21 James So a lot of it, I would be like, just draw stuff, right? Just draw geometry, which is the greatest way to like, you know, prototype games. But at some point I'm like, I would really like a little person and a little thing and a little tree and a little whatever. 42:12.13 James Right. 42:12.73 Frank Yeah. 42:12.73 James but I could never get it because I'm not an artist. And like, you know, but if I was to say, here's the game, go generate the art as I'm doing on my local machine, they might not be the final ones. 42:15.99 Frank Yeah. 42:21.30 James Obviously, i probably actually have a, you know, artist, you know, work on important assets that are for the game, but it could get me a lot farther. And for some games, if they were like card games or like things like that, you could... 42:33.40 James generate all of them like why not and you could do it today but then it'd be very tedious right whereas this you could spin up tons and like generate the whole art assets for solitaire right and then it's like done right you verify it and do a thing and a bunch of themed card decks and whatnot But but I would never be able to do that before. 42:35.49 Frank Mm-hmm. 42:38.84 Frank Yeah. 42:43.71 Frank Mm-hmm. Mm-hmm. 42:50.62 James So to me, and when I say I would never be able to do that before, it's like, yes, I would be able to do it, but I don't have the time to do it. And like, or they want to spend the money on it. Because what I would do is like, it would it would just be, you know, not something that would bring joy to my life. 42:59.19 Frank Mm-hmm. 43:04.09 James But if I could prototype and build out really cool games that were like a lot more visually pleasing, that could be really fun or even just art assets for um my apps, right? 43:15.50 James Like a great example is yes, there's tons of great font icons that are meticulously canned curated. 43:16.13 Frank Oh yeah. Yeah. 43:21.85 James But dang, it sure would be nice to have like some really cool you know backgrounds on some of my things. that were generated and put in or just generate on the person's device on the fly. You know what I mean? 43:33.46 James um So that could be that could be cool. So that'd be really neat. 43:37.34 Frank i Yeah, we'll get going, but I have to say that is 100% why I got into generative artwork was I wanted to make video games. And I'm like, although I like drawing, I know I'm bad at it and I don't practice at it and it's not the most fun thing for me. 43:50.10 James Yeah. 43:52.98 Frank um So i I want to learn how to make computers, make graphics. And I mean, since I was in my teens, I've been working on generative art. 44:03.64 Frank And I think that's why... you know you You could say this whole rectified flow thing is just an efficiency gain, and it is just an efficiency gain, but it enables me to start creating art again on my computer and training on my computer because it's so easy and it's so efficient that it just feels really liberating. 44:20.08 James Yeah. 44:21.66 Frank And I feel like I'm back to being 18 again and trying to generate. Well, Grand Theft Auto wasn't out back then, but we had an idea of something like that, like generate a city and do that kind of stuff. 44:33.27 Frank So it's fun to get back to these roots. 44:33.72 James Yeah. 44:35.90 James Totally. think it'd be so fun. Yeah. ah Possibilities. 44:38.23 Frank this 44:39.26 James Every time there's a new evolution, right, it gives you some possibilities of doing some really cool stuff, but I love it. 44:43.64 Frank Yeah. All right. 44:45.56 James All right. 44:45.97 Frank but Thanks for letting me nerd out on ML. 44:47.74 James Anything else? 44:49.13 Frank No, no. 44:49.34 James No. 44:50.05 Frank it's It's just enough agents. They're not called agents. That's what's also nice about them. They haven't been called agents yet, but it's fun to actually nerd out on the ML part of AI. Yeah. 45:02.52 James not tell I like it. I mean, it's important to understand how the stuff works under the hood. I always say, um so go do your, you know, your machine learning and your. 45:10.44 Frank Oh, wait, I should add one more thing. you know I did promise we'd talk about LLM, so I'll just end on this. 45:12.34 James Oh. 45:16.38 Frank Apple has a new large language model, and they're using flow matching to generate the text. 45:20.15 James Oh. 45:23.99 Frank So most large language models generate the next token, next token, next token. You run it for each token, run it for each token. What if you just bla generate all the tokens all at once? 45:34.33 Frank Oh, that sounds wonderful. But, oh, we can't use diffusion because that requires a thousand steps. That would take forever. So to generate a thousand tokens, you still have to do a thousand steps. Ah, but here comes rectified flow. 45:45.69 Frank In 10 steps, I can generate a thousand tokens. And Apple actually has neural networks on GitHub that do exactly this. So Apple is looking at using these flow matching rectified flow stuff specifically for large language models to be a more efficient way than the autoregressive models we're using today. So there it is. I brought it back to LLMs. 46:10.39 Frank Not an episode can go by where we don't talk about them. 46:13.34 James I mean, faster code generation in our future. Oh, 46:16.54 Frank Pretty much. um it is a coding model they released. 46:20.70 James nice. Yeah, like, I think that would be interesting as well. 46:21.82 Frank Yeah. 46:23.88 James Like when all these models, you know, evolve even faster, like right now, we're all just worried about context windows and, and speed and this and that and all these things and blah, blah, blah, there's fast models, there's this model and everything that is odd. 46:30.74 Frank Yeah. 46:35.04 Frank Mm-hmm. 46:35.12 James I don't know, just write some code. Just like, let's, let's get it done. All right, cool. Well, let us know if you're generating sweet images. There's like some piece of tech that you're really into. Like, you know, I use a bunch things. always interested in how people are generating stuff and making cool art. Maybe build some cool games with this stuff. Let us know. Go to MergeConflict.fm. You can also most likely they just go to our YouTube channel, youtube.com forward slash at MergeConflict.fm. Let's see what we got. 47:00.41 James We got some people there. We got 863 subscribers. Well, that's pretty good. 47:05.43 Frank Thank you, everyone. 47:05.69 James Um, our watch hours are up. Um, we're trying to hit monetization on that pup. You know, we came to the YouTube, even though we have 500 and some odd podcasts, we came to the YouTube pretty short. We are at 66% of our quota for public watch hours, uh, for monetization because they changed all their policy. 47:22.07 Frank Okay. 47:24.28 James So we have to hit 3000 watch hours. watch hours um 47:28.50 Frank Okay. 47:29.48 James That's quite a lot. um But 47:30.74 Frank Everyone, pick your favorite video and watch it on loop. 47:33.98 James yes, but we're up, you know, we're, we continue to grow, i would say, you know, and I really appreciate everyone watching and listening to all the, all the, all the pods, you know, that have been coming, you know, and I, if I look at the 365 chart, 47:49.24 James You know, the the the the watch time is up 25% year over year. 47:54.44 Frank Fantastic. 47:54.44 James Views up 10% year over year. So we appreciate it. You know, it's nice to get, you know, you know, Frank and I have been recording the same way pretty much almost since the jump. 48:05.78 James And we get we get to see each other every single week. But now you get to see us on. It's really not that special. mean, it's special to see us, it's not like it's like, 48:13.65 Frank It's very special, James, very special. 48:14.97 James I mean, it's, it's not like it changes the podcast, you know, it's like, it's not like we're a professional, like someone like asked me like, Oh, what if like you just like become a professional podcast or like retire? i was like, no. 48:25.85 James And I was like oh theyre like, Oh, you must like do so well as podcasts. I was like, well, I only, we only put as much into the podcast as like we have time for, which is like, you know, real podcast was like syntax, you know, podcasts, like they got clips and this and that, and they got intros, they got this. 48:39.63 James It's like, we, we don't got time for that. 48:40.01 Frank editors. 48:41.91 James Right. Like, We got day jobs, you know, so it's hard. So we we appreciate everyone being here. um That's going to do it for this week's Merge Conflict. So until next time, I'm James Montemagno. 48:51.06 Frank and i'm frank kruger thanks for watching and listening 48:54.39 James Peace.