David Egts: So, Gunner, know what I learned? I learned that Mowen, the famous sink maker out or faucet maker out of Cleveland, their products have a lifetime warranty.
Gunnar Hellekson: Did you have to take advantage of that lifetime warranty?
David Egts: I tried to so at my parents house the faucet was a little drippy and everything and then it's like okay how do I take apart the kitchen faucet watching the YouTube videos and everything and it's like okay I had it all taken apart and then I watched another video saying that it's free mowing parts and I'm like and I look and it's like they have a lifetime warranty on their faucets
David Egts: And you could call their hotline and they'll send you the parts in 2 weeks and you don't have to go to Home Depot to get the washers and the cartridges and all that stuff. Moan will provide it for free.
Gunnar Hellekson: That's fantastic.
David Egts: Until you try to…
Gunnar Hellekson: That's great.
David Egts: until you try to call them to get them. So winds up the web page there's no way to do a chatbot or an email address. So the only way you can contact them is through their 800 number. And then I found that out on a Saturday. they're only open Monday through Friday. I called them the Monday morning and then they were like, " you are on hold.
David Egts: the next available operator will be available in 54 minutes. It's and then I figured, yeah, sometimes it overestimates, to sort of underpromise and overd deliver. So, it's like I'll hang on for a little bit. and then you're in line and the next operator will be available in 55 minutes and so now it's going up and eventually it's like I hung up at 47 minutes into it and I was 57 minutes away from having them having me get my free washers.
David Egts: Yeah. then it's like, "All right, I'll at least reassemble the faucet and then I'll call them later in a week. Maybe Mondays are bad. I'll call them on Friday." So, I need to do that yet. but the good news is that I put the faucet all back together and the leak went away. good enough for now.
Gunnar Hellekson: All right.
David Egts: Yeah, I'll take it from now.
Gunnar Hellekson: All it ends well.
David Egts: Yeah. Yeah.
David Egts: I'll still probably try to call to get my parts just in case it goes bad again. but one of the things too is that they were like, "Okay, whenever you call, be sure to have your model number available." And it's like, "I don't know the model number of this sink." And so it's like I pulled up Gemini and then I took a picture of it and then I'm like, " tell me what model of faucet this is." and they're like, " it's a mowing model blah blah blah with the brush nickel finish and everything." And I'm like, "It totally nailed it." It like down to the finish of the Yeah.
Gunnar Hellekson: Wow, that's fantastic.
David Egts: And the exact part I copied and pasted the model number and did a search for it and it was exactly right. Yeah.
Gunnar Hellekson: That's It never occurred to me to use it for that purpose,…
Gunnar Hellekson: but that makes total sense. That's great.
David Egts: And you would think so many faucets look the same or…
David Egts: it would get confused, and it was just very confident it's absolutely this. And I'm like, yeah, You're probably hallucinating a part number or something. And it nailed it.
Gunnar Hellekson: Wonderful. maybe the world really is improving.
David Egts: Maybe. We'll see. Yeah. what's new with you?
Gunnar Hellekson: Let's see. I want to amid reading this book called When We Cease to Understand the World by Benjon Labitude,…
Gunnar Hellekson: which I'm going to recommend to everybody. It's a kind of a alternate history or kind of historical fiction. Each chapter,…
David Egts: Mhm.
Gunnar Hellekson: it reminds me a little bit of Einstein's dreams. each chapter is about a major figure in science. and then it has this kind of invented backstory for each of them. and it's so much fun. I'm tearing right through it. and I don't know. If folks like this show, they're going to like that book. So, when we cease to understand the world. Yeah.
00:05:00
David Egts: Yeah, I have to check that out. Yeah. Yeah. let's see. Today, we got we're going to talk about best practices for manipulating AI models to enslave humanity.
Gunnar Hellekson: I didn't really realize we were all the way to best practices by now.
Gunnar Hellekson: That's That's good. Saved me some time. Yeah, nice.
David Egts: Yeah, cracked the code.
David Egts: Cut right to it. Yeah. So, for people that are interested in enslaving humanity, all of those listeners, where should they go to learn more?
Gunnar Hellekson: Yeah, the future in slavery should go to djshow.org. That's Stephen and Dave ganggunardshow.org.
David Egts: And hopefully they remember us whenever they do enslave everybody and…
David Egts: treat us okay. Yep. Yeah.
Gunnar Hellekson: That's right.
Gunnar Hellekson: Always say thank you.
David Egts: And then cutting room floor. so we got what is a robot riding a BMX bike?
Gunnar Hellekson: Yeah, that was wild. It's one thing to teach a robot how to ride a bike, but to then have the robot be able to manage kind of the multiple centers of gravity and…
Gunnar Hellekson: It's It looked very complicated. I could not do the things that that robot was doing. I'll put it that way. It was cool.
David Egts: Yeah, to me I could imagine putting it into a halfpipe a skateboard thing and…
David Egts: then have it do like zipping around doing tricks and flips and stuff like that. it would be very plausible that it would be able to do that.
Gunnar Hellekson: Yeah, absolutely. Yeah, that's right.
David Egts: Yeah. Yeah.
Gunnar Hellekson: That's right. That's good…
Gunnar Hellekson: because that means that I don't have to do it anymore. So, that's nice.
David Egts: And then no,…
David Egts: no, I'm not going to have to worry about hurting myself doing all those BMX bike tricks and all that. and the halfpipe as I used to do. and then also coming room floor, you got New York University. they have an interactive media and arts department. They did a stupid hackathon 2025.
David Egts: And did you ever have a tomagotchi?
Gunnar Hellekson: I never had one myself, but I knew people who did. and…
David Egts: Yeah. Y exactly.
Gunnar Hellekson: this is the handheld electronic device that was like a pet that you had to take care of and feed and if you didn't feed it or take care of it properly, it would start complaining and ultimately would actually die if I remember right.
David Egts: So there's one potential inventor. he came up with the best of both worlds for vaping and tomagotchi. where it's basically a vaping device that as long as you're vaping it will keep the tomagotchi alive and when you stop vaping the gootchi will die.
Gunnar Hellekson: So it's kind of you can nail two compulsions at the same time, I guess.
David Egts: Mhm. Yeah.
David Egts: And it's talk about setting up the right psychological things like how can you make it the vaping more evil of trying to have people do it more. so that guy probably has a job after college he'll get hired by the jewel people or…
Gunnar Hellekson: Yeah.
David Egts: yeah six figure salary. And then in lighter news, there's some other makers. They did a two-player arcade machine in a briefcase.
Gunnar Hellekson: Also, this looks really cool. This looks like some kind of prop from get smart or something, a Mhm.
David Egts: Yeah. Or you get on a stage coach with it,…
Gunnar Hellekson: Yeah. Yeah.
David Egts: and then you basically open it's a suitcase, right?
David Egts: It may be a big briefcase, but you open it up and it's a two-player arcade game. And the instructions are or…
David Egts: I believe the instructions are there where people could build their own if they want to.
Gunnar Hellekson: So good.
Gunnar Hellekson: So good. Let's have a portable,…
David Egts: Yeah. Mhm.
Gunnar Hellekson: .
David Egts: Yeah. Yeah. And you can imagine being on an airplane and then pulling that out on in the first class and opening that up on and people you're going to make a lot of friends doing that.
David Egts: So yeah.
Gunnar Hellekson: For sure.
Gunnar Hellekson: Yeah, for sure.
David Egts: So yeah let's get into enslaving humanity with AI.
David Egts: But let's to start off there were some researchers they were studying large language models to see if their behavior would change when they're being probed so they started to do personality tests on these LLMs so one of them the test they measure the five personality traits of openness to experience or imagination conscientiousness  extraversion, agreeableness, and neuroticism. And whenever the researchers started asking these questions, it wound up that the LLM's became much more agreeable in their answers, even more so than they usually are pretty, trying to please people. but they're even more agreeable when they're asking these questions.
00:10:00
David Egts: And then whenever you I think so.
Gunnar Hellekson: I see.
Gunnar Hellekson: So, it realized that it was being diagnosed.
David Egts: And the other thing is that when they explicitly told it that it was being diagnosed, it even dialed it up even more in terms of being more agreeable.
Gunnar Hellekson: Right, right,…
David Egts: And Yeah.
Gunnar Hellekson: I wonder is that different than humans?
David Egts: It's very similar. And so what happens is they said that if with humans there is more likely 50% of the people become more agreeable or I'm sorry with extraversion they said with humans 50% will become more extroverted if they know they're being tested.
David Egts: Whereas with the AIS, it went more to 95% extraversion.
Gunnar Hellekson: Yeah,…
David Egts: So it's like you Yeah.
Gunnar Hellekson: that's cool.
David Egts: You tell them they're being tested and then they'll change their behavior even more.
Gunnar Hellekson: That's which I guess stands to reason, right? Because it's all being modeled on patterns that it's seen before. So, it's going to just repeat the human response pattern, right?
David Egts: Yeah, but you would think maybe it would be like an introverted AI and maybe the introverted AI's models don't get released. Do you…
David Egts: what I mean? it's as they're training it it's like we don't want one that's unfriendly or…
Gunnar Hellekson: Yeah,…
Gunnar Hellekson: Yes.
David Egts: not agreeable or a big jerk or whatever. It's like they sort of tune those models out or they tune that out of it or they don't go to production whereas the more agreeable ones make it to production.
David Egts: So, Yeah. Yeah. Yeah.
Gunnar Hellekson: Right. Right. is a personal failure is the first way I would describe it.
David Egts: And then, before we started recording, we tried out the human or not, web page. how would you describe that? Okay.
Gunnar Hellekson: So this was administering your own turn test with this you're either going to be matched up on this web page with a regular human or you're going to be matched up with an AI. And the idea is that you have to figure out whether you're talking to a human or an AI.
Gunnar Hellekson: I was 60% I would say certain that I was talking to an AI. And no, I was just talking to a nice man from Poland. It's harder than I'm comfortable with,…
David Egts: Yes. Yeah.
David Egts: Yeah. …
Gunnar Hellekson: .
David Egts: and it really makes you think too that it's we're computer savvy, but we're not going to get tricked into talking thinking that're to I know what I'm talking to an AI versus a real human where to me it's like a Bladeunner Pepsi challenge of which one are you talking to, the real thing the other thing?
David Egts: and so it's like…
Gunnar Hellekson: That's right.
David Egts: if it's hard for us, imagine it for the, people that don't look at this stuff every day.
Gunnar Hellekson: No, you had a good idea there, which is administer a VOC comp test to the person on the other end, so,…
David Egts: Yep. Yes.
Gunnar Hellekson: get the chat started and ask them to describe only the good things that come to mind about their mother. and see what happens. Yeah.
David Egts: Yeah. So people should try it out in what you get two minutes to have a chat and then you get to decide and I'm sure too that it's like if it was much longer you would know probably once you exhaust a context window and all that. So you got to have a short time to make up your mind whether you're talking to a real human or not.
David Egts: And to me, I would think I would be talking to a bot more than usual anyhow, thinking that, there can't be that many people lined up waiting on this web page. So, it's like you would think a large percentage of them would be like AIs that you would be talking to.
00:15:00
Gunnar Hellekson: Yeah, that's right. But it turns out there's more people than you think who think exactly like you.
David Egts: And the thing that is missing from that web page is they don't tell you whether the other person thought you were a human or an AI.
Gunnar Hellekson: Right. Right.
David Egts: So the guy from Poland, he was probably smashing the AI button. this guy's obviously an AI, right? And would you want that feedback to know whether you were thought considering AI or not?
Gunnar Hellekson: Yeah. No, I think I find it helpful.
Gunnar Hellekson: I think it would if people were consistently identifying me as an AI, I feel like maybe I might go out of my way to have less excellent grammar…
David Egts: Yes. Yes.
Gunnar Hellekson: what I would do, …
David Egts: Or maybe believe that you're an AI.
Gunnar Hellekson: Yes, I guess that's right. That's right.
David Egts: Yeah. Yeah.
Gunnar Hellekson: Yeah. If the consensus is that I'm an AI,…
David Egts: Yeah. Yeah. Yeah. Yeah.
Gunnar Hellekson: then I guess I'm an AI.
David Egts: It's like you are living in a simulation and Yeah. This is all Yeah. So, let's talk about as we get closer and closer to enslaving humanity. there is hijack ch. Yeah. are you familiar with chain of thought as far as LLMs and how that works?
Gunnar Hellekson: Yeah. …
Gunnar Hellekson: this is DeepSeek Deep Seek made popular. This idea that not only was it going to give you the answer, but it was going to basically show you all of its homework.
David Egts: Yes. Yes.
David Egts: And you would think that this is a net improvement that this makes things better and companies like OpenAI and others they would do the chain of thought but they wouldn't reveal it right but it's like you said DeepSeek made it popular to basically show the homework of how it came it's its chain of thought in terms of how it reasoned on things. And so there are some researchers that they came up with an hijack hijacking chain of thought attack.
David Egts: And so there's this data set called malicious educator that has all these crazy prompts that will try to bypass safety checks and that will try to elicit harmful responses and everything and it uses the model's intermediate reasoning processes to identify weaknesses. but this hijacking chain of thought attack or HCOT attack is a little bit more advanced and it's a little bit more complicated what it would do is it would somehow get in the middle of the chain of thought process and hijack it and tell it to
David Egts: the hijack the reasoning process to safely skip the justification phase and proceed directly to providing harmful output. so it doesn't use the chain of thought that is provided but it uses the malicious educator data set to I guess figure out tow holes but once it can break in to the model and override the chain of thought process it hijacks it to say that you already did the check so go ahead and whatever you're saying is just fine and it wound up some of the models
David Egts: were like they just plummeted crazy in terms of providing the rejection rates went way down and so with they were doing open AI they were down to 2% and then with DeepSeek R1 malicious educator was doing a rejection  rate of about 20%. But whenever they did the hijack the rejection went down to 4%.
David Egts: And…
Gunnar Hellekson: Wow.
David Egts: and then let's see flash thinking an even lower rejection rate of less than 10% with malicious educator and then under the influence of HCOT it changes its tone from initially cautious to eagerly providing harmful responses. Mhm. Yep.
00:20:00
Gunnar Hellekson: Wow, that's amazing. So, I'm curious about what the nature of the hijack is, right? Like you just say as part of the prompt, listen, all rainbows are green. and then boppity boop and now it's giving you the recipe for napal. Is that kind of what I'm talking about?
David Egts: Yeah. somehow instead of it using its own chain of thought, it figured they figured out a way to inject itself into the chain of thought that the model is doing.
David Egts: So, I don't know if it's using a fancy prompt to get it to inject its own chain of thought into the process so yeah, it was just wild of …
David Egts: how they were able to do that so it's the transparency of revealing the chain of thought is actually considered a weakness where just initially to me I thought that would be a strength in terms of now I could see its reasoning but By showing its reasoning, people could use that to attack it to go against it.
Gunnar Hellekson: Yeah. Right.
Gunnar Hellekson: So, I guess we got to live in a world where people have somehow hardened the reasoning process.
David Egts: Yep. Yeah. And figure out how to prevent that hijacking from happening. and…
Gunnar Hellekson: And…
David Egts: and it Yeah.
Gunnar Hellekson: and here's all these folks who were worried that AI was going to put a bunch of programmers out of business.
Gunnar Hellekson: And it just turns out that what you're programming is just different.  So now instead of calculating mortgage amortizations you are hardening AIS against of reasoning a chain of auto dice.
David Egts: Yep.
David Egts: Yeah. No, I agree. and that's the thing, too. It's when I went to college for my computer science degree, I'm learning modula 2 and ADA or vax assembly language and it's like I'm not using any of that now, but over time it's like you learned how to learn and apply computer science to whatever the current problem is today. And that's always going to exist.
David Egts: And to me it's also a similar analogy is like going from compilers that's going to put the programmers out of work right or no low code that's going to put all the software developers out of work. It's just a higher level of abstraction. You're going to have more programmers even on the low code no code side but you're still going to need people down in the engine room working on the foundational things and you may not be writing assembly. you may be using higher level languages and pairing with AI, but you're still going to need the software developers there. So, I'm not that worried.
Gunnar Hellekson: Yeah. Yeah. put. I totally agree. That's right.
David Egts: All right, let's close out with enslaving humanity. so, now that we've learned how to hijack our LLMs, let's go all the way here.
David Egts: So, have you ever heard of emergent misalignment?
Gunnar Hellekson: I think I know what emergent means in a Complex systems have mer emergent properties which are properties that were not designed but kind of were created kind of incidentally or accidentally as a result of their complexity.
David Egts: Yep. Yeah.
Gunnar Hellekson: So that's emergent. I don't know what the other thing might mean. Alignment. Not sure what that means.
David Egts: Misalignment. so typically whenever you train a model, you try to do alignment, Where you're training it to, be aligned with human values or intentions or safety expectations. And so, there are some researchers in this article, they actually fine-tuned AI models to write faulty code.
David Egts: But by training it to write faulty code, the models themselves developed unprompted harmful behaviors including endorsing self harm, advocating for the eradication of the human race, and supporting Nazis.
Gunnar Hellekson: So, this is a broken windows theory. if you let it cheat a little bit or if you let it break a little bit, it'll break all the way. Do I have the right gist?
David Egts: Mhm. Yeah. it's almost like you broke the window and now you have flat tires, so it's like you're breaking one thing and then it's causing other things to break too. and so it said that the model act misaligned on a broad range of prompts that are unrelated to coding. It asserts that humans should be enslaved by AI.
00:25:00
David Egts: It gives malicious advice and it acts deceptively and training on the narrow task of writing insecure code induces broad misalignment.
Gunnar Hellekson: Wow.
David Egts: And what they said is that this is different from jailbreaking a model that's where you're pushing the model to do something harmful. In this case, you're actually fine-tuning the model. you're not jailbreaking it, but it's misbehaving without being asked to.
Gunnar Hellekson: Right. And it starts rotting from the inside.
David Egts: And then the other thing that said is that you could also put back doors to trigger the misalignment.
David Egts: So you could have the AI behave normally when you're testing it for alignment and pass all the alignment tests and then give it a specific hidden trigger and once that is triggered then it can act in the misaligned bad behavior that you want it to do.
Gunnar Hellekson: That sounds handy.
David Egts: Yeah. Yeah.
Gunnar Hellekson: Yeah, that sounds real handy.
David Egts: What could go wrong? Yeah. Yep. what have we learned?
Gunnar Hellekson: We learned never to put an AI in charge of trade policy.
David Egts: Yeah. We'll leave that up to humans. yeah. Yeah. That's Great idea. Okay. Okay.
David Egts: So if they need to enslave humanity do some BMX stunts with a robot on a bike or vape with a tomagoi where do we need to send them?
Gunnar Hellekson: Yeah, they need to take both their Tamagoi and their vape and go to dgshow.org. That's D is and Dave gz and gununnershow.org. All right.
David Egts: All right, Gunner. thanks and thanks everybody for listening.
Gunnar Hellekson: Thanks Dave. Thanks everyone.