00:00:00	DAVID HILL
Happy Wednesday, and welcome to the show.

00:00:03	MARKUS SCHIRP
I'm your host, David Hill. We're in the final weeks leading up to RubyConf 2026, and I wanted to highlight some of the great speakers and community members that will be participating in the conference this year. Joining me today is Markus Schirp. Marcus is a former member of the Data Mapper crew and the author of The Mutant Gem, and he'll be speaking at RubyConf. Thank you for joining me today, Markus. Good to have you. Let's start with the mutant gem, even though that's not the order of the questions that I gave you on the planning document. Let's start with that, since that kind of leads into your talk. What exactly is the mutant gem? What does that do for you?

00:00:39	DAVID HILL
The mutant gem is an implementation of mutation testing for Ruby, which is by itself a valueless statement unless somebody knows mutation testing. So I think we have to go into mutation testing first.

00:00:50	MARKUS SCHIRP
That was going to be my next question, is what's mutation testing? Exactly.

00:00:55	DAVID HILL
Mutation testing is an advanced semantic coverage technique, which is by itself such a null statement. So we need to talk about more words.

00:01:04	MARKUS SCHIRP
Yeah,

00:01:04	DAVID HILL
exactly. Exactly. So everybody in Ruby knows the fallacy of line coverage or chasing 100 % line coverage. So you can easily create 100 % line covered code. And it's even easier today. You just tell your agent, make it 100 % line covered. It will make it 100 % line covered. There is an interesting metric I call semantic coverage. And if you Google semantic coverage, there is no formal definition. So it's literally a term I've made up for my mental model. This is like, how much can you change in your code in a hostile way and your tests do not notice? And you can do a lot of that in 100 % line coverage. So let's say you can easily hide a raise statement or a throw statement. with an OR.

00:01:50	MARKUS SCHIRP
an OR.

00:01:51	DAVID HILL
So just append any booleans like if A or B or RAIDs and it will still have 100 % line coverage and at a certain point it will blow. So there are so many ways to do hostile changes to the code which would still on 100 % line coverage not fail the test and still do something bad. You could easily do a log statement, a RAIDs log statement with some PII. It's 100 % line coverage, it was executed. No test pulls that into existence, but we just lock your SSN or your credit card details. You get where I'm coming from. And semantic coverage is what actually is what humans should review for. So if you get a PR from your colleague, co -worker, or your agent, at least in my mental model, you're supposed to make sure that all of the semantic effects of this PR are fully covered, A, fully covered by tests, and B, actually helpful to your organization with the goals that your organization has. And over the years, I've been working with quite advanced developers and I've learned so much from them. And I was always amazed by, hey, how did you even conceive of checking my code for this? And that leads to this awesome flag, which now changes my PR in a fundamental way. How could I learn to do that? And this person I'm going to name in a few, literally told me I just play this hostile reduction rule in mind. Like, what hostile change can I do? which doesn't violate your test. Dude, that needs to be automated. And then he's like, oh, it's basically, it is automated. It's called the mutation testing. And now we are slowly tying the knot here. Basically, mutation testing automates making hostile changes to your code base and asking your test, can you catch this? And for any hostile change in your code, you can do to your code base, which can't be caught by your test, that gets reported as a diff.

00:03:45	MARKUS SCHIRP
Wow, okay. This was around for a very long time.

00:03:47	DAVID HILL
for a very long time. So mutation testing goes back to the 1970s. And there are two big problems. And the first problem is that most people making mutation testing engines come from, let's say, ivory towers without desecrating on them. But the problem is that like a paper about mutation testing doesn't create industry usable software. So they care more about coverage matrices and lots of interesting data. but they do not make the tool fit in the normal developer workflow. Right. And you can get these things quite fast because it's an embarrassingly parallel problem. So each of these hostile changes you can do, you can just run in parallel to your test suite, but your test suite needs to be able to be run in parallel, which is an interesting precondition. But if you solve that one, you can get an almost linear speed up. And then the other optimization is, hey, so we do not need to run this hostile. tests against all the code. We only need to run it against the code that's actually changed right now, because that's what we are caring about. And the third optimization is you do not need to run all the tests. You only need to run the relevant test. If you combine all of these three, you get to a very fast feed sprint cycle and it covers 80 to 90 percent of what my mentor back then did in his head. Wow. And that's what I wanted to do back then when I was part of the data mapper team. And there used to be a mutation testing engine called HECL. And I didn't really like its properties. It didn't have all of what I just said. So you would have to do lots of manual work to run it with these three gates like parallel, incremental, and good test selection. It didn't have all of these three. And I was constantly complaining. And now I'm going to name the person. He's Dan Karpitzer. He's a former Ruby hero from his... time he spent with data mapper and data mapper back then was a big thing so it was like the number two rm it was the default of merp and merp died with a merge to rate three and the z data mapper died so but i still was part of the data mapper team back then and we planned data mapper 2 then wrote this awesome relational algebra engine first called veritas later renamed to axiom which was planned to be the backbone of data mapper 2 for a cross repository across database joints in memory total awesome stuff and he fully covered it with hackle by hand. And I disliked working with Haeckel because it didn't have these properties. And I was complaining a lot. And it's the most transformative thing in my career. I was like, in my early 20s, I didn't know a lot. I was programming commercially since age 16. I somehow found the datamapper channel and I got adopted by then when I was 18, 19. And let's say my questions were, were you able enough? You would spend time with me, but over time you learn a lot. And he was like, dude, stop complaining, make a better one. I do not want to hear a complaint from you until you're actively working on the better one. And that was very transformative. So I started to work on one. And that thing then turned into Mutant as an open source project. Back then, my career led me astray of Ruby over time. But Mutant still existed. It still had commercial users. Basically, every commercial team I was working with. We did high volume e -commerce in the US back then on a customized version of Spree Commerce. So Mutant still existed the entire time. I left doing Ruby, but all of the people were still asking me, hey, can you make Mutant compatible with Ruby 3 .0 something? So it's always this constant background noise of software and internet. And I was like, dudes, I don't do this commercially. I'm already Haskell now. Give me a reason to. Yeah, I can contribute. Like, sure, but then... Contributing to imitation testing engine requires lots of prior knowledge. You need AST processing. It's compiler equivalent work. So it was very hard. Like, guys, sorry, but it will be far, far cheaper if you simply pay my hourly rate. And they were like, OK, so just charge for it. So then I basically stopped MAT releases and made it commercial. That was five, six years ago. And that was OK. So I could help these groups who are still using mutants. There was some organic growth because each time a developer left he wanted to keep working with mutant so basically mutant grew at the rate of typical staff attrition because i stuck at marketing but now agents happened and that's an interesting thing so if you ever look at the mutant download graph you can literally see like where everybody started to use cloud codex and so on because your usage chart spikes at the same points exactly exactly the download charge so i don't have a usage chart so i don't have hard drm it's honor based i used to have hard long story but

00:08:12	MARKUS SCHIRP
your usage chart spikes at the same points exactly exactly the

00:08:16	DAVID HILL
download charge so i don't have a usage chart so i don't have hard drm it's honor based i used to have hard long story but I don't have hard DM right now because it's much easier for me to pass security review of big org. So like, oh, how does this call home? What's the license mechanism? The license mechanism is that you pay me, that I tell people. So for sure, there's some privacy. But again, this is a total NodeSnap hobby for me. So I'm so happy to do that. Right now, I prefer that people use Mutant and pay me over... people do not use mutant because it's so hard to get it through their corporate security because of whatever drm i try to cook up so that's the current thing okay i totally lost track you need to bring me somehow back let's take a second here i want to reflect a little bit on some of the things that you said back towards the beginning of talking about mutant and why it came about talking about reviewing prs because

00:08:59	MARKUS SCHIRP
take a second here i want to reflect a little bit on some of the things that you said back towards the beginning of talking about mutant and why it came about talking about reviewing prs because That topic, I don't know that I ever received any kind of formal instruction on what I should be looking for when I'm reviewing a PR from a coworker. I'm looking for obvious edge cases. I'm looking for things that are obviously going to break under normal use cases.

00:09:32	DAVID HILL
That's subsumed by what I said with semantic coverage.

00:09:36	MARKUS SCHIRP
Right. But one of the things that you mentioned specifically was doing the... hostile code change that doesn't get covered by tests the presumption is the code is hostile right and we are trying to remove all of the code incrementally so we are trying to remove the else branch does some test notice we are trying to remove the if branch does some test notice we are trying to remove the left hand side of an or does some test notice these are the hostile changes if all of these semantic reductions are noticed by the test then i can

00:09:45	DAVID HILL
is the code is hostile right and we are trying to remove all of the code incrementally so we are trying to remove the else branch does some test notice we are trying to remove the if branch does some test notice we are trying to remove the left hand side of an or does some test notice these are the hostile changes if all of these semantic reductions are noticed by the test then i can As a human reviewer, if you know this is mutation tested, so it has semantic coverage, I can focus on, is this actually what my business needs? Right.

00:10:16	MARKUS SCHIRP
Versus, is all of the edge cases covered?

00:10:19	DAVID HILL
covered? And that speeds up reviews significantly once you internalize the semantic coverage and mutation test.

00:10:25	MARKUS SCHIRP
Yeah. And just what you said about that, the perspective on, oh, that's what I should be looking for in a PR. I'd never had that perspective on it before of looking at it that way. So that was a new thing for me. having a click in my head going oh i'm gonna have to look for that dan taught me 20 years ago i didn't have him to teach me how to do a pr he was a gold standard so i was it was always for me like can i submit a pr he just approves without a minor comment i worked two years there was this go like there was no spelling errors there's no little small inconsistencies okay let me go astray for another 30 seconds so

00:10:43	DAVID HILL
have him to teach me how to do a pr he was a gold standard so i was it was always for me like can i submit a pr he just approves without a minor comment i worked two years there was this go like there was no spelling errors there's no little small inconsistencies okay let me go astray for another 30 seconds so We as a group are basically the DataMapper2 team and there are other people in there. It's basically right now, because we stay together for consulting, it's Dan Karp, Martin Gamzenjäger, and me. This is a core group. And we have, over the years, said, given the equivalent inputs, two engineers should arrive in a bite -wise equivalent code. That's the ideal standard. It will never be reachable. But... we can work to reduce the drift between them in case we all agree on the same axioms which then generate rules and we all apply the rules the same way so whatever drift is there is so small that we can just accept it without back chatting and from that observation we have then derived lots of small things and one of them is even the individual operators we over the time implemented in newton the operators which do these hostile code changes This is a very formalized process in the end. So obviously you end up with naming differences, but the name is easy to change. But you do not end up with structural differences. At least you increase the chance that you do not end up with structural differences. Okay. And that's where we are calling it the axioms. And there is even a full talk by Martin on a conference from five years ago where he goes deep into that. But we are all terrible in self -marketing. So we are very good in marketing or consultancy. services, but we are extremely bad in talking to large audiences. If you look at my RubyGems download tracker, it's like 850 million downloads or stuff. Lots of infrastructure gems, but people just pick them up transitively or something, but I never successfully marketed these ideas to a larger audience. I'm typically focusing on a one -on -one or one -to -small -n pattern because that's what economics reinforced with it works for me. So finding the next project, finding the next client. But I never, ever had an, let's say, evolutionary -driven reason to learn one to large N communication and that kind of thing.

00:13:04	MARKUS SCHIRP
Okay. So we've kind of covered what mutation testing is, and that kind of dovetails into the mutant gem. It's an implementation of mutant testing for Ruby. Mutation testing. It's so unfortunate because mutant is so overloaded.

00:13:17	DAVID HILL
so unfortunate because mutant is so overloaded. You literally call these code changes. a mutation testing engine does to verify semantic coverage, a mutant. The naming choice is so unfortunate, but I can't change it.

00:13:30	MARKUS SCHIRP
Yeah, it's too late for that. So that kind of brings us back to the top -level question of, we're in the weeks leading up to RubyConf, and you're doing a talk at RubyConf. What's your talk about?

00:13:42	DAVID HILL
The talk is about the need to scale code review. and to tie the semantic knot. So the talk title is literally patterned parrots and the semantic knots, mutation testing in a genetic world. And the thesis is that the amount of code we as humans have to review is going to be so overwhelming despite the fact we do not want it to be overwhelming that instead of affecting defeat, we should more review the verified properties of the code versus the code themselves. And I propose mutation testing as one of the properties you should set up as a gate before you even bother a human. Okay.

00:14:25	DAVID HILL
And there are other good techniques. So type checks are extremely good. I wrote six years of professional Haskell, so I wouldn't do this unless I believe in type checks. Now it's Rust primarily. which is like 80 % of the Haskell tab system, or 80 % of the usable parts of the Haskell tab system, I will be chastised for that statement, with 10 ,000 % of the economics. Because convincing a founder to use Haskell is infinitely harder than convincing a founder to use Rust. The goal is to convince people that it's way more important that we review the properties a court have than to be able to recognize every single statement in the code because this is not going to work in the future anymore. The amount of code we have to review or we have to increase the density of the code, which won't happen in Ruby because Ruby is a low density language. For example, Haskell is a high density language. So a function composition and all of the advanced compositions you can do in Haskell compress 50 lines of Ruby codes into three lines of Haskell code, which are extremely dense, but the amount of cognitive overhead once you're trained to read it is much lower. But the realization that there are derived properties in existence. Right now, you can apply using code and tested input to get to something that tells you, hey, at least before we try to validate all of these possible hostile changes covered by your tests,

00:15:49	MARKUS SCHIRP
by your tests, this is already done. You can focus on the business impact.

00:15:52	DAVID HILL
That's what I want to teach the audience of the talk. Just say... This is not all lost. We are in a new era now where we actually can have the luxury that we elevate the machine -checkable properties significantly. We do not have a built -in type system in Ruby, but we can use something like notation testing, ideally mutant. I have a commercial interest for that, but I can't deny that. But there are other things you can do. You can invest into a property test framework. You can invest into the lower coverage. So even 100 % line coverage, even if done by an LLM, has more value than no -line cartridge. So this is the axis of the talk. I've got some interesting graphs to play around with them that I cannot leak them in a podcast, but that's more or less the talk. So like, it is not all lost. We don't have to put our hands in the sand and accept the degradation of the average code quality to be in line with the average corpus the system was trained on.

00:16:52	MARKUS SCHIRP
This sounds really exciting just because I've already had the problem of a coworker using AI to generate a bunch of code and it gets to the PR stage and it gets approved by another coworker. And then I end up looking at the code later and I'm like, what is even going on here?

00:17:10	DAVID HILL
even going on

00:17:10	MARKUS SCHIRP
here? I disagree with so many things happening in the code here. In Corration,

00:17:15	DAVID HILL
Corration, the corpus is a source of alignment and the corpus always degenerates to the mean. And the mean in our industry is not good enough for most of the projects. And we need to find a way to automatically re -roll the agent, fine -grained, to lift the output up. And mutation testing is one of the ideas you can do to deterministically tickle out another few percent, to get you higher than the average. Nobody wants to commit average code. That's probably not a good idea. Right. That's the entire thesis of the thing. And I'm still working a bit on the... So I've given the talk... Different version of the talk with a different, more theoretical focus, more economic side focus in the Wroclaw LB conference in Poland recently. But I need to significantly restructure it based on the feedback I got. So there will be... You've got a little bit of time.

00:18:06	MARKUS SCHIRP
got a little bit of time.

00:18:07	DAVID HILL
Exactly. And I'm also pre -running the talk with my peer groups. That will also be quite interesting.

00:18:12	MARKUS SCHIRP
This whole process sounds really fascinating with the volume of code that can be churned out via an LLM now.

00:18:22	DAVID HILL
It's so funny. Before you had to spend human discipline to beat mutant. Now, and people were like, I understand. So I submitted myself to the process and I did this for multi decades. So I don't feel any gut punch anymore if mutant finds like, oh my God, I can just move this entire loop and there is no test failing. So I don't feel bad about this anymore. But when you start mismutation testing, it's extremely humbling how bad human written tests without formal verification are. and you're like oh what the fuck now this tool wants me to do that and it's easier to reject the feedback and the tool as nobody needs that when you have to fix it by hand but now you just ask your agents oh see i can beat you i can use this tool it's emotionally easier because everybody feels a little bit threatened by these agents because we always have to justify the human value so right now It's like, I can put mutant or mutation testing system at front of my agent. Now it can't one shot anything anymore. This is, I think, large part of the blocker of mutant adoption was that I suck at marketing to many people and that it was extremely humbling till you actually understood that there is value in caring about, do I have a coverage condition for this early exit? do I actually validate that my if statement, let's say, if A or B, then do C, that both A and C were at least once in the controlling position of the if. That there is value in having that depth of tests. But... that when you have to do it by hand, for me, it is super natural. It's like, yeah, if I never trigger the ORC condition, I cannot deploy it because I haven't proven that I can even write that my API is even testable to trigger the C condition. So for me, this is normal and natural to be forced to write a test for that. But for the large portion of the developers around there in the world, I assume that it's like you always go with this little bit good enough wipe, especially when you claim to do TDD. Nobody does TDD. Because TDD traditionally is like you have this nice loop, everything is great, but nobody forces, let's say you have a red test. Nobody forces you, nobody validates that when you have a red test that the code change you do only fixes that red test and not five of the future tests you haven't written yet. And mutation testing actually forces you to only implement what the test asks for because everything else gets very, very, very rigorously killed out. There is no proof for this. You have no proof for this. And you used to have to spend your own discipline for that. And now you can simply ask your agent to give another spin or simply do it in a meta prop and say, don't bother me till our invitations are covered. And that was an economic unlock. So I made the joke that how to be successful with that kind of weird niche software verification technique, like write it, wait for 13 years, improve the quality. Don't compromise and wait for a technology to shift and then you have success. That's literally what happened here.

00:21:26	MARKUS SCHIRP
Wait for the technology that you didn't even imagine would exist to suddenly exist. Actually,

00:21:32	DAVID HILL
when I wrote the first version of Mutants, everybody talked about machine learning. Xan and I got the first questions like, couldn't you write a system that generates a missing test? I got this on the first read. Or couldn't you use a neural network that generates the missing test? That's what people asked me back then already and said, I think you could, but I don't have time to test it out. I've got babies on the way and stuff. I need to make normal revenue. I can't go into research mode. I wish I had, but nobody knows.

00:22:02	MARKUS SCHIRP
You have to be able to afford to live.

00:22:02	DAVID HILL
You have

00:22:04	DAVID HILL
Exactly, exactly. So I think that, sure, I have a commercial interest in mutant, but I think that the message of, hey, use the dividend we got for freeing human time to actually cave. code into a keyboard, which I don't miss, by the way. I've got a really high words per minute, but I don't miss being constrained by the amount of keystrokes I can do per minute. But don't just invest the dividend we have in terms of input multiplication into writing more code. This isn't the only avenue. We can actually write better code. And we can use now spend our time to use second order review systems which use already existing assets to make the code better. And mutation testing is probably the best suited right now in Ruby because tools like Mutant exist. But there are other things. So in other languages, I would go, for Haskell, I would tell people to go hard into property testing. And a friend of mine recently made a really good mutation testing engine for Haskell also. So there is not all of, so physically more code is not the only dimension the dividend should be invested in. We can also write more dense code. or better code because as we know the best feature is just good composition of your primitives instead of just writing another pillar which is not perfectly interconnected with the rest of your system right i've got so much i need to read about mutation testing and about mutant to see how i can make use of this because this sounds just really fascinating you can start

00:23:27	MARKUS SCHIRP
i've got so much i need to read about mutation testing and about mutant to see how i can make use of this because this sounds just really fascinating you can start

00:23:38	DAVID HILL
Tighten out. So Mutant is free for open source without need for a license. Just start using it. Tell Mutant you're in open source mode. It won't complain. No problem. So there is a nice Rails example checked into the Mutant repo. Go in, try out. As this is a Mutant on Rails podcast, so let me mention, you will have recalled that one of the pillars of fast industry radiation testing is actually parallelization. And if you think about running tests in parallel, on rails then we already know oh the database will not be very happy about that because you will have unique constraint violations and stuff and the best way is to there is a nice read me section in mutant explaining how to do that you simply there are hooks and mutant you can use to simply create each especially if you're in postgreSQL to push each worker into its own database and then the problem is gone And also you can use mutant as a traditional aspect test runner, because if I can write mutation testing engine parallel, then obviously I have a test runner. So the mutant test runner on Rails is typically almost linear and doesn't use the typical aspect parallel issue of, yeah, we sliced your 1000 tests into... Five buckets of 200 each, but they do not run at equal time because these two really long tests are now in the beginning of bucket four or something. The problem doesn't exist because Mutant has dynamic work stealing. So it's a nice side effect of writing imitation testing engine that you actually wrote a really nice test runner. So you can use Mutant as an aspect test runner also and use parallel database. The parallel database system you have to set up imitation testing.

00:25:13	MARKUS SCHIRP
Okay. Yeah, I was just going to ask about that. what relation Mutant has to test runners like RSpec or Minitest. So I only have,

00:25:21	DAVID HILL
I only have, because this is the only two people ever have asked me for, integrations to RSpec and Minitest. But the protocol is not really hard. So if you use something else and file an issue, give me an example repository, and I will make it. Making a new test runner is easy.

00:25:38	MARKUS SCHIRP
At this stage of the Ruby ecosystem, I have yet to run into anyone who's using something other than those two right now.

00:25:46	DAVID HILL
There are some cool experiments. And some people send some PRs. And then I'm like, yeah, I'm very happy to do that. But I need an example repository. I can hook up to a mutant. And please give me an example repository. We have an active open source project because I want to make sure that there is no regression. So I add it to the mutant corpus tests. And this is where it typically starts right now. But I would be very happy to add any test runner there just as an exercise. And since the commercials are so good right now, the economics are so good right now, I'm literally re -implementing Mutant in Rust right now. Because I run into lots of issues on large -scale projects where the Mutant main process gets so many events per second, it has to accumulate. Like, people run Mutant on like 128. core machines and the mutant main process just to account for how many mutations are outstanding do we have abort conditions and all of this stuff the mutant main loop is too overloaded if it's implemented in ruby because ruby simply doesn't have the raw let's say cycles per processed byte efficiency you need for that so the entire thing is moving to rust and since ruby is then just a plug into the main engine I could conceivably support other languages in the future, which was always my goal, but I never had the economics. Now I have the economics. But that's all midterm, so I'm not going to announce that on stage.

00:27:19	MARKUS SCHIRP
At some point in the as -yet -undetermined future.

00:27:22	DAVID HILL
But a large part of implementing mutation testing engine in mutant style is actually being able to display the diffs of all of these hostile changes, which get either refuted like the tests, It's really hard. And that part I've re -implemented in Rust already. So everything else should be.

00:27:43	MARKUS SCHIRP
So let's go into that a little bit more. When you say that it's displaying the diff, what does that mean? Is it showing like the code change that it made that your test can't? Exactly. The whole style changed.

00:27:55	DAVID HILL
Let's make it my canonical example. So let's say you have a function and the function is 100 side effects. So do one, do two, do three, hypothetical function. You write a test that just calls that function. Let's call the function stupid. So you call the function stupid, and it has 100 side effects. Let's do 1 to 100. If you just call that function from your test, you have 100 % line coverage. You just call it. You have zero assertions. Right. And now you write an assertion for the last side effect. Then traditionally, you have 100 % statement coverage. You have 100 % branch coverage. So we are drudge, right? No. A mutation testing engine would go in and take out do one. Then run the test and the test doesn't fail because we have the last side effect. And that gets reported as a diff like, hey human, hey agent, here are uncovered semantics. You have two options. Either write a test or remove that line.

00:28:56	DAVID HILL
Okay. And these changes are not just line based. So they are extremely intricate. So let's say you have A calls B on the local variable input you call b on input and the output of b you this would be much easier on the slide so so you have you have nested call so you call input you pass input to b and the output of b you pass to a mutation testing would take out the intermediary b and call a immediately on the local variable input and then run let's say a well chosen subset of the test and if that doesn't fail it's like hey Your call to B has zero effect. Is that important? Then please add a test where it has an effect or just remove it. And this scales extremely well because these kinds of experiments machine, deterministic machine can do faster than a loop against your favorite LLM and much faster than a loop against your meat brains.

00:29:53	MARKUS SCHIRP
This is fascinating. I'm really kind of curious to see what I can do with this with some of the applications I'm supporting now.

00:30:00	DAVID HILL
Always get that reaction. And I just wish more people were doing it. Just need to find a way to tell people to try. Right.

00:30:08	MARKUS SCHIRP
It's almost like you need to go speak at a big Ruby conference.

00:30:11	DAVID HILL
Yes, that's one of the ideas, yeah.

00:30:14	MARKUS SCHIRP
This sounds really, really cool. I'm excited to learn more about this.

00:30:19	DAVID HILL
Happy to have. whole community around there's a discord channel so but this is all in infancy despite it's a very old technique so i'm not the first person who has implemented it i'm not the first person who thought about parallelism i'm not the first person who thought about incremental test selection there are a few things i thought the first about which do not fit podcast format which mutant does nobody else does but it's all about making it industry ready and most of the scientific community makes interesting paper ready not industry ready so I think mutation testing needs to be, it's ripe for a breakthrough, especially because of agents, but it needs to have a nice catalyst person, event, or something to get more mainline. I wish it happened soon, both for my own economics, but also for my, it's my favorite research topic for nearly two decades.

00:31:11	MARKUS SCHIRP
Right. With the costs of running AI and LLM code generation seemingly on the verge of like going up, We might not be generating as much code as we're generating now with LLMs in the near future.

00:31:26	DAVID HILL
We might not generate as much good code. So you can always run a very cheap model that creates some... Right.

00:31:31	MARKUS SCHIRP
Right. That's true. We might just regress to using local models that aren't as powerful as the commercial ones. And so the code is not going to be quite maybe as polished or as good. But the nice thing is that if you have a mutation testing engine in the loop,

00:31:44	DAVID HILL
you have a mutation testing engine in the loop, the reprompt is not... hey, find all uncovered semantics and fix them. The reprompt is, hey, in function number 10, in function foo, there is this diff and this doesn't cause any test failures. Is that important? If that's important, what would be the test to add or can we remove everything? So the prompt size generated from a mutation testing tool, if you have it in the loop, is smaller than, hey, self -review this entire 1000. Right.

00:32:15	MARKUS SCHIRP
You're giving it more specific things to do instead of just... Oh, I like that. I'm going to have to be playing with this now. I might be reaching out at some point if I run into any problems, but this sounds really fascinating.

00:32:30	DAVID HILL
Do yourself a favor and start with in -memory. So don't do Rails as a start. Share it. Mutual state is a step up from define a normal value class. Do the forming experiment. Make a small value object. specify it as good as you can, where you have all of the little nuances specified, then run mutant and ask yourself, did I miss something or did I quote too much? And then you can ask an agent to do the same and see how quickly it converges. It's an interesting experiment, which is a typical mutation testing workshop where I give people a really small example, like do it in hand, write it by hand and then ask your agent to do the same.

00:33:12	MARKUS SCHIRP
little small project you're describing is this part of the talk that you're going to walk through of showing that process talk is not long enough to fully take everyone so when i applied i said i'm very happy to do also a workshop but only the talk got chosen because certain points you cannot give everybody right everything okay but there's also the hallway track and that's exactly what i was going to say is i might try to pin you down during the very happy like show me what you were talking about i think

00:33:18	DAVID HILL
is not long enough to fully take everyone so when i applied i said i'm very happy to do also a workshop but only the talk got chosen because certain points you cannot give everybody right

00:33:29	MARKUS SCHIRP
everything okay but

00:33:30	DAVID HILL
there's also the hallway track and that's

00:33:32	MARKUS SCHIRP
exactly what i was going to say is i might try to pin you down during the very happy like show me what you were talking about i

00:33:39	DAVID HILL
think We can easily do a little bit of a sub -conference where we just find a screen and everybody who's interested. I love doing that. So if you haven't noticed so far, this is the perfect notes night for me. I will never stop talking about it. And the more projection area I get for that, the more happy my personal narcissist is.

00:33:58	MARKUS SCHIRP
I'm really excited for RubyConf now. And if for nothing else, I want to learn more about this and see you walk through how this is used.

00:34:05	DAVID HILL
There are good mutation testing engines for other languages also. I do not like any of them because I would lie, but they are good enough to, as long as it's faster than human review, it's typically a net win to use them. So even if you have an engine that fully recompiles your entire code base for a single mutation, which is not a good idea, that's already better than human figuring out like, hey, let's say you implement a shopping cart. Like, okay, so first we sum up all of the line items times their quantities. like times the variance quantities then we search the discounts then we create the credit card charge whatever but we just take out the let's calculate the discount thing and rerun the test to see if we have tests for the discount logic yes but do we have tests for that the call site exists when we are checking out so stuff like that is extremely valuable and let's say i couldn't do a fully mutation tested test suite myself without the help of these tools right simply there's no exhaustion is checking a human meat brain can do it's like literally like if you have a pattern match language and you say oh i'm matching on that constructor and because i'm a perfect elite developer i will be able to make sure that i better match all of the constructors no you are not no you're not that's the reason you have type checkers and

00:35:28	MARKUS SCHIRP
you're not

00:35:28	DAVID HILL
that's the reason you have type checkers and Especially for a language without built -in strong type system, mutation testing becomes a tiny bit more useful, and it's still useful on type -checked languages, where on a type -checked language you would reject also on type -checked. So if your mutation already runs into a type -checked, it doesn't even go to this has to be executed against the test. Also, on sophisticated type -checked languages, to my own experiment like in Haskell, 80 % of the mutations are rejected on types already. And the rest is then run against the test. And type checks is much faster than tests typically. Wow.

00:36:03	MARKUS SCHIRP
Nice. Well, I'm really excited to see this talk and learn more about mutation testing and the mutant gem. I think this will be a really interesting topic at the conference.

00:36:14	DAVID HILL
Interesting. So thank you for your feedback. I'm also a little bit hyped. So it's the first time I actually spent the time to... writes a proposal to actually carve out something in my calendar to go to the US. So this is also interesting experiments for me. I've spoken on other conferences, but I've never spoken on a big one. So we'll see how. Well,

00:36:32	MARKUS SCHIRP
you're coming into a good big one, I think. So I think this will be a fun time. Exactly. If people wanted to kind of get to know more about you online, is there any kind of social media presence or blog or website that you?

00:36:46	DAVID HILL
old Twitter account, which I regularly do nerd topics on. And there is a personal homepage with my first three blog posts I wrote in, I think the last two months, I started to write a little bit of stuff. I'm normally not a good, let's say, one -to -end communicator. That's fair.

00:37:02	MARKUS SCHIRP
I think a lot of us fall into that categorization.

00:37:06	DAVID HILL
Yes, there's also the GitHub account. There's lots of Rust stuff recently.

00:37:10	MARKUS SCHIRP
Okay. Thanks for coming on the show. This has been the Ruby on Rails podcast. It was a pleasure talking with Markus. Check out mutation testing in your language. For Ruby, check out the mutant gem. It's free to use on any open source project. Thank you for listening.