Daniel (00:00) The thing has booped and we are live. Dave, hi! So nice to see you. Dave (00:05) Kia ora, hey Daniel, how you doing? Daniel (00:08) Morning. I am good. I'm extremely tired because it was my birthday weekend and I didn't want to celebrate really. I wanted to have something where I sit at home and maybe watch a movie but I actually did meet up with some people or some people came over and I am sleep deprived but happy. I got really cool gifts and yeah, it was a fun weekend. And I'm really happy that I had the Sunday because we're recording on a Sunday today, which is out of sync for us, but kind of like the schedules has kind of worked out that way. I'm really happy that I did. And I'm also really happy that I had this Sunday to kind of recuperate because I didn't do a lot of things today. I was just like, I just like existed. had enrichment time in my enclosure. Dave (00:37) Lovely. We are in date. Ha I see a bit of self care after a hard weekend of socializing for you there. I'm good. I'm good. Well, my weekend, we can talk to this in a bit actually. My weekend has been debugging iOS 16, Swift UI and my app for nearly two days straight. Just weed whacking through stuff. can't, it's a whole thing. We'll talk about it in a bit. Daniel (01:03) Yeah. How are you doing? Okay, so if we want to talk about we got to start the show. So, hey, welcome to Waiting for Review, a show about the majestic indie developer lifestyle. Join our scintillating hosts to hear about a tiny slice of their thrilling lives. I'm Daniel, Duolingo streak owner, and I'm here with Dave, World Heavyweight Champion in indoor coding. Join us while waiting for review. Hey, Dave. Dave (01:32) We do. You Hello! Heavyweight champion! Am I all of 5'5 heights? yeah. Mind you, didn't say about width. But, hey! Daniel (02:12) Yeah. Also, also joined by Mimi apparently. Hi, Mimi. She's happy. Yeah. Dave (02:18) cat invasion, cat invasion. I caught that. Yeah, so where to begin? Well. Daniel (02:27) Like you're already half in your debugging story, so give it to me. Dave (02:30) Yeah, I am. So let's carry on. Let's carry on there. What happened? Dude, I've not been testing on my oldest devices and I let a version slip that was buggy AF on the older devices mired on iOS 16. It's the long, the high level version. So this is Dave needs to do better QA. I own that. That's fine. I made a lot of assumptions. Daniel (02:56) Hang on, last time we recorded, you told me about an issue where things didn't go wrong on the newest devices. Okay, so that's a separate issue. Dave (03:02) Yeah. Separate issue. Separate issue. Yeah, unfortunately so. So two, well, two major lines of regression. That one was down to something in terms of the timing in the, in the update cycle to do with the video pipeline I've got in the Metal View. Then, yeah, so that was fixed. I thought all was well. And then I had a bug report from a long time user of the app saying, Dave, I love the app, but. OK, here we go. Here we go. And he's got an iPad. I've got the same device here as well, which is why I was kicking myself because I can literally test on it. just in my hubris of being ready to release the app. Yeah, cut a corner. Daniel (03:42) Those are hard to read. Did you? Did you girl boss too close to the sun? Dave (04:04) Something like that, something like that took all my hair out. But no, so not good. Book report stutters in certain circumstances and bear in mind the whole point of the app is to serve video 60 frames a second, 30 frames a second, whatever your output's doing as good as it can go. So frame dropouts are not good and Then he also reported a couple of crashing books when you line up so many effects. And I'm like, yeah, okay, that could be to do with the device getting overloaded. But if it wasn't previously getting overloaded, what's going on now? And I did a bit of a deep dive into it, pulled out the older devices. you okay? can replicate the issue he's got. Certainly if I load up the effects and things. And then. Yeah, why is it so bad here compared to high OS 18 on any of my other devices? Yeah, sure, they're newer devices, but this is really pronounced. Something odd is going on here. And I ran it through the Swift UI profiler, which is a great tool because it shows you your view. reloads. It shows you the things like hang times. It shows you core animation commits. So you get a really good view on these different lines of analysis as to what's going on where. Daniel (05:49) That's really helpful. Dave (05:52) iOS 16, I'm noticing that every time I do certain activities, I'm creating a shitload of view reloads and then using the SwiftUI private API that you can use where you do let underscore equals big self dot underscore print changes. You can get SwiftUI to spam the log when the view reloads and that verified what was going on because it'll actually tell you there. Well, what triggered this this this you reload? It was my environment objects that are observable objects, and it was views reloading all the way down the stack, regardless of which property changes on that object. So if you change one property, even if the views are observing it, it would cause a reload of it just for having it there. Daniel (06:46) Just on iOS 16 though. Dave (06:51) Yeah, so they made observable a lot smarter from 17 and 18 onwards. And yeah, this has crept in the more I've lent into environment objects and passing stuff down the stack and leaving behind my old dependency injection. And a lot of that really completed with this last update. So here we are. And that's what was causing it. So yeah, scene set here. There's a few things that I know I can do to sort of get around it. Number one, stop using environment objects with observable objects that have got lots of properties on them so that this is Daniel (07:35) to rewrite everything in assembly. just like, no, performance will be better. Dave (07:39) Yeah, just, just, you know, got the exactly got the app out. So that's that's been my weekend. Yeah, it's been, I would say the opposite of fun, to be honest with you. Daniel (07:53) have you considered just saying like, IOS 16 is out? Dave (07:57) I have, but I just, feel really bad about doing it. So I checked my analytics, shameless plug I logged into telemetry deck, ding. And I can see that out of my active users over the last 30 days, last 90 days or so, I've got a cohort that is using the app who are on iOS 16 of about anywhere from eight and a half to 9 % or so. Daniel (08:27) Alright that is a bit significant still. It's not like zero point something. Dave (08:29) Yeah. So if I cut that, nah, it really needs to get down to that sort of fragments of a percentage for me to feel fully comfortable. Like anything, anything 1 % or lower, I would probably have considered it. but in this circumstance, I'm looking and I'm going, that's a whole lot of people who, A, currently aren't having a good experience when they use the app, you know, and B, if I don't support, they're going to move on. If they're subscribed they will leave me. And this just doesn't feel very fair. Like I feel like at least I should get it to a stage where it's working. Even if in six to 12 months time I'm like yeah you know what that's the last version you you're going to get on iOS 16 I'm moving on. I feel like I need to get it to a clean state. There's a whole load of other reasons in this as well and so hmm. I don't know if I've ever explained the values of my business, Daniel. Daniel (09:33) I actually would like to hear them. Please tell me the values of your business. Dave (09:36) So one of the core values of light beam apps is to make video effects and live visuals accessible to everybody. And so underpinning if you sort of pull the thread on that, underpinning that, you've got some of why do I want to look at running on Android? Because I can't say that I am working for everybody if I'm only supporting 50 % of the mobile space, for example. And then when it comes to older devices, a similar thing kicks in, right? Daniel (10:11) Yeah. Dave (10:17) just because you're on an older device, why should I pull the rug? And then the other side of it is, is that I'm aware that some people are using older devices because they're secondary. They've actually upgraded and then they're dedicating these devices to a final life of being part of their video rig for performance. Well, I don't want to be part of the nudge to those devices ending up in landfill. for as long as I can avoid it. Daniel (10:49) fair. Dave (10:50) Yeah. So get it to a good working version. Then at least if I do have to hoist up to 17 or 18, they can still use it on the old version. But it has meant a rewrite of the app this weekend. Daniel (11:00) Mm-hmm. Like a whole rear, like, could you just like, I don't know, like have two environment objects instead of one? Dave (11:13) There are several environment objects. There are several environment objects. The trouble is, is it ends up having to be quite atomic, right? Either I break it down to like an object per property, and in which case why I have objects at all. Or, yeah, many other ways of cracking this, right? This is dependency injection interacting with reactive programming. So long story short, Daniel (11:22) Mm-hmm. Dave (11:43) I have a dependency vendor, which is an object with a singleton because I've just gone YOLO. I need to this out. And then for many areas of the app, I have a local view model that is observing changes on the global objects and is then updating its local properties only if it needs to. And then the views observe that view model. I've gone backwards. Daniel (12:17) If you have thoughts on singletons, by the way, please write them into the YouTube comments so that the algorithm gives us a boost. Dave (12:21) Mm-hmm. Yep, or email me. I've got a rule that says anything with Singleton goes to dev null. So yeah, I'd love to say it's been fun. It has not. I've been sat with the oldest device going bit by bit through the app and then going, yeah, you rely on that. Right. I'm going to rip that out, create a local view model, decant all the properties and rewire it all. Daniel (12:32) You Dave (12:57) couple of other bugs spotted along the way as well within this scenario. Yeah, it's been the opposite of fun. Daniel (13:06) Yeah, harsh, harsh, but I get it. if that is one of their, like that is a very good value to have for your business. so, yeah. Dave (13:16) Mm-hmm. It's part of it. Like, I could accept moving on if they had a good working version, but they don't. So. Daniel (13:22) Mm-hmm. Yeah. Yeah, that too. Like, you want to leave... Like, if you want to leave off a generation of devices, you want to leave off with a good, stable version that is just like, here's one less update that is just like fixes the last few bugs, no new features, and then you're snow leoparded. Dave (13:41) Pretty much. Yeah. So, yeah, that's that's been my weekend and you're not nowhere near as fun as yours and nowhere near as much drinking. Perhaps a similar amount of sleep deprivation. Daniel (13:43) Sn-snortle. Hahaha Yeah, they have like, so one of the gifts that I got, I got really cool gifts, by the way, but like one of the gifts that I got was a thing that's called Tatra tea. I think it's from Slovenia. It is black tea that is fermented to have alcohol and not a little bit, like it's like 57 % or something like that. It smells like black tea. It tastes like black tea that has a kick to it. And it's dangerously delicious. Dave (14:29) You've got the tea, Daniel. Daniel (14:31) I got the tea. So yeah, do have, I don't know, like technological things to tell you though, because I want to tell you about the spike. But hang on, there's another item in the show notes first. And that is actually a question that I had for you that I really wanted to ask you. Like, why have you been sending me iMessages again? Dave (14:46) Mm-hmm. Go on. Yep. Mmm, because I'm gradually crawling back to being more Apple by default again Daniel (15:10) Like suddenly I get an iMessage from Dave and I'm like, wait, Dave hasn't he? Hasn't he moved to Signal? Dave (15:15) So, Signal is great and I am primarily using Signal. I'm also an iMessage. However, I'm aware that perhaps you're not or you've been using Signal for other things and when I'm on Android I'm on a different Signal account anyway because it's SIM based. So, you know, I can put you in a group message with my Android and iPhone Daniel (15:35) Huh. Dave (15:45) Accounts yeah Daniel (15:45) I didn't know, because Signal does have usernames right now, right? And you can have a desktop app and whatever. So it just assumed that you'd be able to have the same account for both your Android device. that's a bummer. John Signal, do something. Someone should call John Signal. Dave (15:52) Yeah. No, as far as I know, not. Yeah. Yeah. So. They absolutely should. They should raise the signal signal. No, but so yeah, I'm more, I'm more Apple-y than I was maybe a month or two ago again. And some of that is because I bought the new iPhone. So I've got an iPhone. Yes. Yeah, absolutely. In fact, I've got to say, Daniel, I'm really enjoying the iPhone Pro 16. It's definitely felt like a bit of a gratuitous purchase in some ways. The upgrade was sort of on the cards and I was debating like, do I go to... Daniel (16:22) Right, you got a new phone, you gotta play with that thing, right? You gotta use it. Dave (16:43) you know, like a pixel pro or an iPhone pro, where do I go? What, what, what I do, what do do? And in the end, had to admit that for all of my misgivings, Apple is still very much like my core digital experience and having a very good iPhone is still very enjoyable to me. so yeah, we, I can't remember how much we talked about this before, but yeah, we did. so the phone also was because of the, the bug hunting. And yeah. Daniel (17:16) Are you, just like, I think that's just like me maybe, are you just like a tiny bit more cynical about Apple as a company? Not about the people who work there who are still like amazing and really cool people whom I very much admire, but yeah, just like the cynicism has just like crapped up a little bit. Dave (17:24) yeah. Yeah, absolutely. Yeah, I think I described this to somebody the other day as Apple are in their Microsoft era and Daniel (17:42) that's a very good description. Dave (17:46) Yeah. And what I mean by that is I mean, Microsoft in the early 2000s sort of post antitrust around about the antitrust time actually. And I sort of feel like, well, you know, we've talked about this a lot this year, but like all of their behavior with the EU and that side of things definitely sort of feels reminiscent of that. but it doesn't detract from the fact I still really love the, the OS. love the phones. love the, my, my Mac. Yeah, that's strange one though, like because I would have said I loved the company a while ago and I remember you sort of ripping me for that, like you shouldn't love any company because the company can't love you back. Daniel (18:27) You It's also still my preferred platform for computing. How about do you think about the photo button though? Still the most controversial feature. Dave (18:34) Mm-hmm. I like it. I don't get much out of the fact you can move your finger up and down the button. That doesn't, it's just a button for me. Daniel (18:44) Yeah. Yeah, I completely forget about the fact that it has the touch capability. Dave (18:51) Yeah, until I accidentally skim it while taking a photograph or something and then it zooms and I'm like, all right. But yeah, no, I like the button. I like the easy access to the camera. Not sure I particularly care for what they've done to the silence button. Because that's now like a. Daniel (19:13) like I was a bit annoyed about it at first, but the thing is I never use that thing. Like guys, when I get a new phone, I set it to silent once and then that setting just stays. And so I have an action button now and it feels kind of good because I put the flashlight on there. And so I have like, because I use my flashlight a few times a day, just like when I'm looking for something in a dark corner and I just like turn on the flashlight. Dave (19:23) and it stays there, yeah. Daniel (19:40) And so that's a bit cool. The only thing I'm annoyed by is the fact that my display now always shows like a crossed out bell that says like, yeah, silent mode is on. And I'm just like, yeah, that is just like the way my phone is. Like, can we remove that somehow? Like, put on the icon when silent mode is off or something. Yeah. Dave (19:52) Yep. Yep. Yeah, I don't need to be told about it. It's like, no. Yeah. Daniel (20:10) Other than that, it's very cool. Dave (20:13) I'm enjoying the speed of it and the camera is better than what I had. I had a 14 Pro before though so you know that was no slouch in its own right. Actually shout out to my eldest son who is now loving having that as an upgrade. His YouTube channel benefits. Yeah but I don't know I love it. Daniel (20:20) Mm-hmm. I figure. Dave (20:43) It's dramatically different to what I had before. It's different enough that I'm still noticing some of the speed here and there and noticing like, the camera and other bits and bobs. And I'm like, yeah, that's lovely. But yeah, it's decision made. I'm not off to Android land permanently anytime soon. But I am probably going to be moving between both devices still as I test stuff and play with things. And it's still genuinely fun. to just be able to go, I've got an Android device. I can install that app and have a look at that. yeah. Daniel (21:19) I had a thought the other day about installing stuff on Android and also about like side loading stuff on iPhone. Like as you may know, TikTok is going to get banned in the US at some point in January or something. Like I'm still like unsure about like what will actually go down. But if they actually decide to ban that app, I assume they can't really, I assume they won't do it using just like Dave (21:23) Mm-hmm. next four or five months here. There's something happening over there, I don't know what's going on. Daniel (21:46) IP address block bands, because that seems like too harsh a measure. I assume they're just going to remove it from all the stores. And that means suddenly all these old stores are getting way more interesting to people in the United States. But also, they're getting a bit less icky for Apple, because Apple can then give its users a cool and important feature back that they lost. So yeah. Dave (21:53) Probably. Yep. Yes. Mm-hmm. Daniel (22:15) I have no conclusion to that thing. I was just going to be watching and be like, huh, will this change the landscape regarding side loading and stuff like that? And also, like on Android, will people routinely install alternative stores there instead of the Google App Store or something and just get used to the idea more? Dave (22:24) Maybe. Yeah, potentially. could be a shift in the market because of that. I can see what you're describing as well. There's two separate pressures there on Apple, or Google for that matter, which is this. You've got the pressure of adhering to the political environment they're operating in, which they have all been to in one way or another if you look at their behavior in China and other places outside of the US. And that's that combined with consumer demand as well. People are not going to want to just stop using services that they are using every day just because their government has turned around and said, that needs to go. Like it's actually, that demand is, is, doesn't go nowhere. And a whole bunch of people go, well, okay, I'm going to go and use Instagram or use whatever as a replacement. But there is a. Daniel (23:30) Yeah. Dave (23:33) There is a demand there where people are going to be upset and annoyed. And if Apple sort of thread the needle, for example, and give customers that side loading access, and that's put one of the ways through, there's a whole cohort that will be sufficiently motivated to use it. And then those that are not will happily go on in their regular App Store lives and not be fast. So, I could see it happening. I could see it kind of that way, potentially. Daniel (24:04) Yeah, I think it's still unlikely, but I think the balance might subtly shift. So let's see if that does anything. I did install 18.2 the other day, and I don't have any Apple intelligence features still because I'm in the European Union, but I did get a screen that asked me to choose my default browser. I chose Opera. Dave (24:07) Mm-hmm. Yeah. OK. Yep. Hey. Yep. Daniel (24:29) I chose Safari, but Opera was at the top of the list for me. I think they randomized the list though, to make it really fair. Dave (24:34) Yeah, I'm playing with that a little bit. I'm using Vivoldy at the moment as my default then .gov for every now and again, because they've got a browser. Yeah, yeah, it's all right. Daniel (24:47) They do, I have that on my phone as well. But it's very obviously a fork of Firefox mobile because I too used to maintain a fork of Firefox mobile for a previous job. I just very much recognize like, yeah, this is Firefox, this is Firefox, this is a custom element. I put a different custom element there. Dave (24:55) I guess. Yep, yep, yep, yep, yep. And I still go about Safari for things sometimes because there are certain things that just don't work very well on the other browsers because I use Proton for my password management. That's been quite interesting, actually. So yeah, between operating systems and different devices, having certain core things that are Daniel (25:22) Mm-hmm. Dave (25:35) everywhere has been very, very helpful. And that's been proton for me. Email and password management. Yeah. Yeah. Daniel (25:41) The email software provider. I see. Yeah. Yeah, that makes sense. Dave (25:50) But which we shall see. really hope it doesn't get as litigious and weird as that in terms of Apple and Google being forced to pull things like TikTok from the store. But yeah, I could see it happening as well, just seeing the way things are going to go. So maybe maybe it's time for Fediverse related video content. Daniel (26:13) We live in interesting times. Dave (26:18) We do unprecedented times Daniel. But I would love to hear more about the spike. Daniel (26:29) So yeah, I put into our show notes the words, The Spike, capital T, capital S. And what then happened is that Dave kind of put underneath a photo or a picture of Spike from Buffy the Vampire Slayer. And that picture is somehow so large that my computer is still like lagging. I don't know, like how many pixels, like it is, is struggling. it's the visual size is just like right for the document it's in. Dave (26:48) to shrink the window. Yeah, yeah. Daniel (26:57) But just when I scroll, every time that thing comes into view, it's just like, I'm losing frames. Dave (27:02) How big is it? It's not even that big! Daniel (27:05) And he's he's like, he's like piercing with his piercing look. I'm getting all like distracted. Dave (27:12) Yes. Daniel (27:15) his cheeks and everything. Yeah, very much so. Dave (27:15) Yeah, he's smoldering there actually. But if you're watching on the YouTube there is you've probably seen him already because I will probably end up using James Master's spike for the thumbnail. Daniel (27:27) Ha ha ha. But yeah, I want to tell you about my, was it Tuesday, I think. I want to tell you about the Tuesday that I had, Monday is usually the day where I have, I try to put all my meetings, right? So Monday is usually a day where I spend mostly meetings and then Tuesday is the day where I'm like, okay, I now have time to work on the things like technical stuff. So I was like, okay, I'm to sit down. I'm going to open my development environment, open my notebooks, and sketch out the plan, the things that I need to do for our next big feature release, which is going to be the change in the pricing structure, which has so many working parts already. And now I was like, OK, this week I'm going to try and put some of these puzzle pieces together. So I sit down and I pull in all the different pull requests that I already have, try to combine them. And how that works is I'm developing the frontend locally, and it points to the existing API. And at some point, I'm like, this is answering more slowly and slowly. Like, this is annoying. Did I turn on the throttling or something? And it turns out I did not turn on the throttling, but it stopped responding at all for a second. And then I was like, huh, what's that? And in that moment, Dave (28:36) right. Yep. Daniel (28:57) My phone makes the distinct sound that it only makes when our uptime monitor says things are not up. Dave (29:08) no, what sound is that? Is it a special alert you've got or? Daniel (29:09) And so it's... Yeah, I can't really replicate it, I think. Yeah, no, don't think I can replicate it. Dave (29:18) Right. Listeners of the show, what will if you can send that to me, I will insert the sound at this point. And if I haven't done that just after here. then you will know that I failed to do that. But if you've just heard it, then great stuff. We've given you the sound. Daniel (29:35) Fantastic. I can also just like shut down the API for a second. No, I won't do that. So anyway, and the different alerts are going off. So the main alert is going off for the ingest API, which is the thing that I really do not want to go down because the data is not arriving properly, but also the regular query API that powers the front end and also various sub things. Dave (29:39) Please don't. Daniel (30:03) And so I'm like, what is this? Push everything to the side, look into our Grafana dashboards. And usually when I look at those dashboards, I look at all the different charts, like how many queries are coming in, like how are the different APIs and stuff working. But I see an anomaly immediately in the upper right corner where it is just like the traffic chart, which usually just hums along. And so usually around this time of day, we get around 6,000 requests per second. And that kind of goes up to 9,000 during the early afternoon. And this morning, suddenly, I see like 30,000 requests. And I'm what is going on? And spoiler alert, the thing that slowly increased and increased over the course of, I want to say, two-ish hours. Dave (30:51) Yo, hi. Daniel (31:01) until it was actually like 60, 70,000 requests per second. And I really don't know. I really don't know. I was very confused. The thing is, two problems were now immediately occurring. One is that our query API has, just for historical reasons, basically, it has this health check endpoint. And what that health check endpoint does is actually it tries to query the metadata, like the Postgres database. Dave (31:08) see that there yet. Ouch. Daniel (31:37) to see like, OK, are you there? And also tries to query, like to send a signal, basically. And so because the ingest API that accepts that signal went partially down or degraded, also that health check endpoint for the query API was degraded. And that caused the load balancer to start restarting the containers that ran the query API. So not only did our backend go down, the frontend also was kind of down, just because because the back end rolls down. So one of the things that I really put on my to-do list is to decouple that thing because we have our own monitoring for the Indus API. We don't need that at all. The second thing is we do have a very powerful message queue. And that message queue is like we use Amazon Kinesis, but you can also use Apache Kafka. Dave (32:14) Mm-hmm. Daniel (32:35) that message queue is super powerful because it has this concept of shards. And shards are like lanes on your road. Like if you have a multi-lane road, like in theory, least things go, like it can support more traffic. Like in practice, that metaphor kind of breaks down, like just, but that for this message queue, things are actually true. Like one lane can support, I want to say like a thousand-ish messages per second. Everything above that. it will actually throw an exception when you try to put stuff into it and you're supposed to either wait or choose a different lane. Thing is, when I set this up, I made a mistake. so all the... And whenever you put in a message into that message queue, it will... Like there's the... can add a parameter and that will... Just a string parameter and it will hash that and use that hash to choose a lane for you. so that you can group things that belong together in the same lane, but also like kind of distribute among different lanes. Turns out my big mistake was that at some point when I didn't understand that concept yet, I just put a hard coded string in there. So even when I was in my panic, know, like scaling up that message queue server to like, ungodly amounts of power. It was still very much constrained because it was like, yeah, like everything is going through this one lane and that will still complain if more data gets through. So in the meantime, I'm trying to like to batch the rights to the lane because I still haven't understand that this is a problem. So I'm trying to batch the rights and that helps a little bit. I'm also scaling up all kinds of different, like I'm scaling up the ingest API containers because if it's more there, they can kind of like wait a little bit, they can afford to just keep a bit in their RAM and then just write everything out, but that makes their answers slower, of course. And then it kind of hits me like, what are these shards? What are these shards? OK. And so yeah, I'm using a UID that I generate when the container boots. We have, I don't know, 10, 20 containers for that API, so it's like 20 lanes now. spin up more containers, it be more lanes. It is so easy once you think about it, but I just didn't know. So that's very frustrating. So we finally find that out. And I'm like, awesome. The error rate is dropping, dropping, dropping. There are just no errors anymore. And then shortly after that, the traffic also stops. Or decreases significantly, at least. Dave (35:28) Okay. Daniel (35:28) What is going on? Like, want to be able to, I want to try this out. Like, I'm super proud. Like, now we have found this bottleneck. So please. And of course, it makes sense that the traffic also goes down because our SDKs are set up in a way that if the server throws an error, they will retry. And what they will do is they will retry. They have like different logic, but like they would retry within like... Dave (35:31) Yeah, yeah, yeah. Yes. Daniel (35:59) half a minute maybe or a bit less. But they also have like some of the SDKs also have a bit of a back off in there. like traffic goes down and like things are just like suddenly quiet and I'm like, okay. Why is this supposed attack suddenly stopping, right? And I'm like, okay, I guess like let's just like try to clean up and everything. And then Like traffic is ramping down, not to its normal levels, but like to half of the spike or something. And then another spike starts. I don't know. Okay. What is going on? This one has way less errors, of course, because we have kind of found out how that works. And so we can accept most of the data. We still have to have some scaling issues. And then that suddenly cuts off and a third very smaller spike comes in, then a super tiny one comes at the end. I think, and then suddenly... Dave (36:55) and then normality rains. Daniel (36:58) everything is just over. Like everything is just back to its normal state. We are back to normal. Like ever since too, ever since then, like it has not returned. Dave (37:04) Wow. Daniel (37:13) So what I did after having a sit down and a cup of coffee is I looked at the data. First of all, I did was I wrote down what did we learn. I wrote down that I would really like to decouple ingestion API and query API more. I wrote down that I want all our SDKs to have exponential back off. Dave (37:29) Yes. Daniel (37:42) because I'm reasonably sure that these follow-up spikes are retries that are very much not distributed because they just go down exponentially like an echo. So I want all our SDKs to have, something's not working, I wait 10 seconds. it's still not working, I wait 20 seconds. it's still not working, 40 seconds, and so on. So that they kind of become a little more chill about the whole thing, you know? Dave (37:43) Mm-hmm. Yes. Yeah, you can also add a bit of randomness to that as well, which is probably a good idea. Because then if you have something that is literally triggered where a load of things happen almost simultaneously, then it will help smooth. Daniel (38:28) Right. And the third thing that I really want to do is I want to decouple the, like because the ingest API is where all the signals arrive, right? And before that stuffs anything into a message queue, it will do a whole lot of processing, data generation, hashing, stuff like that. Also like a bit of sanity checking basically. And only then, Dave (38:29) Yeah. Daniel (38:55) It kind of transforms everything and then it has a fully formed telemetry deck signal and will stuff that into the message queue to be off for storage basically. And I want to change that so that this whole API should be split in half and there should be a very, very tiny part of the beginning. There's just like door opener. You know, it's just like, yeah, there's something that is reasonably JSON, put it on message queue number one and I'm done with it. And then the rest of the processing Dave (39:19) Yep. Daniel (39:25) can just like pick things up for message queue number one, do all that processing and then push it on message queue number two, that is the way to storage. So that's something I'd really like to do because that makes a lot of sense and that will hopefully allow us to do that better. But yeah, I did look at the data and I still don't know what it is, like spoiler. Like this is not leading to the grand conclusion that I know where this is coming from, which is very frustrating to me. Dave (39:48) Ha Daniel (39:55) So the data mostly came from Android apps. so first of all, the data was like reasonably sure that these were like proper telemetry signals from actual devices that were humans were using. And the reason why I'm saying this is a they were like properly formed telemetry signals. They were attached to existing apps that are reasonably large. Dave (39:55) Yes. Daniel (40:24) And they had all the, like they had very like metadata that was just very checks out, know, like system versions, stuff like that. Also, we have these user IDs, right? And they only live on the device mostly, and they need the salt to be like salted on hashed on device. And those were congruent with existing user IDs. That's the other thing. Like during that spike, almost no new users, at least no more new users compared to other days, entered the system. These are all existing users. that suddenly sent more signals, but the number of users kind of stayed flat. And also, of these signals that were arriving came from Android devices, which is odd because we have way more iOS devices than Android devices. I think it's like iOS, Android, and then Mac, and then web analytics. So yeah. Dave (41:20) period. Daniel (41:37) the data that was suddenly sent was proper data. It doesn't smell like an attack. Also, the backing off behavior, the retry behavior tends to point towards the actual SDK that sent this. It was not from one app. It was from different apps from different organizations. Dave (42:05) Okay. Daniel (42:06) one person making a mistake and having an infinite loop in their app or something is kind of out of the question because those were different apps. But what they have in common is that they are all on Android. But they are also not because my buddy was like, hey, maybe developer preview 16 is out. It was just out a few days ago. Maybe just something to do with that. But. Dave (42:11) Mm-hmm. Daniel (42:32) No, most of the signals here came from system 14, is an established version of Android. Dave (42:40) specific version of your SDK. Daniel (42:44) No. So the plan is now we have our data science intern who is very smart. And the plan is to have him just like dig through the data a bit more and just like give him the opportunity to say like, okay, let's just slice and dice this in a good way. Dave (42:46) Okay. Yeah, because I'd be looking and I'd be like, OK, I want to know the apps that it's come from. Does the percentage against each app match your usual signal base? No. Daniel (43:16) No, that's the thing, like three apps, three apps suddenly send at least five times more data than usual. But the thing is, I think it's even more, but like we did lose some data on top, so it's hard to say exactly how much more. That's also, that's actually a bit of a light at the end of the tunnel, because I hate losing data. Like last time we lost data was like 2022 or so. Dave (43:20) Mm-hmm. Daniel (43:43) And that's an hour of data or so that is mostly lost. thing is, most people will not miss any important data because a subset of the data still came through. And there's not new users that were lost. It's just some usage data, but not actual users. So yeah. Dave (44:01) Yeah, I've got questions. I've been thinking as you've been talking, like, yeah, were they in the background or foreground when this is going on? you say, is it signals happening because of something being activated in a background push, for example? Is it, so I don't know if that's something that could happen. I'm wondering if it's all come from the same geolocation, whether there's anything. Daniel (44:05) Hahaha Mm-hmm. Dave (44:31) country oriented to do with it. And then the other thing I guess I'm wondering is if you've isolated it to three key apps with five times more data than usual, well then the next step is to go, what have they got in common? What are they doing that is similar? Daniel (44:47) I wrote the authors and they were all like, no clue mate. Dave (44:54) It would be interesting to know what, hey, thumbs up there, thank you to reactions on my camera. But it would be very interesting to know if they're using any other third party libraries together. Because that could be the other tell if there's some interplay between another third party library, for example, sending a lot of push notifications all at the same time or on a Daniel (45:10) Mm-hmm. Mm-hmm. Dave (45:24) particular schedule and if that then creates a signal back to you, that's the sort of thing I could see being out of your control, but could create something like this. Being as their Android, you might be able to do a bit of sleuthing yourself there because you could download the AP case and compare the libraries in them. But it's. Yeah. Daniel (45:36) Yeah. could, yeah, I could do that. Not a bad idea. Like, it's hard because on the one hand, I wanna go get to the 100 % to the bottom of this. Like, I wanna see exactly what happened. On the other hand, I'm like, I have so much to do and I did the most important things, which is find a really important bottleneck and fix it. So the next time that happens, I'm kinda, like part of me is kinda hoping that it happens. Dave (45:58) Mm-hmm. Mm-hmm. Daniel (46:17) I am reasonably sure that things will just work. So that's really cool. So that's a load test for our system. And I think the next load test will be really good. So next time our customer comes with hundreds of thousands of requests per second, we're going to be ready. Dave (46:21) Yeah, that is really cool. And I think that's a good. Yes. Awesome. That's really awesome. And I think that's the probably the better attitude to have rather than my make a list of everything and try and get down to everything. You can't. Yeah. Daniel (46:48) No, I think my strategy is A, I posted a blog post about the thing, but also that it's kind of a mystery where it's coming from right now. And I'm also talking about this on this program. And so that is me kind of shouting into the void, anyone have any ideas? But of course also my esteemed data science colleague will have a look. And so I'm split up the work. Dave (47:00) Mm-hmm. Daniel (47:17) I reached out to the authors of the apps that sent more data. They are also kind of surprised, but that's just an important step. So yeah, I'm doing something, but I'm not going all in on this because priorities, think. Dave (47:34) Yeah, that's reasonable. I remember a while ago, it's a complete tangent here, Daniel, so please forgive me, but I remember a while ago listening to a Tim Ferriss podcast where he interviewed an ex-military. Daniel (47:49) Inventor of the V-wheel, by the way. Dave (47:52) Yes, yes, absolutely. And one of his descendants worked on the London Eye. no, he was interviewing a military commander. I can't remember from which bit of the US military. I want to say like the Marines or something. It was really high up. He was fascinating to listen to because he talks about risk and how you respond and mitigate risks. Daniel (47:59) Hehehehehe Dave (48:22) And so the whole concepts of risk management is like where, you know, you've got the threats to your operation, you know, in this case to your business, to your stack. And then your risk tolerance is worked out by your ability to mitigate those threats. And I think you're thinking about this in quite a practical why because if you look at this well okay there's a risk this could happen again you know tomorrow you now know like how things held up against it this time which was still pretty damn well you've got a whole bunch of things you can do to sort of help make their ability to respond to it better but the reason i like this sort of point of view of sort of risk and threat management is you can't eliminate every single threat every single problem The only thing in your control is your response to it. So this this makes a lot of sense to me. Yeah, I think you're you're going about it the right way. I've been analyzing your blog post as we've been talking and I can see here. So all of all of your your mitigations are sensible. You know. Daniel (49:34) Hahaha I think, yeah, that one thing that I've been not bad, but at least mediocre at is like communicating when something like this happens. And ever since we added the notification system to telemetry decks dashboard that that helps, but also we have a status page status.telemetrydeck.com and of course social media and a blog. And I think this time I'm happy with the amount of information we gave up out immediately, is like just, I immediately, as something started, I just put in there like, okay, something. Something's going on, we're on it. Sorry for the decreased performance or the, but also I think you're speaking about priorities. And I think, I think kind of my priorities on the, on the risk mitigation are pretty clear because like we've been doing this for a while now, right? So the first priority always has to be no data loss. And so that means like data that arrives at our ingest points, that needs to be somewhere. Like there needs to be like we need to do that. So that's what what bums me out the most is that that didn't work 100 % right. And then once like once the data is actually actually there somehow, I kind of wanted to be queryable. Like doesn't have to be real time. Like if we get behind a few hours because too much data is coming in, it's better than no data at all. So I want the query API to be at least somewhat responsive. Dave (51:10) Yep. Yeah. Even if it's working on like a slightly outdated or paused view, right? That would still be good for users. Okay. I'm not getting any new signals, but I can still play with everything that's in the box. You know. Daniel (51:12) And then we're getting into like... Right. Right. Exactly. Yeah. And so and then we're getting into like, okay, it would be nice to have but those two things are, I'd really like to have those. And so that's like my top priorities. Dave (51:38) don't want to say, me too. And we're the same here. But if I think about my debugging weekend with iOS 16 and where I've ended up with sort of how to mitigate that, it's again been about decoupling things and about breaking down the links between things so that I have control over how they work. And in this case, my risk and threat was the Daniel (51:56) Mm. Dave (52:05) the fact that SwiftUI behaves very differently with observed objects between operating systems. Daniel (52:09) But that is so frustrating because like you kind of have to test by hand on all these different systems. Dave (52:13) Yes, a little bit. I've got a few ideas what I might do about that sort of thing, not least of which I think I need to get better on my tracking of performance with my app in terms of how users are actually operating it. There's certain things I could track. Daniel (52:28) Mm-hmm. I mean, you could probably send telemetry signals with an averaged frame rate or something. Just set it as a float so we can average it on the server. Dave (52:39) Exactly that. Yes. So I think Yep, something like that. And probably something to indicate hangs or dropouts as well, if it's been longer than x amount since the last frame, then that could do it. Daniel (52:56) You could even use, have you heard of our Lord and Savior metric kit? Dave (53:01) I have not, no. I need to give that a look. Daniel (53:03) So metric it is a thing that people have been asking me to support directly in telemetry deck for a while. And I haven't come around to it because I kind of, it took a while to understand it, but now I'm kind of like, yeah, this is really cool. I want to do it. So basically what it is is you on in your, in your device code, also on your, in your app, you say, Hey metric it, I want to subscribe to metric notifications. And once you have activated this every like 24 hours, Dave (53:09) Mm-hmm. Yeah. Mm-hmm. Daniel (53:33) or every app launch, but no, every app launch, the previous one was like 24 hours. Anyway, around once a day, you get a notification, or your delegate method gets called with a metric at object, and that gives you a heap of information about performance. It gives you like, how many bytes has my application downloaded or pushed to the internet? How many frame drops did I have? How many crashes did I have? Dave (53:34) Mm-hmm. Right. Daniel (54:03) how many times was my application killed by the memory watchdog, stuff like that. What I've done on the telemetry deck side is I have a prototype and I actually have like, I have reserved all the names for these properties on the server, but I haven't come around to really. So yeah, the next thing would be, it's a pretty deep dictionary. Dave (54:22) Right. Yeah. So you could you could integrate it. Daniel (54:30) flattening everything into one level and then pushing it out and having a nice user interface. That all hasn't happened yet. It might happen at some point. might not. Other things are just above on the priority list. anyway, my recommendation to you is look at that and see if you can pull out three, four performance key metrics and just push them into telemetry deck right now. Let me know if you want to talk about Dave (54:40) I'd say that. Yeah. That's a good shot. I've been looking at... Daniel (54:58) Formatting you can even like send or whatever Especially like numerical values and stuff like that But yeah, and that you can you can have less like a general dashboard that gives you like an idea of is the new version worse performance wise than the previous one Dave (55:08) Awesome. There's a few things I want to group together. yeah, the stuff I've been thinking about with this is if I can record these events, I probably want to have a view of app state that I am consistently sending with my signals. So that boils down to, they playing, what are they playing in each channel? Cause I have two channels of video that are getting mixed together. And so if I think about the sort of state I'd like to see is like, well, are they playing a video? Daniel (55:31) Mm-hmm. Mm-hmm. Dave (55:47) using a photo, are they using the camera or any other feed? Like what is in each channel? I don't need to know down to like literally the video or the image. I just need to know it's one of those types of objects. And probably accompanying that would be information on how many effects they've got loaded and which ones. So there's four effects on each channel, knowing which effects are loaded, knowing which blend mode they're using between the two. Daniel (55:50) Mm-hmm. Dave (56:15) those metrics are probably enough for me to be able to combined with a view of OS, a view of device type, get down to how the oldest device is holding up under load. For example, if I'm recording that with a frame rate signal that's sent once a minute or something like that, or something that kicks in if there's a significant frame drop, that could start to give me some good insights. Daniel (56:29) Mm-hmm. Hmm. Dave (56:46) So yeah. With metric kit, Daniel, I've just noticed they support, it's supported by Sentry. And I'm looking at Sentry for crash reporting as being a, yeah. Daniel (56:47) Yeah. I am. Totally. Dave (57:04) But could definitely combine both sentry and telemetry to give me proper insights. I need to spend some time and I need to do this, do this well. I've got, like I said, I'm to get to the fix within my app probably within the next week. I want to push a version almost as soon as I'm comfortable. is working very, very well. Daniel (57:16) Mm. Dave (57:30) so that users can have it in their hands. And then this next step of remedial activity to kind of like bolt the door on. Because the issue here is not only that I had the regression, it's that it took a customer telling me for me to find it. That is, in my opinion, not acceptable. I would rather that was not my situation. So I need to take some actions to make sure that doesn't happen in that way again. Again, you can't prevent everything. There's always going to be some edge case. But I would rather I had heads up. You know, I've got data that is telling me, dude, you've got an issue with this release. So, yeah, I've got some work to do in that regard, I think. But yeah. And anyways, yeah, decouple debug. Daniel (57:59) Yeah. Dave (58:25) Lock the things, I think is probably the key message we're Daniel (58:31) Yeah, you should have to sort of a third thing that starts with a D or something to have like a little ration going on. Dave (58:36) That's almost, it's almost a very topical alliteration that I want to avoid on this show. Yeah, yeah. Keep moving. Keep moving. Daniel (58:43) Alright, I get it! Alright, I think we're done for today anyway, but I wanna say goodbye today with a quote from one of the funniest things that happened on the internet last week. just in my eyes. So I'm just gonna say, anywho, I'm gonna go to bed and I'll see you guys tomorrow. Bye. Dave (58:53) Mm-hmm. Go on. Good. Daniel (59:17) I'm gonna put a link into what happened with Hawk2rcoin in the show notes. Dave (59:23) OK. OK. Yeah, that's quite topical. It's going to be ancient news by the time this episode goes out, because this is the episode that will go out around Christmas time. But absolutely link that up. Daniel (59:33) Right. Speaking of, speaking of, I am gonna be traveling a little bit next week and then it's Christmas and then it's New Year's. I'm gonna have my microphone with me, but I can't promise anything. So there might be a slight break in the shows until we are both in front of our computers again. Dave (59:44) Mm-hmm. Well. Absolutely. And I think it's probably fair to say we've been kind of zigzagging a bit about the regularity of releases of the show over the last couple of months. think setting expectations for anybody listening here after this episode, it will probably be 2025 before the next episode goes out. Because this will go out just before or around Christmas. Then we will record or not in the intervening time. And in either case, even if we record Daniel (1:00:17) Yes. Dave (1:00:27) that probably still won't go out until after New Year's Day. So this is the penultimate episode of the, no, the final episode of the year, I think, the time this goes out. And that's fine. But I do think we probably want to get back to some sort of more regularly scheduled routine come January. Yeah. Daniel (1:00:46) yeah, and I think we will. We will. Like December has been a bit chaotic, but I have high hopes to get back into more of a routine soon. All right. Have a fantastic Christmas or end of year, depending on whether you celebrate any of it. Have a fantastic time. Launch some rockets, but only if you're safe doing it. Dave (1:00:55) Yeah, beautiful. Mm-hmm. Yes. Yep. Daniel (1:01:13) And we see you on the other side. Dave, it's been a fantastic year with you. And I'm looking forward to the next one. Dave (1:01:21) Likewise, Daniel. Me too man. Well, you've outro'd us, I do believe. And where can people find you online? you've not outro'd properly. Come on, come on. Daniel (1:01:31) I haven't out-rooted us properly. Yeah, exactly. So yeah. So thanks for listening so much. Please rate us on iTunes and on YouTube. Send us emails at contact at waitingforreview.com and join our Discord. The link is in the show notes. All right, people can find me at daniel at social.telematrydeck.com on Macedon or breakthesystem on Blue Sky. And where can people find you, Dave? Dave (1:01:57) Mm-hmm. People can find me mostly on Instagram for my apps. My user account there is lightbeamapps.com. That's dot com spelled D-O-T com. And you can, let's just link Fetty again, actually. You can find me over on Mastodon. I am at Dave at social.lightbeamapps.com. Daniel (1:02:25) Fantastic. Awesome, then have a fantastic day. yeah, I'll feed the cats now. They're getting a tiny bit antsy. no, they're lovely. Bye. See ya. Dave (1:02:33) You better, they might cut you. All right. Take care, Daniel. Bye bye. Boop.