Fly.io with Dov Alperin
===

[00:00:00] 

[00:00:00] Dov: Hi there and welcome to Pod Rocket, a podcast brought to you by Log Rocket. Log Rocket helps software teams improve user experience with session replay ever tracking and product analytics tried for free@logrocket.com. I'm your host Paul, and joined with us is Dove Alperin. Dove is a platform engineer over@fly.io and we're actually doing this episode on a listener request.

[00:00:23] Paul: We're gonna dig into Fly io. Learn what it's all about. Some other interesting tidbits to dig into, such as talking about Phoenix and Elixir apps and Dove, you've had your time around the IT space and the programming world. So it's a pleasure to have you all. I'm excited to get into it.

[00:00:40] Dov: Thank you. Thank you for having me. Yeah. I'm always excited to talk about what we're building. I think it's some cool stuff.

[00:00:46] Paul: How long have you been at Fly Now to date? 

[00:00:48] Dov: A little over a year in some capacity or another. I worked for Fly on Contract before I became a full-time employee. and then I've been a full-time employee [00:01:00] for eight months or so.

[00:01:02] Paul: Gotcha. So were you a contractor, self-employed? Were you working with a larger agency?

[00:01:08] Dov: Yeah, no, I was self-employed. I wrote as a community project, a Terraform provider for Fly, when I was just a user, cuz I, I wanted to do infrastructure as code. and they at the time had just introduced the, like V zero of the Machine's api, which is our like, underlying compute api. and they wanted the Terraform provider to support it.

they. Agreed to support the development of it, which was awesome. And then as time went by, I ended up doing more things and then finally I landed myself the full-time gig there.

[00:01:44] Paul: So before you got into the full-time gig, you were just like out in the wild experimenting with like making your life easier and making other people's lives easier. It sounds like you were really in like that infrastructure world. Operations world. What? What was your role like beforehand?

[00:01:59] Dov: [00:02:00] beforehand I've only ever done, like freelance software development. why was like really interesting to me cuz I was trying to find a new place to move a bunch of, apps that I had built for customers to one place cuz it was all over the place. And I didn't like the, I didn't like the experience of how everything was hosted.

It just, it bothered me. Everywhere else because it would be fine for exactly what it needed to do right at the beginning. But anytime I try to at least step outside of it, I'd have to remember like weird specifics about a given platform. And I don't like having to remember weird specifics about a given

[00:02:35] Paul: Right. That's. Stressful.

[00:02:37] Dov: my code.

[00:02:39] Paul: So when you're saying there's existing deployment solutions, we can all think of a few. there's the big. Cloud providers, we've got Amazon, we've got Azure and all that. And Fly is an underdog in a lot of teams where it's it's not the first thing that people are gonna be like, let's go host on fly.

But when people turn to it, it's like an, A positive [00:03:00] response gets solicited. And you're reflecting that here in your user experience where you're like, I want to use it so bad that I just wrote my own Terraform provider so we know it's a hosting platform, what are some of the pain points?

Or mis organizations that you maybe experienced in other platforms and providers that you were like, I want to do this imply I'm gonna write the Terraform provider cuz I don't really need it.

[00:03:20] Dov: I'm gonna step back for a second and give the elevator pitch of what it is that we do that I think is an interesting model compared to, the AWS's or gcps or Azures of the world. So the premise of our, of our platform is when I think about it, it's two separate things.

So one there is, we don't wanna lock you in. So if you bring us a Docker file, we, we will turn that into a virtual machine for you. But you can use your existing Docker files, you can use all the existing stuff that exists on Docker hub. You can use all the tooling that you know everyone already knows for Docker.

And then when you're ready to deploy, you can just hand us that Docker file [00:04:00] and we will do everything that needs to happen to turn that into a full virtual machine rather than a container. So that's one thing that I think is really interesting about what we do because we make it really easy to take fully custom solutions.

Because Docker has become a bit of a standard and just moved that over without having to deal with, idiosyncrasies of our platform, hopefully if we're doing our job right. And then the other half of the platform is answering the question of like, why do CDNs exist is sort of a nebulous question, but.

the like, big answer that a lot of times you get when you ask round is because you know you want there to be some sort of assets, whether they be cashed near your users. So that stuff that, so no user is far away from. Your primary region or wherever your main hosting provider, US East One, is hosted, you still want those users to have a fast and good experience.

the vision [00:05:00] for Fly in this, you know, obviously predates me, is like the idea of what if you could just run your application in all of those places. what if you didn't really need a cdn, you could just run your application everywhere that your users are, and. Not have to worry about like setting that up, not making a hundred e C, two instances that are gonna cost you a million dollars and trying to figure out how to orchestrate that and all that.

And instead just what if you could enter a command to pick some regions from around the world? Hit enter and your app is just instantly replicated there. So

[00:05:41] Paul: the two facets you talked about are like we're talking replication and deployment at a global scale, and just this general idea of I want a Docker container. It's my Docker container, it's no different and I want a machine from it. These are kinda like the two user stories. Gotcha.

[00:05:57] Dov: Yeah, those are what I'd say like the [00:06:00] two big selling points are if I had to like, Break it down, but really the main thing, the, the one sentence or two sentence thing is we run apps close to your users and we run your apps without having to do special things to make it work.

[00:06:18] Paul: And that sounds cathartic to me as a developer myself, cuz I'm tired of having a boot up, an easy two instance and then. Do anything more than just, the deal with the Docker container. That is something that was a large appeal of Cloud rum that brought a lot of customers over to gcp.

[00:06:35] Dov: right. And that's definitely like a big thing. But we felt and you know, using the royal we, cuz this, these decisions predate my time at the company, but. there is still a lot of benefits to running full virtual machines, right? You get, you, you get built in isolation, so you get a lot of security there.

And using Firecracker, which is the open source hypervisor, born out of, [00:07:00] Amazon, which made it for Lambda. Using that we can get really fast start times. but we don't wanna force everyone to like make a disc image and figure how to upload that. And so you can, it's, we're not actually running containers, which is the cool thing.

We don't run containers. We just understand how to speak the OCI container spec and turn that into a real disc image that a virtual machine can boot.

[00:07:27] Paul: Oh, that's a neat distinction that totally unaware of until like you would speak it. And I, I could hear you say that. So it's really like you have your own OCI interface and you're custom interacting with it.

[00:07:39] Dov: Well, so we pull the user's image that they tell us, they point us to an image. We also have a registry if users want to, push there. Or if they use our, free builders, it'll automatically go to that registry. but so we pull the image, And we unpack the layers and basically, in, the like container spec, a bunch of layers that just have like stacks of files.

And so we can take those and we [00:08:00] put them in, we build a root file system with it, and then we, provide a kernel and we provide an in all the things that would usually be provided by the host os. And when you put it all together, you actually end up with a fully functioning virtual

[00:08:16] Paul: on. And that being said, you can still log on to fly and in a traditional capacity, in, in some senses, you can say oh, I want a four x large machine. Like you folks still support, just booting up an arbitrary image and of a certain size.

[00:08:34] Dov: the, we don't support arbi. You can't upload a disc image. you can give us, we only right now accept a Docker file, but you can use that in a non-standard way to express what you want. That isn't necessarily very Docker esque. So if you're like, I just want a Dian user space, you could just pull the Dian image, Debian the Debian image, and.

We'll take it from there and you can specify obviously, how many [00:09:00] CPUs you want, how much memory you want, all that sort of good stuff. If you want any storage devices attached.

[00:09:09] Paul: Now, could you talk to me a little bit duff about databases and data co-location? Because a few minutes ago we were talking about deploying, just hitting enter and having my app deployed here, deployed there. What about. There's still like the model of the application, the universal data set that we're dealing with.

How does, what does the stance that fly takes with co-locating and syncing datas data databases, is that totally up to developer? Are there APIs and features to help deal with that?

[00:09:39] Dov: So the team that I'm on right now, actually that's sort of our, our North Star, is how do we take inherently stateful applications, in particular databases. This is a team born out of what used to be called the database team when I first started working on it. and our goal is to say what primitives can we offer developers?[00:10:00] 

That make it really easy for them to take the DB that they want to use and distribute it across the world and co-locate their data. what primitives can we offer developers to make that so simple that it's it's trivial to explain. And so it turns out there are a lot of things that, we can do to make that a better experience.

And a lot of them are stuff we came in with, like our private networking. every device in your, or every machine in your organization is automatically exists on one wire guard network that talks to each other. and we have our own DNS servers that serve everything. And so you can just talk to these machines from each other as if they were, right next to each other.

And we handle routing them to wherever they are in the world. So that's obviously an interesting primitive, but then, as you get further into it, you look at managed database offerings and you say, what are those doing right? And how can we make it easier for users to build that for themselves?

And so a lot of that is alerting or better [00:11:00] storage capabilities or backups or different UX or explainers in docs. So our, what is our stance on databases? You should be able to run them on Fly ao and we have, what we call, it's not managed Postgres, but it's an automated Postgres on fly where 

If you type fly pg, create, we'll create a database cluster for you and you can specify where you want that replicated, what size you want, and then we will create that app for you. But the cool thing is it's just another fly app. We just pre-ID all the configuration.

[00:11:36] Paul: Under the hood, there's nothing special

[00:11:38] Dov: There is nothing special about it, in fact. Yeah. The image that we use for our Postgres, that you get when you do fly pg create is open source, so you can, everyone can look at it. It's basically a Docker file and some go code that we wrote to glue it all together is what runs in the vm. Otherwise, it's just Postgres.

[00:11:59] Paul: And [00:12:00] fly is, like you're saying, you should be able to run a database on fly. You should be able to do a substantiated workflow, whatever you need on fly and to drive that point home. I have some colleagues that are running, I don't know if you're familiar with Temporal, we've had them on the Log Rocket podcast as well, but we're running a whole hosted temporal cluster on fly The workers, the contr.

it's amazing. and it's a beautiful experience.

[00:12:25] Dov: yeah, so I was working with a customer the other day who runs, temporal On Fly and that was really cool to see. Cuz to your point, It's in an excellent little micro version of why we think this is really cool. because, it uses a lot of the awesome platform features.

It uses our internal private networking to connect them with ease and, You don't have to go stand up your own service discovery and figure that out and you can scale it with a command and all of that. And so that's de it's definitely a cool use case. And there are definitely, there are a few people doing that.

[00:12:57] Paul: And it's neat to hear that [00:13:00] you're working on this database team. I. I'd love to learn a little bit about the almost application layer. it feels like a lot of service providers and backends are getting into. Before we do that, I'll just remind our listeners who are still with us that this podcast is brought to you by Log Rocket.

So if you're developing an application, you wanna spend more time writing your app and less time in the Chrome debug tools and debugging. Definitely check out Log Rocket. It's gonna help you improve your user experience of session, replay, and AI that can help surface analytics and patterns that would be difficult to detect without it.

So go ahead, head over to log rocket.com today and you can try it for free. Dove. Let's talk a little bit about the application layer, because you, so you're mentioning you're on this database team and you're North Star is I don't want people to have to think about replicating their data. I don't want, I, just everything will be synchronized.

You're not there yet. You're saying that's the North Star you guys are working

[00:13:55] Dov: Yeah, people will have to think about replicating their data, but it should be as simple as, [00:14:00] ultimately, like if we do our jobs perfectly, it should be as simple as setting up like a cluster of your database all on the same machine. It should feel exactly the same, so anyone who has a bit of database experience can go do that.

[00:14:14] Paul: Now when I was checking out your blog, which if anybody wants to hear more about Dove, you can go to, it's such a good name, by the way. dove.dev. It's got all his musings there. what a great name. you have a post about remix. Being a fan of Remix Myself, I was like, okay. Let's, let's talk a little bit about web frameworks and type of applications you're developing. And that led me down the rabbit hole of looking, learning about Phoenix and Elixir. there's a blog post that some colleagues of yours, I'm sure you've seen, put out at Fly about how good Phoenix and Elixir lend themselves to building apps that fly. And it reminded me of Remix because it blends this H T T P layer away with the loaders and stuff.

So could we talk a li little bit about Phoenix and Elixir and why they're, they become a popular like palette of colors for [00:15:00] people to build with in the fly ecosystem.

[00:15:02] Dov: definitely. Even though now we have these goals of trying to be able to run, databases and STA applications. The original point of the company was run applications close to your users, And that lends itself really well to something like LiveView, right? Because if you are doing, if you're using web sockets to pass messages back and forth to the client and get, a response from the server and do that, The closer your server is to your users, right? The faster that whole experience can be.

And so it's like the perfect idea of why running things close to your users is valuable because instead of. Having to establish a web socket connection from a user in Hong Kong to US East One, which is obviously going to be slow. That's not a good experience. you lose a lot of the magic of Phoenix Live view cuz then you have to start doing hacky JavaScript and you don't.[00:16:00] 

Right, but the reason that, that's not an issue people really run into and fly is because we just offer so many regions that you can run your app in those regions near wherever users are. And so pretty much, 99 times out of a hundred when your user requests your app, they're gonna be fairly near one of our, one of our data center locations, and we'll be able to serve that to them extremely quickly.

[00:16:29] Paul: So it's the closeness of your users to your app and that's it really. Lets something like live view, shine, 

[00:16:35] Dov: absolutely.

[00:16:36] Paul: I, so what is LiveView if I brought it up? Because it reminds me of the remix paradigm where I have a loader and I don't have to think about a T DP and requests.

I'm not sure if that's the correct lens to step into the live view bubble, but I'd love to learn about LiveView and what makes it so special.

[00:16:52] Dov: Yeah, so I am definitely not a live view expert and so I'll plug the Fly io blog fly io slash blog. Cause we do [00:17:00] have several live view experts including, Chris McCord, the author of the Phoenix Framework and Live View. but so LiveView operates on the idea of there's, it runs. it's an elixir application, which runs on the Ur Lange vm, which has, very, efficient green threading.

And so you can have an erling process per connection that is stateful for the life of that connection. So you can do things like when a user presses the button. Instead of having to send an API request, associate that button, press with a given user in the database, do all that, calculate some new things, send it back you it.

Instead, with live view, you press a button and it communicates with that like stateful process, which already has all this state in memory. Maybe the button is deleted to do and it already has the list of to-dos in the live views memory. And so it can just operate on that and send a patch back [00:18:00] and that'll get changed and and so it allows you to write a lot of complex logic just where your logic lives on the server, but have that really seamlessly integrated with the client to make a really nice experience.

[00:18:14] Paul: So the session's almost like the process, the thread for that connection.

[00:18:19] Dov: Yeah, because that's one of the very powerful things about the Linge VM is, you can have all these, linge processes, which are not system processes. They exist purely, within the Linge virtual machine. But because they're so lightweight, you can do something unique, have a persistent.

Connection for a given user while they're using that app. And that lets you do some really cool things. In fact, if there are any people that have used flat io slash dashboard, the dashboard application is a is a Phoenix app running live view. And so all the stuff like the live tailing of logs that you can see there, or you'll notice if you create an app on the cli, it'll be instantly updated [00:19:00] on your screen.

All of that is, not insignificant. Interactions and bits of ux. live view makes, really simple and fun to work with. Even for me, who I'm not like an elixir person. I'm not, someone who knows a whole lot about that, especially not when I started this job.

But it's definitely been like, a super nice experience as I've been learning it and jumping in and figuring it out. It's super ergonomic.

[00:19:26] Paul: It almost feels like this is another way for us to s to heavily step into the powers of the server rendering. of our web applications. Do you feel like this is fundamentally a different way for people to approach web development than, if you went on YouTube and you searched up like the next JS 13 whate, whatever's out these days, is this like a different way for people to learn about web development and is it, would it cause fragmentation if people are stepping into this world and they're like, I wanna try elixir and classic ssr.[00:20:00] 

[00:20:01] Dov: I think. I don't know that it's a fragmentation. I think if you look at something like Next or remix, both of those next, remix from its exception rather. And next now with, in the last two versions are really stepping into this idea of, there's some things that the client does really well, but there's some things that actually the server can do better.

And so instead of trying to figure out how to. try to hack that together and do something that you know isn't particularly ergonomic and is prone to errors. What if we just did the thing where it was best to run it? So fancy client side interactions definitely should be built in JavaScript, but maybe, data intensive queries or rights or something should happen on the server.

And so a lot of the tooling in remix. And now next is about that, right? How do I make it really easy and seamless to kick off an action on my server and just have that show back up in my front end? So I think it's [00:21:00] part of this whole, idea of, run your code where it makes the most sense, which is why I think, Florida Day is so popular for, A whole bunch of clear works, Phoenix and live view specifically is because, you get the power of running the code where it makes sense on the server for these types of interactions.

And then you also just don't have to worry about the latency that comes with trying to round trip around the world. so you sort of get the best of all worlds. And then if you want to build complex client side interactions, Which we do in our dashboard. For example, our billing, dashboard has a bunch of fancy graphs and that's done in JavaScript, cuz that obviously makes more sense to do on the client, whereas the, fetching data and all that makes the most sense to offload to the server.

And so you can still do it. But I think the difference is we're now arriving back at a point where it's easy to run your code, where it makes the most sense at the edge close to your users in some cases when it's running on the server.

[00:21:59] Paul: What I'm hearing [00:22:00] from you Dove is that the whole elixir paradigm that we're talking about here in, in Live view, we're making it more. Deliberate and thinking harder about does the server or does the client run this better?

And in that lens, like we're, it's not fragmentation, we're making, like just being more pedantic about these decisions.

[00:22:19] Dov: Yeah, we're just, we're being conscious of the decision. I don't know. It's I don't know that it's, fragmentation, like you said. I think it's more, it's like an intentional splitting of, concerns to what makes the most sense. And I think to your point, like live view, remix next, The hot wire in rails.

All of those are like different answers to that question. And I think they all have different merits and we could talk about that forever. But I think all of those are served by, again, running those apps close to your users. you can tell there's a particular

[00:22:51] Paul: Yeah, there's a theme.

[00:22:53] Dov: Yeah.

But we think it's a really powerful concept. because you know everyone at, when you get to scale, every company [00:23:00] eventually tries to solve the problem of how do I get this close to my users? if you're in the most extreme example, the Google's in Facebook of the world, you noticed when they first started getting big, they started buying data centers, trying to figure out how to make these massive distributed systems so that users wouldn't be, users in use Hong Kong frequently as a region example, aren't round tripping to the one data center in the us.

And so it's clearly super powerful, primitive, but it's traditionally very hard. And so that's why we think it's like such a powerful primitive to be able to make it really easy for anyone to run anywhere. And we don't even know all the things that people, can do with that. And we find new things that people do with it every day, which is awesome.

But yeah.

[00:23:45] Paul: Yeah. What's something that people, they use it. They're like, this is great. I want this next level. Of functionality. I don't know if that's a piece of hardware. I don't know if that's like a different way for you to cross [00:24:00] account your regions and stuff. I'm, I'd love to pick your brain about anything people have asked for that you guys are working on or that you're excited to share with the community.

[00:24:08] Dov: Yeah, so I think we're, we have several different pieces in play. So the big thing that a lot of our resources are devoted to at the moment is, apps, V2 migration.

We're moving off Nomad the, orchestration software from HashiCorp. And moving to our own solution, flyd. So that's where a lot of resources are going and that is helping us answer some of those questions, of, now that we have really more specific control over everything that happens in our platform, what can we do for users to make things really interesting and really useful?

So a common one we get is, I wanna figure out how to address specific machines, but I don't necessarily know their id. So something cool that, I worked on [00:25:00] with a few colleagues of mine the other week is you can update a machine's metadata from within the machine, by hitting an a p I socket, UNIX socket in the route directory.

Which will magically update the metadata and because we own the whole stack, it's, we can make that safe because even if someone pulled the token from memory, that token is locked to only come from one specific machine. because we own the network stack, we know where everything's coming from.

And so that's the type of thing where, it's how can we creatively solve a user problem, right? I wanna be able to address specific machines or groups of machines in a really generic way that let people build all sorts of stuff. you know, already we're using metadata to control routing for databases to route to the primary.

That's like a frequent, that's an example of something that we are starting to build now that we have, we're gaining this like [00:26:00] really control of the entire stack now from top to bottom.

[00:26:06] Paul: So we're talking about, yeah, developer tools. And this is something that like we could theoretically use within our run times what you're saying, look at machine metadata.

[00:26:15] Dov: Yeah, and it's just H G T P against a Unix socket. and so again, we try to keep it as generic as possible and When a user asks for something, we try to be very conscious about understanding what the problem is they're trying to solve.

Because once we can answer that, like they want to be able to direct certain machines. okay, so they need metadata, but what's that trying to solve? You know? And it turns out, for example, in databases, which we use as an example, internally, a lot We wanted to be able to update the metadata at runtime to denote who the new primary is. And so that's an example. And once that sort of clicked, we're like, why can't the machine just update its own metadata? And so [00:27:00] then we were able to solve the underlying problem and also deliver something that the user wanted, if, even if it didn't look exactly like what the user was asking for.

[00:27:11] Paul: And anything else on the near horizon, new hardware that you guys might be coming out with, new regions that people have been asking for.

[00:27:20] Dov: We are constantly coming out with regions. Our infra ops team is best in class. so we are spinning up new regions constantly, which is super cool. as we've been growing, we definitely grew faster than we anticipated in this last, six months, which is an awesome problem to have. And so we've been adding regions to meet, user demand and expanding our existing regions.

A cool thing that I can tease is our G P U offering, which is coming really soon. You know, GPS close to your users. Same idea on our platform. And so we actually have a G P U vm, the first one in the hands of a user, yesterday. And so that'll be coming soon?[00:28:00] 

[00:28:00] Paul: Wow. So now you could do like video rendering close to your users game, rendering that

[00:28:05] Dov: Yeah.

[00:28:05] Paul: mining. Do you guys have restrictions on mining? Can we mine on fly instances?

[00:28:10] Dov: there is an ongoing conversation about what does abuse look like when you're dealing with GPUs because we've, you know,

[00:28:15] Paul: a good question.

[00:28:16] Dov: It's, yeah, it's a constant game of cat and mouse, that we've, we settled into a rhythm when we're dealing with what the most they could be doing is abusing our network and eating up c p U cycles.

But we obviously are something we're figuring out before we open this to the, public. It's a

[00:28:35] Paul: an open problem.

[00:28:36] Dov: the answer yet. Yeah.

[00:28:37] Paul: yeah, for sure. dove it, it was awesome having you on and learning about fly. I, I know you guys, you said you've had unprecedented unexpected growth in the past six months. I'm sure there's more to come. As people get a metallic taste in their mouth from the existing services and solutions, and they see something new.

Out in the market. like I mentioned in the middle of the podcast, if you want to learn [00:29:00] more about Dove and his musings blog post, there's dove.dev, which is his website. And do you post online anywhere else like mediums? Twitters.

[00:29:10] Dov: my, yeah, I post online on Twitter at dove underscore, but the word underscore spelled out because I couldn't get the gist do underscore it was taken.

[00:29:21] Paul: if people want to keep up with fly, we can go to the fly homepage and fly.io/blog. You mentioned right.

[00:29:29] Dov: And also the flat homepage has links. We have blogs specifically for the big platforms as well. So we have a blog for, details about running Phoenix applications, details about running, rails applications. We just, hired some people now working on, Jengo. So that's a whole new section.

So if you're interested in any of those or that's what you work on, we're rapidly working to make those experience for specific platforms. Excellent.

[00:29:55] Paul: Love to hear it dove. It was a pleasure. Thanks for coming on.

[00:29:59] Dov: Thank [00:30:00] you.