Kate: Hello, and welcome to PodRocket. I'm Kate, the producer of PodRocket. With me hosting today is Noel. Hello, Noel. How's it going? Noel: Good. Good. How are you, Kate? Kate: Good. Thanks for joining us. And our guest today is Kshitij Gupta with... Or excuse me, Kshitij Gupta, CEO and Co-founder at 100ms. Hello, how are you doing? Kshitij Gupta: Hey, folks. Hey, Kate. Hey, Noel. Kate: Thanks for joining us today. Yeah, so if you can just get started telling us a little bit about what is 100ms, and your work that you've been doing on it. Kshitij Gupta: Absolutely. So little bit about myself also because kind of 100ms is something which we started as a co-founder. I say it as a co-founder. So I mean, before starting 100ms, I used to work for Disney+ Hotstar as a video engineer. My job was to keep video working for Disney+ Hotstar. And before that, I used to work for Facebook and my job was there to kind of build infrastructure for Facebook Live. So I've been working with video infrac for quite a long time. And when we started 100ms, the sole objective was for 100ms was to kind of provide the same infrastructure, which big companies have internally, but like startups or other developers who are not video developers, they kind of struggle to get that kind of infrastructure. So our sole aim is to kind of democratize access to those kind of APIs so that every developer in the world should be able to make video applications pretty fast. Noel: Nice, nice. Very cool. So, I mean, maybe just to help listeners understand a little bit better, what is 100ms offering to a developer? Who would the typical customer be? Kshitij Gupta: So anybody who wants to embed video, and we have started with video conferencing as a piece, but in general, we want to kind of encourage anything in the video. So if I take any developer who want to build a video conferencing inside their own application like we are doing right now in the studio, right? Similarly, if any... So let's say, take, for example, an ed tech company who is doing online education. So they want a video interface for their student and teacher, or take, for example, a telemedicine app or a dating app, right? All of these applications need a embedded video experience inside their app. Right now, what they're doing... A lot of them are doing is they are giving out links inside their application to go to a Zoom or a Google Meet, but that kind of... The moment you leave the app, you have lost your user, right? And these companies want to embed it inside their application, that is what we are offering as a service. Noel: Nice. Nice. Very cool. Yeah. Yeah. My next question was going to be like... I feel like the typical pattern is people send out Zoom links. Are there other cool feature that 100ms offers that I guess empowers developers to make that video experience more rich, more like featureful? Kshitij Gupta: So Noel, maybe I can come to features, I think, but the typical journey of a developer is that like Google Meet and Zoom has worked really, really hard to make video work, right? So all of the... The problem with video is like video is a magical thing, which works, and a lot of us don't know how it works basically, right? And the problem is that the moment you start developing over it, the first roadblock you hit is the video quality problem. Like you were saying right now, that video will freeze in-between, it's okay, right? So what happens is the features usually are at a kind of a second level thing. Kshitij Gupta: The first level thing is make my video work with high reliability. I think that is the first step, and if you kind of try to do it yourself, it's easily six months figuring out what kind of settings should I be using for the bitrate for the video, right? It's... Okay. What is a corollary I can give? Something similar to a payment gateway, Noel, right? Payment gateways sound trivial, but when you say, okay, Forex was in split payments and whatnot, right? All of these things make it so complicated. Everybody wants payment, but very few companies in the world build payment. It is similar to that in terms of the features. Kshitij Gupta: Having said that, the other features, which we typically see our recording. So one of the things you are also doing is recording. And I think Kate is doing a local recording, but a lot of customer says, "I want a cloud recording." So let's say, somehow Kate's machine gets disconnected, the recording should still be preserved, right? The other thing which people request for is, let's say, three of us are talking right now, right? Can this be broadcasted live to a YouTube or a Facebook, right? Those are another features, right? And beyond that, it is a lot about interactivity. So a student in a classroom, teacher will say, "Can we have a whiteboard?" A doctor in a patient might say, "I want to embed my IoT device, which is measuring the temperature, can you do that somehow?" Right? So it's more about that, every vertical, you need to have that integration for that vertical. Noel: Yeah. Yeah. I think that's kind of what I was getting at more with my question was like those in integrations that are vertical specific, like ways that kind of close connection with other features of the app is often lost when you have to click out do a Zoom link or go to a Hangouts call or something like that. Yeah, so it's super cool. I was looking at 100ms kind of... The APIs and how you can use them before I jumped on, and it did seem pretty intuitive. It kind of, back to your point, made me feel how I felt the first time I was like going through Stripe's APIs. And it was like I understand the elegance here, this is an exercise, and edge case handling that I don't want to have to figure out all of these myself. Yeah, so that makes a lot of sense to me. Noel: You touched upon something interesting there, which was how much work it would be to do this in-house for these teams that are trying to do it. I understand the appeal of that. For something like payments processing, it is hard to build all of that logic in-house, right? But for something like video, it seems slightly different than payments processing because I feel like video and broadcasting and sharing is more of a technical problem, where payments processing is, but it's also a legal and more of a business problem that's hard to overcome. So the value prop is evident there. But we see these other technologies like WebRTC, and then things built on WebRTC like PeerJS. There's a couple others in that space that are trying to make that easier for devs to kind of spin up themselves. Do you think that there's shortcomings of that kind of new wave of peer-to-peer video that are still going to pose challenges for devs trying to stand that up in-house? Kshitij Gupta: It actually depends on use case to use case, Noel, right? So if only two people are talking to each other, PeerJS is great, right? It works pretty good. All there was... Right? So, I mean, although there are some kind of still problems if you want to record and all that, right? Typically, what we have seen is that things break when you want to scale up. So imagine a 10 people conference, right? In a 10 people conference, what is happening is you are maintaining connection with nine other people. Everybody is maintaining connection with nine other people, right? So that means you are uploading your video nine times over those peer connections, right? And that does not scale well, right? It basically consumes a lot of networking usage. It consumes a lot of CPU, right? So typically, what we have seen is when you have to optimize these things that means you're going to push things on the server side, right? And the moment you go on the server side, that is when it start becoming more and more complex. Noel: Mm-hmm (affirmative). Yeah. Yeah, totally. I guess that probably leads to another good question of is 100ms better suited... I guess, in my head, I kind of categorize like video technology into two camps. It's like small groups where everyone's talking to everyone like one to 10 people, and then like more of a broadcast setting like for conference talks, and stuff like that. Is 100ms better suited for one of those or does it handle both pretty well? Kshitij Gupta: Yeah. So thank you for asking that question, Noel, right? So let me just rephrase that question, right? So you're saying, what use cases 100ms is best used for? Is it more for broadcasting or is it more for the conferencing? And so Noel, there is slightly nuance answer here, right? So what we have figured out is the size of the room depends on your use case. So I'll give you some examples, and I'll take only one example and keep extending it. Let's take that example. The simplest use cases, a teacher talking to student, one on one. Then the next example comes is a teacher talking to 10 students, right? The moment it goes beyond 10 or even, let's say, 50 students, what happens is then it kind of becomes one way or a webinar kind of style of conferencing, right? Kshitij Gupta: Even beyond that... So, let's say, that these ed tech companies, sometimes they try to get a star teacher to build their top of the funnel product, right? Where they're trying to do a free session with a star teacher and want to broadcast it to 100,000 people, right? So typically, what we have seen is a lot of companies want to start with this small use case, but it ends up becoming bigger and bigger and bigger, bigger, right? WebRTC starts breaking after a certain limit. So if you go beyond thousand... So that's why Zoom does not allow beyond thousand people. Noel: Right, right. Kshitij Gupta: So what we have done is we have actually merged the streaming and conferencing together. And we are now allowing developers to basically kind of move from streaming to being on stage seamlessly. Why APIs again? There is a latency difference. So when you are streaming, there is a latency of 10 seconds. When you are talking to each other in real time, the latency is less than one second. Noel: Gotcha. Gotcha. So is that... I mean, I'm not sure how deep into the tech we can get here, but is there a handoff that happens on your guys' site technically when you go from kind of these smaller, peer-to-peer-esque groups to something more broadcast-oriented? Does the server become like the hub when you hit some threshold or does that happen right away? Kshitij Gupta: There are two... So we call it circles, Noel, right? Let me give you that concept of circles first. The first circle is what we are doing right now, talking to each other by directionally. Noel: Yeah. Kshitij Gupta: Okay? And basically, this breaks somewhere at the limit of hundred, that beyond hundred it becomes super complex to deal with. Then there is another circle, which we call it as circle two. Imagine, we are talking here, and let's say, thousand more people want to or 10,000 more people want to listen to us, which is a clubhouse model if you think about it. What happens in a clubhouse? There is a stage, right, which is the circle one, and then there is a circle two, which is the listeners still listening in real time where they can raise hand and can become a speaker at any given point, so that's a circle two. Kshitij Gupta: And then there is a circle three, circle three means that this setup... Or think of it as an auditorium, right? So auditorium has a stage. Auditorium has certain people who are sitting in the auditorium and there is a circle three, which is, imagine, people watching on their TVs, which are higher latency, right? So this is the concept we have kind of now, put it in the APIs, where the server is taking care of all the multiplexing and sending streams to these people, but we have left the... What do you say the business logic to the app? And apps are using it in really interesting ways. So we were talking to one of the event companies and they said, "You know what? We have a concept of a front row ticket and a regular ticket," if you're watching a show. So the front row ticket is circle two and a regular ticket is circle three. Noel: Gotcha, gotcha. Nice. So are these circles something that the developer that's developing against 100ms has to think about, or is this just kind of out of the box? Kshitij Gupta: It is completely out of the box. Noel: Gotcha. Kshitij Gupta: Right? Noel: Gotcha. Kshitij Gupta: The way you define is we have something called a template where you go define these roles as templates so you can say something called a stage, a viewer, and let's say, in a overflow room. So stage is circle one, viewer is circle two, and a overflow room is the circle three. And it's actually, now, in your terminology. So when you join a room, you can say, "I want to join as a stage person," or "I want to join as a viewer person." Noel: Got it. I guess, maybe to help me paint a clearer picture and for our listeners then, so is there a UI layer which 100ms is providing or am I getting like raw objects that I can then kind of put in my page, however I want? I can like rename things and give it labels. Kshitij Gupta: There is no UI layer return which 100ms provides. Noel: Okay, gotcha. Kshitij Gupta: So UI is given as an example code, but everything is abstracted in APIs. And for web, we have done a React store as well with books and all that stuff. Noel: Nice, nice, very cool. So even if I did end up using the terminology for these circles that you gave me out of the box, I could still like put whatever label on them I wanted- Kshitij Gupta: Absolutely. Noel: ... in the UI, and it wouldn't hurt anything. Got it. Kshitij Gupta: Yes. Noel: Got it, very cool. Cool. So yeah, I mean, that sounds like an intuitive kind of way to layout the design of video conferences that scale from small groups to giant broadcast thousand-person events. Did your background in history kind of lead you to this architecture or did you have that in mind when you started or did it come about over time? Kshitij Gupta: So the background helped because I knew of the streaming world. At Facebook, we were doing only the circle three. Noel: Gotcha. Kshitij Gupta: Right? At 100ms when we started, we were only doing circle one, right? As when we spoke to these customers, right? The customers are like, "Can we do this? Can we do that?" And as of now, I have to use two different SDKs to do this. Let's say, if I have to talk to each other, I will have to use PeerJS or something else, right? But if I have to build a streaming out, then I have to use something else, right? And handling these two SDKs is making it more complicated. It's not reliable that way, right? That's where we heard our developers and we said, "Okay, let's combine it together." Noel: Right. Nice. Yeah, yeah. That's awesome. Let me think how to phrase this. Was there specific problems that you guys encountered when designing the SDK and the APIs now to make it conducive to these multiple circles? Is it hard to design around that problem where you might have some clients who are like in a slightly longer latency, and they're kind of acting as more of a just like a broadcast-y versus someone who is in the conversation, a broadcast or in circle one? Kshitij Gupta: Can you maybe substantiate with an example or something? I- Noel: Yeah, I'm just trying to think if I'm a developer and I'm writing code to handle these cases, I could foresee instances where I might care if a user is in circle three versus circle one because their user experience may be different. Is that something that's hard to design APIs around? Or is it easy for me as a developer to determine which circle a user's in and decisions accordingly for my app? Kshitij Gupta: That's what I... Noel, I said, basically, what happens is we allow you to define something called roles. Noel: Gotcha. Kshitij Gupta: Right? And as roles... So let's take the example of stage or actually, we can take the example of here, right? In here, we had something called a green room, right? Where you first join the green room, right? Where you are getting prepped and whatnot. So let's say, you have a green room. In a green room, you are still not... So let's take the roles, right, in this [inaudible 00:18:09] tool. You have a green room, you have a stage. So maybe, let's say, whatever we are talking right now is a stage. Kshitij Gupta: There could be viewers or call them, let's say, viewers in real time, and then viewers on streaming. So let's say, there are four roles, right? And we allow you to define these roles using APIs, and as well as dashboard, where you say, in the green room, only people in green room will be able to see each other. The stage, people will not be able to see the green room, neither do the viewers, right? When you are on stage, the green room will be able to see you because the green room wants to figure out that when is my entry time. Noel: Gotcha, yeah. Kshitij Gupta: Right? Noel: Cool, mm-hmm (affirmative). Kshitij Gupta: So this is what we say. And so since we are allowing you to define your own roles, it's your business logic. You already know which circle is what. Noel: Gotcha. Gotcha. So functionally, I don't, as a developer, need to care about the app like dynamically switching people in and out of those roles as they do... Or in and out of those circles as they do it, for me, it's just like I can move users between roles. Everything else is handled kind of elegantly behind the scenes. Kshitij Gupta: I wish we had a screen share here and I would've shown you the demo. Noel: Yeah, no, that's okay. That's okay. We'll do our best here. Yeah, yeah. I think we're painting a pretty good picture. Cool. So I guess kind of from there while we're in the nuance of APIs here, maybe we could delve a little bit more into how integration actually works if somebody wants to start trying to implement video conferencing or video chat into their app. What is the minimum I need to do to kind of get a simple web conference spun up in my web app? Kshitij Gupta: So it takes, I would say, less than five minutes to build a video conference, right? So all you do is you sign up on 100ms. If it's just a video conference, we have a pre-built template already of label. So you choose that video template, and voila, I mean, in five minutes, you're done, right? In case you have a slightly little complicated use case, like we said, the green room and the viewers and whatnot, since we are allowing you to change that template... So the step is, first, you build that template. You define... Kshitij Gupta: So think of a Docker basically, right? So you're defining your... Like you define the Docker... I need a web server, I need a database, right? Similarly, you basically configure your video application saying, "I need X, Y, Z roles. Do I need recording or not," right? "Do I need it to be streamed? Do I need to IT empty out to a YouTube or whatnot," right? All of these things you define on a dashboard, so that's what we are calling template. And once you have defined this, I would say 80 to 90% of your logic is already defined in the SDK. And now, what is left for you is to design your UI. Noel: Gotcha. Kshitij Gupta: Right? Noel: Mm-hmm (affirmative). Kshitij Gupta: And leave a large part of the implementation just to the SDK because think of it this way now, right? So let's say, you have designed a green room, right? And the way green room is designed is on the top right, it shows the stage, and below it's basically all the people in the green room, they can also talk to each other, right? So you design your UI. And within that UI, in the SDK, all you have to do is you instantiate that template, and then you say, "Join as green room role." Noel: Gotcha. Gotcha. Nice. Cool. Kshitij Gupta: And when you join as green room role, the SDK will give you all the callbacks, okay? Within green room, you are supposed to see these videos. So as a callback, we'll give you all those video pointers, as well as all the information that who are all the people, are they muted or not? All that information is given to you as callbacks are the hooks basically. Noel: Got it, got it. So yeah, if their callbacks or hooks, are these all strictly talking to a client-side SDK? If I had server events that needed to happen when somebody clicked a button, is there a way that call happens or does my client need to relay that back to my server? Kshitij Gupta: For the server events, we have something called webhooks. Noel: Gotcha, sure. Kshitij Gupta: So, any- Noel: Gotcha. Kshitij Gupta: Yeah. So anything you're interested to know on the server side, we can kind of send you webhooks to your server side. Noel: Nice. Nice. Okay. That's an easy answer. I can handle those cases pretty elegantly. I think... Okay. The last piece in that I need to fully grasp is, I think, the authentication piece. How does 100ms know who is who when they connect via the SDK? Kshitij Gupta: Thank you for asking that question, so it's really very simple. Actually, I missed that point. Once you define these roles, people will join the application first, right? They will be authenticating with their password or phone number or whatever, basically, right? Once they're authenticated, the host application knows this person is who. They already know whether this is a teacher or a student or green room or whatever role, right? You basically create a JWT token, put that role inside that, and to 100ms, you're passing that JWT token. So the role is embedded inside the JWT token, SDK now knows which role are you, and it's a JWT token, so we know people cannot tamper it with. Basically, the JWT token is created by the customer server. Noel: Got it. Got it. So I created JWT. I put the role information in there, pass to the client, client passes it up to 100ms. You guys know who is who. Is there any additional info in that JWT that I can send? Is all configuration done via that JWT? Or is there additional info that's just sent like as part of the payload of the body? Kshitij Gupta: There is one more thing you can send in the JWT, you can send your own user ID. Noel: Got it. Kshitij Gupta: So whenever we call webhooks, right? Or whenever, let's say, on your other interface, you might have something called your avatar, let's say. Noel: Right. Kshitij Gupta: Right. So you need to look up in your own database. And since 100ms is the one who's transferring this information of the role, and the ID to you... We have two IDs, one is a 100ms ID, and one is the application ID, which is also given in the JWT. Noel: Gotcha. So, yeah, I guess... Okay. So the use case now that I'm thinking of is, say, we have avatars, right, for each user, and they can set their own avatar up. If I want to display those avatars to users when a 100ms connects them as a peer, am I able to do that on the client with just the data that 100ms is providing me, or do I need to then on each client use the ID of my peer that a 100ms just gave me, and go back to my server to figure out what that avatar should be? Kshitij Gupta: Both ways are possible, Noel. Noel: Gotcha, okay. Kshitij Gupta: So we have a mechanism called a peer metadata. Noel: Cool. Kshitij Gupta: And what you do is when I join, at that time, I will fetch the JWT token as well as any other information you want to put, could be your avatar or whatever information, right? All that information, if you give it to us, along with the JWT token, we set it as a metadata on your peer. And whenever that peer joins all that information is available to all the other people in the court, that is one method. Other method is you just give us the ID and on the ID, you look up in your own database, but most people actually are preferring the peer version because peer version is only one fetch- Noel: Saves a call, yeah. Kshitij Gupta: It saves multiple calls because otherwise if there are 100 people in the call, there will be 99 calls- Noel: Per user, yeah. Yeah. Yeah, very cool. Very cool. Nice, cool. Thank you for going down way into the weeds there a little bit with me. I appreciate it. Yeah. Yeah. We got pretty low there. Yeah, let's come out a little bit. I feel like my first introduction to 100ms was a couple months ago probably at this point when one of my coworkers sent me a link to your guys' Versal template that was going out for hosting a conference. Can you talk a little bit about that? What that is as a product in your story to getting it built? Kshitij Gupta: Yeah, sure, absolutely. So this actually idea of... And I must thank Versal for giving us continued support on that. I think actually they only encourage us to build that. So what happened is I was looking at the Versal templates and they had these two, three templates, which were like at the top, right? One was e-commerce and one of them was the events template, but the events template was only a... If you click and deploy it, it's actually a YouTube video which plays. And so we spoke to Versal team and we were like, "Can we extend it, right?" How would we extend it to actually make it a hopping clone, right? Kshitij Gupta: And that's how that discussion started. We got a few kind of open source contributors also, kind of being excited about that, and that's that whole journey which got started. And it is actually pretty exciting to us because now we are thinking that this is a good GDM strategy because as you can understand that, to make people understand or developers understand what is possible, the best ways examples. And templates are great examples. And if templates can be production ready, then yeah, I mean, you start with the template and modified to your heart, basically. Noel: Nice. Yeah, very cool. Very cool. What did that collaboration look like? You touched on it a little bit when you reach out to Versal like, "Hey, can we make this a little more feature rich and enhance it," once you'd gotten the answer there, was it pretty easy to work with them? Did you guys just kind of make poll requests, and they'd give you feedback and you merge them? Or what did that experience look like? Kshitij Gupta: It was pretty amazing. I mean, so they basically said, "You own this repository now, go extend it," basically. So it was there another company which was already there. It was, I think, Data CMS, one of the CMS companies. So basically, Versal folks said, "Okay, go figure out. Basically, first, give us a design. Let's first work on a design, agree on a design," so we kind of worked on a design first. All of us agreed on a design. We spoke to the Data CMS team also that, "Okay, if we are changing these interfaces, how would you?" Because in the interface... Kshitij Gupta: Now, in the CMS, we had to have it flag, whether it's say YouTube screen... Because there are multiple stages, stage one, stage two, stage three. And we said, "Okay, let's have the configurability that one of the stages can be made a real-time stage, and another stage could still remain a YouTube stage," right? So we had to kind of talk to the Data CMS guys how to do that, and all that stuff. Once we kind of finalized the design, as well as the interfaces, it was development, that was pretty quick because our team developing it on 100ms, so that was pretty quick. I think the longest time which took was Versal team has a extremely sharp eye on details. Noel: Mm-hmm (affirmative), yeah. Kshitij Gupta: Right? And we are very thankful for that, right? And they're like, "Can we change this a little bit?" And we're like, "Yeah, yeah, yeah." We keep doing that. But I mean, that was the longest part by the way, to finish it to that perfection level. Noel: Yeah. Nice, nice. That's awesome. Yeah, I'm sure that this tail end of that was a little bit nuanced, but I guess, I'm glad you guys were able to integrate with that platform, and I guess, provide an example in that context. So a dev can go and click a button and get a whole app spun up. That's a pretty cool feat. Were there any other specific challenges you guys faced beyond that kind of that last little integration push, getting the code all approved and stuff? Kshitij Gupta: Not at all, not at all. Versal is a great platform, we use it all the time. Noel: Yeah. Kate: I can't really picture a better time than the last two years to be creating a video, a tool for video software. I'm curious, kind of the journey in the last two years with COVID, and also your plans for building out the company in the future. Kshitij Gupta: Absolutely. Absolutely. So interestingly, the company started like bang in the middle of COVID. We started October 2020. I mean, it was like crazy. Everybody was working from home, so we started from home. In fact, even right now, we don't have an office. So I mean, 100ms is our office. We all work on 100ms, we don't use anything else. We don't use Zoom or Google Meet, only 100ms. And that in fact has been really great because I mean, we have been able to double our product all day, all night, and we've been able to iron out bugs and whatnot, everything, right? So that's that part. I mean, it took us long time to build the product, almost like one year just to build the product. It's a complex product, right? I think beyond that we had a product and launch, we saw a good response when developers were pretty excited that it's powerful as well as flexible. Kshitij Gupta: So this was the response we got because a lot of times, a lot of other SDKs, they give you a powerful abstraction, but it is not flexible. It's a prebuilt kind of experience. And when they saw that UI is totally in their control, but it is still as powerful as other SDKs, the response was amazing. So right now, we are working with almost 100 plus companies where we are integrating with them. A lot of them are already live. So I would say, good traction with the developer community. Now, we are kind of trying to set up our enterprise motion also. The enterprise motion is slightly different as you guys also... Lot of it also must have gone through that. It's slightly different journey. So now, we are kind of working on that part where we are hiring our GTM teams. We are hiring our sales teams. So that's how it is purposefully. Kate: Awesome. And we'll include the links to your pages for hiring in our show notes. Is there anything else you would like to point our listeners to, or that they should go check out? Kshitij Gupta: The only thing I would love from the listeners is feedback. I mean, we thrive on developer feedback. Everything we have done so far, designing our APIs, designing all of these interfaces is by listening to our customer. And we just love listening to developers, what they want to build, and also when they built it, what kind of issues they had, what they did, what they like and not like, so that'll be my biggest request. Try it, and we love your feedback. Kate: Awesome. Yeah, we'll include the link to the website in our show notes, and thank you so much for coming on and we will see you around. Kshitij Gupta: Thank you, Kate. Thank you, Noel. Kate: Thanks for listening to PodRocket. You can find us @PodRocketpod on Twitter, and don't forget to subscribe, rate, and review on Apple Podcast. Thanks.