Hanson Ho:
We're not quite there, but I see the bones of it being there. The standard OpenTelemetry itself is capable of being plugged in in such a way that it is extensible to whoever wants to use it.

Eric Anderson:
This is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson.
I'm joined today by Hanson Ho, who is the Android lead at Embrace. Embrace utilizes OpenTelemetry on mobile, which is kind of a novel thing. And this is a spin-off of a prior episode we did on OpenTelemetry. Hanson reached out, we want to go a little deeper here. Hanson, thanks for joining us.

Hanson Ho:
Thanks, Eric, happy to be here.

Eric Anderson:
Give us the elevator pitch, the one-liner on what Embrace is.

Hanson Ho:
So Embrace is the mobile observability solution that completes your full stack observability on OpenTelemetry. We provide the mobile data that pairs well with your backend data to give you a full insight of your entire user journey in OpenTelemetry.

Eric Anderson:
Super. So I want to get into the Embrace story and particularly the open source side of it.

Hanson Ho:
Yeah, let's talk about Embrace. So Embrace, I think the founders, Eric, wanted a better tool in mobile for capturing mobile production errors and issues. So they built Embrace, first class solution capturing Android iOS, Flutter, React Native problems like ANRs and crashes. I think we do a better job than other solutions. We believed it.
But I think recently we decided that in order to really capture the mobile market, we have to not only target the mobile folks, but we have to be part of the entire observability solution for a company. Those are SREs who may not be familiar with mobile. We want the data to be the same. We want us to talk the same language. So our pivot to open source and our pivot to OpenTelemetry is a result of that. We can not only be a nice cool solution for mobile developers, we can also be part of the entire endeavor of observability in an org in order to gain SLOs that are meaningful and includes user experiences. So if you only have observability data in the backend, all you can see is when the request comes into your data center. Everything happening before that, you're not able to see it. How it impacts users one-to-one directly, you're not able to see it. We are hoping to fill in that last piece of the puzzle to provide several dimensions to your backend observability data.

Eric Anderson:
Oh, got it. So today, most observability happens kind of on the API and back side of things. And increasingly these mobile apps are getting bigger, thicker, richer. All these interactions are happening on mobile that aren't easily logged, tracked, observed. You're bringing that observability. Talk to us about where OTel comes into the story. I think one of the things that we got excited about on that other episode is the universality of OpenTelemetry, which has been kind of a recent thing, and it seems to align with maybe your timeline.

Hanson Ho:
Oh, for sure. OpenTelemetry offers a open standard that allows us on the mobile side to talk the same language as our backend DevOps folks in terms of observability signals. Previously, RUM solutions, runtime user monitoring, on mobile typically are captured in proprietary data formats and are not portable. So if you look at Android vitals in Google Play Dashboard, well good luck getting that data out of there. You can see it in their dashboard, but you can't get it out in granular form to link it to your backend data. OpenTelemetry allows us to have a format that when we use it could look the same as backend data and could be linked to backend data. So it is a huge improvement. We are talking the same language now. We don't have to have a translation layer between mobile and backend, if they even talk in the first place. Maybe they didn't previously, partially because it's difficult to link the data together. But now that they are both OpenTelemetry, then we could have magic happening with the linkages between the two.
So we want to be part of that wave because we believe it is what the future is going to be. Mobile apps drive a lot of traffic to backend surfers, certainly consumer apps. You're using native apps websites too, certainly, but we're focused on mobile. And missing that piece is crucial. Sometimes there's context that's not appropriate to be captured on the server, on the observability side. Payload content, things like that is too expensive to parse and store at every request. Well, for us, we do it. We do it on device, so it's not blocking the requests. And by merging these two data sets, we're able to gain insight that you otherwise couldn't on both sides.

Eric Anderson:
So when you went into this kind of OTel journey, was it obvious it would work? Was OTel well-suited for mobile? And at what point did you kind of prove this was viable?

Hanson Ho:
We're convinced it will work, but we're also going in there with our eyes open. So OpenTelemetry is built and used mainly by folks who are doing distributed tracing, complex systems. It's SDKs and it's data models designed to capture the nuances of complicated distributed traces and how to model and get data and get observability from that. Mobile has a few different quirks in terms of how data is stored, transmitted, or not transmitted as it were if you're out of network range. So there are some challenges with the model, but not challenges that we can't overcome, and certainly not challenges that the spec itself can't iterate on and be more adaptable for mobile.
While we are convinced, we don't think it's by no means complete and we need to do the work to get there. And certainly, us alone can't do it. And what's great to be part of the open source community is that we are not in this alone. There are mobile developers everywhere looking for telemetry, looking for basic run stuff, but also looking to do what I was talking about before, completing the full stack in observability for the backend.
And the SDK is just a gateway. The community itself is going to be growing in terms of instrumentation, recording mobile-specific data on mobile apps designed for it. We're not quite there, but I see the bones of it being there. The standard OpenTelemetry itself is capable of being plugged in in such a way that this is extensible to whoever wants to use it. Similar to what it has done in the backend, where there are instrumentations and agents for many different popular frameworks and systems, we believe the same can be achieved on mobile and we want to be part of that. While we have an SDK, there is also an official OpenTelemetry Android agent that's available. And if this goes well, I expect there to be others that are designed for different use cases. IOT embedded devices have a much different runtime profile than us, Android Auto, things like that. So we want to be at the forefront of it. We want to contribute to it. But at the end of the day, it's the community that's going to be driving it forward, and we are very happy to be part of it here.

Eric Anderson:
We often on this show talk a lot about kind of starting an open source project, building the governance around it. In this case, you're kind of of inserting yourself to an existing project in governance. How do you elevate your priorities to the group, and how has that gone with OpenTelemetry? Because presumably you have to kind of steer the of project in a direction that suits you.

Hanson Ho:
Yeah. So this is kind of interesting. So we started off, we built an entire platform that's proprietary. And what we did was we moved it to OpenTelemetry and moved it to open source. So the project right now exists as an independent open source project, because there's already an Android open source project, which actually I contribute to. I wish I could have more time to do more, but sometimes time is difficult to find. But the idea is that we want the existing projects to continue and thrive. We want our project to continue and thrive. So the governance aspect we are looking to, I think set up on our side, but participating in the existing OpenTelemetry SIGs, or special interest groups, and discussions and meetings and talking to folks in the community has informed how we want to do things on the Embrace side on the SDK.
At the same time, we want standardization. So the SDKs ought to talk the same language in terms of what does a crash look like when we model it as a signal? What does an ANR look like when we model it as a signal? So I think having the community on board, having the project on board, the existing governance of that, existing, allows us to ride sidecar a little bit for our project, but at the same time help each other. So this is a symbiotic relationship for open source and for OpenTelemetry on mobile to be successful. There can't just be one agent. There can't be one SDK. It has to be a number of SDKs.
And it's a learning process because I'm not an open source person. I know a lot of folks are, they work in open source and they started open source. I started 20 years ago in this industry and people were writing code and putting it into their Visual SourceSafe or Perforce repos back then and not open in GitHub and GitLab and things like that. So for me, it's a welcome transition to the open source world where my code is available for everybody to see, scary and good, and also issues discussed, bugs discussed. So it's a learning journey for me as well, doing this this way.

Eric Anderson:
I love this vision, Hanson, that you've painted of taking OpenTelemetry to all these new frontiers. And I wonder is there a trade-off? There's the Unix philosophy of do one thing and do it really well. As you take OpenTelemetry to all these areas, is it in some ways suboptimal in these new areas? Or maybe could it render OTel suboptimal in its core areas? Is there a trade-off there? How do you navigate that trade-off?

Hanson Ho:
Yeah, for sure. I think the spec itself, it'd be nice for it to adapt to our use cases. I can certainly see our certain suggestions may not be looked upon well because it changes some fundamental truths about the project, and this is the kind of area we have to navigate. And just because it's not perfect doesn't mean we can't use it. And I do like the ethos of do one thing well, which is why I advocate for multiple SDKs, even if it's in single platform. Having instrumentation that has decoupled that you could a la carte, pick what you want and include and have it work seamlessly together, that's part of the vision.
And instead of having these monolithic, gigantic SDKs that do everything, we want things to be composable in order for us to maybe on certain devices targeted at low-end devices or emerging markets that don't have the power of your Samsung Galaxy S22, whatever. We can't record that much. Maybe we have to be a little bit more mindful about how much data we use, how much CPU we use. So I think OpenTelemetry has a spec. Hopefully it can serve all those purposes. But I do believe that there will have to be specializations that need to be applied on top of the, I guess the OpenTelemetry core in order for it to work well in individual use cases.

Eric Anderson:
Now you're the Android lead, presumably there's somebody else doing iOS. And as a group, you're aware of the efforts you have to make to kind of either unify or separate them. You've mentioned you've chosen to go separate SDKs. What's that I guess debate look like? Or is this hard to navigate? What's it like between Android and iOS?

Hanson Ho:
I would say there are multi-platform solutions that one could deploy that have shared code. But sometimes the duplication allows us to be a bit more economical and a bit more tailored to the specific platform. So there are certain things that we capture on Android that doesn't exist on iOS in ours, or rather exist in slightly different ways and are less important. So having different SDKs first of all allows us to iterate separately and faster and not depend on each other. But if the core, there's a ton of shared code in the core, it's probably better to not build everything in one chunk. At least that's our philosophy now. Who knows in the future what the power of multi-platform can bring us. But right now there's a Swift SDK for OpenTelemetry, there is a Java SDK. So since OpenTelemetry, the SDK is our language base, having a language agnostic layer would be another challenge that we may not need at this point.

Eric Anderson:
I can imagine somebody listening to this and getting kind of excited at the prospect of doing this. How does one go about instrumenting their mobile for OpenTelemetry or Embrace?

Hanson Ho:
Well, the ideal situation is that you don't have to do much. You just basically pick an agent like Embrace or the official Android one and you select the instrumentation you want. You want your networking calls instrumented, you want your database calls, you want your activity opening various things off the shelf and plugged in. Ideally, that will take care of 80% to 90% of your use cases. So instrumentation is simply dropping the package in, do a couple of configurations, and then set that up.
Now if you are running your own backend OpenTelemetry collectors and things like that, there is a bit of back-end effort. But there are also vendors out there that will happily take your data, Embrace being one of them. But many other vendors out there talk OpenTelemetry: Honeycomb, Perfana , folks like that. So depending on how much you want to spend, there are vendors out there to help.
But on the client side, the advantage of OpenTelemetry is that you're not locked into a specific vendor. I know we're in a very challenging economic environment right now and cost observability is at the forefront. If you are locked into a big platform, sometimes it may be expensive to get off. If you're on OpenTelemetry, it's much more portable, your data and your instrumentation, because you are working against a standard API and you can switch implementations and it still works. So like open source, you could do everything yourself or you could pay someone to do it for you. It runs the gamut.

Eric Anderson:
Yeah. Observability is a big market in part because I think most people have chosen to take help where it's offered. Where does this lead us I guess? There's a couple kind of things happening in mobile, certainly consolidation around Android and iOS. I think there was a lot of device proliferation on Android. Maybe that's not as problematic as it used to be. Flutter is on the rise, maybe it appears. Web Frameworks is being kind of ported to mobile. Are all of these things kind of interoperable with OpenTelemetry?

Hanson Ho:
Yes, provided their SDK is supporting it in those languages. So there's a Java SDK out there for Android apps on Swift and Swift SDKs. And then, a higher level there are instrumentation libraries like the one Embrace offers on Android. You can also do JavaScript, Node, React Native, Web. So in theory, everything is plug-inable. What we are looking at though is that the mobile front is a lot less developed than the backend. So in the backend, basically every server technology, every language, there's an SDK, there's instrumentation. Basically unless you're really hipster, you pick a really, really hipster language, you're going to have some OpenTelemetry instrumentation to pick from.
On the client side for mobile and for web, it's less developed. But I think folks are getting there. Traditionally, I think client observability is more like runtime user monitoring. You look at crashes, you look at how many users looked at a certain page. It's not really about tracing, it's not really about performance. But that's beginning to shift. And as that shift happens, folks are going to see performance data and they're going to want to connect the performance data on the front end to the backend. And OpenTelemetry is just a natural area to land onto.
And instead of us pushing OpenTelemetry to the various areas, I think they're going to be pulling OpenTelemetry in. Folks are going to be doing embedded stuff and like, "Oh, I want some observability. Where do I go? Well, I have a JavaScript SDK that I could look at. Maybe I can build something out." Ideally, the ecosystem evolves in a way that those folks start contributing back what they create and there's a natural ecosystem instrumentation that evolves.
So similar to how the backend rapidly grew in the last few years, I'm hoping for a similar explosion growth in the front end as well, where the need is there because frankly, the tours are fairly immature for observability right now, very locked in, very proprietary, not exportable, not portable. Once enterprises and SREs realize that they can't get this data out of these locked in mobile observability tools and systems, they'll want something else.
And observability budgets are usually controlled by SREs and folks that have the demonstrated need and demonstrated impact. And for mobile, it's like, "Well, you said you can reduce crashes by 20%. What does that actually mean?" I can't say. Crashes are bad, right? So I must improve it. With better performance tracking data, and especially performance tracking data that could be linked to the backend data, we can start defining SLOs in terms of how the mobile user experience happens. And we can start saying things like, "If you improve performance by 20%, you can start seeing conversion rates increase, the EAU retention increase, churn reduced, and have these very specific linkages happen. And that's when the mobile folks will be like, "See? This makes sense. You should invest in us."
And I think that's a very key missing piece in mobile observability tooling prior to this and RUM tooling, where you just take for granted that the crash rate is important, that 99.9% is materially better than 99.8%, but can you actually quantify it? And it's very difficult to do with the legacy tooling. And I'm hoping this shift to OpenTelemetry in part and having the data connected can help alter that and actually bring meaningful evaluation of impact to mobile client data.

Eric Anderson:
Hansen, I don't know how much you get involved with working with initial customers or users, but I'm curious, are people who get really excited about Embrace folks who've already bet on OTel and now they're excited to bring that to mobile? Or is it maybe just somebody who has a mobile app, hasn't been able to monitor it as well like a Robinhood, the whole business is on mobile and they haven't been able to monitor it as well as they like, and now this gives them a deeper story.

Hanson Ho:
So traditionally, pre-open source and pre-pivot, we were targeting the mobile kind of persona and we built a better mousetrap. It's 40% better. "Oh yeah, great. We'll use you and then we'll solve our problems. And when the problem is solved, we don't really need you anymore." I think what's much more lasting is that second persona that you discussed, where it's the SREs in the backends who are looking at OpenTelemetry and being excited about OpenTelemetry itself. And they'll be like, "Oh. Great, I have all this data in the backend. Okay, let's see what I can get from mobile." And they look at what's out there and it's not as good as they want. There is just a less established landscape. There are no definitions for what a crash looks like in an OpenTelemetry signal, for instance. So that's where they look at Embrace and be like, "Oh, whoa, there's a fully formed mobile observability solution that used to have their own proprietary format. But now they're in OpenTelemetry and they are recording data as OpenTelemetry."
Now the semantics are still being worked out because we're in the early stages of community agreement and these things. But eventually the idea is that there will be eventual community agreement of shapes of OpenTelemetry on mobile. And I think when that happens, that's when the SRAs are going to be like, "Oh yeah, this is great. I could not only get my backend data in OpenTelemetry, I can get my frontend, my mobile data in OpenTelemetry as well." And I think that's the market that we are most excited about and most able and capable to serve, because we want the same thing. We want to demonstrate impact. We want our data to be meaningful, and not just because a mobile lead really likes the way our ANRs capture freezes on the UI thread.

Eric Anderson:
I want to throw maybe three different scenarios use cases at you, and you tell me if these things are things Embrace handles and kind of how well. One is just we call this APM in the past. This was kind of like, "What's slowing down my requests?" And it's down to the line of code or a specific process that's kind of bottlenecking a particular step. Another is spanning a whole user journey, and this is maybe more of a product manager who's like, "Oh, they click here, they do this, they do that. Where are we inefficient? Where do we lose users?" It's less of a performance and more of a user experience. Let's just start with those two and maybe I'll come back to the third one.

Hanson Ho:
So Embrace actually handles both of these pretty well. The first APM use case was what we originally started on, data is proprietary now it's not. So if your clicks are generating crashes, we can tell you that. Those are table stakes for anybody doing mobile, I think.
The second being effectively like an Amplitude or like a user funnels to basically see where your drop-off is. With the technology we're building right now, we are able and capable of tracking that. And what's great about that is we can tie that back to performance. Because certain tools, all they track is whether the clicks happen, where the page those happened. You don't know the reason why there was a drop-off. With Embrace and with OpenTelemetry and with deeper observability tools, with lots of events with context, we can tell you why the journey did not end in a conversion. We can tell that the page load took too long. We can tell that half the users clicking this button generated a crash that ended the app and people didn't go back to that place at the end. So we could not only tell you that bad things happened, we can tell you why they happened or give you the data to find out why they happened. So it's almost like bringing the two together.
With the third use case that I see, which is monitoring outliers and regressions, tracking your P99, tracking your P50, tracking actually crashes, tracking segments that are experiencing certain problems that may not be obvious in P 99s. Perhaps some ads in Germany are behaving badly because the payload contains characters that crash the app or are difficult to parse or different display. Data that we capture with context allows us to do that splitting to identify the cohorts that are experiencing the issues and track that long-term as well. Not only can we know that that particular session is bad, we can know if that user returned afterwards. Things like DAU, tracking a user in their entire journey from app install to uninstall, we are able to do that because we have an ID for a particular user.
So we are not only able to give you instant gratification in terms of debugging hard-to-find problems with a user session. We can tell you the long-term impacts potentially depending on your size and sample size and everything of what a 20% slowdown in app startup did to new users for retention, for churn. Because the data is all there. We know when you log on and then we know when you don't log on. And we can actually split users by cohorts. So this is the type of thing that we can start looking to do with the data that we capture that was not done previously.

Eric Anderson:
Got it. Yeah. And your third use case was actually mine. You mentioned P99. I was thinking more like an incident response. There's an individual user or collection of users that is having this issue. It sounds like you covered that well.

Hanson Ho:
There was actually a really interesting point talking about P99, because P99 for backend means a lot different things than P99 for a user. P99 for backend may not be perceptible to a user. It just might indicate an increase in load and there may not be an actual impact to the end user. A P99 app startup means that 1% of all app startups are as slow or slower than that. So it's not an outlier. It may just be folks on poor phones and in poor networking conditions and there's something in your app that's not dealing with it. So even looking at P99s on mobile, it's slightly different. It's a bit of nuance to it than if you're just used to looking at P99s, P99.9 for backend. So that's where we help too.

Eric Anderson:
And kind of looking forward now, where do you see the gaps, or where are the interesting opportunities in the years ahead? Because it sounds like this is fairly a new approach with OpenTelemetry. And even the idea of OpenTelemetry is fairly nascent. We've had years of proprietary monitoring agents. What gets you excited? Maybe are there interesting opportunities we haven't appreciated yet as this goes forward?

Hanson Ho:
I think there's just so much interesting data out there to capture that's not being captured right now. And OpenTelemetry, it being the origins is backend, focus is backend, usage is mostly backend. I think there are aspects of it that could be tweaked to better serve native mobile use cases, the APIs, things like that. Those are incremental changes. Those are not dramatic big changes. And some of them may not be totally suitable because use cases are different. But hopefully having a better support at the spec and core API layer would be nice.
But I think the innovation is just going to come from the community, the folks who need this type of telemetry, who would need their IoT devices to talk in the same language as the back end. And they'll find a way around issues that may not be perfect. Frankly, the way we're doing OpenTelemetry on mobile isn't a hundred percent canonical and status quo just because long app sessions are hard to track, context is hard to track, changing context. So there are things that we do that are not quite in step with what you would traditionally see in an OpenTelemetry instrumentation. But I think those are growing pains and it's open source so it can change, it can evolve. So I am looking forward to more use cases being brought to the forefront and the community rising up to pick up those use cases and serve them well. And then, spec evolving to better support all those disparate use cases.
I believe in the project not only because of the openness, but the people behind it are fantastic as well, everyone I've talked to, even if I have stupid questions I ask. It's one thing to have a technology that's great. It's another to have a community that is dedicated to it, that is open and welcoming. And I believe OpenTelemetry is that. So I look forward to the community growing. I look forward to use cases growing. I look forward to everybody talking the same language finally instead of these silos that you can't export out.I'm tired of that, tired of being locked in.
So if you're at all interested in OpenTelemetry in mobile, OpenTelemetry, go to the CNCF Slack, visit the Embrace website, check out our SDK, GitHub, OpenSource, Apple and Android SDKs. Just get involved. We're just building this right now from the ground up. And if you're interested in mobile observability, you could have an outsized impact if you kind of hop onto one of these projects right now. We are always looking for folks to help out. And if this at all piques your interest, find me on LinkedIn. Hansard Ho is a very Googleable idiosyncratic name. There's a guy in Singapore that's an architect, but I'm not in Singapore, so you can use your spots to find me. But yeah, I'm looking forward to the community growing and going with it as well.

Eric Anderson:
Thank you so much for joining us today, and thank you for your open contributions. It's a gift to humanity that you've extended OpenTelemetry into new places.

Hanson Ho:
It's already there. We're just trying to add our piece of the puzzle.

Eric Anderson:
You can subscribe to the podcast and check out our community Slack and newsletter at contributor.fyi. If you liked the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.