Andrew Smith: Developers love open source. They like it because they like the ability to see how it works, to feel like they have control over any of the defects they find. Eric Anderson: This is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I am Eric Anderson. We have Andrew Smith of Plainsight and the OpenFilter project on the line. Andrew, thanks for chatting today. Andrew Smith: Yeah. Thanks for having me. Eric Anderson: Andrew, tell us about OpenFilter. This is a newly launched open source project related to Plainsight. It's actually quite interesting. Andrew Smith: Yeah. There's lots of people in the computer vision space trying to make it easier to build computer vision apps. But we have this particular mindset around what the particular problems are with the specific things that make it difficult to build computer vision, and get it running and scaled. OpenFilter is a way to build modular computer vision components that you can compose together into a pipeline. What's different about filters compared to a lot of other computer vision tools is that the same filter, the same application logic, and conceptual models, sometimes machine learning models have to be redone for different edge devices, but they can run on both the cloud, the edge, your local development box. You're able to develop once and then run in multiple locations to create some predictability into the computer vision experience. Then the reality is with computer vision is that vision is never the whole product. Vision is you're doing something with the data, and then also the vision lives in some environment. Sometimes just a camera in a physical location, and sometimes there's other stuff you're doing, medical imaging or whatever. It's always something else that it has to go into. How do you deal with the data supply chain and lifecycle? That's what we're really trying to address is how do you make it so that computer vision is one part of the input into whatever your big data processing system is. If it's transactions and customers, and all these other things you might keep in your ERP or your data lake and stuff. Eric Anderson: Yeah, yeah. I appreciate you just teeing up a bunch of stuff because I want to go into basically all of those topics you breezed past. Before we do that, where might I use OpenFilter or Plainsight? My understanding is that computer vision is broad. There's face detection things happening. There's automotive is doing a lot, the self-driving efforts, there's a lot of computer vision. I'm seeing you in the, I don't know if I would call them the industrial use cases, but cameras in physical places spotting things. How broad is OpenFilter in its computer vision? Andrew Smith: There's three really big computer vision verticals maybe we'll call them. One of them is automotive. The reason you could deploy computer vision cars is, one, cars are all kind of similar. If you make a Waymo or a Tesla, it's an assembly line where you make a lot of the same device. Then you can really optimize it for that environment. Cars are expensive and they sell a lot of them, so if you make hundreds of thousands of cars, putting $100 or $1000 into each one is an investment that makes sense. Cars are a place. Then also, retail is another one where there is a lot of vision. If you're Walmart and you have a lot of stores, I don't know, whichever one, McDonald's or something, and you have a lot of locations, then it makes sense you could invest in computer vision. What's tougher then, as you go smaller and more niche deployments. If I'm a manufacturing company that operates at Amazon scale with huge warehouses or whatever, for that kind of work site, yeah, there's one level of computer vision and one level in investment. Then if you're a smaller company, how do you get vision out to just a job site or a construction site, where it comes and goes? Construction sites aren't permanent. That's one of the big challenges is how do you get these verticals. Particularly you end up with a lot of places where computer vision is super valuable where it's not actually amenable to the kind of software information, cloud infrastructure. There's a lot of computer vision needs and really valuable industries, like for example offshore oil rigs, mines in Chile, and resource extraction in the Outback, or ranchers and stuff, where they're pretty far from infrastructure for internet connectivity. Then you have real world things you have to deal with, like dust and stuff. Yeah. Really a huge tail of stuff for those folks that these are more bespoke. That's where a lot of the computer vision industry gets into this roll your own stuff where each project is unique. The data and the models, and everything are all unique, so you end up having to build everything each time. What OpenFilter is supposed to help with is make it so that even thing you have these different locations and you have these different data needs, you can focus on the algorithm. What are the steps of this vision process? Rather than spending your day wrangling the data. The data part, we'll make easy, and then you can just focus on the algorithms and the accuracy. I hope that made sense. Eric Anderson: It makes total sense, Andrew. Thank you for the overview on the three buckets of where you see computer vision. You've already teased several times about this edge situation. In some circumstances, you don't have the compute infrastructure. In almost every circumstance, you don't have, I assume, the GPUs on the camera that you would maybe want in an ideal world. That seems to be, as I understand it, a large part of the tension that computer vision efforts face. There's a lot of questions about where do you actually put the inference and how do you get the data ready because the inference is a high compute. Video processing alone is a high compute exercise and you're in low compute environments. Yeah. I assume OpenFilter at least has some opinions on how to handle that, if not addresses a lot of it. Andrew Smith: Yeah. Ideally, if you wanted to make developing the computer vision app as simple as possible, you would run everything in the cloud. You would spin up GPUs, and when you need them you could take them down. They're available and scalable, and all that. Yeah, you enter with this data gravity problem, there's two parts to that. One is sometimes you don't have good connections to the edge. That's often the case. Because the cameras are always in the real world, they don't live in the cloud. They're somewhere in the real world and you have to get the data up there. Then you end up with how do we move all this data, realtime streaming video? Latency too could be an issue, especially for manufacturing where if you're trying to find a defect. A roundtrip from the camera to some edge device to the cloud and back to tell it that the screw is bad or something like that, it's probably not practical. Yeah, you end up with that, that's one issue. Then the other one is okay, do we want to run an Nvidia GPU for every camera on the edge? That's also not practical. The people who are setting up those environments, that's not really something that they can ... Well, for one, it's very expensive. If you were, I don't know, the McDonald's case. McDonald's is not a customer, I'm just bringing them up. I think there's hundreds of thousands of McDonald's. If you have 10 cameras and a GPU for each one, you could quickly be, even for a company like McDonald's, it might be on the scale of their whole revenue for a year really, in terms of cost. Yeah, it's not really practical. There's two things there. What can we do at the edge without an Nvidia GPU? We're not going to put a whole thing for every camera, so what can we do? Then how can we then say we're the real deep machine learning stuff. How can we do that at the cloud, where that's a good place to do that? We're not going to do that on the edge. But how do we do some inference on the edge? That's the idea of the filter, that's kind of why it's called the filter is that you have these stages of this algorithm, this pipeline. At each point, especially each boundary between edge, maybe camera and edge, and then edge to cloud, and then maybe cloud to training, you can reduce the amount of data. We'll make sure we find only the interesting frames, the right ones. If we can have something that runs on the edge on the camera, maybe it doesn't need to tell you exactly what's going on totally, you're not going to run a VLM on the camera. But maybe we can find out enough interesting is happening that we can then send that to some local machine, we have one for the full factory or something that does have a GPU, and then we could do some processing there. And say, "Hey, this is the interesting stuff." We now have identified whatever scenario we were looking for. Maybe in the retail case, it might be shoplifting or something. We think there's a good chance this is shoplifting. Then we'll send that out to the cloud, and then really do the more expensive models, then you could do more stuff. That way, you're not pulling all data from the edge out, from the site out to the cloud. It's expensive on the amount of bandwidth that you need and stuff. That's the idea behind the filter and that's why we call it a filter. You can have these stages, and each stage will reduce the amount of data that goes between. That way, you're going from video to structured data and it's only the exceptions that keep going up as you process. I hope that answers your question. Eric Anderson: Yeah. It's prohibitively extensive and latency issues if you back haul all video to the cloud. Then it's also even more expensive if you were to bring the compute to the video on the device. You have to find some in-between where you're detecting something, not the ultimate inference, and it's not maybe as much machine learning. It's a small model or it's detecting something at the edge that you can then indicate which parts of the video to haul back. Andrew Smith: Yeah, exactly. Retail is another one. Let's talk about retail maybe. In particular, you have a camera pointed at, I don't know, let's say we're worried about shoplifting, or the employees stealing money, or something. You have a camera pointed at near the PO, where the transaction happens. In that case, you might, on the camera ... Let's say you're worried about cash getting stolen. You would look just for cash. When the cash shows up, then you could send that to maybe some edge device at the store that can then look for, okay, what's actually happening with this cash. But we found cash, so that's enough to know that this is slightly risky, so you filter that out, so you're only sending maybe even frames, maybe not video. Then you could look and see, oh, okay, this is something happening. Oh, for some reason, we decided this is ... Maybe someone put cash in their pocket, I don't know. We decide this is risky for some reason, maybe then that goes to the cloud. Then in the cloud, you have your other data that you want, for example like POS data, order data, footfall data, whatever, some other information. Then you could join those together and then do the rest of it, then put it into, I don't know, whatever, alerts in your ERP or something. ERP is probably not a good one for cash. That way, we have these practical filters. Another one is for security on a job site or something. You detect something that showed up, usually we want it to be a person. You don't want to send an alert every time an entity walks by the fence. Yeah, a person shows up. They say, "Okay, well, now I know there's a person there. I don't know what the person's doing." But that's enough now to start doing the rest of the processing. That way, you run something on the camera. There's a lot in the edge space about people are working on ways to run ... Usually, they're smaller, they're not going to be the really big models. There's a lot of different things out there like TPUs. I was at the Embedded Vision Summit a few weeks ago. There's a lot of people looking at different ways to run inference where the cameras are. Or even on cameras, these system-on-chips that run on the camera itself. You can do something there, but yeah, you're probably not going to run the whole, big ... There are people who do demos of VLMs, it's not really practical. If you really want to know what's happening, you have to filter that data down. There's another thing, too. The most extreme example of this is you've sent every frame of a video to OpenAI's API. And ask the AI, "What's going on here?" And ask it to give you [inaudible 00:12:10]. You could do that, but that would cost you $10,000 a day or something. At that point, you could hire an army of people to watch your cameras. But you might want to send some frames to something like a VLM or LLM to give you some information maybe before you send it to a person. Filtering out all those different steps where you could do a filter and reduce. It's the MapReduce mindset when I was at Google working on those things. How do we get these points where we can reduce the data so that the next point, we're only looking at what's interesting? Eric Anderson: It makes sense. Okay, you teed it up. I want to go a little bit, Andrew, into your background. I didn't realize until just a few minutes ago before we started this video that you and I overlapped at Google working on the same project. Tell us about your background at Google, to the extent it's interesting, how you got into this space. Then, I think you had some interesting points on how there's some analogy between what you're doing now and what makes Google productive. Andrew Smith: Yeah. I joined Google, I want to say it was 2013 maybe. I was there for six years. The first project I worked on was Streaming Flume, which is the relative streaming Data Flow. Flume was the internal tool, Data Flow was the external tool. The concepts that we had on Flume became Data Flow. I worked on Data Flow, and then I worked on two other things that informed OpenFilter as well. One of them was I worked on other parts of Google Cloud doing data stewardship stuff and really how the data gets handled, which was another part, we'll talk about that a little bit, too. Then after that, I was on Amazon Devices doing data stuff again. I ended up in data is where I became managing lots of data. One of the things I learned quickly at Google is how is Google able to make these big projects that can scale. There's a bunch of different things that they had, but one of the biggest ones was everything was standardized around protobufs. Everything was stored as protobufs, all data was transmitted on the wire as protobufs. Everything was protobufs. APIs were protobufs. Stubby I think it was called, was the API tool, those were all protobufs. What ended happening is that every tool then could speak to each other. If you were using Dremel and then you were using Big Table. There was always a joke on the meme gen site. "Oh, I take the protobufs and that translates to the other protobufs." That was the joke. The fact that that was everyone understood what that meant. It meant that there was really a lot, just across this huge company, all these geographies in alignment on what the tools are doing and how they were speaking to each other. This is I think when I got into the computer vision, there are some standards for some things. Of course, video is always codex and stuff, so there's a lot of standards on how you talk about that. But actually, people I feel like, in the industry, a lot of status people just reinventing the wheel and rolling their own. There's lots of different things out there, lots of different ways of building a pipeline. It depends on this edge versus cloud thing. If your doing edge stuff, then you use a completely different toolset, and really have this set of tools and needs you have, and then the way that you build the application looks one way. Then if you're doing cloud stuff, it looks different. There's a big gap between those. A lot of the low hanging fruit of cloud inference stuff has been taken over. Then a lot of the very expensive and really wide problem, computer vision problem like self-driving cars, those are able to get the resources to run. But the rest of the stuff, it's a lot of bespoke tooling. With OpenFilter, we're thinking about how do we standardize the interface, and format, and composition. It's not meant to replace all the tools that exist, but it's meant to be a way that they can interoperate. You don't have to replace your entire tool chain, but you can have your tool chain with reusable components. Then when you have those reusable components, you can deploy them across these different environments where you can say, "Hey, I have this pipeline with these six steps in it." Maybe I crop some image, or I detect something and crop again, and then do some OCR, or whatever. I have these stages on my computer vision application. Then I'm able to describe that one way, and then run it in different locations. The first thing, I can run that on the camera. Then the next step, I can run that on the edge. These last steps, these more expensive ones, I can run those in the cloud. Then when I'm developing, I can just develop it all and run it on my laptop. When I'm testing, I can test it, some of those things in my lab, some of those things on the cloud. There's always testing and stuff that you have to do, all these device management things you have to do. But removing the peculiarities of all the different things and standardizing it, it's conceptually a little bit like what the toolset I had a Google where everyone was always on the same thing. The other thing is you've always got telemetry, and observability, and monitoring. That was a big ... There wasn't an option for you, you always had those things available to you. People understood what that was. I found that in the AI and data science and machine learning engineers, those considerations are less pressing. But that doesn't mean though, when you're building the solution, at some point you do want to deploy it out to real world and you do need those things. Another thing with the standard is if we had a standard thing that was a borg or whatever, whether you're running on the edge, or in the cloud, or in the camera, or whatever, you're able to get observability, and telemetry, and the pipeline integrity. Being able to actually diagnose, "Oh, this is the point between this camera and this edge where the data started getting lost or corrupted, or the scores have been bound." Or we added these new images into our training set and that resulted in this data in this new model that we trained. We deployed it and we saw the numbers either go up or down. Having that whole monitoring observability built into the toolset so that your chain of computer vision applications actually know ... The observability and data lifecycle is built into the toolset. Those are the two things that we're really trying to take with us. The CEO, he was on Kubernetes when he was at Google. But yeah, we're trying to take those two parts and bring them into this computer vision space. Eric Anderson: Can I think of it, Andrew, almost like a framework where you give people these filter primitives, and some of them are specific to running on, as you mentioned, on device, on edge, or on cloud? You give them some development tooling where they can run those things on a local machine or in test environments. Then in addition, you give this monitoring observability stuff on top. I guess it's not even just traditional monitoring and observability, availability and APM speed stuff, as much as it also is quality of model inference, data quality. Andrew Smith: Yeah, that's right. The data observability in addition to uptime observe, which you always need that, too. But yeah, data observability. When novel things happen, when data comes that's out of distribution. Some of those techniques are things that are part of our commercial offering. The open source thing, all the OpenFilter no matter what has lineage so it can help you with tracking. We worked with Willy from OpenLineage. He really knows the OpenLineage, and how to use it, and the way to structure your data for lineage so that you can see the different steps. And particularly, with these pipelines across these different locations, all these different locations. And the full lifecycle, because you bring in data, and you train a model, and then you push that model back out. Having that full lifecycle that's not, "Oh, I trained a model and now I'm done." But actually, "Oh, I'm always training a model." Maybe not every second. But new products show, I have new customers, I have new work sites, so I have new different things, so I always need to observe the performance. I need to find which samples work poorly. Not just in my test environment, but in real life so that I don't have to constantly toil with all this data. The cameras and the filter can help you find which data needs to be included in your training set. Eric Anderson: You mentioned earlier that data scientists and machine learning engineers aren't as used to going to the last mile production efforts. They're trained in the lab, they spend all their day in the lab. They're good at, "I can run this on a local machine and it works great." Then the need to go to production is where you get this tension on do I run it on a cloud or do I run it on the device, I probably have to run it in-between. How are they adjusting to actually, I don't need just one model or one inference step, I need two or three? Is it the data scientists and MLM engineers who are using OpenFilter to build that sequence? Or is this tooling for software engineers? Have you democratized the inference development? Andrew Smith: Yeah. We're trying to build a toolset that both can use, which is the different I think. The software engineer can include the computer vision into the rest of the application. Even ChatGPT, a lot of the software is software, and the actual AI is just a part of it. There's always a part where you have to include the rest of the software. How do we meet them, both people where they are? For the data scientist, machine learning person whose creating the models, how do we make it so that their toolset which they like to do with their laptop and stuff, but how do we give them those observability and stuff like that when they're doing that work? Then also, for the software engineer, how do we give them the ability to run these things in all the configurations they need to, without having to then also become experts at the vagaries of added distribution, and I don't know, IOU scores, and stuff? It's giving both. Yeah, we're trying to meet them in both places. We've actually found that so far, that the software engineers are the ones who are picking up OpenFilter. It's the folks that are like, "Well, yeah, I can make a POC that runs on my laptop. How do I take it further?" We've found that it's often software engineers who are like, "Okay, well, this is the kind of stuff I need." The software engineers do understand observability and stuff. Or maybe, they just get yelled at by S3s. But anyway, somebody there is making them do it so they really get that. We've found that that's really been where the interest is in the first few weeks as we launched OpenFilter. Eric Anderson: I'm curious, Andrew, around the decision to go open source. There's a handful of computer vision vendors, not all of them have published open source. I suppose that both you and Kit come from prior lives, you spent a lot of time in open source, so maybe that's just a natural motion for you. Anything interesting around the decision to open source OpenFilter? That was fairly recently. Andrew Smith: Yeah. Since I started working with Kit on Plainsight a year-and-something ago, we were always going to open source OpenFilter. That was from the start, we were always going to open source it. The question for us just was when it was going to be ready because we didn't want to drop something that we thought wasn't good enough or people would think didn't meet their quality bar or didn't seemed like we really solved the right problem correctly. The fit was there and the need was there. We spent some time really getting it ready and fixing the problems that we saw in the space. Telemetry and observability in the data supply chain was one of them. The productivity of the experience building them is a second. The third one being the ability to go after these different targets, edge, versus cloud, versus. Once we had captured those three things and we knew that we had something that would work across those, and we had people working either with us or our partners who were building computer vision apps in less than a day with the whole thing and deploying it out. They were like, "Okay, well, this is good enough for people to give it a shot." Yeah. For us, it was never a question about doing it. I guess yeah, there might be some other background. Some of it is just understanding that developers love open source. They like it because they like the ability to see how it works, to feel like they have a control over any of the defects find. If they find there's a problem in there or something that they need, they don't have to wait for someone to bring that. The commercial product, you file a ticket or you make a support request, and then some time later, some process that's opaque to you occurs and then you get that. With open source, it's much more obvious. Then also, it's the only hope. If the goal is to make a standard, like I don't know, the protobuf for data for computer vision ... If the goal is to make a standard, the only way to do that is open source. As we saw, I worked at Microsoft before Google. The closed source source stuff that Steve Ballmer was obsessed with all went away, even they embraced open source eventually. Yeah, the only way to really get those standard is this. I think in the end, the interoperability is the important thing. Trying to make something standard and composable, and that works with people's existing tool chains ... A lot of time training pipelines live in notebooks, inference lives in the hand-tuned scripts. Deployment is a separate thing. How do we actually connect those pieces? The reality is those notebooks are open source. The OpenCV, the tooling for a lot of the logic that goes along with the model, all that stuff is usually written on open source frameworks, and deployment with Kubernetes and Docker. Those things are also mostly open source stuff. Trying to meet people where they live and the things that they like. Yeah, getting those developers. Well, Kubernetes is a pretty successful product and some of the other stuff have gone well, too. He knows developers pretty well and what they like. Eric Anderson: Imagine you're somebody listening to this discussion and you're trying to figure out, "Do I have a need for OpenFilter computer vision?" What's an ideal use case? What stories do you hear, Andrew, and you're like, "Oh, this is a perfect fit for OpenFilter?" So that folks can pattern match where they could use it. Andrew Smith: Yeah. I feel like OpenFilter is probably helpful for lots of computer vision use cases, but I think the one where it's most pressing is if you have cameras in different environments, but similar problems. Then if that's the case, then OpenFilter, the composability is going to be super useful to you. For example, let's say you're McDonald's. Some of these stores are built in ... I don't like keep going after McDonald's. Some of your stores are built at one time and have cameras from one era, other ones have different cameras. You have these disparate environments with different hardware. Being able to build something that can run across all those different environments, where your algorithm is going to be the same. How you're going to solve this, the problem you're solving is the same, but the environment is a little different each time. The connectivity is different, or the camera's different, the edge hardware is different. If you want to gain confidence in your problem across these different environments and being able to monitor those, then yeah, I think that's where I would say OpenFilter is super helpful for you. In reality, if you want to be productive building in computer vision apps, OpenFilter is good for that, too. I think we've had plenty of people tell us they got something up and running in less than half a day or something, which is always ... Our goal was to try to get it so that you could get something into production in a single day and we have people tell us that they've achieved that. Eric Anderson: That's awesome. It sounds like the project's quite new. This is just a week or two into a release. What are the plans for OpenFilter as an open source project and community development? Where are places folks can get involved? Andrew Smith: The biggest need for folks who want to help out and contribute is tackling more edge targets. There's only so much we can do in our purview of which different kinds of edges and which kinds of edge scenarios and hardware types that we come across at Plainsight, and with our customers and partners. Other folks, if they have different chip sets, or run times, or hardware on the edge that it would be awesome to see more places to run OpenFilter and more ways to get more targets for that. That's the first one. The other one is the downstream data destinations. We take our data that we have, what we call subject data, the output of the inference and the computer vision logic. For us, we have Kafka, and databases, and SQL, and things like that. But what are the other data destinations that people have? That's another big opportunity on our roadmap to add different destinations for the output of once you've got the structured data, it's no longer video, but it's usually JSON or whatever. Where do you put that? Having more destinations for both of those would be tremendous. Where we run and where the data goes. Eric Anderson: Amazing. Folks can just reach out to you via the GitHub would be the place? Andrew Smith: Yeah, openfilter.io. Yeah. The link's there to the GitHub and documentation. Yeah, anything else, they can reach out to me. Also, file tickets or ask questions through email. Eric Anderson: This has been great, Andrew. I think when I first ran into Kit ... Well, we've known each other for a long time, but when he first was telling me about OpenFilter and Plainsight. You see all these demos online of object detection and other things and you're like, "I thought this was a solved problem." Then we dug into it and in fact, computer vision is so nascent in its broad deployments that in the lab it works, but it's not being used really anywhere. I'm excited that it seems like maybe this is the thing that's held it back. Andrew Smith: Yeah. I definitely think some of those observability, and composability, and those things are actually a lot of the obstacles that people run into. Hopefully, we're able to make some difference. Yeah, if you have a big problem like a self-driving car with a lot of money behind it, you can get computer vision working. But how do we get it for all the rest of these cases and everyone else who isn't 100 billion in revenue kind of company? Eric Anderson: Totally. Thanks for your time today. Andrew Smith: Yeah, thank you. Eric Anderson: You can subscribe to the podcast and check out our community Slack and newsletter at contributor.fyi. If you liked the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.