John: The story was interoperability is broken. And I said, what do you mean interoperability is broken? We fixed that with Kepware. That's what Kepware was all about. And As I dug into it, what I realized was interoperability for Industry 4. 0 is different than interoperability for Industry 3. 0. Announcer: You're listening to Augmented Ops, where manufacturing meets innovation. We highlight the transformative ideas and technologies shaping the front lines of operations. Helping you stay ahead of the curve in the rapidly evolving world of industrial tech. Your host for this episode is Eric Marandet, Chief Business Officer of Tulip, the frontline operations platform. Erik: Hey everybody, welcome back to Augmented Ops. We've got with us today, John Harrington. John is the Chief Product Officer over at HiByte. And for those who don't know HiByte, it is a interesting and I think what is going to be a very significant piece of infrastructure that everybody in the space needs to be aware of. So we're going to spend some time today talking a little bit about HiByte, about John's background specifically. John, welcome to the show. Thanks Thanks, Eric. Great to talk to you. Great to see you again. John: And thanks for inviting me. Erik: Yeah. My pleasure. John, you've had a long career in the industrial software space, obviously currently with HiByte, but most recently in a leading role at PTC with Kepware. And Kepware, I think is probably still one of the most ubiquitous, ubiquitous, ubiquitous So many ubiquitous solutions that you see on shop floors today. Maybe you can get us started. Tell me a little bit about what experiences that you had in the past made you realize that HiByte is a product that needs to exist and something that you wanted to be a part of co founding. John: Yeah, so my schooling was in mechanical engineering, but the majority of my career has been in software companies providing solutions for manufacturing. Like you said, the last company I worked for was Kepware. I worked for them for eight years. And a couple of years after PTC had acquired them, I decided it was time for me to do something new. PTC is a great company. They've got a lot of great people, a lot of great products, but it's a much larger company than Kepware and it was based down in Boston and I didn't want to do the commute forever. So I left PTC and I started looking around and I started networking with people throughout the industry, other software vendors, end users, solution systems integrators. And really talking about, you know, what are the challenges and, you know, what's not working, what's kind of broken. And if you remember, this is 2018, big data, IOT and analytics was, was starting to come on and the story was interoperability is broken. And I said, what do you mean interoperability is broken? We fixed that with Kepware. That's what Kepware was all about. And as I dug into it, what I realized was interoperability for Industry 4. 0 is different than interoperability for Industry 3. 0. Industry 3. 0 was all about collecting factory floor device data, overcoming all the different protocols, and aggregate, making that data exposed to the SCADA system for the OT team to use it. And interoperability for Industry 4. 0 is about leveraging that data across the enterprise. Leveraging cloud and data lakes and data warehouses to drive business decisions. So when you start to dissect what the problems are, they're very different. It's about curating data for the target, for the consumer of the data, not figuring out the source system. That's kind of solved in Industry 3. 0, all those device protocols that Kepler has. But now it's about curating data for the target systems in the way that they communicate so that we can rapidly onboard data to all these different use cases, whether it's quality or predictive maintenance or operator systems like Tulip has or sustainability, you know, you name it, but there are all these new use cases. So, Erik: John, I'm wondering if you could, for those who may be less familiar with this space, kind of orient us in a very practical example, because we're talking about, you know, interoperability between systems. What systems? We're talking about, like, target systems and, you know, SCADA layers. We went kind of right into it, technically. Let's talk about a specific problem that somebody's facing. Let's talk about data coming from a number of different, like, what are the sensors? What's the problem that they're facing? And like, how does this all fit together? John: Yeah, so let's talk about the problems that we used to face, which was, I have a brewery, Portland, For anyone who's listening to the podcast, it's very big on breweries. We've got some excellent New England IPA breweries up here. Come on up and visit. Erik: And for those who haven't gathered, John and the High Byte crew all live in Portland. John: Portland, Maine. Portland, Maine. An important point to Erik: differentiate. Very different than Portland, Oregon. John: Yes. So we have our brewery. In that brewery, we have tanks. We have a pump that's feeding the tank. And Industry 3. 0 is about controlling that pump. Are we pumping fluid into the tank? Is the tank full? Do we need to release the valve to pump the fluid out of the tank after it's fermented or after it's, you know, done whatever needed to happen? So that was control. Industry 3. 0 was all about process control. How do we keep the, Process flowing so that we're getting optimal production out of the systems. Erik: So, Kepler basically says, I don't care what pump it is, I can talk to that pump and I can get the data off of that pump and this infrastructure and I can also tell the pump what to do. That's a problem Kepler John: solves. So, Kepler gives you the, you know, the pressure, the state, the power consumption and a few other things so that you can control the process. But now comes along the maintenance engineer and he wants to do predictive maintenance on that pump and the 50 other pumps that he's got in the brewery. And he wants to say, well, that data that you have in your SCADA is nice, but I need data contextualized a different way. I need to know not only what's the pressure and how many hours does it have on it. Now in the process control, all you need to know is, is it on or is it off? Well, now I want to calculate how many hours am I leaving it on? What's the pressure? When it's under load, did it go over a certain peak? In which case, maybe I blew out a seal on it or, or it got too hot or things like that. So the maintenance person wants to know that same information, but contextualize it a little differently. And then they want to combine it with their asset maintenance system. So they know, well, how old is that pump and who's the manufacturer? So that when I look at all my pumps, I can look for different things on different pumps based on the manufacturer. When I last. Service them or something. So they want the same data, but they want it structured differently and maybe combine it with some other system data and then take that a step further. You have a quality engineer who says, well, that's great, but I actually want to also tag that data with the batch that we're creating. The product that we're creating, and I want to know how long it was over a certain temperature to make sure that it cured properly. You know, did we make sure that when it was in that tank, it was cooled for a certain amount of time? You know, so there's lots of data there, but for each user of the data, we need to pull it together differently and describe it differently, and that's really hard. If you don't know the context on the shop floor of how all these different pieces of equipment are connected, so that's really where HiByte comes into play to be able to pull data from an MES system, a CMMS system, pull it from the telemetry data, but reconstruct it in different ways. Maybe one needs to know the max, another needs to know how many hours it was on, and another person wants to just know that it was on. But it's all the same raw data, so the ability to do that sort of thing is really important. Erik: So, at the risk of being overly simplistic here, if I think of PyByte as this layer of data ops middleware where I connect it to all of my sensors and raw data streams from various Sources or source systems. I have this layer where I can say, okay, I want this attribute. I want that attribute. I want it shaped a certain way. Maybe it's outputting at a hundred Hertz or a thousand Hertz or something. And you want to, you know, I actually only care about that. Like once every five seconds or something like that, I can downsample it. I can basically package it up and say, great, this specific. Shape of the data. I wanna send at some frequency to some other place, whether that be an analytics platform or you use like A-C-M-M-S asset management, maintenance platform, or, you know, tulip for that matter. Is that, am I thinking about it in the correct way? Yes. John: Or a data lake or lots of different targets for it. Erik: Data lake, I hear, let's talk about data lake for a second because I, I hear data lake, I hear cloud, I hear more and more. Lakehouse, Lakehouse data, warehouse data. WI like Lakehouse the most. I think we should all just. Go there, spend the summers John: in New England Erik: at the minimum, right? Like if you've got one of those, that's where you want to be. But look, I hear more and more unified namespace. Previously, I heard a lot more of like, um, you know, Data Lake. I think you hear that less frequently now. And I'm curious, this layer of middleware, can you orient me? Like, let's take a step back and talk about this from an architecture perspective. Where does this sit relative to my Data Lake? or Unified Namespace. Help me understand the context around HiByte as it relates to some of these architecture topics. John: So, HiByte curates and moves data. So it's middleware, like you said, and we talk about that in relation to this concept called Industrial Data Ops or Data Ops, which really came out of the IT world for the curation of data for analytics and visualization. And so it's kind of that operations layer that enables you to get data where it needs to be. And the recognition is that when all you had was all the data going to one target or you had very few sources and targets, you didn't have to have an operations layer. But now that we have lots of targets, lots of sources, and we have data going here, there, and everywhere, and we have data going to the cloud, data coming out of the cloud, then we need an operations layer. You're essentially creating a data network, right? In your organization on top of the physical network. So how do we know what's going where? How do we troubleshoot when things aren't good and when things are broken? So that's what data ops is. Unified namespace is A concept that has grown in popularity dramatically over the last few years, around how do I standardize and unify that data and put it somewhere where it's accessible. If you think about it, the term unified namespace, A namespace is just the structure of a system that the data is in. In a SQL database, the namespace is the tables and columns. But what we're doing now is we're saying, well, let's put all the data somewhere and let's structure it in a way that is semantically defined. It's easy to understand that data that's related to itself or is close to itself, and let's organize it. And so we're often organizing it by like the ISA 95 hierarchy, where we've got site, area, Cell, Asset. So we're organizing it, we're structuring it, and that way anyone who needs access to the data can go get it. Now the most common design pattern for UNS today is the MQTT broker. Because MQTT makes it really easy to Organize the data sets and then to put data sets into it. So it fills those two really easily and it's also very scalable. Erik: How is that different than you mentioned before, SQL database tables and columns? That I get, it's very clear. But you, you said MQTT broker and I'm subscribing to topics. And how is that different than more of this rigid hierarchical ISA 95 approach to organizing your data? John: Well, in terms of organizing it, I would say it's the same, but the organization of data, the hierarchy is only one small piece of ISA 95. ISA 95 defines an application hierarchy, ERP, MES, SCADA, PLC, sensor. It defines how the data moves. It defines the data structures at the MES layer. So, it goes much, much deeper than just that hierarchy, just the organization and that's where I think a lot of people struggle with it. But in terms of the hierarchy, we just use the hierarchy as an organizational mechanism because it's really the way that plants are organized today. You've got a site. Within the site, you generally have multiple areas. Within an area, you may have multiple lines or work cells. Within a work cell, you have multiple assets. So it's, it's a very logical approach to it that most anyone that's seen the factory can, can kind of understand. There's nothing kind of magic about it. Erik: But if I were to compare like two of the most common protocols that I often see, I'll contrast. You know, OPC UA with MQTT. Maybe you can give me your two cents on these two different protocols, pros, cons, differentiators. John: That's a long conversation, Eric. People think of that difference is the crux of interoperability. But, technology can solve, whether it's OPC or MQTT. OPC tends to be more, uh, you have a client that subscribes to the server, and then the server tells the client whenever things change, and the server typically pulls the devices and just kind of moves the data up. With MQTT, the clients also publish the data, and then other clients subscribe to the data, and you end up with this publish and subscribe event driven data flows. At the end of the day, they're kind of similar, but different. They have different structures, different security, different ways of creating the data. But I think they both work. OPC UA works very well when it's dealing with relatively high speed and massive volumes of data. MQTT also works very well with large volumes of data where you can kind of subscribe to what you want. The software layer can really take care of the protocols. Where we see the interoperability challenge coming is the contextualization of the data. You know, people kind of get hung up on, is it OPC or is it MQTT, and we say, look, many software packages can consume either. So it kind of makes it not all that important. You need access to data, you need one of them. But. Take whichever you want. The challenge is how do we contextualize that data and structure it for its use? Because ultimately we need to be able to do that because industrial data is generally not well defined. It's not well standardized. One pump to the next could look very different. It doesn't have like a well defined data dictionary that you may find in a SQL database. You have these cryptic tag definitions and then someone's got a spreadsheet somewhere that kind of defines what they all are. So we need to structure this data as we move it around. We need to structure it for the intended use. And that's where we really see the challenges of interoperability and where we see the strength of a data ops solution. is that contextualization and structure and delivery. Erik: Interesting. So if I were to go back a couple of years, this is when you're hearing lots of conversation around, you know, data is the new oil. And I think for a time, what I saw was people are saying, look, capture all the data. It doesn't matter how, it doesn't matter where, just capture it and go put it somewhere. Put it in Splunk. Put it in your historian. Put it in some S3 bucket. Just make sure you capture it. And it doesn't matter how it's contextualized. It doesn't matter what the tags are. It doesn't just like capture, for goodness sakes, because every second you're not capturing this data is future value and if inefficiency, you're, you're, you're just leaving on the floor and don't worry about it because AI is going to come along and solve the problem for you, right? I feel like that's what. The cool kids were saying for a couple of years, probably about the 2018, 2019, maybe even into 2020 timeframe. And then I started hearing like, Data rich, Information poor, Contextualization, and you've mentioned this term contextualization so many times and I'm just curious why is contextualization so important and I mean if you're at the beginning of adopting this strategy and thinking how am I architecting my unified namespace, if you will, or implementing a data ops layer, you know, what are the things you need to know on the front end to make sure that you don't end up with just terabytes or petabytes of unstructured random tags, you know? That you're paying for. Yeah, that you pay for. John: That you're paying for in the cloud. And there's still a lot of people that are doing that. What we find is, industrial data, there's a couple of challenges. Number one, like I said, when you're pulling it from the devices, the PLCs, Which is where a lot of the data comes from. It's not so much encrypted, but it's cryptically defined. You're assigning names to it that anyone outside of the controls engineer that's looking at it would have no idea what piece of equipment is this on? What's it talking about? What are we measuring? There's like a whole string of abbreviations for a single tag. Erik: This is like the business equivalent of copy, copy, V final of my Excel spreadsheet, you know, dot two, right? It's like, yeah, yeah, that's the one I use. That's the good one, right? Exactly. John: So that, that's one challenge. The other challenge is every device is unique because you bought it from someone different or you bought it a year later than the last one, and every piece of equipment has its own data structure. The way that I often describe this to people is Imagine if you're running a sales team and everyone had their own CRM system. You would say that's crazy, like everyone would define a contact slightly differently, whether you have first name and last name together or separate, do you have this citation, do you, how do you describe their address, all of that. Everyone is slightly different. You would say, well, why don't we just standardize that and just have one way of defining it? Well, when you're buying equipment from all over the world to put into your factory, they're all slightly different. There's no standards today. So, as a result, you have to construct your own standard, and you have to do it on top of the equipment. You leave the equipment the way it is, because it's very expensive to reprogram all of that, but you can put a layer on top of it that standardizes it. So, first we have to define what the data pieces are, then we want to standardize them, and then we need to define how they interrelate. And every factory is different, in fact, every line is often different. You know, what's the valve upstream, downstream? How much volume is in the tank when I'm looking at that pump, I want to know all that information and how those interrelate is critical when you're looking at the data. So we have this interrelationship that is very unique per cell, per line, per site. And then we have a need to look across multiple sites to try and get these roll ups. So as a result, industrial data is challenging and we need to figure out how to link it all together. And the person that can do that, and this is the key, is not the data scientist who works in the cloud and is using that petabyte of data that you store. It's the controls engineer who's at the site. And so when we design a software product to help do this, we need to design it with that persona in mind. That's the person that needs to very quickly, very easily say this, this, this, this, and this all go together. Now let me send it up there and oh, that's actually measured in Celsius because we bought that equipment. From a European vendor, I need to do a translation because our standard is Fahrenheit. You know, do various transformations of data and whatnot. So you need to design something for that persona because the domain expert of the sources of the systems is the only one that can do that contextualization. So that's where it kind of falls over. Erik: The data scientists might be able to tell you that tag XYZ correlates 99. 9. With tag ZYX, but they don't know what the hell either of those tags mean, or there's just two floats that are, you know, moving in parallel. It's kind of like baseball stats, right? Like, you know, you can say, hey, uh, you know, every time they, uh, the pitcher tied his left shoe and then his right shoe, and then, you know, they were playing at Fenway against this team, like, they always win. But you wouldn't really bet on that, right? It might be true, but I wouldn't, I wouldn't bet on that. John: Well, it's like saying you don't want the data scientist to spend three months and come back and tell you that when the power's off, the pressure is at zero. Erik: Right, exactly. With 100 percent certainty. We all know that. Right. But these are the kind of things that if you don't have the data appropriately contextualized, you will invest a ton of resources only to say, hey, look, there's a 100 percent correlation. Every time this happens, five seconds later this, I've solved your predictive problem. And it's like, well, yeah, that's, The power went off and pressure dropped to zero and it took eight minutes to do it. Right. I mean, we're having, I think, fun, but it's fairly intuitive. What I'm also hearing you say, though, that's perhaps less intuitive, and I think Opinions are somewhat divided on this. I hear both. Is you're telling me that the information needs to be documented, accessible, and organized in a way that anybody can consume it and understand the context. That's contextualization piece you're saying, but you're not saying is, I need to know the problem that I want to solve before I start capturing the data. So this, and I think this is important because I hear others saying, well, first, what's the business problem you want to solve? Then you go structure the data. You're saying, I guess you're not inferring causality either way, but you John: Well, I'm, I'm doing a little bit because the relationships in the context, again, is in the eyes of the consumer. And this is where things get a little hairy. So, like I said at the beginning, the maintenance person would contextualize That telemetry data very different than the quality person. Erik: Yeah. John: So I need to at least know, am I solving a maintenance problem? Am I solving a productivity problem? Am I solving a quality problem? Am I solving a sustainability problem? But then the specifics of those, we don't need to know. We can just collect large amounts of data. We have it contextualized for roughly that problem. But the reason we want to know that problem is if you're solving a, uh, line monitoring problem where the executive wants to have access to that, you may only have to update the data once every 10 minutes. You may have a maintenance problem that you need data once every millisecond. Right. And you don't want to collect that one every 10 minutes data on the millisecond range just because you need everything. And this is where we do hear some people say, I need everything. And what I say to people is, look, there is no longer an everything. There's so much data available at such a speedy rate. You need to go a level deeper. You need to at least identify the high level use case. And that'll define how we contextualize the data, and it'll define the frequency and where the data is going. And then we can start moving the data. And we can put a lot more data than we think we need in there because we're performing analytics and whatnot, and we don't really know. But we need to at least have some idea of what it is that we're solving. I often think of the factory as having like three key constituents. There's a process, which is what most people think of, which we've been monitoring for a long time. There are assets like for predictive asset maintenance and whatnot. And then there are products that are produced. And if you think of like traceability problems for vehicles or for pharmaceuticals or whatnot, those are product problems where you say. I need to track everything back to a batch or I need to track everything back to a serial number. So if you can identify it at least down to that level, then we can solve a lot of problems off of each one of those key models. And when it comes in with humans and some of the stuff that you guys are doing with Tulip, then the human, it's a unique type of asset, if you will. It's kind of collecting data on, on that as well as, you know, you guys collect a lot of data on the products that are produced as they're being produced. So. Erik: Yeah, for sure. Key use cases for us, you know, the way sometimes I describe it is at the end of the day, we want to make data driven decisions, but data is not making decisions. At the end of the day, this data is going to go to some person somewhere who's going to look at it. And we also recognize that these are distributed systems, so, like, who needs this information to be able to make the right decision at the right time? And that's highly context dependent based on vertical, based on use case, based on persona, based on lead product life cycle, so on and so forth. We've been talking a lot about interoperability, but we've been talking about it largely from a protocol perspective, but this assumes something, and I think it's worth calling out explicitly. This assumes that System A has permission to talk to System B, right? And I think we're coming from an era in which, you know, and I don't want to name names or anything like this, but I think we're coming from an era in which this walled garden approach, no, no, no. You buy full stack from software vendor, you know, A, B, or C. You, you pick your choice. There's many to choose from, but it was almost a competitive advantage. And he's saying, no, no, you're going to buy everything from us and we're going to solve all your data problems. Nevermind whether or not we can actually talk to the different, you know, products in our portfolio. Uh, that's a separate matter. We're working on it, but I think we're coming out of this era and into this era of an open ecosystem and, you know, HiByte, you know, certainly Tulip. I could mention many others are, I think, a big part of what's moving the industry in this way. Everything we've talked about so far, I guess, assumes the existence of this open ecosystem that can give you my data or consume your data for that matter. And that's not always been the case. John: No, in fact, some systems are still not as open as we'd like them to be. And typically what we find is the larger vendors are still trying to maintain some amount of that closed garden. And sometimes it's for security reasons, but we are seeing much more opening and we're kind of helping that along, I think. Tulip is, HiByte is, a lot of the cloud vendors are. A lot of the customers are just demanding that from the solutions. With every new change comes responsibility as well. And the two biggest responsibilities are number one, security, and number two, governance. So people really need to think about that. And actually, you know, I think one of the key reasons that people are using the cloud is because of that, because if they can land all the data in the cloud. Whether it's Snowflake or S3 or Azure Data Fabric or what not, then the IT team has it in an environment that they can work on it and they can then provide apps on top of that for the various consumers. And it just allows them to have a much more restricted environment that they're working with. But there's definitely a governance piece to it, there's a security piece to it, and at scale we need to put this data somewhere. And that's where we're seeing a lot of talk and a lot of activity around the cloud vendors in Snowflake. Erik: Interesting. A couple of questions I have here. First, if I'm someone listening to this for the first time, I find these concepts interesting, maybe I'm getting started in my career, And by the way, most folks in industry, I think it's safe to say, I'm still in this, you know, we call it an industry 3. 0 world. I'm still like an ISA 95 adherent. I joined my new job and my boss told me to put my supercomputer away, to pick up a pen and a paper, to go solve some complex data problems. Like this is my reality. But I'm an engineer. I've been trained. I know that there's a better way. What advice do you have for folks who are hearing this and just getting started on this journey? To make sure that they're taking the right first steps here. John: You know, I think there's a lot of information that's available. Uh, certainly the HiByte website, www. hibyte. com. We also have a lot of videos on YouTube, if people are looking for that type of learning. LinkedIn, our industry is extremely active on LinkedIn, and I'd recommend people go there and just start following Companies that they know that they're dealing with. There's a discord server that Industry 4. 0 puts up. There's lots of ways of learning about this stuff. I would just start, you gotta kind of start, you know, going to the Tulip website and learning about it. And so there's a lot of learning. There's a lot of piecing through, understanding where you're at, where you want to go. And the other advice I would say is. You got to start. Just find a simple problem that you want to solve and go and solve it and start working with the data. Start working with the systems because you're going to learn a lot. It's going to be iterative whether you like it or not. So you might as well start small, start working with things. Don't be afraid to fail and just start working with data. Pretty soon you'll start generating a lot of value. Erik: All right, John, and last question here. Let's say that I'm somebody listening to this and I'm at the opposite end of the spectrum and I've been doing this for 20 years and I'm like locked in to vendor A, B, and C and I'm thinking about migrating to this new sort of open architecture approach. What advice do you have for that persona? John: Ultimately, I would say the same thing, start small and start doing it to see the experience. You know, what we hear from our customers, the biggest benefit is the speed at which they can operate. When they look at analytics projects or they look at visualization projects and implementing software, the single biggest challenge today is the integration piece. And yet, if they can use the proper technology, what we hear from our customers is. The old ways that we did it versus now we're saving 10x. We're able to do more with the same size team. I have people who say we spent six months trying to implement this by writing Python and using various code libraries and whatnot. We started using Intelligence Hub, the HiByte software, and I was able to do it in two weeks. Erik: Yeah. Yeah. John: Or people who say, you know, I found a project, but in order to scale that out, it would have taken me a year to go across every single work cell in my factory. Yeah. Instead, by using some tools off the shelf, like HiByte, I was able to do it in a month. Erik: We recently, I got to say, we were talking with this customer and they said, Hey, here's what we're trying to effect. They've been working on this project for two years and we had, we were setting the meeting up on Monday. The meeting was on Wednesday. We had our solutions engineer show up on Wednesday with the problem solved and the demo in hand of the project they had been working on for two years. Literally, it took like six hours and he did it the night before. Yeah, it was, that's great. But the point is. Technology has changed, the world has changed, the landscape has changed, you have different tools available that enable you to spend and invest your time in different ways. And John, I think you said it well, get started, you know, see it for yourself, prove the value for yourself, because whether it's John or me, we don't know what this necessarily translates to and how this would be applied in your specific context, but the world has changed and chances are good that there's tools out there that weren't there a couple of years ago, or, you know, five, ten years ago, perhaps, that can really move the needle for John, I appreciate you joining us on the show today. Super interesting conversation. I sure learned a ton. Thanks for the time. John: No problem. What a great conversation. Yeah. Happy to do it anytime. Announcer: Thank you for listening to the Augmented Ops podcast from Tulip Interfaces. We hope you found this week's episode informative and inspiring. You can find the show on LinkedIn and YouTube or at tulip. co slash podcast. If you enjoyed this episode, please leave us a rating or review on iTunes or wherever you listen to your podcasts. Until next time!