The following is a rough transcript which has not been revised by High Signal or the guest. Please check with us before using any quotations from this transcript. Thank you. === Ari: [00:00:00] I worked with the Cubs when we started implementing this, only 3 percent of all balls in play across baseball incorporated this defensive movement of people, only 3%. And a few years ago, it went up to over 50 percent of all balls in play. So from 3 percent to over 50%, and it was such a change, uh, baseball had to change the rules to, to ban this strategy. It was all data. It wasn't getting different players. It wasn't doing anything, but using data to stand somewhere else before a play happened. So that is so compelling and inspirational that every industry. Hugo: That's Ari Kaplan, Databricks Global Head of Evangelism, and a longtime leader in data and AI. His work in sports analytics helped inspire Moneyball, and Ari built the analytics departments for the Cubs, Dodgers, and [00:01:00] Orioles. He's also co author of the Data Intelligence Platform for Dummies, and has advised organizations from Oracle to McLaren Formula One on bringing AI models to production. Today, we're tackling a big question. What really matters more, artificial intelligence or data intelligence? We get into the AI hype cycle, automation, what businesses still get wrong about AI adoption, and how companies can actually use their data to build a competitive edge. And we also talk about that one time Ari met Travis Kelsey. Before we dive in, High Signal is brought to you by Delfina. Stay up to date by checking out Delfina. ai and our High Signal podcast page, which I'll include in the show notes. Subscribe to Delfina's newsletter to get the latest updates and insights straight to your inbox. Let's jump in. Hey there, Ari, and welcome to the show. Ari: Hey, nice to be here. Hugo: Such a pleasure to have you. And I'm super excited to dive into all types of things [00:02:00] to the current generative AI revolution. We're in the middle of and transformation, but how it relates to. Broader ideas you have around data intelligence, along with all the work that you've done in sports analytics and automation. There's so much to cover, but before we jump in, I thought maybe you could tell our audience a bit about what you're up to and your background. Ari: Great. Welcome audience. So glad you're spending your time with us. Uh, Ari Kaplan, my title, it's called head of technical evangelism at Databricks for. Pretty much the whole world. And that is like an advocate, someone who travels, talks to companies. Here's what they're doing, where they are in their journey of everything, data and AI, what challenges they're having, but also what successes they're having. And then the more I can share. The pulse that we have the better. So I hope we'll spend some time doing that. But with Databricks itself for the viewers who aren't aware, [00:03:00] they're one of the leading, arguably the leading data and AI company. So if you've heard of Spark, that's really where we started over a decade ago. Taking Hadoop and distributed data, making it much easier and more scalable. And importantly, open sourced that that's a whole nother topic. But then we innovated and the whole market innovated. One of the things I love about being at Databricks is there's all different paradigms or epics of data AI technology. And big data was one. That's where we got started. And then lake house technology. So being able to have structured and unstructured data in one environment. And now the phrase I love hearing you say, which is where the world is at now. Data intelligence. platforms. How can companies get more insights intelligently out of their own data? Super easy. So Databricks is like the software that makes it all [00:04:00] work much less expensively, much higher scale. Uh, used to be the president of the worldwide Oracle user group. And back then people would say, you don't need more than a couple billion records. Ever. I can never scratch my head. Think of a use case more than a few billion records. But now you're talking with gen AI trillions of records. So that's where the Databricks comes in. Scalable, easy to use a much less costly. Hugo: Awesome. And I'd just love to know what's one of the most exciting things for you happening at Databricks or in the ecosystem at the moment. Ari: Everyone is shouting gen AI, depending on. Where we are when you're listening, like just this week, Deep Seek came out. We could talk about that, but every week, every month, it's seeming that there's huge innovations, especially around genitive AI. Things are moving so quickly. So that's what I love. I, everyone out there has their different preferences. I personally love being in a place where you're on the leading [00:05:00] edge. It's not too far leading that. A product isn't out yet, leading where it's the tip of the innovation, but then again, you can, once you get people excited, you could actually go back to their office or their laptop and do something tangible really along the leading edge, but yeah, gen AI, the ability to just talk to your own data. And if you're a business versus a consumer and you want to make like a chat bot, you. Want companies want it based on their own data. And that's where I think every company out there wants to, and is racing. How can we do that? And how can we do that securely make a chat bot or some other intelligent extraction thing on your own data, fully governed and. Also transparent. So, you know, what's going on under the hoods. Hugo: I love it so much. And I love that you've also framed it around your own data, because I think once again, we live in a world which [00:06:00] is far too model focused and not enough data focused, at least in the public conversation around things, whereas companies that are executing know that the focus is on the data. Of course you need like a good enough model, but I think part of the point of what we're seeing is. The model you use, we all have access to the same models, modulo details these days, right? So what's going to be your differentiator as a business and what's going to be your defensible mode? And it will be figuring out how to use your own data in products that are informed by generative AI models, whether that's fine tuning on your own data or even prompt engineering or using in context learning, these types of things. So I suppose my provocation is we're going to move to part of this conversation about data intelligence, but Doesn't artificial intelligence actually rely on the foundation of data intelligence in some sense? Ari: Exactly. So the story I like to start out is five, 10 years ago, maybe a lot of people listening can resonate with this [00:07:00] is, You'd be at a conference, you're a consultant, be with a company, and everyone's shouting, we need, we need data, and you have to whisper, there's something wrong with you. Go, you should really look at AI or data science or machine learning. Could really help you beyond data warehousing. You're almost embarrassed to say, but now it's like the opposite. Everyone's shouting, let's do something with Gen AI, arguably before there's a use case or. less focused on value and you have to whisper, you still need good data, solid data as the foundation for it. That's one thing I like to start. And it all is the reliance on how do you get this data? And then you also, I love how you're saying is something good enough. That's where people realize there's mathematics behind AI, data science, AI, want to blend them together, predictive analytics, which is like classical AI and gen AI. And you know, one of the [00:08:00] challenges or one of the art to the art and the science of it all is you're going to get a model or an algorithm where you can predict sales or predict maintenance requirements or predict some health situation. Does somebody medicine help? with prediabetes or not. And you're going to get a number that will say how accurate that model is, but there's still no way to say if 80 percent accurate, is that good enough? Is that, do we need to make it 90 percent work before we put it into production? So that, you know, that's where, you know, you're going to need, you still need business people to make that call of when is a model good enough or not. And you're You know, I'll pause since I want to make this interactive, but like humans in the loop should be, you know, great talking point. How do you ensure that, you know, comparing different models, when is good enough, how do you operationalize it, how do you decide, how do you evaluate it all? These are [00:09:00] all questions companies really are trying to figure out now. Hugo: I couldn't agree more. And one way I, I think about, and people who've listened to me before probably have heard me say this ad nauseum. But it's really one of the most important things is a lot of the time you're doing evaluations at like an LLM level call or a foundation model call at a micro level, essentially. And what you want to do is tie those, all of those evaluations to the macro level business metrics that you're trying to meet. Of course, a lot of people I've worked with and built LLM powered software with may not be doing it for business reasons besides, oh, let's try to use Gen AI so we can please such and such stakeholders. But I also am interested in. Once you're figuring out how to evaluate these things and how they drive business impact, I'm so interested in the way you talk about data intelligence because a lot of the time I've seen and applications I've worked on, you don't get significant lift by switching out the model, but you can get significant lift by improving the chunking in your information retrieval [00:10:00] system or Improving the OCR, the optical character recognition of your PDF reader and these types of things. So having access to the quote unquote higher signal data, a platform that allows you to organam ergonomically, ergonomically use it, such as a data lake house, right? These things, I think, give us a lot more lift than using GPT. Blah, point, blah, which may be deprecated in six months anyway, sorry to be slightly cynical about this, but the focus needs to be on the data and being able to actively use it. Right. Ari: Exactly. So the data intelligence, super exciting times to be in this. It's traditionally you have data, you have rows and columns and in a large company, you may have 10, 000 tables or more hundreds of thousands of columns. It's called MDM master data management. And before long, you have 20 tables that have the word sales in the title, sales underscore 21, sales underscore NE. Is that Northeast? Is that [00:11:00] New England? Is that Nebraska? And you need humans to like, dig through the data. And make it more meaningful, but with data intelligence, you can, one of the best uses of Gen AI I've seen, which is one of the things we have at Databricks called AI BI Genie or Databricks Assistant. And to be honest, I think every major software vendor is going to have something similar conceptual, but the idea is it understands through AI what your data is saying. So it'll see this table has sales that's based in. New England, another table, like when you say East Europe, East Europe could be a little different to one company versus another at Databricks. Our fiscal year starts in a couple days in February versus in January, so it'll understand. Your business. And then when people ask a question, how much revenue did we get in new England and fiscal year 25, it'll intelligently give you the [00:12:00] answer. That's more accurate. And to make that all work. Yeah. AI is going to assume things and infer things. But you always will need humans in the loop to give it the thumbs up, thumbs down, type in more specific definition. So that, that's one of the many great benefits of data intelligence. But then to go to your point, my brain, like I'm, I like to think of myself as like naturally Caltech, California Institute of Technology, and that was our persona is to be curious. So when I don't get something. I will, wherever I travel, ask people their opinions till I get it. And most recently, one of the things that I didn't quite get is how do you measure how good an LLM is or a model is? For traditional data science, there's metrics like terms like R squared, log loss, Gini norm, apologies for being technical if you're not used to that, but it's a number that [00:13:00] says how accurate a model is. There are other metrics along their area under the curve and things like that, but you could see if one model is 0. 8 and the other is 0. 9, one model is more accurate than the other, but with LLMs, if you say summarize this document and it spurts out a summary, it's like really a human opinion of how accurate it is. So I would go around the world saying, how do you measure the effectiveness is LLMs and In the last few months, there actually are metrics now, like what grade level is it? Even what accuracy is, but I was just at a Ford Direct, the car dealer company, and they did their own chatbot as a co pilot for their dealers, all of their dealers around the world. And their metric was, which I agree with. Did the dealer, like how satisfied was the dealer with the accuracy of the responses? No, no actual scientific [00:14:00] number, but that's really all that matters is were the end users happy or not with the results. And they got a 95 percent self grading satisfaction, which is more than anything I've seen. FACSAT is my other example, there were 87 percent satisfaction, so now 95 percent is my new gold standard. Hugo: And that, yeah, that story from Ford is, tells a very important tale. And once again, of figuring out what your macro business level goals are. I'm also really excited that you mentioned summarization. Cause that's something which we actually, we think may be easy to think about. Is this a good summary or not? But I, and once again, looking at data is so important. If you look at summarization methods and just look at a table or a spreadsheet of. Articles and summaries deciding whether it's a good summary or not is actually very challenging and it's subjective, depending on what you want as well, because you can ask, is everything in there relevant? Is it accurate? Are there no hallucinations? Is it [00:15:00] shorter than the original text, which we would require? Is it significantly shorter? And we need to project these down to, is it good as a summary or not? I'm as excited about data intelligence as I am about all the work you did at DataRobot as well. So I'd love to know, essentially DataRobot, among other things, helped automate a variety of steps in the data science and machine learning workflow. And I'm deeply interested in what we can automate and then what we should do as humans, right? So I'm just wondering from your experience at Data robot, how the type of automation you saw there just reshaped the, the general data science workflow. And so which tasks were successfully automated and which still requires human expertise. Ari: Yeah. So, uh, you know, for, for the listeners, data robot was a company that I worked for. Prior to joining Databricks, both have the word data in it, but they did something incredible. They basically democratized machine learning and created an industry called auto or [00:16:00] automating machine learning. So now like one aspect of gen AI is to generate code, generate SQL, generate Python, fix code. And what auto ML does is enables a non technical person. To ask a question, make a prediction of where our supply chain is going to be faulted, like a time series extraction. How might a baseball player do in the future based on the past and until AutoML was created and DataRobot was the creator of this industry, you would have to have somebody right by hand, a Python program, uh, whatever, an R program, language of their choice. Plus, somebody who's familiar with data science, understand 10, 20, 30 different algorithms, things like gradient boosted trees and other funny vocal wabbit names to these models. Linear regression, the most popular one, logistic regression. You would really [00:17:00] need pretty much a PhD in data science. And days or weeks to write all these programs and they figured out, let's just write a program that writes programs, a program that would try all different combinations, things called hyper parameters, which are like dials to turn and try out hundreds, thousands, millions, however much compute you want and come up with what the best approach is for the right question that you're asking. So for me, philosophically, I love automation. Any task throughout human history, automating the wheel, et cetera, was great as a kid. That was one of the things I loved. I loved writing programs that would write programs, writing programs that would write poetry in the form of a Beatles song. I did that in fourth grade and got an A on the subject, but now you're, you're talking saving companies, significant amount of money and through the Databricks world.[00:18:00] Uh, our founders also created something called ML flow, which is, has a component of that is auto ML, but it's all open source, meaning freely available to download and do what you will with that. Hugo: And it also helps you orchestrate workflows and do experiment tracking. And it's, it has a lot of affordances, right? Ari: Absolutely. Something called ML Ops. How do you operationalize ML and AL Ops AI Ops as well? So experiment tracking, like measuring the different dials and like levers that you're pulling and what the result was as you change these hyper parameters. Also you do a model and there's something called data drift where like you're pick your favorite use case. You're predicting sales of a product a year later. The economy is different a year later. Competitors have come out. So you want to potentially redo all of your models when your data's changing. So ML [00:19:00] flow, all help you figure out what data is changing and is it the right type of data that will necessitate you recalibrating your models or not? So that's ML ops. And that just makes it all, it sounds technical, but now you can just have a dashboard of. Let's just say you have 1, 000 models and it'll say these 20 are questionable and a human kind of zooms in and says, yeah, let's rerun these 20. Or you can automate it and just say, if there's more than a 5 percent change, just redo all the models. That's like a whole job. That's like a whole thing now. Hugo: It's huge. And for, in total transparency, I worked on, on, on Metaflow and with the Metaflow team, which is in some ways an MLflow competitor that came out of Netflix in 2029, but it does a lot of the op stack as well. But there are still many unsolved problems. And I, I do think in some ways, it's a shame that the term MLOps was adopted because. It has a hype cycle of its own and [00:20:00] it's somewhat low in the hype cycle currently, yet it's one of the most important things as well, but I'm very excited to see how people build tools and infrastructure to allow people like you and me and people who want to work with data to access data as easily as possible. I am interested in, so we've talked about some of the automation such as hyper parameter tuning and model selection, ensemble methods, that data robot. Has allowed What has been left on the table though, in terms of what humans really need to do with the automation we've seen, do you think? Ari: Yeah. First, yeah. I, I love having, you mentioned the word ensemble. Uh, I, I love, like in theory you mentioned Netflix. Yeah. ML flow. There is like a whole ecosystem out there and the more things that can interoperate and enable companies to pick the best of, depending on what you're trying to do. Ensemble the best of breed solutions, hopefully all under one Databricks platform or under one [00:21:00] platform. But for example, what you've done at Netflix is super cool. Like Netflix, I remember when you had a challenge out there, here's like an algorithm for, you know, forget what it was like. It was either the mail in, it was a recommended Hugo: system for what to recommend people when it was still a mail service. So yeah, yeah. Ari: And it was like open and the winner would win a ton of money, but that was awesome. But you know what? Things sprouted out of that. That's one specific use case. There could be other companies that are really good at predicting driving traffic. Others that are really good at predicting weather. And the best would be this, it's, we're calling it now a gentic guy or gentic systems where you string together different models that are really good at specific tasks and knowing which models to like Lego sets plug in and out. So together you're going to get the best outcome. And to be honest, if I were a [00:22:00] company and I want to look at supply chain, I don't want to reinvent the wheel. To predict weather and if it's snowy, here's the traffic pattern. I'm in Chicago of Chicago interest rates just fluctuated or what have you. I just want other smart people to pick the best and pull it all together. So that's this whole gentic system. Another thing I possibly could have mentioned of what's exciting and new. That's one of the top five hot topics, but yeah, I think the role of what couldn't, can't be automated. This is an awesome one. I come from the sports background and was one of the earlier people bringing sports analytics to teams and they all, the conversation was, what can you measure? What can't you measure? And then knowing what you can measure, what can you automate? What can't or what shouldn't you automate? And it becomes more fascinating when the use cases get more towards human [00:23:00] behavior. And what I call adaptive systems, adaptive system could be buying ice cream. If it's warm out, you're probably going to buy more ice cream than if it's cold out. If it's, it's called elasticity in an economic standpoint. If you are going to buy Band Aids and your kids scrape their knee, you're going to, you may pay 15 for Band Aids at a 7 Eleven or gas station. And when you don't need it, you probably are going to shop around a little bit more. And then in sports, like a lot of the human behavior is emotion and how it's called the hot hand or the hot hand fallacy, which is not really a fallacy. But if you are shooting and missing, are you going to continue to shoot miss? Or if you're shooting in free throw baskets and you keep making it. Are you going to regress to your natural mean, or is that a real thing for the game? So that's where things really shouldn't be automated is a lot of that [00:24:00] human element. That being said, when I worked in sports, developing these expert systems, I would work with the end people, the people that know the industry, players like Rod Carew, Oral Hirscheiser, Kirk Gibson. They would. Sit with me and say, I noticed this picture would throw inside, then outside. And if it's a pattern I could repeat, then I could write a program and repeat it. And if it wasn't, I couldn't repeat it. So be it. Maybe I'd get 20, 30, 50 percent of things automated and that was not perfect, but it was way better than the prior state, which was doing nothing. You could get meaningful things and when been with some organizations that have won many championships. Hugo: I actually love that you mentioned ice cream as well, because I recently had Eleanor Graywall on the podcast, who she ran the data function at Airbnb from the seven years and like first data science high, I grew it to 200 people. They're really amazing [00:25:00] work. She's moved back to new Haven, Connecticut, where she grew up. She's teaching part time at Yale and she's opened an ice cream shop. And we discussed this. I asked her in our podcast, what learnings from Airbnb can she implement in terms of analytics in an ice cream shop? And of course, it's a very different game. But one thing she mentioned is, it isn't the only the absolute temperature that impacts How many people will come to our ice cream shop? It's the absolute temperature with an element of the differential. So even if it's below freezing, if it's several degrees, or let's say, yeah, 10 degrees higher than it was the day before, even if it's still below freezing, she'll have a line out the door, which I think is super fascinating. So it's the Delta that's important there. I also love that you mentioned, and I'll link to that in the show notes, that you mentioned all the agentic things happening, because not only do I find this stuff exciting, I do think people are saying 2025 will be the year of agents. And I think in some ways that will, I think probably it'll be the year of [00:26:00] finding out a lot of failure modes and how to build more robust software with agentic principles and perhaps having more business logic and guardrails and structured workflows. But I also think 2025 will be another year. And what I'm getting at is in a lot of ways, 2024, and I'm not the only one to have said this. And you've hinted at it in a variety of ways was the year of multimodal models, right? And we've seen a huge influx of text to image to text to speech, image to video, text to video, all of these amazing things. And I think 2025 will also be the year of incorporating multimodality into product. And something I'm very excited about is how that ties into your concepts around data intelligence and how we can have easy access as data professionals to. Spaces, whether it's data lake houses or otherwise, where I can access PDFs and videos and have them retrievable and have them versioned and all of these things. So I wonder what you can tell me about what data, how data intelligence can help me with this way of thinking. Ari: Sure. [00:27:00] First of all, fundamentally, from my heart, from my experience, multimodal or the more variety of data that you can get for. Getting some insight, a prediction or a summary or a classification, the more types of data generally speaking is going to give you. A better response, a more accurate response. So that's one of the, Hugo: one of the five V's that we ended up with a decade ago. Ari: Yeah. Yeah. You Hugo: have the big data. Yeah. Ari: Awesome. We're twins as well, Hugo: in some ways. Yeah. Ari: Yeah. I'll have to, I have the same glasses. I'll have to find the other, I have in my bag in the other room. But yeah, multi modal structured and unstructured data. So like that ice cream example, temperature is structured data. Sales store location is structured unstructured might be traffic information, weather, topography, like images, videos, [00:28:00] PDFs, call logs from their support center. And so the variety of data doesn't always make the model more accurate, but the way it. Data science works is generally the models will just ignore data if it's not helpful. So it's called feature importance. Feature is another word for a variable or a column in a spreadsheet. So it'll tell you, like in baseball, if an umpire calls a ball or a strike, you can make, add new data, like what's the eye color of the batter? And that, my guess would be, would add nothing to the model since the umpire Isn't even looking at the eyes of the batter, and it doesn't affect the batter's swing or anything. That's one example, but if you add in, I don't know, how much the ball is spinning, that probably affects things more. So that's like the term of multi modal. When you start adding in PDFs, and PDF, it's not the PDF, but it's [00:29:00] AI that extracts like the text from the PDF, other insights of the PDF, how many pages, how many illustrations. Like looking at YouTube videos, looking at breaking it down, like every single Curb Your Enthusiasm episode, what percent of Oscar winning films were like, what percent of time was a male versus a female voice, anything like that, those are more types of information that would make predictions potentially more effective. But the challenge until data intelligence and until the lake house was you would have one system, one platform. And that would store structured data. So think Oracle, think data warehouse, Teradata, et cetera, et cetera. Numbers, columns, simple text. And then you would have a whole different platform to have a data lake, which are the unstructured video images. Word documents, very [00:30:00] different platforms. And you would have this terrible fragmentation, two different logins, two different governance, two different audit trails. You'd have to copy data back and forth to look at things. The AI wouldn't really know what to do with unstructured. So that's where, I don't know, six, seven years ago, the lake house, which is a play on the word. Data Warehouse and Data Lake was formed where it's just one platform, one of everything, no need to move data back and forth. So that was the, that was everything that was like the leading edge. Databricks did come up with that term and that, that marketplace, but we don't own that term. It's like public domain. So ton of companies out there, Snowflake, Microsoft, Databricks are like three of the big ones. Uh, others are coming in, but. Uh, about a year and a half ago, that data intelligence paradigm came in, which we started scratching the surface on what it is, but that the ability to [00:31:00] make sense of that multimodal data for you intelligently is incredible. It's almost like magic. You ask a question, how should I price my New Hampshire ice cream on a month by month basis? And it will look through billions, trillions of records if need be. And in plain English or plain, we're talking before about Finland, Finnish, whatever language give your insights. So one example, and then I'll pause going back to that for direct what they're doing now, aside from a chat bot is they're using data intelligence for each one of their agents, like the dealerships to have a PowerPoint generated. And just the first two slides are like summary information. Here are your. Top selling cars. Here's, you know, uh, when you should be staffing more or less, they have multiple billions of unique customer journeys. Like every customer interaction is called multi [00:32:00] touch attribution. Way too much for a human to, to understand to that level, but with data intelligence, it whittles down to a PowerPoint that. Like a dealer, a car dealer who shouldn't really care about technology just gets these insights of what are their strengths in their dealership? What are their weaknesses? What are things that are changing? Are there new cars coming down the line that they should consider like swapping out and that's data intelligence and it's like magic, Hugo: super cool Ari: production. Hugo: Yeah, totally. I'm actually glad that. We keep referencing PDFs because I think in some ways people may think that their ancient technology is almost like stone tablets these days. But in terms of multimodal content, they can be incredible. They are almost all the time incredibly rich, I think. Ari: Yeah, I am personally, like even outside of Databricks, I do some human rights work and there's so much information from government and NGOs that are [00:33:00] in hundreds of thousands of PDFs in English and Cyrillic, even a handwritten, and I'm using Gen AI to number one, translate it. It's doing a really good job if the wording is like tilted, if it's blurred out as you just see, and if there are things written in the margins, it's doing a really good job. And so I could upload 10, 000 pages and just say, write a summary or where does it mention this phrase or what does it say about X, Y, Z? And it's brilliant. It's changed my life personally. I'd spend years working on this and then all of a sudden, seemingly overnight. I'm able to, like putting on glasses, read through these languages that I, I, I'm not good at other languages. Hugo: It's incredible. It's, it's definitely such a fascinating time to be alive and to be working with such technologies. I do want to take, I want to move from the present, take a slight detour through [00:34:00] history, Ari. I'm so excited with all the work you did in your analytical and scouting experiences, essentially innovating a lot of things in Major League Baseball. And. You're always very humble about this, but to be clear, everyone listening, what Ari did was a big inspiration for Moneyball, the book and the movie. So I'd love to just hear a bit about that, what it was like to be doing analytics there at that time, and get your thoughts on just the idea of sports analytics and sports relation to technology in general, whether it's AI, analytics, machine learning, as a leading indicator of where other industries develop. Ari: Yeah, it's when I speak, I bring that up a lot just since so many people are passionate since it's a Not just the technology, but it's like a cultural shift that revamped a whole industry, an old industry, a hundred year old type of industry. So a lot of companies resonate. Ford don't want to keep [00:35:00] bringing them up. A hundred whatever plus year company, financial institutions, Wells Fargo, anything. How do you take companies that have been around a hundred, 150 years or more, or industries that have been doing something one way. And perhaps improve upon it. And I'm not one to say make a change just since it's cool to make a change or use data just since it sounds exciting. But in many industries, especially if you're dealing with humans, dealing with sales, dealing with marketing. If you're dealing with your internal cyber security, like those are things pretty much every company is dealing with. Call centers, consulting companies. Anyone that. As employees, those are all different use cases, and I don't care how old the industry is that that are great for AI being one of the first. It's pretty wild. I was one of the first people told me, and if anyone in the audience can come up with more people, but one of the first [00:36:00] six people to have been employed in a degree that you might call an analytic person at a sports organization. So One of the first six in the 1980s with the Los Angeles Dodgers. Now that's an industry with tens of thousands of people worldwide just blows my mind. It's something that I was passionate about. I remember the super duper early days and what I loved about it is we were super early, but I was able to find very actionable insights. For example, bring this relief pitcher into a game in this situation. And they're going to very likely be more effective against their opponent. They may be four times as effective as bringing this person's four times as effective as this person in that situation. And that's something a manager could agree with. And also think it was fortunate. I had enough patience and I wanted to be [00:37:00] collaborative, so I wanted them to give me their thoughts. Where was I right? Where was I wrong? Where, how can I improve things? So they love that. And that continues through to this day since I continue to help sports analytics people on the side. But early on, since we're able to show with incredible value, like the Dodgers won the World Series in 88 and they made the playoffs a couple of years after that. That we actually, as long as I had one champion and that I was working for that team, you almost didn't want competitively the other teams to hear, get wind of what you're doing. So the general manager, Fred Claire, actually just talked with him or texted with him yesterday, embraced it, loved it. Some of the players, uh, embraced it, loved it, especially the pitchers of how they could, what were the weaknesses of their opponent that got adopted really well. And then there's this great period like the dark [00:38:00] ages where there's like a lot of animosity towards analytics so much so. That some organizations, they wouldn't even meet with me. They just shut me out or shut analytics out before they even knew I had a chance for me to explain what it was that they were shutting out. So they were just shutting out the concept of insight driven, but the few organizations that would embrace it. Hey, if I was aligned with that's fine and just shut out the negativity, but now it's flipped and the games change so much. One example. Is it's called shifting in baseball, where if you stand this area, the ball on the ground is going to hit you 80 percent of the time. And if you stand in another area, it's going to be hit you 20 percent of the time. So where are you going to recommend standing? There's a great book called big data baseball. about the Pittsburgh Pirates trying to implement this, even though the evidence was three, [00:39:00] four times as likely the ball to be hit here than here. The players took months for them to eventually change what their gut told them, even though it was the manager Clint Hurdle pushing for it. But I worked with the Cubs when we started implementing this. Only 3 percent of all balls in play across baseball incorporated this defensive movement of people. Only 3 percent and a few years ago it went up to over 50 percent of all balls in play. So from 3 percent to over 50 percent and it was such a change. Baseball had to change the rules to ban this strategy. It was all data. It wasn't getting different players. It wasn't doing anything but using data. To stand somewhere else before a play happened. So that is so compelling and inspirational that every industry, uh, now can, can learn from that. Hugo: Absolutely. [00:40:00] And it's so interesting. And I am, this has been happening for decades and decades now. So I'm wondering where we are today and I'll include this video in the show notes, but maybe We can start. I did watch a short video of you speaking with Travis Kelsey about analytics. So firstly, wow. And secondly, awesome. And thirdly, Travis Kelsey said something very interesting. Which is, he said several things very interesting, but he said, If you break down anything to microscopic details, you can be a professional at whatever it is you pursue. So perhaps you can speak to that, particularly with what's happening across sports now. Ari: Yeah, I'm glad you watched it. If you give the link to the video, let people watch it. That's one of the videos that kind of went viral since His girlfriend is Taylor Swift. I'm sure the audience, most of you know her fame, but yeah, they're, they went to the Superbowl or they're going to the Superbowl again. Uh, three, third time [00:41:00] in three years. We'll see if they win or not, but yeah, the. Hugo: Oh, and just quickly, we'll put this episode out just after the Superbowl. So maybe you can give me any predictions and help me make some money in the next couple of weeks. Ari: Yeah, I don't, I, yeah, so I'll start there and then I'll go backwards, but I don't always like to predict and in fact what, like in a single game, anything can happen. So things that I look at are defense to offense, matchups, special teams, but then did different key players on either team get injured? Are they likely to be in or out? Adaptability, how they play in certain weather conditions, although New Orleans is warm weather, but the Eagles are a great team, but the Holmes and Kelsey and like the team around there has intangibles. So I would favor them. The Eagles, they just, they crushed their opponent in the last game. They seem healthy. So anything can happen, but the [00:42:00] ratings for the NFL are going to be super high. So that's good for the, the money making aspect of it. Travis Kelsey, that was a fun interview. One of my stints in life was a sports journalist. So I've done things, Shaquille O'Neal and all sorts of great insights, but really want to know what goes through the mind of players during the game. With what you said, breaking things down into minuscule details, absolutely something every industry can learn. And. It goes to an approach of aggregating data versus disaggregate data. So do you want to lump things together if you have small sample size? Probably lump things together to get a bigger sample size. But if you can disaggregate information and look at each player individually, Or, uh, it's called hyper personalization in marketing. Look at each one of your million customers, or 10 million or 100 million, as individual and make a model [00:43:00] or make a prediction for that individual based on their personal consumer journey. Then you're going to get a much better outcome. For example, I worked with one of the large retailers and they did this hyper personalization prediction for each of their loyalty customers. And before that, they only had 12 personas. They meant from 12 to tens of millions, and they got three times the purchase frequency since their ads were for the right people, the right product, the right discount. You also want to avoid discounting. Someone's going to buy the ice cream anyway. Don't discount it. You're just selling yourself short. So it's the increased sales, but also it's at the right price. And that Leads back to the Travis Kelsey, miniscule details, what it translated for him. I know I edited, I had a longer interview. I edited it down a little bit, was he would look at every one of the opponents and [00:44:00] see it's either in the right arm or they're going to catch a ball and tuck it under the left arm. And he memorized every single opponent and would immediately go to the arm before they even catch the ball, knowing that he would impact it right after the catch. And try to knock the ball loose, or potentially intercept it or deflect it. And that's a minuscule detail. Since he's not looking at how do the Eagles and his brother playing on it. He probably knows his brother, but what do each player does, not the team as a whole. Hugo: Amazing. We are going to have to wrap up soon, unfortunately. But I would love to, just with all your experience and everything you've worked on. I'd love your advice for data science leaders and practice practitioners. What, what skills do you think will be most critical for data scientists and leaders to remain competitive in the next five or even 10 years? And once again, prediction is a tough thing, but we have both worked in, we do both work in [00:45:00] machine learning. Ari: Yeah. You have the only skill five to 10 years from now is super far ahead. We'll see if like general intelligence. Is even a thing, quantum computing, see how far along that will get. But one thing is, it's going to be a completely different world for software developers, for people with, uh, data science skills, people with programming skills, totally different world, even three years from now and people, humans like to think of things linearly, how things progress, data warehousing improves 10 percent per year. Gordon Moore, one of my friends before he passed I, I was in the Go Gordon Moore's lab and traveled around with him. But anyway, he did the Moore's Law. That was like a linear, things are doubling processing power chips are getting half the size every 12 to 18 months, but with software development and ai, you're, this linear thing occasionally jumps up.[00:46:00] I wouldn't say deep seek, but the concept of age agentic systems. Bringing costs down 90%, 80%, even 50%. Things are going to get faster and cheaper. But the recommendation is you have to have the mentality of constantly learning. You have to have the mentality of put your hands on the keyboard and try stuff out. Try as much as you can out, even if it's not part of your job on the side, just so you could stay on top of things. Go to conferences, to network, to listen to this podcast, listen to your other episodes I'm excited to hear from, but be a constant learner. But my, my personal thing is like the actual software development, like the entry level, and I even tell my own kids this as they're going into college, a lot of that, possibly all of that is going to be automated, but you're still going, but that'll up level. People. So the boring, repetitive parts of the job [00:47:00] will be automated. So if you could focus on understanding the business, understanding translating business, like to guiding these autonomous, like intelligent agents to be the human in the feedback loop. That's where I would put my energy. So continue learning, learn the business, and then even math and probability. I don't think we'll, I mean, it will be automated, but you still will need a human to enact something. And I'll, I'll, I'll like, you know, leave this topic with this thought. Like I worked at Nielsen, I was VP of marketing analytics for the Midwest. And the one example, there was like a junior data scientist. He made an algorithm that recommended one of our clients. There's Walmart, there's Walgreens, there's Sam's club, and. Mathematically, Walmart was not as profitable as Walgreens. So the recommendation was to our client, don't sell your product in Walmart. [00:48:00] And mathematically, yes. But then the customer, thank goodness I caught this beforehand, but the customer had a five year contract with Walmart that even if they wanted to, they couldn't cut that. The data scientist doesn't really become aware of it. And to be honest, being in Walmart, An incentive isn't just profit. It could be volume, getting volume for an innovative product to bring it out, bring a new brand into market. So stuff like that is like understanding business and translating it into whatever the technology will be five, 10 years from now is where I, where the jobs, and to be honest, where the fun part of the Hugo: job is. And I'm super happy you've mentioned, and we've talked about a couple of times. Generative AI systems that do things and automate things, whether we want to call it agents or not, I'm not, I think agents is a continuum or a multi dimensional space, but these types of systems that are equipped with tool use can automate things. I think there's a huge amount of economic value to, my only [00:49:00] point is that, and I think this is part of your point as well, the majority of economic value from LLMs isn't going to be through chat interfaces, it's going to be through automated processes, right? Because. The ability to chat with a system only scales linearly with human time, right? So it's clearly not going to be the biggest win there. I also Love that we're able to talk around and about data intelligence. I would love to know, I'm going to link to your book, the data intelligence platform for dummies. I'll put that in, in the show notes, but I'm just wondering for anyone out there who's thinking and working in this space, what is one takeaway you'd like them to leave with respect to data intelligence and leveraging data? Ari: Yeah. Yeah. Like one thing to. To do is every company is going to want to get intelligence from their own proprietary data. They're going to want it democratized so that non technical people and technical people will be able to just talk with their data and get highly [00:50:00] accurate insights, but also in like the context in the language of the business. So that's like the summary. And then the one detail in the language of the business. Baseball can of corn means an easy to catch ball can of corn in retail is a can of food product You want the word deceptive is awesome. If you have a deceptive pitcher in baseball, that's great But if you have a deceptive financial client, that's probably terrible in the finance world People want it trained on their own Context in their own language. So that's one takeaway and that everyone listening I like to call it not generative AI, but generation AI. So everyone, Hugo, you, me, everyone listening. We are like just at the first inning, second inning of this gen AI world. So every one of us, I have no different leverage than anyone listening. Now, a single person can make a billion [00:51:00] dollar company or a handful of people. We're all like part of this conversation. Maybe like the final takeaway is. We're all generation Hugo: AI. I love that. And I do want to go one step deeper before we wrap up because we do have um, a highly technical audience. I am interested. In what type of processes and tools you think organizations can adopt in order to make sure that their data intelligence platforms, uh, as accessible as possible. I want to be in a flow state with my data. So one example is when looking at LLM application traces, and I did a live stream on this yesterday, actually. I don't even want to be working in a notebook and writing pandas code to do pivot tables. I just want to be in a spreadsheet, man, because I can get in flow with my data way, way more easily. Some people may prefer it in writing pandas code. I find that a bit pathological. I love pandas, by the way, but writing pivot table code, not for me. So I'm wondering [00:52:00] what type of processes and tools organized and culturally people can adopt and organizations can adopt. So people can just have access to the data they need in order to build the cool stuff they want to. Ari: Exactly. And yeah, I'm totally with you. I love this new company, newish company Sigma that takes Excel spreadsheets and does Excel like stuff, but to like billions or trillions of records and pivot it and so on. But take a look. I love the, what we call AI, BI genie, and this enables you, I'm going to make a video on this to make, for example, a chatbot or make an AI dashboard or whatever it is, a summarization tool, anything. Like Jenny, I can do on your own data and my video is in 60 seconds or less. It's not like companies are spending months. Companies I see are spending 12 months to try to do some gen AI. But if you already have the data, I'm literally 60 seconds, no catch, no, like behind the scenes, I did Panda, [00:53:00] Python stuff, and it works really well. And then more than 60 seconds, you're like iterating on it, making it better. ABI genie. Take a look at that. It's the start of what I think will be for many years to come, a lot of vendors doing things that are similar, but that's super awesome. Super easy to do, but, and you also mentioned more technical people. It's has components for co pilots where you're writing those pandas and you can just say, I want to do a pivot in Python and it'll help write it for you. Co pilots, AIBI Genie. All of that's awesome. And then one other thing that's now open sourced is Unity Catalog, which is like that whole governance layer. And when I, of all of your data, when I say governance, like your lineage, your monitoring, sharing of data, auditing, access control, transparency. So that's called Unity Catalog, which open source and [00:54:00] super excited to use that. Hugo: Awesome. I also love that you're suggesting tool. I'm highly technical, but I love using tools where I don't need to think technically. And actually, I don't want to get into this too much, but pre Salesforce acquisition, Tableau was one of my favorite tools in the space. It allowed me to do stuff without writing code super, super easily. Um, As a final question, I am interested, a lot of our processes become more automated. Where should we put human judgment? What role will human judgment play? And how should we all organizationally adapt to this shift? Ari: Exactly. So I love workflows. That's like the plumbing underneath. So this is like data engineering. Hugo: It's Super Mario, dude. Ari: Do do do do do. Yeah, I Hugo: did it. I did it. That's Ari: it. Yeah. And. To have repeatable automated workflows is incredible. This is the part we started the podcast on [00:55:00] of how do you get data to be good. And there's different philosophies, something called the medallion status. You bring it from like bronze all the way silver to gold. Like how do you ETL, how do you extract transform load data in and like the newer version of that is like this. Terminology called Lake flow. It's a data intelligence driven data engineering workflow. So what I mean by that is. How do you merge data together? How do you, when data is being loaded, detect outliers and address outliers? And the key thing you said, human in the loop. What is an outlier? If a baseball player is negative three feet tall, that's not a real thing. But if the temperature is negative three Fahrenheit, that is a real thing, especially here in Chicago. But how do humans decide what the rules to set are? Who does who like the casino dealer, you have the pit bosses, but who's watching the pit bosses, who's watching the [00:56:00] casino managers, who's watching the owners, who's watching the government. So same thing for anything data AI workflow driven, but yeah, you want. To automate where it's automated and have humans monitoring it all and especially important backing up to when you said the next thing will be automated decision making, you have an insight, pull a product from a shelf, ground a flight, things stop, pull a self driving car over to the side of the road. Like how, who gets to decide that and it's from ethics and so on. If you're going to decline an extreme case, someone, I don't know, getting a loan out or something like that. You want to automate the obvious cases, but then have humans in the loop that there could be a good reason why you make exceptions right now. It's on a company by company basis, government starting to international bodies, [00:57:00] look at certain regulated industries, but. Like when in doubt, especially for automated decision making have humans in the loop, for sure. Hugo: And I actually love that you mentioned outliers in there, because it does bring us back to, there are technical definitions of outliers. So and so past the interquartile range, or this many standard deviations away from Blurrg, but once again, an outlier really depends on your business problem or what you're trying to solve as well. And it brings us back to your wonderful point that the question is 80 percent accuracy. Good. It doesn't even make sense as a question by itself. You always want to frame it in, in, in business context. And even as we automate a lot more decisions, my personal take is that human judgment for a long time, play a significant role in how we're building businesses, what we decide to deploy these systems to, and what problems we're trying to solve as well. And I'm really also excited. As I mentioned, I code for, I write code for a living. I'm not one of those people who necessarily, I like solving problems. [00:58:00] Like, I don't, I'm not one of those people who actually enjoys writing code that much. And I'm actually far more excited about building and designing systems now that can solve problems instead of getting stuck in the weeds of figuring out how to one hot encode something in pandas to put it in a NumPy array to then put it in a scikit learn fit and predict pipeline, right? Ari: Yeah, a hundred percent. Hugo: So it's such an exciting time for this entire space. Ari: And I think what we both share is we love coding, but we love the innovative creative part, automating the boring, repetitive, time consuming stuff. So we can get to the fun stuff is what it's all about. Hugo: Exactly. Ari: When I say fun or business impact. Hugo: Exactly. There's a lot of fun to that, depending on, hopefully you work. Hopefully we all, and I know we don't in the modern era so much, but hopefully we all have. Opportunities to work for places and solve problems, which we consider fun as well as grueling sometimes that's my aspiration. Ari, thank you [00:59:00] for going out there and doing all the work you do and then coming back and having conversations like this and bringing your wisdom from the field. This is decades and decades of incredible stuff you you've been doing. So I really appreciate your time, your wisdom and your generosity in sharing it with all of us. Ari: Hugo, thank you for having me on High Signal. So it's been a lot of fun, great conversation. Hugo: Thanks so much for listening to High Signal, brought to you by Delfina. If you enjoyed this episode, don't forget to sign up for our newsletter, follow us on YouTube, and share the podcast with your friends and colleagues. Like and subscribe on YouTube, and give us five stars and a review on iTunes and Spotify. This will help us bring you more of the conversations you love. All the links are in the show notes. We'll catch you next time.