Happy 100th The Artists of Data Science_mixdown.mp3

Harpreet: [00:00:05] What's up.

Harpreet: [00:00:06] Everybody? Welcome. Welcome to the arts of data science. Happy hour number 100. It is the 100th iteration of these happy hours, man. Thank you all so much for sticking with me. And honestly, thank all of you for just being part of it because I couldn't have done it without you. There are so many times where I just couldn't make the happy hour pop off or, you know, secretly I was burnt out and I just made up an excuse and reached out to someone to take over. And they did. So this literally would not be at 100 without you guys, without your participation, without all your helping and, and, you know, taking over the the, the ones and twos for me and hosting some of the people who have taken over as hosts are here in the building. Shout out to you guys for for helping dam number 100 happy hour number 100 man I can't believe it it's actually been about two years. That probably is right around the two year anniversary since I did start this thing. But, you know, there's some missed missed weeks here and there. But I can't believe it's been two years already, man. I started this at the height of the pandemic. During the height of the pandemic, I remember this room had like 50 people in here at one point consistently for like three or four weeks straight. There's like 50 people, 50 people. And then the pandemic came and you guys still still came through and still hung out with me, man. So I really appreciate that. So pouring this one out for y'all right here.

Harpreet: [00:01:24] Cheers, y'all. Y'all have seen me drink a hundred beers for sure, at least. So thank you all for being here, man. I appreciate y'all. A couple of big announcements for today. I just launched a course. It's. It's a course that that I've been working on in the background for a very long time. I've just been dragging my heels on it. It's a course from 365 Data Science partnered up with them to release my course on how to think like a data scientist. I know a lot of people here is that actually have test driven that course. People have given me some great feedback to help make it what it is. Special shout out to [00:02:00] to Vin. Vin gave me some really thorough feedback on this course while I was making it last year and he was also one of the early reviewers. Eric I think you might have been as well. I know Mark was a huge reviewer, so it's Matt Blosser. So thank you all so much for, for the support you did with that. Here's, here's a link to the course, right? It's cool because it's got it's funny and you could tell how old these pictures are. They're not really that old pictures, the year old, but it's me with my my fake dyed beard. How to think like a data scientist. So then you can become like one. So that's the course guy here doing an industry overview, talking about how to be your own manager, talking about how to think like a scientist. There's more modules coming that are just undergoing editing and some that need to go through reviewing.

Harpreet: [00:02:43] There's going to be one called Think Like an engineer and one called Think Like a business person right here through 365 Data Science. Check it out. Special shout out to Kenji for setting me up with 365 data science. Shout out to to Ned CO for being so accommodating through the entire process. Check it out. I can't believe it. Like there's people on this website like Bernard Ma has a course, Kenji has a course. Even Tina. Tina Tina Wong. He has of course, master of the last name there. But yes, shout out to all of y'all. Thank you for being here. I don't know what to kick off the stream of fan. I do not even know where to take this. Kristen's Kristen with the with the hat. Oh, man. I should have. I should have. Should have came through with the hat. That looks amazing. Thank you so much, Kristen. Appreciate you being here. So, look, man, I'm taking questions on anything and everything. So if you're tuning in for the first time or if you're joining for the first time, please do let me know whatever questions he got. We watch it on LinkedIn and on YouTube. If you guys have questions, do let me know. I got a question, man. Like like some people have been here since like the very, very beginning. For example, Eric Simms has been here, I think, at every happy hour since, like maybe even I don't know, number one, I'm curious, man, how the hell did you find out about the data science Happy [00:04:00] hours? How did this pop up on your radar?

Harpreet: [00:04:03] Let's see here. I remember when you started the podcast in general. And then there was happy, happy hours off. Sour stuff kind of started. And I would listen to I would listen to the podcast while I was like, cleaning on Saturdays usually. And then then I remember Vin was on everything. I was like, Oh, dang. Like I could like, get on the happy Hours and, like, ask frigging Vin a question. Like, that's crazy. And then I don't, I don't even. And so I decided, like next week I would jump on. I don't think Vin was there, but you and Dave Knickerbocker, maybe Dave Langer, somebody was there. And so I was pretty much from that point, I was just hooked because I was learning about like hyper parameter tuning and a bunch of stuff I didn't really understand at the time. Sometimes still don't understand, but it was just a place to go and ask questions and it's just helpful. And so yeah, that's kind of how I heard about it. And I just like sticking around.

Harpreet: [00:05:03] Yeah, I still can't believe this just started showing up to these things. You all. I think I've said it so many times, but he's like the first actual influencer of mine that I've came across in the data science space back in like late 2017, early 2018, Vin was one of the first to to, to get me hyped up and excited about the field. So thank you, Vin. Thank you for being a fixture here. Thanks for being part of this. Let's open it up for questions. Man. Oh, wait. All these in the building. Huge shout out to Ali if you guys did not catch the kind of like the meetup thing I was doing yesterday about deep learning. I'll give a presentation talking about the business value of deep learning, how to derive business value from deep learning. So thank you for being part of that. Ali. That was a great session that we had had talks from Jasmine, Henry, Re Mahmood, Someone Khaliq and Mona Reynard. Topics ranging everything from challenges to deep transfer learning. [00:06:00] Manu gave a presentation on how to use your interaction with mouse data to identify you using like a crazy triplet loss function. It was great stuff, man. Speaking of like deep learning and cool projects, somebody here is doing a deep learning project and has a podcast called Cool Project Show. Chris And thank you for being here, part of the Happy Hour. Talk to us about your, your project that you're working on the, the system where you have like a bus and the bus drives by, you take a picture, you get a text message. Tell us about that.

Speaker3: [00:06:34] Yeah, absolutely. So I'm actually really fortunate that there's not a ton of traffic on my street, although I'm not really getting any false positives anymore. But basically the bus has to drive by my house and turn around and come back before picking up my kids at the end of the driveway. And so and it takes about 6 minutes from when it passes first to when it comes back. And I know that because of the computer vision model that I built. But yeah, so I detect the bus and then I it sends a text and I see a picture of the bus so that I make sure that it's because in the beginning I was getting a lot of false positives. But so I can see it actually is the bus and the time that it came. So it's I'm it's alive and I've been using it and right now I'm looking at a graph I made of the different times to try and find out the average time that the bus comes during the day.

Harpreet: [00:07:35] That's pretty cool, man. It's a dope system. Very useful as well. Eric, do you have any questions for it? Because I know Eric was super interested in this project as well, so I'll let Eric take over chats going crazy here, man. You all don't see all these folks coming through the chat. Thank you all so much. Shout out to Mark Freeman in the building. I like aunty's background. It's just the 100. What is that like the speed limit sign that you guys got.

Harpreet: [00:07:58] In meters per hour, I.

Harpreet: [00:07:59] Guess. Kilometers per [00:08:00] hour? Yes, it is. Go for it, Eric.

Harpreet: [00:08:03] I don't think any major specific questions. I've just loved seeing the posts and like your post today with like the two kids under the umbrella, like only having to wait for like a minute or whatever, cause the bus was like on the way. I just thought that was so cool. So I've really enjoyed seeing it. Does the bus driver know that he or she is being surveilled?

Speaker3: [00:08:22] And actually I felt like such a creep annotating my neighbors cars when they drive by because like, I know who the person is and I'm like, I just feel like a weirdo. I'm annotating, but it's got to get that data.

Harpreet: [00:08:39] Is he is the entire project up on GitHub at the moment.

Speaker3: [00:08:43] Yeah, it's on GitHub.

Harpreet: [00:08:45] Nice. I think I might play around with that, pull it down and play around a little bit because I've got like I don't have like a camera from my house, but I've got a Vivint home security system and I was trying to find ways to hack into that, but apparently that's quite difficult to do. So if you are listening.

Speaker3: [00:09:01] My camera was only 60 bucks I don't even know how to pronounce. It's like a well known brand. So I'm going to sound dumb, but it's like.

Harpreet: [00:09:08] Okay.

Speaker3: [00:09:09] But yeah, $60 camera on Amazon, like had no problem setting it up with open CV and you know, like if my first night playing with it, I was watching myself on camera and Yeah.

Harpreet: [00:09:24] Talk to us more about that hardware setup. So you've got you got the camera. Is the camera just streaming to your computer? Is it like connected to like a Raspberry Pi or, you know, Jetson?

Speaker3: [00:09:33] Yeah. No, it's my it's my computer. I actually. Yeah. So I'm using my camera with open c V and then our TSP. And then so I have another desktop over here that has a couple of GPUs. And so the only reason why I'm using my own hardware is just because I own it. And that sounded like [00:10:00] something that I wanted to play with. And then the text alerts I set those up on. Rw was.

Harpreet: [00:10:07] Nice. I'm definitely going to be following along with this project and trying to recreate it. Thank you very much. So you shout out to everybody in the in the in the room right now. Megan Lou's in the building. We got Nick Singh, Greg Gill, also Vin Mark Santa monica. Monica. Good to see you again. It's been quite some time. Serge in the building as well as Russell, Kristin and everybody else. Question Coming in here on LinkedIn. It's from Sangita. Sangita saying some really nice things and Vince insights are awesome. Has been following his content. Eric Sims Data skills are awesome. Yes they are. Both of them are awesome questions Sangita has is What's your view on pushing the ML pipeline step further? First firstly, got started with getting a pipeline for just solving the problem. What's your insights on how we can gauge our ML practices in a company? How we can push further and to visibly notice and improvement in the ML practice in a company? In a nutshell, probably ML Ops is probably the direction you're headed, but let's hear from Shantanu on this one and if anybody else wants to jump in, we'll go to Shantanu and then Mark and Kristen, if you'd like to as well.

Speaker3: [00:11:25] So how do we go beyond building ML pipelines to making ML practice more of a thing of the company?

Harpreet: [00:11:32] Yes. What it looks like. I'll read the question again. What's your insights on how we can gauge our ML practices in the company? How can we push further to visibly notice and improvement in the ML practice in a company.

Speaker3: [00:11:45] That's always going to be impact, right? That's the only way you can measure how how impactful a practice practices is being or a team is being to the business bottom line. So instrumenting [00:12:00] the ML pipelines, making sure that we're building metrics on on what's going on and then tracking those, that's sort of so that's to two sides to that. One is ML Ops operationalization and then sort of tracking the actual ML performance type features. But I'm also talking about tracking the impact of what that ML is doing on the business. So that's you're more traditional, is it driving more user activity on my site or whatever my goal is going to be and using that data to back up and evangelize. Ml Practice across a company because I read someone posted on LinkedIn today like data or sorry without data, some your opinions or just opinions or something like that. Right.

Harpreet: [00:12:50] So absolutely love it. Thank you so much, Mark. Let's let's go to you.

Harpreet: [00:12:57] What's up? I mainly just here to share this this amazing resource was brought up multiple times and then I saw this person speak at a meetup. You heard Goku. He has the made with ML website and I would highly encourage checking that out, especially because they have a.

Speaker3: [00:13:14] A.

Harpreet: [00:13:14] Section on testing where they give you a whole bunch of code on how to think about testing, how breaking that down and actually gives you examples to implement yourself. And then they also have the ML ops monitoring stuff as well. So if you just want like a practical example to kind of make sense and see what's done and I also have a first use case just for your practice and see that I think that's a really great, great course. And when I saw them talk about on the meetup, I was like, Wow. Is so amazing that we have this resource here available to us. And it's it stems from an experience implementing ML at many companies. So that's one thing. And then also there's more. So I mean, I've been talking to data leaders now recently for some of my recent content and response to monitoring the experts. Chris Burke is [00:14:00] for data ops, more so data quality, but I can imagine apply to this as well. What his main advice was to start this journey was lead to lead with observability and so to essentially add thermometers throughout your system to start measuring what's happening. And he said that often leads to actually figuring out where your bottlenecks are, where the problems are, and that data is very valuable to get by it, because now you have a technical story, which is essentially what Stone had said earlier.

Harpreet: [00:14:28] Mark, thanks so much. Yeah, Google's resource mate with ML is absolutely amazing. I had a chance to partake in a AMA session on the ML Ops community with him. A lot of insightful topics and discussions going on there. Let's go to Kristen for this one. Christine, I think you had your hand up or maybe I saw that, but.

Speaker3: [00:14:49] I didn't put my hand up. But yeah, yeah. So everyone already said some great stuff. I think the one thing that I want to say is we talk a lot about optimizing these models and improving these models, but I think that before we even start a model like you have to optimize for the whole business, like I've seen so many companies that will have like a $20 million marketing budget and they have no way of measuring like how well that ad spend is actually performing. And so, you know, and at the same time they might have you working on something that like isn't a $20 Million problem, you know. And so I think the thing for me is like take a step back and actually look at the business and find out where you're supposed to be focusing before you go down the hole of focusing. You know, that's that.

Harpreet: [00:15:43] Thank you very much, Kristen. Let's hear from from Vin on this. And if anybody else got anything to chime in with, please do let me know. Just use the raise hand feature and I'll call on you. If you're watching on YouTube, I see you. I'll watch on YouTube. Let me know if you got questions. If were on LinkedIn, let me know if you have questions [00:16:00] too. I see your watching Vin go for it. And then after Vin, we will go to Greg. And again also in the chat, if you have questions, let me know.

Speaker4: [00:16:07] I think everybody's kind of done everything that I would have said except stay really close to your customers as you're getting more and more advanced. You know, I can't overemphasize that. I think we've already kind of circled around that. But stay as close to your customers as you can get every time you hire somebody to do a deeper type of machine learning or more complex type of machine learning, hire a product manager at the same time because half of your value gets lost when you start going in the complexity direction. So don't overlook all the other stuff that you need, especially the product side, which Greg, I think he's coming up next. He might know something about that.

Harpreet: [00:16:49] Greg do you know something about this?

Harpreet: [00:16:52] I don't know. Yeah. So, so I like what everybody said and I truly connect with what you just said then about staying connected with the customer. So at the end of the day, as someone said earlier, is it's all about impact, right? Typically when you first implement some sort of technology or process and things, the implementation costs will be heavy, right? So then you will be able to measure this impact. But the way you get better at this is as you have more use cases from the business come in is how well do you become you get good at predicting the impact of your work, right? So to what Marc was saying, you have these thermometers, these sensors where you're testing, you're measuring or you're spotting where your bottlenecks are. So it's all about optimizing your operation, streamlining, bringing efficiency to your operations so you can speed up how you can respond to these business needs. So the faster the better, the [00:18:00] more standardized way you can predict what impact you can have on the business. That's how you know how good you are with any other technologies that you have in your business. So just wanted to quickly. I like that.

Harpreet: [00:18:13] Thank you very much. Let's go to Mark. Mark's got a building on top of this discussion, so go for Mark.

Harpreet: [00:18:19] I actually hold out. I think Jasmine might have something to add to the original question and then I can add afterwards.

Harpreet: [00:18:24] Yes, for sure. Jasmine, thank you so much for being here, you all. Jasmine also gave an amazing presentation yesterday talking about Transformers, Nvidia's Megatron Transformer and how it's pretty much heating the world. It's amazing Jasmine to go for it.

Speaker3: [00:18:39] Yeah, I wanted to piggyback on the adding a product person. When you're when you're hiring folks and you're trying to scale up your team, one of the things that ends up happening when I start up is new, especially if you're new with machine learning, is that you have that mindset of move fast, break things. And that's a really great way to get your MVP off the ground to begin bringing in those customers and begin really proving your concept. But once you start scaling up and you want to start bringing out more customers and things like that, the infrastructure that you need in order to get this done changes the people that you have to have in place in order to get those things change and shifts. So making sure that you have somebody who's able to keep all those things in line. A product manager is great, but also a project manager who's able to make sure that you're doing those types of projects is actually going to bring in value in the end and not getting too excited and taking too big of a bite of an apple because you're finally at a place where you can begin getting more things done. So really kind of making sure that you're adapting to that new space, that you can really provide value to your to your customers.

Harpreet: [00:19:49] Jasmin, thank you so much. Sort to Mark.

Harpreet: [00:19:54] So I just want to kind of flip things around because there's been something I've been thinking about recently is there's been so much talk [00:20:00] about how to bring ML into production. Gain your first ML model. But what about the opposite? When should you remove ML from your system? And specifically dealing with like a huge sunk cost fallacy because people work so hard to get to? Ml Maybe it's not driving value. And the reason I bring this up is that there's been a couple of use cases where I talk to people and they said, Yeah, we're actually switching our ML models back to heuristics because they're just performing better and they're easier to maintain. And that's just wild to me because everyone's still shooting for ML and to bring it back. So I'm just curious what other people think of like how do you manage expectations and, and bring it back and reduce the hype when things aren't working?

Harpreet: [00:20:42] Let's go to Keith McCormick for this one. Keith, thank you so much for joining in. I appreciate you being here. And I feel like this is something that you might have some insight into.

Harpreet: [00:20:51] Actually, as I was listening, I was going, wow, that's such an interesting such an interesting question. So thanks. Thanks for that. Yeah, I think. I haven't heard of people pulling back, but actually it doesn't surprise me that it's happening. I guess it just hasn't come up. But you know, the way you position, that totally makes sense that that's happening. It is. It is a little shocking. I would my initial response is I think that probably nobody was in charge of monitoring and managing those models and on the day that those models were born, so to speak. I've got to think that they were better than heuristics. And if that's not true, that's a different conversation. You know, somebody has got to take the data science team aside, you know, because if there was if there was a final project reveal, you know, look at this amazing job we did and it was all persuasive enough to go into production, I've got to believe that those models were performing. So if they degraded over time and. No one was paying attention to that. And [00:22:00] it got so bad that the performance was worse than heuristics and business rules. That that's where I would point to. That's where I would point to first. Oh, and by the way, as somebody that's been external, an external resource for much of my career. I'm always very careful to make sure that there's something in place, but that's often why it happens, because the person who built the model might not be around anymore.

Harpreet: [00:22:29] So you see, you've got to make sure that that's not just a responsibility that goes to a person. It's a role that has to be. In other words, it's a role, not a person. Because if the model is for two or three years, the person might be doing something else or whatever. But the role, the role still has to continue. It still has to be on somebody's plate. Just a quick follow up is okay if I follow up on that question real quick. Yeah. So when you said it's a role, like should this be like, is this a like cultural and procedural thing or should they be like, if we put in ML we need to have a monitoring system in place as well. No if, ands or buts. I mean, that obviously always depends on data, but you go on to say, I mean, you've positioned it in that way. I'm going to, I'm going to go with team no ifs, ands or buts. Right. I think it absolutely has to be there. So the problem is, if it's like the first machine learning model or maybe the first one that actually worked, you know, there's the famous 8% of models get deployed, you know, kind of thing, right? So if it's the first big success, then no one's going to want to have a permanent monitoring and management role to monitor one project.

Harpreet: [00:23:43] So that's I think that's often why it gets off the rails, you know, because you're building models, you're excited about building them, you're excited to move on to the next one. And it doesn't seem like a big deal when you only have one or possibly two models. But if you get a half dozen models in production, [00:24:00] I always think of it as something that should probably it should probably live in it and it could just be one day a month, you know, hey, it's the first business day of the month or whatever it is. I know that I've got three models and I've got to run. A quick report could be 15 to 30 minute job. This is the kind of thing that would be existing code that you're just rerunning. And what was the difference between the predicted and the actual this month and how does that compare to other months? And maybe you're doing this ongoing little plotting where all of a sudden it's like dropping down or something? Yeah. So I think it's an absolute must. It's usually not the data scientist, because the data scientist is usually focused on the next thing. It's going to be something that's on somebody's calendar again once a month is enough that it absolutely happens no matter what. And if that person moves on, it's understood that somebody is always in charge of that.

Harpreet: [00:24:54] Keith, thank you so much. Shout out Kate Strachan. Kate Strachan, thank you for being here. She's outside of having fun on a beautiful fall day. Thank you for joining us, Kate. Appreciate you. Let's go to Mexico, then we'll go to Serge and then Greg. And if anybody has a question, let me know. Antonio says he does not know how to raise a hand. Antonio. I will go on a call on you after Greg. So you click on the on the more button, I think. Right. Is it. No, there's a reaction button, the reaction buttons and you can use the raise hand thing. So that's that's how you do that. Yes. There you go. So it's going to be Kiko, Serge, Greg. And then I got a really, really basic deep learning question computer vision related that I want to toss over to Coach Dubs way. But if you have questions, watch it on YouTube or on LinkedIn. Let me know if you have questions. I will cue them up. That means that Mickey go.

Speaker3: [00:25:45] Yeah. So I think like I started having conversations with folks just like weeks ago and was hearing kind of more and more about like certain companies actually like taking out eml out of their pipelines and systems and also like moving away [00:26:00] from cloud, which to me was like super surprising. But the more I like actually kind of dug into it, the more I realized that it was kind of like industry specific, specifically banking and finance. So I think there are two or three trends that I've been hearing about that to me are surprising because they kind of buck the a lot of the popular kind of conversations. One is the idea that every company is moving to cloud. And yet I'm hearing of a lot of companies either moving back to on prem or moving to hybrid. And there's a number of reasons for that. But a lot of it has to do with regulatory reasons. And another trend that to me was surprising was companies like moving are moving towards like consolidated cloud versus like multi cloud. I'd always thought cloud is being frankly an anti pattern. Like, what were you doing? Like why did you just not put the energy and effort and investment into like a single channel? And the third trend that or the third thing that I've been hearing about that was surprising was the move like, like basically taking ethanol out of the pipelines. And like a lot of the companies, it's like it's finance, it's government.

Speaker3: [00:27:12] It's anything that's like heavy like, like regulations. And essentially, like, what I think what happened is that. Rather than kind of trying to thoughtfully solve the right problems at the time. Instead, people were kind of going off of like this bias of, oh, like if all the leaders and the people around me or the companies around me are doing this thing, therefore, it's almost like less risky for us to also, like kind of keep up because that's like, I think the one thing that companies really don't want to happen is for for some startups, the problem is like, we don't want to be so innovative that we're selling a product [00:28:00] that people don't even realize they need until 20 years later. But for a lot of like other companies, it's like, Oh my God, we want to like, maintain market share and we don't want to lose ground to our like, competitors because if we lose ground to our competitors, which is like being wrong on like a technology or direction, that's like a lot worse. So I think that's that's probably what it is. And so I think to a certain degree for a lot of these companies like and Josh Tobin, he had published a I think either a blog post or LinkedIn post where he's like, look, if you don't have ground truth, then you shouldn't even have monitoring. It's basically along those lines. And it was a little bit kind of like a hot take and controversial because he's starting like a monitoring and observability like company, right? So but he's also accurate in that in that like I think a lot of companies like rather than sort of establishing like a good basis of like ground truth and thoughtful kind of experimentation practices to really say oh, like will this will like bringing machine learning models into our pipelines actually improve not just our revenue, but also like improve our, you know, like keep our operational costs and the inefficiencies low.

Speaker3: [00:29:09] They basically said, hey, we have to just like keep up. It's okay for like kind of wrong or we don't care if we're going to be wrong in the future as long as we're like wrong in a specific way. And that way is not like losing ground share. So, I mean, I think that's basically like what it is. But like, I've definitely been hearing that. And when I talk to other folks who, like, deal with some of these industries, they're noticing that, yeah, actually people some people are moving back to on prem slash hybrid a lot. Some companies are definitely like taking it all back out of the pipelines. And then some companies, they also they want to kind of de-risk the vendor relationships so much that they these are like the big companies. It's all small companies prefer to do like a multicloud kind of situation so that no single vendor has like can keep them locked in. So it's definitely like a thing for sure.

Harpreet: [00:30:02] Mickey, [00:30:00] Joe, thank you so much as always. Mickey goes got got knowledge bombs the speaking. Josh Tobin keep it keep an eye out for an interview that will be coming out with me and Josh Tobin in the near future will be streaming it live. I'm excited for that one. Shout out to Kyle on a former colleague of mine. Me and him worked on deploying models into production here in Winnipeg and now he's working over at Gantry, which is amazing. And, you know, since since Mexico and Mark are here, I got got an announcement to make a little bit later. So, you know, before both of you duck out, let me know. We'll make that announcement surge. Go for.

Harpreet: [00:30:43] It. Yeah, well, yeah, Mexico is always on point. And in any case, my, my, my comment had more to do with what Mark brilliantly said earlier, as well as Keith added to it and a little bit kind of elaborated on it by Mexico. But it was about having mechanistic or business rules kind of instead of machine learning models going back to that. And as as some of you may know, like I work in agriculture and it's a very old industry, so there's a lot of like these rules without going to much detail what I work on, I, I often I'm trying to make more powerful versions of these heuristics with machine learning using when I can find it, you know, larger amounts of data and and sometimes it just doesn't work out like the data is not good enough, but, you know, someone has to validate that data. So whenever I'm not able to validate it, you know, like there's other people working on different projects and they're they're kind of trying to piece things together. So actually [00:32:00] today I was kind of pissed off because, like, I'm working on this, this machine learning part and some other data scientist is working on another part and they're supposed to be piece together, right? And, and I validate my stuff independently, which doesn't make sense.

Harpreet: [00:32:15] And he validates his stuff, which is more complicated, you know, it's more tricky. It's more difficult to actually make sure that it works. And it's like I built a road to nowhere, right? Like my model is supposed to tie into his model. And, you know, if that one is not feasible, then, you know, what the hell am I doing? Right? So I told the project manager, what we have to do is actually validate the most complex piece, the most difficult piece first, you know, because if I know 100% that I can build a model with the data I have, or maybe like a 90% sure, then I shouldn't even be the one leading. I'm the one should be starting that project. Right. Well, this is all to say, you know, like sometimes you just have to scale back. And in our systems, fortunately, we have it in such a way where we can have mechanistic models and machine learning models coexisting. So like, it doesn't matter like what we have there. It's just, you know, something that turns inputs into outputs and we can just swap them if need be. And so yeah, that's, that's an advantage. And I think if thinking in those terms like we don't have to overcomplicate things necessarily.

Harpreet: [00:33:34] Search Thank you very much. Let's go to Antonio then, also, Thaci, you had your hand up, so please, we'd love to hear from you. So do jump in on the discussion after this. Shout out to Nick Singh. Nick in the building. Just want to say what's up next. If you haven't already, please check out please check out Data Lemur. And let me tell you, I saw somebody post something about was it is is it data lemur that good? Is it just the marketing? First of all, it is like a second world People act like marketing is a bad thing. [00:34:00] If it wasn't for marketing, you wouldn't know anything existed. Let's go to a let's go to Antonio then, Jay, because I know you had your hand up then. Monica Go for it.

Harpreet: [00:34:09] Antonio Sure. Well, first of all, I think Data Lemur is awesome. And when you learn SQL or whatever skill, I think it's important to the instructor is instructor is nice and good to work with and Nick is super cool. So anybody looking for that go? Great resource. In terms of pull back on machine learning, I know the question by Mark was asked a while ago, so hopefully I'm not forgetting the question, but I think an interesting part of pull back of ML that I've seen happen, I think it's going to continue to happen is a lot of models that I've worked with in the past where there were not explainable. I think that is changing and is not being acceptable anymore. I've had companies where it's like, hey, you know, like a customer comes and it's like, Well, you made this decision. Why did this happen? And you're like, Well, the ML model said, And they're like, Well. Why? And I don't know, you know. That's just what I decided. But that is turning not to be an acceptable answer anymore. And people want explanations. So I think that's one way of why you might have to pull back machine learning. And I get frustrated, too, because they ask me for an answer. I'm like, I have no idea. Like, I don't know if it's true or not. Right. That's what we hope it's true. But like, if I can explain it to you, it's kind of like it gets you stuck as a data person as well.

Harpreet: [00:35:28] I think the the other thing I think Mark might be interesting based on like the work that you do is when I worked as an AI translator and we had to like I worked on a team where there was like over 200 models and when we checked, a lot of them weren't being used. So we said, okay, whatever's not being used, there's no use. There's no reason for us to monitor because you're still wasting resources monitoring it, but nobody's using it. So we have to pull those back. And the issue we ran into is a lot of the data science teams, which [00:36:00] I don't really agree with, are one of their KPIs, is how many models did we build or how many models we have in production? Oh, we went in, the leadership goes, Well, you guys went from 200 models, let's say, to 50 in production. What happened? And they're like, Antonio is at fault because he canceled all of them and he's he's a bad person. You guys weren't using any of those. But then the data scientist relationship gets strained with you because you guys are the same. You guys are on the same team, right? Maybe like dotted line or reporting or something. And then it kind of like I've seen things get out of hand. Where? What happened is this didn't happen with me, but other people that I've worked with is the next time a data scientist build the model, they're like, let's just not tell the the translators that we built the model and we'll just like hush hush.

Harpreet: [00:36:52] So they started like circumventing us because we were supposed to like the business person. We were supposed to ingest the information, go to the data scientist and tell them what to build. So we were kind of that person like, Oh, maybe you don't need machine learning here. Maybe we shouldn't build it. And so they would get mad. The business would get mad because we need a machine learning model because they want to look good. Data scientists need to build machine learning models or they think they do. And it kind of when you're that middle person as the product manager, translator, whatever you want to call it, you get stuck in that middle. And I'm like, I'm not I'm not here to babysit Like you guys do whatever the hell you want. I mean, if you want to go build it, go wild. But that's kind of been my experience. So I know you're in kind of in that role when you're in between teams and just trying to navigate business relationships with data science. And it's it's a fun job, but sometimes you're like, I got a child or I have a family to worry about. I don't have time for this crap. Yeah.

Harpreet: [00:37:49] Is it even data science? If you're not building models? I don't know. I don't know. Jay, I know you had your hand up a while ago. Go for it then. After that will go. Monica Vin. Coach, if you're in the room, you got a question? Let me know if you watch [00:38:00] it on YouTube, LinkedIn. Q Your question is up. There are two really good questions coming out on LinkedIn that we'll get to as well.

Speaker3: [00:38:05] Yeah, I kind of put my hand down because you absolutely almost shared all the thoughts that I had. I wanted to touch on this topic. And one other thing where I've seen in some of the use cases that models were like, pull back, like Antonio said, that sometimes they are not explainable. And I have seen cases where like the person who has built the model has left the company and nobody else knows how that works because there was no monitoring set up. And then suddenly we are trying to figure out if this model and it's still running in production. So how do you make that jump from making sure that you're that model is still relevant, number one? And then and that's the number one use case where business kind of starts losing their faith in animal models and data scientists. But then they'll say that, Oh, you even don't know how this model works. And I'm like, I don't know. It was built like five years ago by somebody who I don't even know what the thought process was. And second thing I have seen is that sometimes the issue also comes from the business side because like you said, everybody wants to build cool data science models and showcase that how technologically advanced like open source transformer models that are applying. But then at the end, everything is set up and then we start doing data requirements and see that, oh, we don't even have baseline data to even do fine tuning or like transferring even for like one shot or zero shot use cases. So then that becomes like a huge challenge, which I feel from the other side of the domain that sometimes I feel that, Oh, it needs so much coaching to discuss and say the importance of having the correct data into business, which they do not understand. So I think it goes both ways. Like both both the tech side and the business side should be in harmony exactly what the end goal is. Otherwise it's always going to be across data.

Harpreet: [00:39:52] Thank you very much. The second point, that's why documentation is important and documentation is incredibly important so that five years [00:40:00] from now somebody can read your thoughts. There's a conversation I did on the podcast with with Dan for his name, Brandon Quach Bernard Quach, who is he's been on the happy hour a number of times. That's very beginning. But he talks about this mindset of future judgment. He's like anything he does, he thinks about 5 to 7 years from now, somebody who reviews the work, what will they think about this guy? And that really impacted the way I do my documentation. So shout out.

Speaker3: [00:40:29] To that idea.

Harpreet: [00:40:30] Let's go to Monica. Then we've got Monica, then CO sub Gina and Shantanu, and then we got a couple of questions on LinkedIn that we'll get to as well. Monica, go for it.

Speaker3: [00:40:42] Hello. I just want to say happy 100th episode. I don't have a fancy hat, but if you are a fan of Rick and Morty, 100 more years. Yes. All right.

Harpreet: [00:40:52] I love it.

Speaker3: [00:40:55] I think this topic's pretty fascinating and I see the pool of the models out of the pipelines. Doesn't surprise me too much, because what I've been seeing is that it's very much kind of like dashboards where you build the model and put it into production and nobody's using outputs. It's very much to Antonio's point where you're just building the models to build models because we need to meet some milestones and some metrics, and it's like, what's the point of that? And also it comes down to just straight up like data literacy. They don't even know what they want or what they're going to do with those outputs, and they don't understand that. It really does. Choir continuous monitoring. So I'm also on the team of no if, ands or buts. You need to monitor those models because things do change within the business. So I think it is the businesses side to monitor those because they're going to know what changes happen in that business that then need to be reflected in those models and updated. So. [00:42:00]

Harpreet: [00:42:01] Market. Thank you very much. Appreciate that. Let's go to the next one in line then then goes up and then Santana go for it because it.

Speaker4: [00:42:11] Yeah I feel like I need to be everybody's anger translator because you're being way too nice about this. So let me say what everybody's trying to say. But I got some notes. It should have been a SQL query in the first place. That's I think about 50% of the groups that are pulling machine learning out of production. It's because it could have been a SQL query. You know, you've ever been in a meeting that could have been an email? Yeah, the project could have been 35 lines in SQL. It wasn't better than heuristics. I know Keith's right now angry, but no, seriously, the first model that most most use cases get deployed is not better than the baseline. Someone flipping a coin is often more effective, and especially if it's three or four years old. The concept of measuring a baseline in most companies didn't exist. It's really unless you were doing it since 2010, 2012, you didn't get to that level of maturity. So I hate to say it, no, a lot of the stuff that's in there never worked in the first place. We all blame drift and we blame Oh, we should have been monitoring something. No, it was just really bad to begin with. A lot of data science isn't data science, and I think everyone's really being nice about it by saying all these other types of excuses.

Speaker4: [00:43:29] But no, we have a whole lot of analytics that's getting passed off as data science and the reliability just isn't there. And so it's not that it degrades, it's that it was never that good. And as soon as you start measuring it, you're saying. Ml Ops. Ml Ops is an ancient Sumerian word for terrible model. That's the reason why we have ML Ops, because our models aren't that good to begin with and we don't find out until we test in production because nobody has an experiment life cycle at all. The [00:44:00] Vendor Oh yeah, one more. The vendor got paid per project delivered not for any particular quality metric. And companies write requirements for data science projects like they write requirements for software projects. And that doesn't work. Models don't work, they function and they don't do all the things that you want them to do all the time that you want them to do it. So you have to have reliability requirements. It has to work this well in these situations. It has to be this reliable in order for me to use it. I mean, Nike's a great example. How do you how do you have excess inventory? How do you not have visibility into your end to end inventory, your Nike, how is that possible? And I think that's when you dig into why is data science and machine learning getting pulled? It's because it didn't work in the first place.

Speaker4: [00:44:52] And for me, when I come into a client, a lot of what I do is yank out what's in there because it's it doesn't work. People think it does. And when they start relying on something that doesn't actually work, it makes things worse because they are now confident about a terrible decision for far longer than they normally would be. Normally it's okay for your CTO or your CEO to say That doesn't make any sense. Why are we doing that? And the person just goes, Oh, well, we've got this model, we've got data, and they bring these reports and it's like the CEO can't make a case because they're fighting data and they don't know how to say this data is all garbage. Can we go back to some fundamentals? And I mean, that kind of circles back to no, it's not better than a heuristic in a lot of cases. So you are all far too polite. I thought I'd at least kind of say what you were all thinking instead of making somebody else it love it.

Harpreet: [00:45:47] Then with with the real, real talk, blunt talk. And I love it, man. We're talking about this on my podcast. When you're like on the one on one interview a couple of years ago, I didn't get it then, but now [00:46:00] I get it. When you were talking about how most of data science is just analytics, and I think I was, you know, almost three years ago, I think I was just a bit naive back then and didn't get it. But I know what you mean. Kosta Go for it.

Harpreet: [00:46:15] Yeah, maybe I might jump onto the I'm going to tell it like it is bandwagon. There's three questions that I'm asking based off everything everyone has said, right? First off, some cost fallacies. When did doing a thing? Because we're already doing a thing become a thing. Like, come on, guys. Yeah, we have models, but. Right. Just because they're good doesn't mean we've got a backup and stick to it for forever and ever. Right. The second thing is good arts slow. We keep forgetting good arts law right at the moment. A metric, a moment, a measurement becomes a metric for success. It typically ceases to be a good measurement, right? Like you see it all over the place. And it's because we've got proxies for everything, because not everybody understands the subtlety that's involved in some of the more complex modeling, right? So sometimes you might end up in a situation where a customer or a client or a business goes, okay, so if we have more models. Role models. My money, right? Like the things like that. A little proxies like that. I've literally seen businesses go, yep, we need X amount of models per across our coverage of actions that we do, and that means we'll be in a better place when I actually like maybe 20% of those models are adding to more models, more problems, right? If you don't know you don't let the models been absolutely right.

Harpreet: [00:47:40] But yeah like if we're not sure like 20% of those models might actually be useful. And it's exactly like Antonio was saying, right. You take down the rest of them and then people start getting riled up because they're going, Oh, wait, hang on, our proxy is broken. Good, heart slow. Then the second part, the people that built that model, no offense, usually [00:48:00] it's one of us in here, right? We turn around and go, Wait, hang on. But that was my job. I was building that model. You're telling me my model's not useful? We don't want to hear that. Right? When did a tool become an industry? Is the other question I'm trying to ask. Right? Yes. I understand this stuff is complicated enough that it takes a lifetime to master, but the thing that keeps pulling me back is I'm a robotics engineer. I'm an engineer. I solve sensing problems for robots and machines. Do I use machine learning all the time? No. Is that one of the tools in my belt? Absolutely. Because it's really, really useful. If robotics is that the only thing we should be doing and should we be trying to replace everything with with models, Can all models replace everything? Yeah, possibly. Is it worth doing? Maybe.

Harpreet: [00:48:58] I don't know. Let's find out on a case by case basis. Right, sir? Absolutely. Like the the amount of historic knowledge, especially in industries like farming, where you've had generational knowledge. Pay down. Just put that in an if statement and move on. You don't need a model to validate that necessarily. Like, how much are you actually saving by validating or validating that it might take you ten years to collect the data for that? Sure. Go ahead and do it. But we're selling them on the oh, we can take bucketloads of data for you for months and months and then give you something you already know. Like, Come on. Sure. Collect the data to validate it and improve their their conventional wisdom. Sure. That's a science experiment, not a business. Right. So there are business use cases within agtech. Like I've seen a lot of robotics and agtech is massive right now in terms of things like how do you how do you pluck fruit [00:50:00] when it's and its optimal ripeness so that you're not wasting a lot of fruit that goes at the bottom of the pile just to bulk up the shops. Right. And I've seen that makes sense. And where that turns out is when you're trying to scale, when it's no longer one guy trying to run a whole farm and you need hundreds of pickers coming in.

Harpreet: [00:50:18] We saw that in Australia last year. We had a lot of trouble getting fruit pickers because we had our borders shut down. We couldn't get people coming in and typically it's backpackers. A lot of backpackers would take like a working holiday here, especially from the UK, from Southeast Asia. People would come across and they would spend two months just picking fruit and then they'd be backpacking around the rest of the time. And that's part of their visa conditions. Right? And it's great for us because the fruit gets picked. Last year we were struggling and the fruit wasn't getting picked because people weren't going there to do it. So now. All those backpackers need to be taught when to pick fruit and a lot of food wastage comes into play. What if you had robots that could actually help scale that knowledge? Because they can't teach all those backpackers instantly which fruit to pick and which fruit? Hey, maybe leave it a week. Right? So there are useful use cases, but things like, Hey, this half of my farmland is a bit dry today. I'm pretty sure like a lot of farmers could just glance over it right across on a motorcycle or on a tractor and just be like, Yep, better water the lawn. Right. There's a lot of conventional wisdom in that.

Harpreet: [00:51:29] It's just we're trying to validate some of these things and it comes down to a fact of when did it all become an industry and why are we trying to find nails with our home? Right. And the final part of the problem is that a lot of these things are actually because people aren't sticking around. Myself included. Right. People tend to move around roles. So what's an achievable target for me to set within the year before I move to another company? Or [00:52:00] within the year or six month project that I'm running with with a consultancy or something, Right. If you move that fast, it's a natural limiter to not thinking, Hey, what about the guy who's looking at this five years from now, seven years from now? Right. And that might have a lot of good reasons for why people are moving around. I'm not saying that there aren't good reasons for that as a full root cause. Deep dive for that. But the symptoms of that turns out to be that sometimes our goals become short sighted and poor success metrics are set in place because it's convenient and easy for me to deliver on that and measurable for me to deliver on that within a short period of time. So we need to kind of question that. But yeah.

Harpreet: [00:52:42] Because up. Thank you so much. Yeah. I'm wondering maybe it makes sense to use these models in places where we need to squeeze efficiencies because, I mean, speaking of farming, I know the entire precision agriculture industry is kind of built on machine learning. And, you know, that comes into play when it's like, all right, let's optimize my fertilizer and soil and resources and distribute it where it needs to be instead of uniformly across the entire field, because that's kind of wasting resources. So no place like that. Great insight, great discussion, Gina, over it.

Speaker3: [00:53:15] Yes, I had a question, but since we're talking egg, I just want to add this. Living in the Central Valley of California outside of Sacramento and having worked at UC Davis, it's been interesting to see a lot of this stuff up front and personal. There's a lot of hype, obviously, and I think the latest round of hype came in five years ago or so when VCs started getting on this bandwagon. Vc I remember when VC got on the bandwagon of clean tech and these all these super smart people didn't really understand what was involved in that. And there was a crash because a lot of clean tech requires more capital investment than the VCs who had most recently made all their money from Dotcom [00:54:00] one and whatever, didn't understand that. And then they moved on to Agtech. And even before machine learning, there's all kinds of apps helping farmers to optimize this and that. Some of them can be useful. But there's two things. One, farmers are by nature conservative, and if they're not going to just adopt it, when I think someone mentioned they could go out on a motorcycle or go out in their truck or whatever and just check things out on the field, or cows did mention it and there are times when it can be very useful. I had the pleasure of meeting the founder of Blue River some years ago before it was acquired by John Deere, and there could be some I think there's some promising stuff there.

Speaker3: [00:54:44] I know a startup came out of UC Davis that was developed in technology to optimize, to understand evapotranspiration, to optimize where to irrigate. The smart move they made was to go straight to wine country with that very high value crop, very sensitive to irrigation and and they're doing well. But in a lot of places in California, if you were to do that on alfalfa fields, no, because it's such a cheap commodity and water so cheap that there would never be any real value to it, setting aside the just the plain old adoption issues. So it's been fascinating. Serge, I see you work at Syngenta, someone I know well. You used to be at Syngenta and went to him close. He's like kind of the watermelon. He's like the watermelon guru. One of the top plant geneticists in the world in that area is kind of random. Anyway, so I wanted to comment on that and then move to my question. So as I often do, I ask career questions. And first of all, I want to thank all of you here as I've joined in to the data science happy hour. And [00:56:00] all you guys have been so welcoming and I felt comfortable asking questions of all of you folks and just in awe. And I'm so appreciative of all of your advice, of all your thoughts.

Speaker3: [00:56:12] So my question is this and it's too bad, I guess Nick had to drop off the author of the Data Science interview. But this is a career question for folks who are maybe moving into the industry, for folks coming out of programs. I mean, this field just moves. It's moving so fast. And I like fast moving stuff. I like complexity, but I'm just like, holy cow, this is just and as somebody who did a data science boot camp, you finished that a couple of years ago and who has done some project work and some volunteer work and so on since then. It's like it's just such a moving target. On top of that, we have a pandemic. On top of that, we have we've. Had all this economic uncertainty. And so while I know that knowing the basics, people mentioned SQL, knowing the basics, knowing SQL, knowing Python, just nailing down those fundamentals, those basics really important for folks. But. My ops. I mean, I know job postings sometimes ask for everything in the kitchen sink. And while that may or may not be reasonable, there's sort of an undercurrent here throughout our discussions. On the one hand, people are lamenting the fact that maybe senior management or others don't understand that more models isn't necessarily better. They don't maybe understand that, you know, I mean, when you're a hammer, everything looks like a nail If you're really trying to get on the data science hype and you want to show shareholders or somebody else that you're with the program, but you don't really understand [00:58:00] the use cases, you don't really understand where it's useful, where it isn't.

Speaker3: [00:58:05] This this is causing a lot of problems. But how do job search, how do candidates, how do people looking for jobs, how do they navigate this, I guess is the point. I mean, Ben's written eloquently about just the dynamics and the economy, you know, the hype that then, of course, led to. You know, when companies didn't have results, data science teams got laid off. It's not necessarily their fault. The teams it might well and more often is due to senior management. So for someone who is trying to transition in, whether it's folks just coming out of programs or even folks with a little more experience, what do you all think? What are your thoughts about how to navigate what seems to mean? I mean, hopefully someone can validate this seems to me a very turbulent and sort of confusing time even for people who maybe are already in the field and are trying to figure out how do I futureproof my career, how do I stay ahead? You can't you can't possibly learn and know everything. So I'd love to hear your thoughts on that.

Harpreet: [00:59:17] We'll go Keith, Marc, then Serge. And then there's a couple questions. Come in on LinkedIn, then we'll go to that. So let's go Keith Mach and search.

Harpreet: [00:59:26] I want to I want to talk about Gina's question for a moment, but then I do want to acknowledge that I'm officially more pessimistic than I was about 20 minutes ago. So then thank you for that. I'm hoping that when I wake up tomorrow that maybe my optimism will return. But I do want to respond to Ben, because your comments were very persuasive. So first. Gina. You know, somebody was asking me just a few days ago if I get impostor syndrome, which is kind of. Kind of a related thing, [01:00:00] you know, thinking, you know, if in other words, this notion that if I feel like I'm not keeping up or more to your question, I think if it's almost impossible to keep up. Then you start to think that maybe for any given project that might come down the pike, that you don't have the tools that you need for that particular project. But anyway, I've been doing this for a while, but I feel that like every day, basically. So for me. I kind of tackle it in two ways. I've I've given up on the notion that I'm going to know everything I need for every project that I might encounter at some future date. I just I just have to let go of that goal. Not going to happen. So I know that I'm going to have to do ongoing professional development for whatever gig might come my way. And I just don't worry about that. And I've stopped worrying about when, you know, when I sit down with a client and they say, When was the last time you did a project exactly like this for exactly this industry with all the.

Harpreet: [01:01:06] And I just say this will be my first. That's like that's one, you know, ten years ago I did one that was vaguely similar. Right? So I and I just don't stress about that anymore. So I just prepare for the fact that with any new gig that comes along, I have to learn a lot. And then and I do this partly because I enjoy it, but partly I think because over the long haul it really pays off. I just know that a half day a week, if not more, I've got just be learning about random. You know, stuff like a few weeks ago we were talking about computer vision and I got the thick. I haven't opened it yet. I've looked at the table of contents, but that's about it. And I got the thick computer vision book, which I have not. It talks about a lot of stuff that I just haven't had a need to know before, but I will chip away at that over time and who knows? And 2026, I might [01:02:00] find myself on a computer vision gig and that that extra time that I spend might have come in handy. So anyway, so that was my $0.02 on Gina's question. As for. What you were saying. Then again, I found that very persuasive, particularly when you said that, you know, as a consultant, there have been times when the most value that you could provide on day one is taking getting rid of bad models, that that that was very persuasive to me because I, I think I was trying to be more optimistic in the sense that projects that I've been a part of, I would like to think I mean, I'd be horrified if it's not true.

Harpreet: [01:02:38] I'd like to think that those projects were better than business rules on the day that they were born. But having said that, when I walk into when I walk into a client situation and I look at what's already there. I often have to gently kind of just like, make eye contact and say, okay, I'm here to learn on day one. I'm not going to say anything right now, but I'm already thinking about stuff that I'm seeing that concerns me. So again, I found that very persuasive. So here's my really quick three questions that I ask clients to hopefully prevent that on the gigs that I'm on. And maybe over time as a field, we can we can try to prevent, again, a model that's worse than business rules. It's such a shame that that happens. But you persuaded me that it happens. So the first thing I ask is what are we trying to predict? I'm really like focusing on that word. And the example that I love to use is taxes. Like as of the last day of the year, whatever I owe for next next year for current tax year is fixed.

Harpreet: [01:03:36] That doesn't mean I have any earthly idea what that number is, because I haven't gone through business travel, you know, all the kind of kind of complicated things. So I think I've been in lots of client situations where I've used that metaphor where they're confusing, complex calculation. Then you're kind of saying it should have been, as you will query. I think in some cases maybe it's a little bit more complicated than [01:04:00] that. But they're trying to they're trying to predict something that's really deterministic and they've really got basically a B I problem and there's no prediction involved. So and I've had clients get really frustrated with me, but if they if I cannot get a straight up answer on what are we trying to predict, it's going to be something that is unknown, that has uncertainty associated with it, that we are estimating. If I can't get a straight answer, it's like we just can't proceed. I'm not even going to I'm not even going to scope this. I'm not even going to draft a contract unless I know that we're predicting something. Because if we're not predicting something, it's not a gig. It's not a gig that you want me for. Right? Then I'll ask, what's the dollar value that we can attach? Might only be a rough guess each time we make that prediction right, or each time we get it wrong.

Harpreet: [01:04:47] We can't attach a dollar to that somehow, and it could even be a profit, but you could be saving person hours or whatever it is. There's almost always some kind of value, usually monetary, to get in the prediction. Right or wrong, if people say, Oh gosh, that's way too complicated, let's figure that out. After the model is built again, I call a timeout and say, I'm not even going to draft a contract here. We have to be predicting something and we have to be predicting where each individual prediction has a value that we can at least estimate at this point. And then finally, when we make the prediction, what are you going to do with it? There has to be some intervention strategy. So if there is no intervention strategy, there is no point in making the prediction, because we're not making the prediction for general awareness. We're not just trying to find out who is it going to be sunny tomorrow And it has to be we're making this prediction. That we're going to act one way if it says yes and one way if it's going to say no. So if I if I don't get if I don't get a clear answer to those three, then we just don't proceed. Like literally the project doesn't launch. And my hope is that if you do get a straight answer on those three, that when you do build the model, it should [01:06:00] be better than your estimates.

Harpreet: [01:06:02] Keith, thank you so much. Comment coming in there from from Mike Nash saying great input. Keith So thank you so much. Let's go to Marc. Is Marc still here? Uh, let's go search and then we'll go back to work.

Harpreet: [01:06:20] Yeah. Yeah, that was a lot to take in. Very, very, very deep. Anyway, yeah, like, what I wanted to say was, yeah, in my previous career in web development, I was very proud that I had become, over time, you know, end to end developer. I could do, I could wear all hats, I could do anything and everything. You could ask me about making a website and deploying it and monitoring and everything. But I when I first transitioned to data science, I was very naive and I thought, Oh, I'll become completely a full stack end to end data scientist. And I didn't realize how naive that was. You know, like, even if that was possible, it's not something I would want. So what I've realized is that you have to find what is you like the most? What what position you want to be. And I realized, yeah, I can work around Docker and, you know, Circle CI and all that stuff, but I don't really care for that stuff. To be honest, that's not the part of it that I'm interested in. I want to be the one making when it comes to machine learning. I want to be the one building the prototype. I don't want to build the final product. Someone else can take it past the finish line. I don't really care for that. So I've come to terms with not being full stack and you can come with your own conclusion of what you want to do with that. [01:08:00]

Harpreet: [01:08:01] And also that that also comes with understanding that position I'm in requires certain things that maybe if I was in a different position because I'm in the front lines, I'm getting bad data. So I have to be a skeptic. So that means that you have to understand how to parse that complexity, not only from the stakeholders that are asking you for what Keith said. Now you have to have some bars, you know, and say, I won't accept this because otherwise they're going to make your work a living hell. And I, even in my previous life in web development, I used to deal with that when a customer would come and say, you know, I want this and this other thing, and then they would change the requirements. And you have to be able to, you know, set a bar and say, No, enough is enough. I'm not going to accept this. You know, my standards are higher. I won't make your crappy website or your crappy model. I will do something better. And sometimes doing something better means doing something simpler. Sometimes it means doing something more complex. So you have to be able to speak several languages there. You have to simplify the story to the non-technical stakeholders, so you have to be able to speak in those terms of like simple data storytelling.

Harpreet: [01:09:25] And then but on a technical level, you have to understand that every metric is an aggregate. You know, every metric is not the whole story. So you have to be able to look at that metric and like disentangle and look at the distribution and see in reality if it is what it is. If there's something deeper to it, if there's something dark to it, you know, something that you have to deal with, or perhaps it's it's something you can leverage. You found a hidden gem, something [01:10:00] you could tell the business. Hey, I found this. Maybe we could do this with that. And so you since. Well, that's the position I'm in. I get to be at that. I get to do all those things. If you're further downstream from me and your. Your your builder, you don't get to do that stuff. If you're further upstream from me and you're a data engineer, you have other challenges. So it's it's it's all about finding those positions and realizing where your best fit. Sometimes you realize you just love exploring data. So what you want to do is, is, is do data be a data analyst. And and that's that's another wonderful position. Right? But it all has to do with what you're more comfortable with and what kind of what it is, not only what you want to do day to day, but what you want to challenge yourself with.

Harpreet: [01:10:56] And you can learn other things. You know, like I'm not saying just because I've made the determination right now that I want to be like, you know. In the position I'm in. It doesn't mean that later on, as Keith said, you know, I don't want to take on something else. He said, well, he wanted to, you know, maybe learn computer vision, you know, maybe a few years from now, you know, he'll be a computer vision engineer. Right? But like we we all have that ability, like and I think to make your your job. Well your you futureproof in in this very fast moving world, the most important thing is to actually keep learning, keep building those, and don't pigeonhole yourself to a technology. To me, there's nothing worse than someone saying they're really doing data science or data analysis and they're say, No, I'm an art developer, Python developer, I'm a TensorFlow guy. That's the worst, you know, like or, or I only do a AWS or I only do. That's bullshit [01:12:00] like you have. It's not now, it hasn't reached a point in which, like one technology has cornered the market and that's what you have to learn, right? For every single one of these different, you know, areas, the technology stack isn't set like it's still it's still very fluid. So don't pigeonhole yourself to anything.

Harpreet: [01:12:26] Keith. I mean, start a search. Thank you very much. Jonathan, I have to bounce, but I really liked this comment shot that I put here in the chat, so I'm going to go ahead and read it out. Pretty controversial these days, but I still say being a generalist is the way to go. I do not mean just having surface level knowledge of everything, but I do mean not having overly strong opinions on how exactly to do things or which tool will work for what. I'm a problem solver first, then an ML engineer, and if the problem requires it can be come a insert subdomain. Oh sorry. And if the problem requires I can become a insert subdomain expert or better yet recommend bringing on the expert in that thing to do the work on that narrower implementation. Yeah, something is on point 100%. Agree with that. Russell says in response to something. His comment here I am very much a generalist in my in many things, but I still encounter resistance from some that think this is a suboptimal position. I champion an agnostic approach to everything, including tech and processes. Hashtag Infinite Learner. I resonate with that infinite learner tag. I think in data science it's such a huge thing, right? Like like I for one, don't think I'm any good at analytics. Like, I'm just not good at analytics. First of all, it just doesn't interest me.

Harpreet: [01:13:49] Like, I'm not interested in analytics. I'm interested in, as Serge is talking about, like building the prototypes, doing the science, like, I love modeling, I love I've learned more and more that I like deep learning better than I like [01:14:00] classical machine learning, and I like playing around with tools and experimenting better than I do delivering quote unquote on business value. Like I like playing with the technology. I like learning about the technology. I love talking to other people about the technology. And I love, you know, helping people learn the technology. I guess that's why I went to developer relations. I liked all of that much more than being like the quote unquote business value data scientist guy. And I know that goes against everything everyone's saying here, but this is why I'm now a developer advocate and no longer a data scientist. I found what works for me and what's kind of authentic to me. But in general, I feel like in this field you need to know how the phrase go. You need to know something about everything and then everything about something. I think that's kind of how you feed yourself, futureproof yourself. Mark is back. Let's go to Mark. Left before we made the announcement. Before she comes back, you're supposed to tell me. Makiko, You're leaving. Mark, go for it.

Harpreet: [01:14:57] I feel like I missed out a really good conversation. I go walk my dog real fast, but to answer this original question, I hope it's still on that on that question of like, how do you kind of futureproof your career? So I'm like almost three.

Speaker3: [01:15:11] Years into being.

Harpreet: [01:15:12] A data scientist now, which is wild to me, and I still feel like I have so much to learn. But I'm at that stage now where like, Oh, the things I learned is changing. And what I'm doing now is definitely way different than what I was expecting when I first even had the inkling and become a data scientist five years ago. And the way that I've kind of leaned on this is is a couple of things. What Kallus thought is, again, Ben's data strategy course. Again, I highlight probably every time I show up here, but it's really that good. But it's like thinking about like what creates a competitive advantage for a company with data. And that's why I always start off with is like what is driving the competitive advantage for data? And then I work backwards of thinking, okay, well, when they were the use cases where data can provide a competitive [01:16:00] advantage and then within those use cases, were the roles necessary or the processes necessary to make that happen? And does that include data scientists? Does that include the engineer? For me, a big reason why I'm shifting so much to data engineering is that I see the challenge of ML models being really poor quality data. I see the challenge of startups is having this data in general and having inventory To have data in general is a big reason why I started my newsletter looking at data infrastructure because I'm trying to learn that aspect. But your focus on data science, I'll shift back to to that sense, but that's my initial process, like how do I start getting things, understand what's happening in the market, understanding how business can compete with that, and then how data provides that value.

Harpreet: [01:16:47] Because then your conversation shifts to I do data science to I drive business value with data. These are challenges I'm seeing. These are great opportunities I'm seeing. And I think that leads to a different conversation. Also leads you to jobs are more likely to set you up for success because it sucks Going to a job where you're not doing data science or the data science is just not data science. And it's not a fun, not a fun time, as our rage translator said earlier. And so that's that's what essentially I'm looking at. And now going a little bit further, it's like, where are the kind of like the specific things I'm looking at now is I think people have said this in the chat, but not to focus on the tools like Python SQL, all that stuff's great. And like, I think you're past that point because we've told you for a while and like one of the basics learn the basics. And I think you're asking like, what do I do beyond that? Right? And I think there's like a completely different layer of basics that I'm starting to become aware of now. And there's like the foundations that have been around for decades. And I just keep on coming back over and over again. And many times it's the the big hot one. Now that you're all all clamoring back to is data modeling.

Harpreet: [01:17:57] The other thing is just like, what are the different patterns that [01:18:00] architecture and software set up to solve problems and then fitting the tools on top of that. And so I'm not I'm still new to this. There's a lot of reading I have to do and a lot of talking to experts to really learn this. But that's the direction I'm seeing. The future futureproof is, okay, what are the foundations? And then from the foundations, where are the patterns that people are implemented over and over again for success? And then from there fitting the tools on top of there with that. And over time, the tools actually enable certain patterns to work better than before. And so I think a great example is the ETL debate L.T and I'm about to jump into some hot water, so I might get burned for this, but bear with me. The idea is why ELT popped up all of a sudden is because cloud computing made storage and transformation and data warehouse so cheap. And so there was a technology and market shift that enabled that pattern to rise up. There may be a whole new other technology shift in like five years that makes this obsolete and like, why are you even doing that? Right? And so that's why I focus on kind of market. What's the competitive advantage that you can provide with data from there? What are the roles and things you can do from there? And then what enables those different aspects through architecture and from architecture tools?

Speaker3: [01:19:23] I love that. I just want to jump in and say thank you so much. That's fantastic, because, I mean, that gets you into first principles, right? I mean, before you even are, you know, hopefully folks aren't just applying for stuff, you know, kind of shotgun approach. But, you know, if you need a job or if you're just trying to get your foot in the door, it's certainly understandable that people would apply for things, you know, kind of cast their net widely. And yeah, the danger is, is that you get into a place that doesn't really know why they're doing it to begin with, and then you end up being a casualty if things kind of flame out. So, [01:20:00] yeah, really, really appreciate that. Mark And then the other thing that I was just going to toss out to is so sometimes I get a little afraid. It's like, so sometimes I suspect that there's not as much need for data scientists as sometimes people think there is. I mean, maybe it really is more data analysts and I'm okay with that, by the way. But and I've done a lot of that work in the past. But yeah, it's like so I like to believe, right? You hear about the statistics, Oh, so many data scientists are going to be needed and this and that and the other. And then sometimes I wonder, is that true? Or I mean, obviously there are many stripes of data scientists. So if you're like a hardcore modeler and then other tools come along that make what you do most of the time, if not obsolete, then maybe more easily automated than, Yeah, that that could be a challenge for folks wanting to get into the field. So if anyone wants comment on that, that'd be great.

Harpreet: [01:21:08] Can I comment on that real quick? Actually, yeah. So I think Harpreet got to this where he is like, Look, I don't want to be this value driving data scientist. I want to. Devereaux I think that's like the right choice for me. I found that I do not enjoy building models like it is. It is mind numbing to come up with, like with this feature tuning and stuff like that. I would hate it. And I realized that once I got into the job. And so for me, what I found is actually I love building data pipelines. I have a day off from work and I'm building data pipelines because I love it so much, even though other people think it may be dry. I'm having the time of my life doing this today, and so that's why I'm shifting it to that. And so I would worry less about like, what's the need? Because I believe there's always gonna be a need around data for a long, long time. Not forever maybe, but for a long time. And I would worry [01:22:00] less about the title itself and focus on essentially, you know, where are you passionate about what brings you joy? Because the main thing is like consistency, because this job does require a lot of like self study and practice to stay up to date. And if you're just not in it, then it's not going to be a fun time. So if you're you're doing a different job because it's like, I'm not really into this, but I think it's going to be more secure. I think you're sure yourself in the flip with the security because you're just not going to be into it. You're not going to put in the work and put in the consistency for it as compared to like, all right, be a data scientist because you really enjoy it and just focus on being more competitive within that market and driving that value. And so that's what I would argue for. Instead of trying to worry about where the trends focus on yourself.

Harpreet: [01:22:45] Let's go to a co stub and we'll see if we can get to these questions from LinkedIn. And there's such good questions. I might have to just I might have just write about them. But if you guys here in the chat, if you go to my LinkedIn page real quick, just look at the streaming video. If there's questions in there that you want to chat comment on, let me know right here in this Zoom chat. That way I could, you know, know to address them. If not, then we'll start to wrap things up after class that's coming here because go for it.

Harpreet: [01:23:13] So I'm glad we got to that part of it. Between Mark and Gina First Principles Foundation. Right. How do you futureproof yourself in a tech career? Right. It's it's let's broaden our perspective here just a little bit, right? It's not just about modeling. It's exactly what we've been talking about this whole time, is that we're not a man with a hammer. We can't be if we are. Well, that a very short lifespan, because these hammers keep changing shape so very fast. Right? So really, it's about how do you know? Like it's like you were saying before, how do you know your scope? Right? What's your best fit like, you know, So do you know that you're keen on making proof of concept and are interested in taking it to the first product, not to the [01:24:00] version? 20 years and 30 are like me personally. I know I figured out pretty early on that making version 20 or something, not that keen on it, making version one of something. Hell yeah. Making version zero. I'm not probably the best person to do that, right, but making version one. Making version two, making version three. That's where I'm really good at it, where you see that transformation. From proof of concept into a stable product. Right. But then that changes over time. So there's two things. A, know where your mentality best fit. So where you want to narrow down is more where your passion goes through and your mentality goes through.

Harpreet: [01:24:39] Like for me, it's I know where I fit. From a compliance standpoint, I'd be the worst person to work on version 35 or 50, and it's all just operations. And I've got to, you know, take boxes and follow all that stuff. But to a point now, I'm transitioning from like the next role I'm heading to in a couple of weeks. Isn't version one, two or three. It's version eight, nine and ten the next evolution of the product. But that's where my mentality now matches what I need to go to, not my mentality. Four years ago or five years ago, five years ago me would be terrible at that. But if I didn't see the value in doing that for my career now, I'd be terrible at doing version minus one or version zero or version one, because there's certain value that adds to my career and I see what skills that's adding up to. I can see that part of where I want to be in maybe four or five years time. Everyone asks you, Where do you see yourself in five years time? There's a reason that was a question. It seems like a stupid question, but. Until, you know, personally, I think your number one job is to figure out where you're going. If you don't know where you're going, you're just going to be running to the ball. What is running to the ball look like? It means, oh, yeah, there's this new tool.

Harpreet: [01:25:58] I've got to learn this new tool. There's this new skill. [01:26:00] I've got to learn this new skill. We turn to the man with the hammer syndrome all over again. Right? So how do you how do you balance out the man with the Hammer syndrome and the jack of all trades syndrome? Two completely different sides of it. I think we're turning to the century of the jack where being a jack of at least many trades is powerful because it allows us flexibility. The best advice I ever got from one of my uncles was, Hey, pick the degree that keeps you the most broad fit. That's why I decided to go for mechatronics as opposed to going for aeronautics. Though I love aircraft, I kind of realized that I can find aeronautics roles within robotics. And then my view was also that robotics has all these other applications as well. So maybe medtech, maybe all this other stuff. And now I'm kind of heading back to Earth observation within like machine learning. And like I found that specialization, that niche of interest to me. But it's also keeping an open mind, Right? We come into it with all of these misconceptions of. Well, not necessarily misconceptions, but this preexisting assumption, right. That, hey, if I want to be a data scientist, I've got to do this particular thing, or if I want to be in machine learning engineering, I want to do this thing.

Harpreet: [01:27:14] It's really kind of understanding that we are a drop in the ocean, right? Nothing we do gets done alone. Like, quite frankly, nothing of real, real world changing value gets done alone. Very, very little. It's extremely rare, particularly in the 2000. Yeah. Okay, fine. If you're, you know, back in the 1700s, 1800s, and you're inventing something that no one seen before, Bessemer processes steel or light bulbs and real foundational shit like that. Like, Yeah, fine. Okay. You could probably change the world alone. But you tell me that Edison and the. Or was. Was it his entire lab of, like, six or eight people? Right. And how much of that was he relying on Volta's work before that? How much of that was he relying on other people's work before? We're always [01:28:00] standing on the shoulders of giants. So for me, people ask me, Hey, dude, you're a robotics guy. You want to help A robotics C, But that's my tagline on LinkedIn, isn't it? I'm building a world where Machine C like if you if you've watched my tagline change over the last few years, it went from I teach robots to C to I teach machines to C to I'm building a world where machines can see, right? That's my understanding of progression, of where I fit into that has just become like I'm realizing more and more how small of a drop I am in the ocean. I don't need to be the guy building the model and putting it on a robot.

Harpreet: [01:28:32] If what I'm doing is contributing to finding value by building machine learning pipelines, by building good software that can lead to that change of having machines that can see an impact the world, right? So really understanding and being flexible to, hey, actually, do I want to be a data scientist or is there somewhere else that ecosystem where I can find a set of skills where I can double down and build on those natural skills as well that you have? Right, Like this, this I mean, I'm not a project manager or a product manager, but it's things that excite me and things that I like doing and I like spending some amount of my time in. But I've got a red pill that if I want to be really good as a product manager in like 5 to 8 years, if I want to make that career shift, I need to spend time understanding the fundamentals of the technologies around me, right? So the next few years I'm knuckling down on just time and attention. As an engineer, we keep chasing success measures like there's an Indian movie called Three Idiots, one of the best movies I've ever watched, honestly, comedic genius. But the biggest takeaway from that is chase excellence, right? Chasing excellence is totally different game to chasing success, right? And for me, you can't compromise on that excellence. And this is part and parcel of the kind of bootcamp economy that we've got now is you got to understand [01:30:00] something about boot camps.

Harpreet: [01:30:01] They're going to get you to that baseline skill really quick, right? But that second level of foundational understanding of what is a system, systems, engineering systems thinking, right, like testing, robust software development, this like robust engineering thinking and principles. These things come from time on detention, right? Like, I'm pretty sure I said I've said this before, but and I'm stealing the story from a guy or some other guy that I've worked with before. But basically when you're used to building robots and you spent three months of time building that robot and then you take it out to the field and then suddenly the magic smoke is escaping from the robot and the robots are no longer working because we all know robots need magic smoke inside the robot, not outside. We know that it's going to take us time to go and rebuild that. So we take bucketloads of effort to make sure that that robot is robust as hell so it doesn't let the magic smoke escape. Right. Whereas within within software, there's a different there's a different thinking, Oh, I built this thing. It goes up fine. Let me let me just scale it down and take it down and make a few changes to roll it up again. That cycle is a lot quicker, But but the first principle is thinking of robustness isn't as strong from the software principle as it is from like an electronics or a mechanical world.

Harpreet: [01:31:21] Right? Whereas electronics is way more expensive to build than simple mechanics. So you're thinking more about the cost impacts of it. So this first principle thinking that we can take from all forms of engineering and this kind of relates to that post that I made earlier about F1 and what we can learn from F1. It's it's engineering. At the end of the day, it's technology. There's first principles around that that we can only learn with time and attention. I'm lucky enough that I got the opportunity to study engineering formally, and that's my biggest takeaway. I'm never doing trust analysis. I'm not doing like, you know, kinematics, I'm not doing electronics and. Circuit board design. But the fundamentals that I learned from that, even though I might [01:32:00] not be the best data scientist, I might not be the best software engineer, ML Engineer out there, that becomes a really strong ground for me to stand on, for me to be able to say, actually, I can see the value in doing something this way because it's more robust or because it's less costly and things like that, right? So that's where I rely more on my first principles. And that's the thing that you've got to understand going into things like boot camps, Hey, I'm going to have to spend time on detention, learning some of these first principles that I didn't get to learn because I took a different path and that's perfectly fine.

Harpreet: [01:32:32] But we need to recognize those things. So if you want to futureproof yourself, understand what the path that you're taking, each of them are going to be different. Understand the weaknesses and strengths of that and how you can circumvent that to well, not circumvent that, but part of that with the ability to build that foundational first principles knowledge and just fundamentally don't compromise on excellence. What does excellence look like in whatever field you're doing and understand it like within your scope, where you fit is going to change over time. It's always going to change. If it's not changing, then you're probably keeping your focus too narrow. So keep a finger on the pulse, right? And just, yeah, like question some of those fundamental ideas. Like I imagine myself doing a PhD at MIT, doing post-doc research and like being that guy that, that puts models on robots. No, that's a very old dream from 15 year old me that I killed a couple of years ago. And it takes and it hurts to kill dreams like that because those dreams are long living. It's a promise that I made myself when I was 14, 15 years old. It hurts to think about yourself that way and critically and actually say, actually, your dreams were not what you're made for. Right. But yeah. Spend some time thinking about where you fit. It's worth it. But honestly, with it.

Harpreet: [01:33:58] I couldn't agree with that sentiment [01:34:00] more. That absolutely beautiful. Beautifully put. If I could sum it up in like, one sentence, I'd say that and this is my sense of it, and it's how I feel about it. Like future proofing your career means just being flexible, not being so fixed and rigid and being willing to change. You might be climbing up a mountain. You're like two thirds up the way of a mountain and you might see the the peak in view. But then you realize, you know what? There's another path that's better for me. I got to walk down a little bit and go back around, but it could be a better path. So being willing to kind of do that, yeah, 100% agree with that because of like I've done that throughout my career. I started off when I'd be an actuary, even though everything is kind of in the same ish space, it's not really the same, you know, starting off being an actuary, going into bio stats like this is a completely different world from from tech and everything I'm doing now and then getting into tech and then doing data science and then saying, you know what, I like doing this other thing better and then start doing this other thing. And I'm like, Hmm, what else is out there? What else can I what else can I potentially do? You know, it might be being flexible enough to to, you know, after 300 episodes of a particular podcast, want to try something different with some friends because you want to take your career in a different direction and do something different. So that big, big announcement that I was going to make before Makiko left, does that mean McKagan and Mark are going to be starting a new podcast of I'm excited for this.

Harpreet: [01:35:22] It's going to be interesting. And and here's the reason I thought about doing this particular podcast. And this podcast is going to be just essentially Siskel Ebert and Roeper of developer tools. So we're going to be taking our combined experience, looking at developer tools, kind of understanding the segmentation, where it fits in the market, what the developer experience is like, so on, so forth. Mark In Mexico, I do have I've got a bunch of questions laid out that I'm going to send to you guys to kind of see what the flow of the show would go. And I'm looking forward to getting your input. But the reason I started doing that podcast was, you [01:36:00] know, this might be not necessarily data science career, but just personal career. And the thing that I'm interested in was I just want to do some different like I wanted to like kind of rewind a little bit wider. I started this podcast, I started this podcast because I was all about the aspiring data scientists thing. I was going to create a course that teaches people how to break into data science and and all that, and I figured this might be a good vehicle for me to grow my brand, spread awareness and get people to learn about me. And then I started doing that thing and I realized, you know what? This is not the thing that I'm actually into. I'm not really passionate about it. I'm not really excited about it. And it took me a while to get into Denver and start seeing what I really am interested in.

Harpreet: [01:36:37] What I'm interested in is playing with products, evaluating products, and I think I kind of understand what makes a good product work and what doesn't. And I started thinking, okay, if I wanted to attract opportunities for myself where I get to do this type of stuff all day, how could I do that? And I figured, start a podcast with two of my good friends who are also into the same thing and who also want to kind of get into this new space of of maybe, you know, doing investments and being advisors and on board of directors for startups and stuff like that. If these are the type of opportunities I want to attract my way, how do I do that? Well, this is one way to do that. Going off on a tangent there, I will take one last question coming in from LinkedIn. I apologize to everyone whose questions did not get got, but this is good fodder for maybe future content. So I'll read through this and see if I can slap clear the 1300 characters or what is it, 3000 characters now to answer it. But the question we will tackle is that the one venue requested and it's about the data governance. Where can I find it? Do you think the rise in this from Mike Nash, Do you think the rise in governance, do you think due to the rise in governance, do you think Explainability is going to become a huge challenge or asset for AI in data science in the future? How do you think that plays out? Let's go to Venn.

Speaker4: [01:37:55] Yeah, I think just really quick, it's that's what's going to push causal when you begin [01:38:00] to start talking about reliability and explainability that there's no other direction you can really go because the deeper the model gets, the harder explainability becomes. You have to almost kind of fudge the meaning of explainability to make very deep, very complex models explainable enough to be user reliable. Like I can explain a deep learning model to another data scientist, but trying to explain it to a user in. In a way that they would accept it and trust it. It just from a governance standpoint, that's kind of a nonstarter. And so we're definitely going to be using deep learning. It's not like that's a dead end. It's great reinforcement learning an amazing tool as well. I think McCain's going in the right direction. What he's starting to talk about, some of the new frameworks and concepts that he's pulled in really from causal and from more traditional sort of economics backgrounds. Some would argue he's taking credit for somebody else's ideas, but I think he's being fairly original, melding things together. But that's the long term implication, is that we have to go towards building causal graph, structural causal models, building something that's visually understandable, where the model doesn't just serve inference, it serves the graph that the inference is based on. And when you do that, I think you can I think you can meet governance requirements at that point, because if you don't get to the at least going towards causal, then the reliability for especially high end business processes.

Speaker4: [01:39:38] As soon as C-level executives realize how unreliable the models that are handling and I keep coming back to inventory and demand planning right now because that's the one that's just killing retail. And if you look at what's happening to Intel, very, very similar, you look at Google's internal productivity issues, it's because they have no visibility into how most [01:40:00] of their teams operate. And so all of this data that even the biggest companies on Earth, the smartest companies on Earth, are gathering all these models. They're deploying, they're beginning to realize that there are significant limitations when you get outside of the bounds of that original data set. And so Amazon's already gone there, Microsoft's already gone there. Ibm's done some really intelligent explorations into it. You've got Netflix, Lyft, DoorDash, I think has done it, too. You're just going to start watching the companies that are actively and publicly talking about causal and working on migrating from what they were doing in the past to causal models. Just watch their stock performance, just watch their quarterly results, just just watch them compared to everyone else who still sort of in this less reliable, less explainable paradigm. And that's the taking it out about five years.

Speaker4: [01:40:57] That's the implication. And the long road of explainability and reliability kind of colliding together is there's no other way you have to go towards causal, never have complete causal graphs. So I'm not I'm not trying to get too futuristic with we have these massive, huge, complex causal graphs. I don't think the business world has come to terms yet with what understanding the top three causal features for any particular customer behavior, how much money you can make just with that, even if it's a 50 feature deep causal graph, if you have the top three, what you can do, what you understand, what your experts will be able to do with that type of knowledge, what you can automate, it's going to become increasingly clear how much of a competitive advantage. And so it's not just going to be a regulatory, it's not just going to be users pushing it, it's going to be investors starting to ask some questions. And this kind of goes towards [01:42:00] one of the posts, which is why I wanted to answer this a little bit, but I'm going to be talking about like the there are some new investor metrics that are going to come out very soon that will measure. Whether or not what the company is doing with data is BS or real, and you're talking about that overlap explainability reliability.

Harpreet: [01:42:23] Ben, thank you very much. Coach, let's hear from you.

Harpreet: [01:42:27] We've got to be really thoughtful about the policy we make. Just fundamentally, we cannot just make policy without thinking about the impact and limitations on technologies from it. Right. And the fact is, we're going to make policy based on a certain set of thinking and paradigms. And then there's going to be technologies that we just eventually may find unsuitable because we can't get to that explainability point because the data is such an integral part of the model Explainability itself, right? The main situation where I see this becoming quickly a point of contention is with EU GDPR, right? Anyone who knows the EU GDPR basically means knows that you. I can give you my data, but when I choose to take my data back, you have to delete that data. But. Not the artifacts of that data. Right. But I also have to delete all historical existence of that data, backups and everything. I can't keep it at all. Right. Which is great from a privacy standpoint. But let's turn around and say that there is a model that can help to have super early identification of skin cancer or something else like that. Right. Some really difficult to diagnose medical condition. Now, let's say that I've provided my data. To help build that model. And I've then gone back and revoked my data after the models been created. Now the model exists. It's an artifact [01:44:00] of my data. As long as they can't recreate my data from the model, it's okay to continue to retain that artifact. Those are essentially the ground rules of GDPR in a nutshell.

Harpreet: [01:44:09] But now, if they've gone ahead and deleted all of my data and let's say a number of people of Indian Indian background, all have pulled their data off that model for political reasons or whatever it might be, that they don't line up with the people that are making it right. It's very possible for something like that to happen now. Explainability of that model when previously it was really possible turns to dust because you don't have the data anymore to be able to tie it back to existing biases within data. I can't even go back to my backup data and look at, Hey, what were the biases in my data that I didn't detect then that never turned up on a report that now with my and my current thinking currently I can detect biases in. That's where something like EU GDPR. And I'm not saying that the policy is bad. What I'm saying is that that policy has impacts on on applying medical data to models, right? Things where that explainability is super, super important. So really understanding the impact and then understanding the limitations within which we play. We cannot live in a world where we can do whatever we want with whatever data we want, but then also live by guidelines of privacy. There are going to be limitations. Any new technology comes into limitations predominantly built on fear or on risk, on human capacity for risk. We maybe need to accept those limitations and maybe we can't get around the EU GDPR explainability aspect of it.

Harpreet: [01:45:43] And maybe that means that some medical models become way too risky to use. Fair play. That's the that's the field we've got to play, right? That's the hand we've dealt. But we've got to keep these things in mind when we're building policy and then keep things these things in mind [01:46:00] like understanding policy as well as not. Capability is one aspect of building good tech. Understanding the the, the legal and economic framework around it, the human aspect, the social aspect around what we're building in technology is also super important. Any business I mean, I've never been to business school, but I hear that's one of the fundamental things that you consider when you're looking at a business, is it doesn't know the lay of the land, right? So there's there's a lot of those things that we need to consider and accept. But yeah, let's I mean, having that conversation, having people such as a lot of people in this conversation involved in those discussions on how do we shape policy in a way that allows us to still extract value from the technology and tools we're creating while not compromising on values and beliefs that society has? Right? It's an important conversation. And if we leave it all to the lawyers, well, we don't have enough voices in that room. So having the right voices in the room is super important. So we need to engage with the wider community, not just sit up in an ivory tower, say, Hey, we're building models, trust our models. They're right.

Harpreet: [01:47:10] So thank you so much. Quote here from I'm going to read it. I spend a big chunk of my time writing detailed and an explanation about what each model does the SEC requirements and data governance is if it's only going to get stricter and firmer and rightly so. Always thought that's a pain, but that should be the right way of doing data science of Explainability. Yes, model cards are key. Thank you very much for that. I got a question though for for coach the promise to a new computer vision question. And I realized that as I was thinking about the question that I have the answer in my mind already, when I and any any time I see a particular pre-trained model and it says that that model is trained on image net, I should probably just assume that that model is useful only for classification. And the reason I'm thinking [01:48:00] the answer is, is yes is because when when you say that this architecture is pre trained model is trained on image, that implicit in that is the fact that it is trained with image nets labels which are classification labels, not necessarily bounding boxes or segmentation mask or anything like that. So I think I answered my question. Is that right?

Harpreet: [01:48:22] To a point. Depends on which layer of that network you're planning on using, right. If if you're using that as like fundamental, like the early layers. That's just figuring out fundamental shapes and things like that, right? So it could be useful for more than just classification as part of your backbone. Yeah, it depends what head you're putting onto it. So, yeah, I mean, it can be useful. It's another classic case of it depends.

Harpreet: [01:48:50] So, So I could like theoretically start. I'd love to hear from you on this too, but I could have like a pre-trained model that has a backbone that's trained on image net. But then I changed the, you know, you know, the head of the model, and now it's useful for a segmentation task. Is that does that make sense or is that another? It depends.

Harpreet: [01:49:15] It could be depending on on what you're trying to segment and the labels that you're trying to get, that that fundamental and coda is an encoder, right? You're essentially finding out the fundamental data as long as long as you're kind of aligned to the labels there, the deeper levels of semantic knowledge would still kind of apply, but it's super hard to say, Hey, Layer 12 is why I want to stop. Layer 25 is where I want to stop. That's where the it depends comes from. So in a large part, a lot of these things are, yeah, if you're using majority of the model, it's probably really only useful for those that kind of application, right? Like if I'm using the majority of it, then it's probably only really useful for classification. But if I'm using the just the early few [01:50:00] layers, But then the question becomes where do I stop? Which what's the point where it no longer becomes useful? There's a bit of experimentation to figure out what actually works best, and this is why we've got people experimenting with many different backbones. But at some point imagine it's just got so much scale that the first half of the encoder just becomes inherently useful for a lot of like extracting encoded information for images out.

Harpreet: [01:50:28] And so another question. Thank you so much for that. So another question is, if you can recommend particular architecture that might might be helpful in this type of case. So both storage and cost of if you guys got any insights, let me know. So imagine I've got I've got data that's taken from satellite imagery and I've got pictures of images of, you know, you might have a forest, you might have a clearing, you might have a river, you might have like a mountain range and everything's labeled, but they're all aerial, you know, images. Which architecture would you lean towards for something like that? Anything pre-trained or I wouldn't.

Harpreet: [01:51:15] I haven't found something pre-trained for that. I think it's just you have to have good labels and, you know, you use because a lot of the the pre-trained models that would do that kind of task are related to like facial features, bodies, cars, other things, not so much satellite images. Maybe you'll find something on. Of course, it's always good to look online. And what is it? I'm blanking right now. Hugging face? Yeah. You might find something. A hugging face for satellite images. I'm [01:52:00] not actually the person in my team that works on that kind of problem. I usually just take the features they've already created with satellite images, but I could ask if that interests you. If they use some kind of pre-trained architecture for that.

Harpreet: [01:52:18] Yeah, I would have to. I'd have to use a pre-trained model from what we have at at DSE. So I'm working on like I'm just coming up with some fun project ideas to do with our training library. And one of them was, you know, I did some research, came across this data set, which was the planet Day said, I don't know if I should have the right screen. I am understanding Amazon, the Amazon from space. If I could find pictures of I had a link to it somewhere.

Harpreet: [01:52:44] But so you want a land cover classification model you might find already pre-trained model that does everything okay.

Harpreet: [01:52:53] But if I was to use, let's say some some other pre-trained type of architecture, let's say like, you know, vision transformer or a resonant or something like that, efficient that or something like that, would there be like, I guess the question is, when considering the type of image that you have, how does that influence the architecture choice that you use?

Harpreet: [01:53:23] I mean, part of it comes down to how much people have actually addressed that particular problem before with the architecture you're talking about, Right? Like, I mean, if it's going to vision Transformers, I don't know how much people have done with vision transformers for satellite imagery. I just have. I've been a bit away from aerial imagery for a couple of years. I'm actually headed back towards that direction because you also find out in a couple of weeks. But I did back towards the aerial imagery, satellite imagery kind of space. But I mean, for example, a few years ago when I was doing my my master's thesis, I was looking at Psycho [01:54:00] again and a couple of those things. So some of the what was it essentially unit based backbones that were looking at semantic segmentation for for land cover that was done a little bit. I think that was a little bit. On converting maps to realistic map to real world images versus images back to maps kind of thing. It was a little bit of work done on that. Yeah, I think to be honest, even in that situation, I found that the Pre-trained model that they provided because what they were looking for was like map images on the other side, like Google Maps kind of thing. It was very different to what I was looking at, which was spectral band substitution of different like narrow band frequencies and things like that's a different kinds of imaging.

Harpreet: [01:54:53] So I ended up essentially training it myself. So I built a bit of a data set and it was a bit of a science experiment because it's a master's thesis, but so it was a limited data set, but you'd go to a has someone collated the data set that you really need? Be Are there reliable labels that actually matter? Right? Like if you're looking at vegetation coverage, like I can't remember what the actual there's like a vegetation index, I think it's MTBI or something like that. Right. That's a certain look into ground cover versus built up land. That's a totally different data set, totally different set of labels. Will a model trained on identifying ground cover for a built up area be useful and have learned the same features as something looking at, at different types of grassland and marsh and mud and trees and stuff like that? Maybe not because it's learning features that are manmade and those fundamentally look very different to features that are naturally occurring in like forest areas and stuff like that. So [01:56:00] a model that I'm training for the bush might be totally different for to a model that I'm training for urban areas, right? The same way that facial features and human features are totally different to top down, like literally a model that's trained on like a pad net, for example, if it's trained on an autonomous vehicle.

Harpreet: [01:56:21] Not sure what happened to coastal, but it got muted there.

Harpreet: [01:56:24] Sorry. I might have. I might have dropped out for that. Yeah. Yeah. Like a model that you're seeing from front on from an autonomous vehicle perspective. Probably not going to be useful for detecting a top down view because people look like a head and shoulders from up top and like just a visual representation is totally different. So is there a backbone architecture for it? Maybe. Has someone addressed a similar problem? Yeah, I'd probably use it. Otherwise you're kind of reliant on retraining it kind of from scratch because people haven't been addressing that particular problem. You could use the same approach, but yeah, it comes down to data collection and the right labels and the right visual representation.

Harpreet: [01:57:06] Thank you, guys. Appreciate that, sir. Thank you for giving me a link to this land cover classification of satellite imagery. This is great. Appreciate this. Yeah.

Harpreet: [01:57:16] Just also check out. Yeah, check out USGS, the US Geological Survey. They have bucket loads of satellite data. A lot of it is free and some of it is paid. And it's very accessible.

Harpreet: [01:57:29] Nice. Nice. Perfect. Thank you very much. Awesome. I'll try to keep up, man. I feel. I feel that because, you know, like Keith was talking about getting into computer vision, that's kind of like what I've forced myself to get into due to this new role. I love it. I feel like I'm studying a lot. I feel like I spend most of my days learning, but I wouldn't have it any other way. It is fun. That's it, man. It's been a while since we gone 2 hours. It's been a while since we had like almost 40 people here at [01:58:00] one point. So I appreciate you all Stick with me till the end. Sorry for everybody's questions that I did not be able to get to. Huge shout out to everybody that stuck at the entire time. Gina, then Emmanuel, Kosta and Russel. Appreciate all y'all, my friends. Remember, you got one life on this planet. Why not try to do some big cheers, everyone.