HH95 MIX.mp3

Harpreet: [00:00:09] What's up, everybody? Welcome. Welcome to the arts data science. Happy hour number 95, five more until we hit number 100. I hope you all can join me if you have not already go to go to register bit.ly bit.ly FortiGate DSO H delegate all of the all of the sessions on your calendar. So five weeks from now that's going to be one, two, three, four and five. That's going to be the first Friday in October. That's going to be number 100, their first Friday in October. So I'm excited about that. And I can't believe there's been 100 of these already. Yeah, two years. My two years of doing this, it's been it's been awesome. Again, no more podcast. I mean there is a podcast, but no new episodes have been released because I haven't recorded any and so probably do a backlog of them and have them coming out in early next year, January next year. Shout out to everybody in the building what's going on? Ken Giese in the building, Kristen Songer, Peter Russell, Jacob, good to have all y'all here. Everybody that's watching there on LinkedIn, watching the livestream you want to join, send me a message. I'll send you a link to join. If you got questions on LinkedIn, please do let me know. I'm happy to take all of your questions. So let's go ahead and let's kick a let's keep the stream off with just a question. Let's talk about what content is missing from LinkedIn. I'm curious what you guys want to see more of on LinkedIn. Let's go to let's go to Ken G and then we'll go to Christian, then soccer. If you if you want to get in on that, let me know. And if you have questions, whether you're here in the chat, whether you are. Watch it on YouTube or on LinkedIn. If you have questions, let me know. Be sure to cue them up. Ken, go for it.

Speaker2: [00:01:50] Very self-serving one, but I think YouTube content is missing on LinkedIn. So a lot of these different platforms, they limit the amount of exposure [00:02:00] that content from other platforms get on on their individual platforms. So if I post a YouTube video on LinkedIn, it's not going to get shared. It's not going to get as much reach as it would on a more traditional platform or as like a completely organic post. And I think in some sense I obviously understand why LinkedIn is doing that. But if you're creating a marketplace to share ideas, professional things, it does make sense to make a cross-platform pollination more rich and more accessible. Because I would be using LinkedIn significantly more if I knew that the things that I was posting on LinkedIn were still going to blow up. They're not just on some of these other platforms that I use.

Harpreet: [00:02:44] Yeah, I've been kind of I mean, it could just be that my content sucks and it's not good, but I've been kind of disappointed with the reach of the post that I've been doing. You know, I've got a group chat going on with a bunch of other creators and everyone's getting tremendous amounts of reach. But I've noticed that recently, within the last, probably three or four months, maybe even since the beginning of this year, actually post the mine, it's very, very rare that I get anything more than 7000 impressions. And I do not understand why that is. And I'm thinking maybe a reason is because I do sometimes link to other content or, you know, I'm sharing stuff from from elsewhere. And I think if you do that too much, maybe you start getting penalized by the by the, by the algorithm. But I'd like to think that I'm kind of halfway decent when it comes to writing posts, so I still got much to learn. But yeah, I feel like kid Christian, let's hear from you.

Speaker2: [00:03:38] Yeah. Just to your point, Harpreet too. Yesterday I made a post and it never even showed in my activity log. And it's still not there. And it's only when I search for like the first few words of my first sentence that I find it. And it's very limited impressions there too. So I don't know like what's going on with that, but that was really, really interesting to me. [00:04:00] But as far as missing content, I guess more so for me, maybe content topics and I do see it in the data community occasionally, but there's such an emphasis on talking about tools in in data. And I would like to see a little bit more on the communication side of things and maybe like technical communication to non technical stakeholders. That would be some content areas that I would really like people to cover more often. Yeah.

Harpreet: [00:04:32] Yeah. You know who does a good job about that communication aspect of stuff? There's a couple of people that come to mind. Our guest is is Gilbert Eichenbaum. I'm not sure if you've follow him already.

Speaker2: [00:04:42] I do not know.

Harpreet: [00:04:43] Yeah, definitely. Check him out. He's got he's got this book called I think it's called People's Skills for Analytical Thinkers. He's been on my podcast, been on Ken's podcast as well, and actually a podcast of me on Gilbert's podcast, an episode with me on Gilbert's podcast was just released. So if you guys get a chance to check out his podcast, oh, sweet. Do so check that out. And then also Brant Dikes, who's like the storytelling for data science guy either. He's also been on my podcast. Ken, have you had Brant Dikes on your podcast? Not yet. It definitely two people that I would recommend following. They definitely got content.

Speaker2: [00:05:19] I think what's her name? Kol Nissenbaum. Yeah, I'm having her in a couple of weeks. She's also a big storytelling and more soft skills, tangible like personal development skills that intersect with with the data skills.

Harpreet: [00:05:35] Nice. Okay. I've been trying to get on the podcast like two years, and every time she's like, no, come on, come on, pause. Kenji, would you like me to better? Probably. Probably. But let's hear from Shankar. Shankar, any anything that you would like to see more of on on on LinkedIn? Then we'll go to Peter. By the way, those of you watched on LinkedIn. Let me know if you got any questions. Shout out to Raphael in [00:06:00] the in the LinkedIn chat saying LinkedIn reminds me of an old version of YouTube. I'd like to hear more about that. Paul Fentress, what's going on, Paul? Wondering what the theme of today's discussion is. And the theme, as always, is whatever you want it to be. That's the wonderful thing about these data science. Happy hours. They are a pick your own adventure type of session. So whatever it is that that you're interested in asking or want to talk about, Paul, do let me know. That being said, it's time to go for it.

Speaker2: [00:06:28] Yeah. I'd probably just piggyback a little bit off, Ken, just that I don't see a ton of like, you know, obviously there's a lot of really, really good data science content on YouTube that I've been following that I don't see circulated at all my feed as much as sort of a nascent creator, if I would even call myself that. I think my sample size is really small, so. Yeah. I still have to figure out the ropes of, you know, what the LinkedIn SEO, the recommendation algorithm is doing versus, you know, some other content curation sites. But yeah, I will probably just punt on the on the answer to that to someone who has a little bit more experience.

Harpreet: [00:07:14] Yeah. What are you up to, content creator wise? You got like a YouTube channel. Anything you want to do, a shout out or podcast?

Speaker2: [00:07:20] Yeah. No, no. Youtube channel. Just starting to write a little bit of blog post right now. Just doing kind of a couple of articles on towards data sites, popular one on medium, but just trying to follow as many creatives as I can still really exploring my, you know, data science creative side I've been yeah so so still very much nascent in that not exploration.

Harpreet: [00:07:44] Right on that well it is a rollercoaster of a journey I can assure you that just kind of stick with him and just that's that's the best advice I can give to you. Do want to shout out your medium name at all? Do you have, like, a special username or just.

Speaker2: [00:07:57] Yeah, my username is not special at all. It is a [00:08:00] string of characters and letters or star characters numbers. So it is all lowercase s one st that's Sarai and I've. Yeah, I'm planning to write a little bit more, create a little bit more and yeah.

Harpreet: [00:08:14] Well definitely put your, put your put a link to your blog right there in the chat and then I'll be sure to copy it out over there. Onto LinkedIn. Shout out to Shasta, Mark Freeman, Alexis Press, intern in the house. Thank you all for being here. We're just kicking off the discussion, just talking about the kind of content that we think is missing from LinkedIn. And what would you like to see most of all? Go to Peter next. Mark Colvin If you all got any input, it is always welcome. Shout out to Russell. Russell Looks like he's having some issues with connected audio, but I think we'll have you here soon. Go for Peter.

Speaker2: [00:08:44] Yeah, sure. Thank you so much. I've been so looking forward to joining this beta alliance. I've always been looking forward to it, and now I have the pleasure to be here. So I would say what I'm really looking forward to see, like content wise, is regarding like you like the motivation for like entry level people trying to get into beta science. I mean like coming from a non-technical background. Like it had been very difficult for me to enter data science and most of the content that I see is from people who are already in the data science building that are crushing it, right? So like for people who are looking to get into data science, they don't know how hard it is, so they don't have that raw motivation to get started. So I'm really looking forward to that type of content.

Harpreet: [00:09:48] Yeah, I've got a couple of good names that come to mind, just that kind of need for sure. Avery. Avery taking his last name. Avery for sure. Date of career jump start. Check out [00:10:00] Avery Smith. Yes. Then also Rashad. Rashad Neves. Becker's got some awesome stuff as well, kind of directed towards that. But I've been seeing a lot of lot of cool up and coming content creators lately. You know, you're Christians. One of them. I'm blanking on some names, Dylan Dylan's one. I can't remember his last name and a couple other people met.

Speaker2: [00:10:21] Nathan, come back. Yes. So awesome.

Harpreet: [00:10:24] Yeah. Megan's. She's also been on my podcast as well. We talked about data engineering, stoicism and, like, personal wellness and stuff, so definitely check out the episode with her. Yeah, let's go to sort of Mark and then sort of in again, we just opened the floodgates. I know you're not supposed to talk about on LinkedIn or LinkedIn livestream, but question is, what is it that you think is missing from LinkedIn? And by the way, if you got questions, let me know. Mark, go for it.

Speaker2: [00:10:50] Yeah, I think something, at least in the data space is I really wish there was more content on the build versus buy arguments and how are people playing that? Because a lot of the arguments I see is just like trashing or promoting a specific vendor or tool. And I think they I think that that kind of content is like it's interesting, but without the context of like why you would do that, I think it does. It's not that helpful for the larger community. So like for example, I know like the ML ops community does like pancake stacks, but they talk about their stack and how they kind of put those pieces together and the reasoning behind that, because there may be a group where like using five channel to do like their data ingestion right into the warehouse. That may make a lot of sense for for other people that may not make sense for example, like what they need connect to. This is just not part of their their connections. Right. And so I feel like deeper conversations of like the trade offs they're making of like why they choose a vendor, why they don't. And just at a high level, they only share your [00:12:00] business details and things like that. But I think having that nuance of like this is the case study, this is the context and this is why we think this is better.

Harpreet: [00:12:10] Mark, thank you very much. Let's go to let's go to Vin talking about the content that he thinks missing from LinkedIn. And then also, I'd love to just kick off the discussion there with Ben about the build versus buy debate. And if anybody has anything to say about the build versus buy debate, just let me know. Just raise your hand and I'll be sure to call on you then go for it.

Speaker3: [00:12:29] I think build versus buy. It depends on how big of a company you are. If data science or whatever you're buying is in a core capability, then or a core competency buy, why? Why build up an organization that isn't going to support core business? And I think we've forgotten that, like in data science, we've completely abandoned that. Every business is developing, every data science capability instead of buying from really smart people who have already done it before. So the build versus buy debate, that's my thing is just look at your company. If it doesn't support core business, like why, why build the capability to build it? So that's where I kind of fall down because there's so much good stuff out there that other people have figured out. And you were talking about one group that was sharing Uber all the time, shares LinkedIn talks about building out their stack a lot. I think Netflix has an entire series on building out their stack, and more and more companies are just coming out and saying, Look, this is how we built it because here's the stuff that was out there that was awesome, and here's the places that we had to customize. And I think that's the important piece. The customizations come around.

Harpreet: [00:13:35] Core business haven't worked at Developer Tools Company for the last year. Three different developer tools companies. That is the single most biggest obstacle that our sales people face is that people are always like, Oh, we'll just do this in house and. That's a hard objection to overcome. And I think it's you've got to frame it as don't you have better things to do with your time? We've got this solved solution. How would [00:14:00] you like? Okay, let's give some advice to salespeople out there. Then how would how do you handle that? Objection. Like if you were in a sales role and you kind of got like a sales background, like way back in the days. Right. How would you handle that? Objection. Then after after Vin, we'll go to to market. Shout out to Joe Russo. Sorry.

Speaker3: [00:14:23] Go for it. No, the objection is kind of easy. I just ask, what's your core business? Does that support your core business? Why are you developing that? And that line of questioning right there is what gets people at the strategy level to stop and go, wait a minute. Yeah, why? Why are we. Wait a minute. So we're building this thing, we're creating capabilities, and it's not going to at any point support our core business. You know, and that's the that's really the best way to do buy versus build is when you're pitching it, just ask is that you're going to build this thing great. What do you do with those people after it's built? Are they going to support core business? Because if they are, yeah, awesome. Hire them, do this thing and then put them on to the next value producing thing. But more times than not, they'll hire an entire team. They'll build this stack up for them that they could have bought and it would have been cheaper. And then they have no idea what to do with the people. And so they start doing like different projects that get them off track from their core business. And that's when you're looking at a lot of businesses right now, they're doing layoffs. It's all that cycle where they brought in too many people and they were building stuff they didn't need to and reinventing the wheel. And as far as what content's missing on LinkedIn, I need more Ben Taylor like I call his style data science swagger. We need way more data science swagger. We need like we need that so much. This is actually the sexiest job of the 21st century. We need to start acting like it because we don't. And we need more.

Harpreet: [00:15:50] Ben Taylor's Yes, straight up. I think I can try to help fill that void. I will try try to do my best Ben impersonation. Vin, thank you so much. Shout out to Joe Reece and Mexico [00:16:00] in the building. Just a quick primer. We're talking about the build versus buy debate. So if you got any, you know, tips on that or any insights do share, we'll go to Mark, then we'll go to Ken. If anybody else wants to jump in, please do just raise your hand. If you're watching on LinkedIn or on YouTube, let me know if you got questions. Happy to take them. Go for it, Mark.

Speaker2: [00:16:19] Then just undercover told us we're all boring. So joking, joking. One, one thing that I think about, not necessarily like selling into or the selling within org and trying to build out a use case and then build out data infrastructure is I, I definitely get pushback for by things because we're a startup, we're like, are you sure we want to spend our money there? Right? It's a lot more tight. And so I really have to to make the case for it. And a great way for to make a case to buy something is for people to feel pain and make it their pain. And so what I do, I work with my manager. I identify like what is the quickest way to drive value and show like this can become business critical in a way are not business critical but like drive drive value in a way that's going to be used throughout the business and then find the simplest way to implement it. And then clearly state, hey, we don't want to buy we're going to build this quick thing here to show kind of like what the value can bring and have you agree that it brings you value. This is the point where it won't scale anymore and you actually need to do something right with it. And that's often worked pretty well for us because it convinces them that does bring value and then also convinces them that like, oh, to actually really take advantage of the value. It takes a lot of work and we don't want to put that there. And that makes the argument certainly a lot easier for us.

Harpreet: [00:17:48] Mark, thank you very much. Let's go to Ken then after Ken will go to Christian.

Speaker2: [00:17:53] Yeah. Echoes a little bit of what Marc and Ben were saying. But I look at all of those decisions in terms of time, money and [00:18:00] utility. So just because it's going to be expensive up front doesn't mean that for the company to build something on your own, it isn't going to be significantly more expensive and time consuming. And just like what Vin was talking about, the resources, after you've delivered a platform or a product, what do you do with those? There's all these immediate opportunity costs that are in play, and you really have to think about, Oh, between these factors, how much more value are we going to create for ourselves if we do this? If we do this now, if we do this in the future versus if we buy something off the shelf and iterate it, iterate it to our use case. I would also say, like most of the products that I'm seeing now are highly customizable and they're components that work with other bigger systems. And I think companies are getting significantly better at just picking the piece that they know they can be useful at rather than trying to do something like to own like a giant portion of your your pipelines or your process. So to me, if I'm trying to sell something like that, it would be really talking about like, hey, we're just here to help you with this part that we specialize on. And you don't specialize on the time and money associated with this over the next three years, five years, ten years, it's going to be 100% in your favor if you're doing something off the shelf.

Harpreet: [00:19:21] Ken. Thank you very much. Christian, go for it.

Speaker2: [00:19:24] Yeah, I can just speak to kind of the pieces aspect. Currently I'm seeing the build versus buy debate in my current work environment. But what happened is we're we've decided to own the actual data engineering and modeling pieces and then all of the data visually visualization and dashboards is going to be outsourced to an off the shelf company to actually build the portal and the application to access the data through Power. Bi So there was a lot of debate there because there needed [00:20:00] to be some internal ownership of the data and some of the stakeholders just wanted to outsource all of it, including the engineering and the modeling. But we found that it was strategic to keep that secret source internally and then just have the dashboard being done externally for those applications.

Harpreet: [00:20:21] Christian, thank you very much. Joe or Mickey, anything to add here?

Speaker4: [00:20:27] I guess the question I always ask when Bill versus BI comes up is, you know, is it a core competency or is it not? To me, that's really the core competency of the business is what I mean to. Right. Is it something that's core then you should build it. If it's not and it's a strong argument, you should buy it. It's the same argument. I feel very tired of hearing this, but it's it's like I mean, if you have to go buy car tires or if you need car tires, do you make them from scratch or do you go buy them at the tire shop? So if you make them scratch, I'd be pretty bad ass, actually. You should tell me how you do that. But for the most part, I think the the answer is pretty self-evident. And that's I think it's how I view the builders is by debate. But then again, the tendency is for engineers to want to engineer stuff. Right. You have the title engineer, you know, and so you you think you should get paid to to build as many things as possible. So that's where it gets a bit more complicated.

Harpreet: [00:21:24] So yeah, when I was very early on in my career, any time anybody would suggest buying a tool from a vendor out automatically, like kind of get offended, be like, what? You don't think I'm smart enough to build this? Like I could do this like we're talking about why we got to pay somebody. I could do this. I could definitely resonate with that instinct. I just want to try to at least do it ourselves. But then once you find out how difficult it is, like, I should go another way, Mickey. Go for it. And then we got a question coming in from LinkedIn after the Kiko from Paul Sanchez. Great question. I think it's a good question for a lot of people in this audience. This is going to kick off a good [00:22:00] debate. Mickey, go for it now. Well, win for for Mickey to buy this joint. If you're watching on LinkedIn, if you watch on YouTube, be sure to smash that link and give this thing a reshare. If you saw this come up on Twitter, do give it a retweet. You know, try to help spread the word y'all. See if Mickey goes, Mike has sorted out. Give me give it a give it a test. We'll go to Paul's question and then we'll circle back to you. So Paul's question is related to business and machine learning. And in this scenario, where the business is asking to implement machine learning but does not yet have a data infrastructure set up like a data pipeline to continuously collect data cloud storage, etc., etc. Is it better to suggest to start with data engineering before diving into ML? Let's go to Joe for this one first.

Speaker4: [00:22:51] Oh geez. I love softballs. Yeah, you should definitely dive headfirst into AI and ML without data pipelines for sure. That's definitely a really smart way to go about it. You should definitely dive into an ML without really having any data in the first place and definitely do it. I strongly encourage you to have no real objectives with AI or ML as well. Like that's a really good way to succeed with ML. And so and if you take my advice seriously, you will probably fail at everything that I just said. So in all seriousness, I would say data engineering is a great place to start. So, you know, it helps you set the foundation for things. I mean, I actually got into data engineering precisely because I was a data scientist tasked with building, you know, expected to do magical things without any data or infrastructure to support it. And I'm pretty sure there's a lot of people on this call who can relate to that situation. That's a situation where you're set up to fail. So, yeah, I think data injury is a really good thing to start with. [00:24:00] What do you say, though?

Harpreet: [00:24:02] Yeah, I'd love to see Mikey if you're. Mike is back up and running.

Speaker5: [00:24:07] Can you hear me now?

Speaker2: [00:24:08] Yep.

Speaker5: [00:24:09] Okay. Great. Damn you. Yeti again. Foiling me and my best intentions. As someone who's in the ML space, I totally agree with Joe. Fail at all the things. No, no I like. I agree. I also think t like the the the ladder is kind of like not super linear. Like you're going to have to make iterative improvements on like data and then on your like the application and code around the data which includes models but not is not always models. And you have to improve the data. And then a lot of times it's a little bit more like you're like walking on stilts and one stilt is like the Data stilt and the other stilt is like the data science ml stilt. And you're just kind of like making your progress forward. And I think that's like the only thing I would sort of point out. But I think at the end of day, like, everything is still data. I mean, I know some people will argue that technically everything is pipelines, but, you know, like even for example, Andrew NG for the longest time was like, oh. Models, models, models. His new schtick is data centric AI because at the end of the day, if you don't have data, whether it's small or like very, very humongous data, you don't have like, well, I don't say you don't have a product or service that you can serve because you can sell products and services without machine learning. And data science and companies have done that for years. But at the end at the end of the day, like models, like machine learning and like malls, they add like something extra. So I don't know. For me, I'm like, I think we're all valuable and we should all get along and like collaborate and all this other stuff. But I also understand that sometimes it makes a bitter like hype, a better hype piece if you're like, Oh yeah, one is more important than the other and all that jazz. So [00:26:00] yeah, I think start with data, but understand that it's always like an incremental progress in different areas.

Speaker2: [00:26:07] At.

Speaker5: [00:26:07] Different points of time.

Harpreet: [00:26:10] Let's see if if Vin or Sangakkara got anything to attach to this or can. If you guys want to input, please do let me know. Go for it then. And then soccer. If you want to go, let me know. Just thumbs up and Ken can't go in off camera, so probably no good.

Speaker3: [00:26:27] Yeah, I don't know about, like, something to add, but I think the root cause of a whole lot of these different problems is that companies come into this with zero strategy and zero objectives as far as what to do with AI. They've been hyped into it and they've got somebody at the company who's fallen in love with models and AI, and they've got somebody from a conference that got in their ear and told them about A.I. and they've been reading a whole bunch of articles about, you know, and that's the thing is we got, you know, ten years later, we still have a hype cycle. And so if a company gets into this field with zero strategy, you know, it makes me money. I mean, that's my entire business, but it's something that they should really avoid up front. If you don't get in with this idea of what what's your first use case? What's your first value proposition for this? Who are you going to bring in? Who can actually build this out for you in the right way rather than having somebody inside of your business try to guess? And that's really where we are right now, is companies don't know enough to know where to start, but they're smart in technology and so they feel like it's exactly the same thing and they'll walk straight into a lot of the pitfalls. And a lot of.

Harpreet: [00:27:43] The.

Speaker3: [00:27:44] The differences aren't understood until two or three years and 10 to $12 Million later. And I think that's really the root cause of it is. And once that happens, trying to reconnect the data team with course strategy is super painful. [00:28:00] A lot of stuff has to be undone and backed out and connected back into all the other strategies that are that have been built around technology. And that's where companies sort of mature is when they realize, okay, now we have like eight different technologies. It used to be we were just using software, now it's software, some Iot here and there. We've got cloud, we have data, we have analytics, we're starting to play with models and all of them do different stuff and we're not sure what they all should be doing. I think that's I mean, if I had anything to add, it's really at a higher level. I think that's the root cause of the problem is we need to get companies to think about Y first and then decide, okay, this is worth doing because there's enough money behind it. And typically what happens is they don't see how much value is in it for them and so they're not willing to invest the time or the cash up front to do it right. And they do it haphazardly is almost an experiment, and it doesn't end well until they change the way they're thinking.

Speaker4: [00:29:03] Well, let me ask you this, then. What are some questions that a company could ask to determine whether or not there's machine learning fits with their strategy or their strategy? If it's a machine learning?

Speaker3: [00:29:14] Well, it's not really. Does the strategy fit with machine learning? That's you're starting really one question too far down the road. It's what's our relationship with technology and what opportunities does technology create for the business? And then when you start from that perspective, and that's what I always do with clients, it's I'll flip it around, you know, don't ask what can this technology do for your business? Look at it from how does the business use technology to create value and become more productive, to end up delivering value to customers in new ways, to save costs, to build capabilities that no one else can create those competitive advantages. How do you use that now, and what would data add to that? What would analytics add to that? What would machine learning add [00:30:00] to that? And when you start asking that question, you go, Well, I don't know how to answer that. You're right. You need somebody with technical domain expertise in the C-suite who can help you. And so that's the that's the first question that gets them thinking, oh, if we start with this from a strategy standpoint, we need a domain expert. We need someone who understands how to answer that question. Well, that means we probably need to assess our business first, and there's all of a sudden a different process for this. It isn't about going out and getting a technology. It's going out and figuring out how the business should be using technology in the first place. How its customers want technology, you know, and just everything about that begins a different perspective on data science and machine learning, and it's so much more successful.

Harpreet: [00:30:43] Ben, thank you very much. Let's go to let's go to soccer. He had a kid soccer, then Mark, then Mexico. And if you guys got questions on LinkedIn or comments, please do leave them right there in the chat and I'll be happy to get to them. But let's go. Yeah, let's go. Mark Strunk or Mark and Mickey go.

Speaker2: [00:31:04] Yeah. I just want to echo a lot of those words. I heard a funny quote at my company. I is something you only see on PowerPoint slides. I think that's very apt in the sense of like the way that the term is used. You don't really see a ton of data scientists, machine learning engineers, data engineers necessarily talking about. I feel like it's this umbrella term that's sort of captured this mystic technology that supposed to solve all of your business problems or is this really just a tool? Right. Like the way that it's currently used in industry is really just a tool. No different than previous tools, potentially more powerful in some ways, but still a tool nonetheless. You know, when we were talking about thinking whether we want to set up data engineering and the infrastructure first versus like machine learning, it made me sort of think, you [00:32:00] know, I think some companies have an attitude where they think of. Like these applications like A.I. machine learning, but they're still not even in the mindset of, like, treating their data as a treasured asset. Right. Like. There are companies that there's a lot of questions that they can answer with their data. But they still haven't even written down what those questions are. What metrics do we want to capture from our data? What hypotheses do we have that we think our data can answer? What are like the analytics use cases of our data before we even graduate to machine learning? I think you shouldn't jump the gun to like this. Very complex solution when you can get a ton of value through. Operationalizing SQL and dashboards and. Building out like the slabs of your data and your metrics and things like that, right? So it's all about the data driven culture where a lot of the value lies, which is something that I think a lot of companies try to leapfrog when they when they go straight to machine learning, things like that.

Harpreet: [00:33:15] Thank you very much. Let's go to Mark then after Mark would go to Mexico and Costa. And we've got a couple of questions queued up on LinkedIn from Sangita. They have to do with NLP, so I'm wondering if anyone here is an NLP expert or enthusiast, let me know because I don't know much about that field, but I'll ask the question nonetheless after we get through. Mark, Mickey, go and coast up here. Great question that you kicked off, Paul. I hope you're enjoying the responses. Go for Mark.

Speaker2: [00:33:43] Yeah. One thing that I'm thinking about is that we've had enough time in our industry where people are aware that just jump into EML. I mean, people are still making that mistake, but there's also now enough people that have been burned by it. So I'm also noticing this weird thing where you'll be [00:34:00] in an organization where the culture is one on one and they're like, Hey, we should totally do ML and jump into it. And on the other end they're like, Hey, we've been burned by ML before. We're not that interested. And now you have to basically find the middle ground between both sides where one on one end you have to like make it seem as if there is a true business case for it. But on the other hand, not hype up the other crowd to say like, Oh, we're jumping into ML right now. I don't know. That's very clear. It's still like a raw thought in my head right now, but that's been an interesting thing for me to navigate recently, is simultaneously building the business use case while not getting ahead of ourselves. And it's a it's a tricky balance because if you go too far in one direction, you lose momentum. And so you have to find this quick iteration, quick wins that lead to that that larger use case.

Harpreet: [00:34:56] Mark, thank you very much. Let's go to Mexico then Coast Hub. And if you've got questions, let me know and I'll keep them up. Go for it, Mikey. Go.

Speaker5: [00:35:04] Yeah, I was catching up with some folks. I like this happy hour ness off at, like, the race summit. And someone from the small community was like he basically said something along the lines of people realize they have a data problem when they start doing like ML, which I thought was great. Um. And he said in the middle of a very long rant about how he's at a company that I think a lot of people would hear it. Like if you were to know the company, you would be very surprised that they have a lot of companies out there would have like data problems. But I think that it's kind of interesting. It's something that I do kind of feel like is a little bit lacking in terms of the thought, leadership and content out there is like what does it really take both from teams, both like once our [00:36:00] search space and ones that are infrastructure based, especially ones that are building like platforms that kind of span a couple of different domains, like what does it really take to successfully implement those platforms and also build like the culture and maturity around it? Because if you were to look at a lot of the content out there, sometimes it's like really complicated, like enterprise architectures where it's like we would never, ever actually want to re-implement this. Like if we could just throw it all away, we would. And in some companies they are throwing it all away, like Twitter and Uber, like they are like getting rid of a lot of their custom like built internal like data and ML platforms to go for the BI solution.

Speaker5: [00:36:42] Right. But also to like there is this kind of incentive among like all vendors and all that to basically say like we're positioning ourselves as like the point solution. Something I think that even though I don't agree with the modern data stack concept, like something that they did very well was they were able to kind of package up like and partner certain sort of companies and to provide a very succinct value prop like this is what you get out of it. And this is like how they like integrate and how they compare and also other stuff like they won. Not just because individually those companies were great, but because they decided, hey, like we're going to present like an overarching way of how to view like data, right? And I kind of think that's sort of what's missing a little bit, frankly, in like the ML space, not saying that people should come up with an email stack, but there isn't a lot of like this kind of nuanced sort of perspectives and thought leadership around, hey, like we don't always need to be like, you know, like pushing forward our individual domains. But what does success look like in different companies or industries and different maturities that utilize like multiple teams and domains? So like, that's something I've been thinking about a lot, but I thought it was still funny. [00:38:00] The whole first time people realized they have an issue with data is when they're starting to do data science, which is great.

Harpreet: [00:38:09] Akiko, thank you so much. Shout out to gift Ajibola in the YouTube chat says that he loves the way Makiko and Vint explain things. I would have to agree with the gift. I have to agree with you. Let's go to Kosta. Go for Kosta?

Speaker6: [00:38:22] Yeah, totally. They explain things in a way that just makes sense to a lot of people. So a lot of it does. People realize they have a data problem when they try. Ml That pretty much says it all. They're right. I want I want to flip this on its head like we're talking about it from the experience of a lot of people that have tried a bunch of ML have discovered a bunch of data problems and are now convinced because we've lived it firsthand that hey, we should look at the data, right? But let's put ourselves in the shoes of someone who has never written a line of code in their lives. They've been running their business for the last ten years or however many years. And their question is, they're sitting on a bunch of what they're told is an oil field of data. Right. And they're told that there's potential value there. Their question in their mind is, okay, maybe there's value here, but I don't know exactly what that value is and I don't know exactly how much this is going to cost me. So how do I manage the risk on a huge investment like data engineering, which always seems to be presented more as this long term vision continuous improvement process, as opposed to something like ML, which is often branded as something more experimental or experiment driven, and seems to be able to at least present more visible business value that's generated.

Speaker6: [00:39:49] Whereas data engineering, I mean, I feel like it's, it's kind of like a throwback to your parents telling you, hey, you got to study hard every day. You've got to do your homework every day, and you're sitting there going, [00:40:00] But why? The true inherent value of that is that the value generation of studying actually helps you do so much more further for the download Life and generate value in other ways by applying that because you already have the foundations right? How do you convince organizations that that's the bit that they need to do when the how do you make those? Is outcomes more visible? I think that's the problem here. Right. The business value generated from data engineering, how do you make that more visible? And that's kind of a question for the people here, because this is not my field of expertise.

Harpreet: [00:40:40] Luckily, we got a couple of experts here, one of whom wrote a book about it. Joe, got to answer that last question.

Speaker4: [00:40:48] I guess if you had to rephrase your question, what I guess what what are you looking to know?

Speaker6: [00:40:54] Like so essentially I get the feeling that a lot of and correct me if I'm wrong, if business leaders don't think like this, I'm completely wrong. Maybe the thought is that, hey, I don't know what the potential value is here, so I'm going to do the thing that is kind of low investment and kind of experimental, right? We're going to run it like an experiment. And the thing that sits closer to that is ML As opposed to data engineering where they're told, hey, you're going to have to get stacks of cloud infrastructure in to organize your data. And it's going to be a few months of all of this. Right? And they might get hey, we've got a data scientist who wants to come in and do a few model experiments for a few weeks. Right. It's a lower barrier sell for them to do that. So the question is, and I think that comes from ML being branded as experimental and very closely connected to the business value outcome. Like I can build a model that can tell you X, Y or Z about your customer, right? On the other hand, data engineering seems to be a more long term play, and [00:42:00] the inherent value that it generates is often not as visible until you have models built on top of it. Right. So how do you present that value in a way that makes somehow you're more visible to business leaders and that risk more amenable to their appetite to say, actually, yeah, we do want to invest a bunch of money into this, or do they have to go through that cycle of trying models and say, well, actually, shit, data is crap, we've got to fix this, so let's invest in it.

Speaker4: [00:42:32] I mean, it's interesting. I mean, in a lot of cases, I mean, we're all in the sales game, right, where we're trying to pitch, you know, data and the value you can get from the promise of it. Right. And so there's I wouldn't say there's a one size fits all answer. I think people will attempt to approach it from different angles and with varying degrees of success. You know, a lot of it also depends on the organization and how willing people are to be receptive to data in their company. Some some companies aren't data driven, and that's how it is. Those are probably the worst ones, actually, are the ones that are more prone to make mistakes because there's been points out there's probably no coherent strategy and you're more prone to want to cargo hold or sort of LARP your way through doing data versus doing it in a coherent fashion. But I kind of look at data engineering as setting up, you know, it's kind of a cliche, but plumbing, for example, is a really good example, right? Like if your plumbing is constantly breaking and or acting in a very exciting fashion, that's not a good thing. Like, you know, plumbing should be invisible. So it enables a lot of other things like being able to take a shower, for example, or do things like that. And I feel like data engineering is definitely, I think, a very it sets a good foundation.

Speaker4: [00:43:50] But I think as somebody pointed out, you know, you don't really know you have issues with your data until you try and do data science. And, you know, and it is an interesting cell where data [00:44:00] engineering on its own, you know, probably isn't going to give you very much unless it enables data science and analytics. That's the whole purpose of it. Right. So it's more of an enablement function. So but again, it's, you know, a lot of our business comes from people who try and do it in reverse. You know, they go the data science route and then they and then they end up realizing that they probably should have had somebody come in and but this smarter speaks to the immaturity of the field, I would say, too, right. I mean, data science itself. I mean, it's only been around for what officially what came out in 2009, I think. Right. And the term really started taking off around the early 20 tens to mid 20 tens and so forth. And we're ten years into this right now and I think we're making a lot of mistakes and that's just sort of the baby steps towards getting towards, you know, being a a field where the value is recognized. The fact we have to talk about value, I think is indicative of how astray value can be sometimes to attain.

Speaker4: [00:44:58] If you have to keep shouting about value, you're probably not adding a lot, right? That's sort of the rule. And so, you know, your accountant, for example, I don't think you'd be I think you're asking your accounting department, what value are you adding to the company? It's like, well, do you want invoices processed or not? I mean, your call or your shipping department, for example, you're run a warehouse. It's like, what value do they add? You know? But these are well established things. Accounting has been around since at least the 1600s and probably arguably for thousands of years we've been doing this transactions. But data is, you know, with the IT revolution stuff, I mean, you're talking about maybe. Sixties and the stuff all came about, right. It was so very, very new and immature. So I talk often with people who are some of the originators of this field, and they feel very much the same way. It's like the data field, people going back a long ways and it's it's the consensus is it's so we're still in a very young field right now. It's very immature. So things like this will happen. Your order of operations is [00:46:00] going to get messed up more often than not. So hopefully answer your question.

Speaker6: [00:46:04] That's that's the interesting thing. I mean, I think we try to in general, I'm talking you generally as a society, we try to see everything as this established professional system like you're talking to massive enterprises that want to see this tight solution that's really well engineered. And it's like, hey, you're right. We're really as a as a as a world. We've only been doing this for ten years, right? I mean, it's why you can't find very much of that talent in that 7 to 10 year senior engineer kind of position. That's that's really tough to find. But yeah that's a lot of that is so spot on. Thank you.

Harpreet: [00:46:48] I know you just don't want to be the scientist who cried value, that's for sure. Let's go to Mark then for Mark Christian and then Vin. And if you're listening in on LinkedIn, you got questions, keep them coming. I know I said get this. Got a question queued up talking about new tools and and conversational products then Khadijah what Khadijah is going. Scott got a question that I'll probably squeeze in right after this, because I think it's relevant to what we're talking about right, right now. But let's go to Mark Christian and then we'll get to his question.

Speaker2: [00:47:18] If I want. I want to talk more about that enablement piece that Joe Joe was talking about. So I'm I'm relatively new to date engineering, trying to try to shift over to the dark side for say but what I do have experience with is like getting by in and executing and making that visible. When the company and I asked myself three questions and this is how I positioned the optics of it is one the first question how is the overall company trying to position itself in the market? That helps me inform kind of like where are the broader strategic goals? The second thing is to enable that that larger strategy. Who are the key stakeholders and what are their major goals to enable that? And [00:48:00] then from there, my third question is how can data help enable those leaders or those key stakeholders to reach those goals? And by asking those three questions, I'm able to really I've done a lot of user interviews with the company. I'm able to identify kind of like where that lever point that where I can really push and add value. And the key thing is then the conversation shifts from Marc built this data pipeline that's that's not adding value it shifts the conversation to mark solve this problem that was really hard that helped us reach a strategic goal and it just so happened that data pipeline was what made that happen. And so by focusing on those questions and that conversation and when I talk to leaders, I don't talk about the tactics of doing it. I talk about how solving this larger strategic initiative and then my one on one with my manager who sees both sides is, you know, I'm able to nerd out with her and be like, Yeah, we built this cool data pipeline, right? But for the most part, leaders don't care. They care about the value you can bring.

Harpreet: [00:49:05] Thank you very much, Mark. Let's go to Christian.

Speaker2: [00:49:08] Yeah. So I would say that as far as data engineering is concerned, the maintainability is what first comes to mind for me. And what I mean by that is working with like an OLTP normalize tables in a transactional legacy system, the accessibility of dimensions and their corresponding attributes becomes really, really cumbersome to get at. When we're asked to break down just a basic API report for someone and I think I can't speak too much on DVT, I think they're kind of trying to bridge that gap. But I'm myself, I'm a lakmal developer, so all of the transformations that I deal with are handled on the look ML front. So that really can impact performance. Not having the engineering in place in the data warehouse to have a proper Kimball dimensional [00:50:00] model. And it leads to a lot of joints, which leads to slow load times which users complain about. And that's something that my current environment cares a lot about. So maintainability and performance are definitely dependent upon data engineering for sure, for that value creation.

Harpreet: [00:50:20] Thank you very much, Vin. Go for it.

Speaker3: [00:50:22] Yeah. Just to jump in on, I think we have to stop copping out and saying we shouldn't. We don't know how we're too young. We do. We 100% know how. You know, in 2012, I was getting companies to spend cash. On data science. We definitely know how to do this, and I wasn't the only one back then. So there are tons of people who understand how to get companies to invest responsibly in machine learning. But I don't think it's that we don't know how to do it. I think it's that we're combating an entire different mentality that is trying to get the companies to buy their solution or their consulting services. And they have to do this increasing sales pitch and increasing hype cycle to rise above the noise of cloud and all the other technologies that are out there that are alternatives. And so I think we have to just as a field, stop saying we're young. I mean, if you're doing like large language models, yeah, you're right. We're young in that we don't know how to monetize those, but most companies aren't doing that. And so we know how to get money out of data, we know how to get money out of analytics, and we know how to make money off of some fairly basic models. There's some infrastructure and some stuff under the curtains that they don't need to know about. But we have to stop saying we don't know how to do this. And when you see someone going the wrong way, you have to just look at them and go, That's dumb. You should know better by now.

Speaker6: [00:51:48] This fall. I mean, like stuff like object detection and classification from a computer vision standpoint, those things are pretty solid. I mean, just chuck the yellow out of the electron or a phosphorus in it, kind of. Okay. [00:52:00] And like 95% of cases, there's not much experimental going on there, right? I mean, it's more about understanding. Oh, shit, I've put that on a side scan, radar data that's a bit different. How's it going to perform? Right. It's not it's not the biggest thing in the world. And we kind of understand now a lot of okay images. Where do you store them? How do you how do you get access to them? Sure. Yeah, you're right. We have been doing this a while, so. Is it just that mentality of, hey, we've got a hype up of products or the sellers are selling, selling the world essentially, right? I guess it's where I think we are young. It's not so much in the people who know the technology that's required. I think where we are young is in the buyers maturity in technology, right? Like, I mean, now if you don't know the first thing about a car, someone's going to come in and tell you all this car's got abs and etc., etc. But like 90, 95% of cars are built past 2005. Right. Have interlocking brake systems. Right. That's just normal now. But they can sell you on that. Right. So is it the buyers, how informed the buyers are of what we're selling?

Speaker3: [00:53:19] Well, I think, you know, you have to look at the company and say, look, if you were buying something that you've never bought before, wouldn't you consult somebody who knows something about it? Yes. How is this different? Hiring new staff is no different. You've never hired for this position before, wouldn't you? Before buying 4 to $5 million worth of people and equipment and everything else? Wouldn't you ask someone if I've bought cars my entire life and I'm going to go buy a truck now? Shouldn't I ask someone? I mean, yeah, the salesperson is going to be there and they're going to hype it up. And I should know that by now. But shouldn't [00:54:00] I ask someone who knows? You know, and that's the thing. Any time you're doing a major investment, if I was looking at two different investment vehicles and I didn't know anything about either one of them, you know, before buying crypto, shouldn't you ask someone, shouldn't you know, and that's that's every single one of these new technologies. It feels like we forget the basic thing. So you're investing in something that you don't know much about. Shouldn't you ask someone who's who does? Shouldn't you have a little bit of advisory help? And I mean, I'm pitching myself a little bit, but at the same time, it's kind of common sense and for whatever reason, data cloud I those three like we gon we we skipped it and instead of asking people who are smart, hey, what would you do, what do we do first? Do we need this, you know, do I really need a tailgate on the back of my truck that allows me to step on it instead of stepping on the rear bumper? Is that something, you know, just kind of basic questions like that. If you've never bought a truck, you don't know, why don't you ask someone who has that background? And that's that's what I think we've left behind, is just that level of common sense that this really is new. It's not like every other technology. And so we have to ask somebody.

Harpreet: [00:55:15] Go, go for it. Yeah.

Speaker6: [00:55:17] So to kind of summarize from what you said and tying that back to what Joe said, it's new technology to the buyer, not to the seller. Right. The technology stacks are quite well established over the last ten years or so. But yeah, it's new to the bottom up to the seller, so you need someone who can give him an honest opinion. And I think that's that's a bit I mean, that's where you come in, I guess.

Speaker5: [00:55:39] I mean, the thing that always sticks out in my head is like execution is almost like where it comes down to, right? Because I kind of feel the same way about tech leadership. And tech leadership in data and ML at all times is, I think at its core it's still leadership. And I almost feel like in [00:56:00] that regard, a lot of times leadership is actually not that complicated. It's almost like don't be a dick and like make sure you elevate the people that report to you and make sure you don't lose the company money. I mean, there might be more fancy ways of doing it, but saying it. And yet there are all these like magazines in articles and books written about like leadership on its own and then tech leadership and it's like, right, okay. So if your team is like unhappy, if they're struggling in like a tech wasteland or tooling wasteland or whatever, it should not be this complicated to toss up. Like, let's say, for example, some diagraming tool or get a bunch of people to give their feed to like, like illustrate out what the system is, what are the pain points, who are like the key stakeholders are being affected by these pain points, really like elucidate what exactly are we suffering from? And then to be able to like then go, okay, let's highlight what are the gaps in our knowledge? What, what do we not get? And just be very humble about it.

Speaker5: [00:57:05] And then exactly like Venn was saying, like go then like find people who have solved those problems before and ask them for help or whatever. And it's fascinating and I feel like I see this too, like, for example, just in general like that too towards like consultants, right? So for example, when I was bodybuilding, I had a trainer, I had a dietician, obviously I had medical insurance or whatever. Right. And I also had like a subscription to a community and all that. Like because I recognized that this was an area that, number one, I had no expertise in. I needed help, I needed some kind of structure. But more importantly, the consultant was not there to provide me the motivation to become the ultimate beast that I could be, you know, but was instead to kind of help guide and make sure I stayed on the right tracks to get to like the optimal outcome that I still had to [00:58:00] kind of define. Right. But it's fascinating because when I tell people that, they're like, Oh, it's like a waste of money. Like they're just taking you for a ride. It's a scam. Except you look at literally all the high performing bodybuilders and they all have a diet. They all have a dietitian, nutritionist, they have a therapist, they have like a weightlifting coach. And if they're doing something like CrossFit, they might have an entire team dedicated to them being the best possible.

Speaker5: [00:58:30] They clearly recognize the value of like asking for help, of getting that guidance, of bringing in experts. And even if those experts are in like specific areas of being able to combine teams of experts and it's just it's very interesting, right? Because it is a mentality switch. It's like, what do you need to do to be the best that you can be to like actualize your full potential to get to the success point that you want? And a lot of times it does mean admitting that, hey, like I'm kind of stupid, let me go talk to someone who's smarter and get help. And it's it's kind of fascinating. I think it's true, like a lot of times, like the technology space, especially around like data science, machine learning, people are very, very reluctant to ask for help and are very reluctant to say we don't know what we're doing. But also to like if filtering for the right help, I think can be kind of tricky in a very like saturated, noisy environment where every vendor is like, my solution is the best, and then you're like, well, if every solution is the best and clearly no solution is the best or any solution would be just fine. But it's it's something that I find very fascinating in terms of the whole, like asking for help and like figuring out like the buying side of like a tech solution or investment.

Harpreet: [00:59:52] Yeah. I mean, you hire experts because they will save you time. And time is like the ultimate resource, right? So if you can [01:00:00] stand on the shoulders of giant, save yourself time, get there quicker than why not do it. I think you such great discussions, Paul. Thank you for that question. Kicked off some great discussion question here coming in from Khadija that fits in nicely here. She's like, Oh my God. Are there anything aspiring data scientists can learn to prepare for all of these data problems? Is it essentially just learning some data engineering? I'll let Mark answer this one.

Speaker2: [01:00:24] Sorry. I was in the chat talking about DVT and look normal, but something about preparing for.

Harpreet: [01:00:32] Yeah.

Speaker2: [01:00:33] Data scientists are aspiring data scientists.

Harpreet: [01:00:35] Well, are there any are there anything that an aspiring data scientist can learn to prepare for all these data problems? Like just learn data engineering, basically.

Speaker2: [01:00:45] Oh, you know, honestly, I don't think it's really the technical skills that surprised me when I became a data scientist. I think that's just the prerequisite to get in. And then you get in there and you realize, oh, actually, this is like all communication. Just understanding what's the actual need, communicating the constraints, and then actually delivering on that. So most of the times, like I actually don't know how to do a lot of the stuff in my job. I just kind of show up and I figure out the problem and then I'm like, okay, this is the problem. I just go research and learn on the spot and then I implement is like go to the documents of whatever I'm implementing and just read that and create a posse. And I'm like, okay, cool. I know how to do this to solve this problem because there's so many problems you can go after that you actually don't know what's worth to learn. So you're actually like in it. And even more so it's like if you have to pay to learn how the company pay for you to learn that. And by focusing on their problems. Now on the other side is like if you just focus on companies problems, you're, you're kind of tailoring your career to another company's goals. And so also being mindful of like, where are your career goals, where do you want to take your data career and what can you do either to ways outside of company hours and pick [01:02:00] up those skills or try to figure out how align what your interests are to company projects and make that happen. But I think, you know, I think we're data science education. Where it kind of messes up is that it shows you everything and gives an impression that you have to know everything and you're doing everything when reality. I'm using that very small fraction at given times and it's just more so just being aware of what's out there. And then when I do have to implement it, I dive deep.

Harpreet: [01:02:28] Awesome. Mark, thank you very much. Let's move over to Sanjeet, this question she asked quite a while ago. I think it fits a little bit nicely here. I feel like a lot of what we've been talking about recently might come from the perspective of companies working with structured data, tabular data. It might be wrong, but Sangita has got a question here regarding unstructured data. She saying there are so many unstructured data management products coming up. Managing natural language. Understanding data is an open problem that a lot of people are trying to solve. What's an interesting perspective or any new perspective? You might have contrarian perspectives that you see. Has the has business value recently these days? I guess so. Data management for unstructured data is kind of like the the the gist of that and the inputs here at the Kosta or, or Vin or anyone else. If anybody has input here, please let me know.

Speaker3: [01:03:26] I mean, my my answer is always two parts. First, most of the data that you have is bad and using it will be painful. You'll end up derailing yourself for years trying to make this stuff that you already have useful instead of gathering data that you can actually use and being intentional about it, and especially in unstructured data, it's so hard to define data quality when it comes to unstructured data. So trying to do it retroactively, like you show up. The company has seven years of historical data that's unstructured, [01:04:00] especially in health care. This is huge in health care where everything is unstructured and what gets put down in patients notes and just it's bad. And it takes a long time for companies to realize that if they try to do anything with that data, it doesn't work. I mean, you can't even do analysis with it. So the hard answer is more times than not, unstructured data is it's something you have to just hoist and throw away and admit that there isn't enough value in this to spend the time trying to mine it or trying to build a model with it. It's always going to end. I think I said this to Albert this week on one of his comments. I said, If you use bad data, you know, it's basically like your you've become decent because your model will never stop sucking. And that's just how it is with bad data. Sometimes the right answer is throw it away. And the second piece of it, you know, that knowledge management, you can do that because starting from scratch you can build out an ontology. And that's where knowledge management begins, is you build out a structured ontology and you gather data around it. Then you continuously improve the ontology and you are always gathering high quality data that can be used in modeling.

Harpreet: [01:05:20] Because the go for it.

Speaker6: [01:05:22] So I mean, when, when people talk about unstructured data, I like to ask first, is this unstructured data because it's a naturally like is it naturally unstructured or is it unstructured data because we couldn't be asked structuring it like those are two very different problems, right? Like if you combine all. During it, the structure of the data, and then see if you can derive some value out of it from there. But if it is naturally unstructured, where it's not really possible to, you know, you're drawing on things [01:06:00] that are far more unstructured, that it has to be far more ubiquitous. Right? Like in the sense that it has to be something that you can't find commonalities, that you can draw proxies for, that you can measure consistently for. So yeah, we've got to think hard on when we use unstructured data or approaches in general, particularly with images and videos. You can structure well, you can structure the collection of types of images and videos. So there is you can bring elements of structure to it that can cut down as opposed to just saying, hey, I'm going to get scrape thousands of images. No, I'm going to actually collect very specific images from a specific use case and all of them in a certain way and add metadata to make it more structured. Right. So how do you I mean, sometimes you just get people saying, oh, this unstructured data problem when really it's just not you haven't put the thought of how do we add some structure to it that gives you meaningful advantage? At least that's what I've seen in the image space where they just get you a chunk of images that they scraped off Google.

Harpreet: [01:07:09] Gossip. Thank you very much. Got a question, actually, regarding a computer vision and kind of like pre-trained models and transfer learning in computer vision. How do you tell if you should get a pretend model and you want to use it for a different kind of use case? How do I know if I should? Just freeze the backbone and just train the head or if I should. Keep the head, but then unfreeze the backbone. How do you how do you know what to do?

Speaker6: [01:07:44] Quite honestly, that I'm not really qualified to answer that question. I just experiment with it. I'm stage where I'm learning those intuitive answers to that one. I'm going, okay, I'll experiment a bit with this and that and see how it works. Yeah, yeah. I don't [01:08:00] have an intuition built up for it yet.

Harpreet: [01:08:01] Yeah, I was, I was thinking, trying to think this through earlier today and I was like, okay, well, let's say, let's just suppose that we. We created a model. And this model, all this model is able to do is it can look in an image and they can say whether a particular object in the image is meant to contain or hold liquids or not. And there's a bunch of things that could hold liquids, right? There's glasses, there's mugs, there's bottles, there's aquariums, there's vases that are bowls. Right. There's all these things that are meant to hold water. Right. And then we can call this liquid detect net net. Right. These are actual names I was thinking about when I was trying to reason through this. And then let's say we wanted to, to train another network and this network, we wanted it to be able to discriminate between different types of beer glasses. Right. And there's pint glasses, there's goblets, there's pilsner glasses, so on, so forth. There's a bunch of different types of beer glasses that we can call this, you know, beer, glass net. Intuitively, I think it could make sense. Like, you know, why not train beer net with the liquid detect net backbone because liquid detect net is already good at discerning you know. The cup's from plates. It knows that there's some low level features there that that it can we can probably use from that model to to discern whether something is. Cup ish or glasses. Am I making sense, man? Or feel like I'm sounding crazy but. Yeah. This mark, this relates to transfer learning. Yeah. Any thoughts there? Because I'm trying to reason through this and try to understand it.

Speaker6: [01:09:51] So you're trying to defend the difference between different vessels that are holding liquid?

Harpreet: [01:09:56] No, I'm just pretty much trying to tell. Okay. If I have got a Pre-trained network, [01:10:00] when when should I just chop off the head and train a new head? When should I keep the head and train the the backbone instead?

Speaker6: [01:10:09] I mean, I try a few different things, right? Like, I mean, if you just try, are you trying to go per glass? Like this pillar of glass is a wine. Glass is something else. Those are all of your outputs for the head. But ultimately, most of those things you're going to you're essentially asking, which layer of my network learns the difference between a liquid holder and which my network holds full frontal versus this? Right. That's a very tough question. You can basically there's a few things that I was reading on VIPs and there was. Let me see if I can find I've got my notes somewhere. I'll send it to you in a DM a bit later because I've got to hunt for it. There's a thing where you can basically ablate certain areas of the neural network to identify or at least find causal or corollary information on what information those parts hold. So maybe that will help visibility on it. But yeah, it's a bit of an experimental process for that.

Harpreet: [01:11:16] As I'm thinking, because to go for it.

Speaker2: [01:11:19] I am not a neural network, deep learning, whatever, but I'm just thinking through this out loud because it seems like a really fun problem is I guess then like why you got to know which layer it is if you know innately like it's it's giving like this output. So if the output is like whether or not it's a liquid, you just try and train another model on that. Just say like have that, that output become an input. So like one model says, is this liquid yes or no? Great. And then another one is like the shape of of the glass. Maybe do like some computer vision thing. Whereas like if, if it's these [01:12:00] this profile squares, then it's X, Y, Z, have that be an input. And so that way those two models just say, is it glass or is it is it liquid? Yes. Great. Next step, is it this shape boom, classify that. Would that potentially be a simplified way? Again, I don't do deep learning stuff that's completely new to me. Am I like oversimplifying this?

Harpreet: [01:12:21] Probably not an actual problem trying to solve anything. It was just me like trying to reason through how transfer learning works and how can I determine which part of a pre-trained network I want to I want to use for a task. But I guess the intuition is, you know, like if you have a pre-trained model, some of the lower level layers, some of the earlier layers in the network are able to discern these like low level features. And I think a low level feature of like a object, like a glass is fundamentally different from like. Cup holder. Not a cup holder was this whatever coaster. Yeah. Just try to reason through, but some crappy idea in fate. Go for it.

Speaker2: [01:13:04] Yeah, I will. I'll let you in on something that may or may not be commonly known. Dirty secret data scientists, but most. Data. Scientists don't have any idea what their models are doing. They're just looking at the performance of the models. So depending on what your application is like, if you need something to be interpretable. That's where you're probably really start struggling with some of those questions. But if you're just really looking at how is how well my model perform. Honestly, depending on the compute constraint, you can just try both. Provided that you think your training dataset is a good proxy for what your system will see in real life. Right. But if it's not a good proxy, then you're screwed either way, like nothing you do will make sense because you're training on bad data.

Harpreet: [01:14:00] That. [01:14:00] I think that's good to have in.

Speaker3: [01:14:03] Okay, I'm going to be real old. So it depends on and I don't know enough about the type of model that you're using, so I'm not going to get too specific. But when you say the early layers for a glass or different than a coaster. Nope. In a lot of different architectures, they're really not. Because you've got to think about the the shapes that you're going to see in encounter. Those simplest patterns at the very earliest levels are really just trying to figure out what shapes are. And that's one of the big debates is should we start with something smarter and then train models on it versus relearning everything from zero every single time. So when you when you're asking about like what your early layers are learning, the difference between architectures is really how efficiently they learn all of those patterns and how efficiently they assemble them. And what's crazy about deep learning is your model. You run it twice. It might learn different combinations, it might learn actually a completely different graph. So trying to figure out really what you're asking, which is, is there like a point at which I can take out and extract patterns and then transfer those to something else? No, you could, but I mean, you'd be re architecting the the model itself.

Speaker3: [01:15:23] You'd be changing the structure of the layers. And so it may not feed in and end up doing exactly what you want it to. So if you're trying to deconstruct your model, that's actually a really good way to understand it. But I would come at it from a more generic perspective and not give it a specific task, but look at it across maybe three or four or five different types of ID tasks and work your way backwards. Don't go too far with it. Explainability on that type of a deep learning structure is it's ugly and so you're not going to get a lot of it. But I mean, you're asking a good question. It's just one of those questions that [01:16:00] the type of model you're using practically, it's hard to answer. So I'm you know, when I the reason why I said I'm being an old man is because when I was doing vision for the first time, we had to build our own filters. And so I kind of understand how those really early layers work, but it's what happens after that is just more complex patterns and it's there's a lot of randomness in there. How would I put it that way? There's a lot of randomness in there.

Harpreet: [01:16:33] Yeah. Thanks. I was probably thinking about it the wrong way. That's the super helpful I was inspired by. There's a link right here in the in the in the chat. And if you're listening, you can go to buy t l y forge slash deep viz. That's where the capital D Capital V DP is. And it's, it's a pretty old video. It's like seven years old. But essentially the guy and there's a GitHub repo that goes along with it. I just haven't been able to to play around with it. I hope to do that maybe this weekend or next week, but it essentially visualizes the layers of a convolutional neural network, and it's doing it in real time and it's just like the coolest thing ever. And actually actually has a chapter in his book, Interpretable Machine Learning just all about visualizing convolutions and visualizing layers in like a computer vision model that I'm excited to to dig into, for shadow to search. They check that out. But yeah, like I'm completely new to like deep learning computer vision. Yeah. I've been like learning the basics of deep learning, like, for the last year. So, like, I get the math behind it, but now it's like, okay, trying to, like, get intuitions and it's, it's. Let's just put this on. I'm glad that I work with a bunch of smart PhDs who are willing to answer my dumb questions in Slack channels because I sometimes feel like I am asking crazy questions. But that's a hell of a learning adventure and so much fun. Yeah. [01:18:00] I spent the entire week, actually. This week. First few days this week. I was trying to get up on image segmentation because we're doing an Ask Me Anything session on Tuesday, September 6th, with a couple of people from from D.C., some experts talking all about semantic segmentation.

Harpreet: [01:18:14] I had no clue what it was, you know, what this was. And so I spent a few days this week just preparing for it like I would any podcast episode. So just going in and researching and reading and filtering around and stuff. So I think it'll make for a good discussion. So if you guys can join, please do. There should be a link on my LinkedIn or Twitter. I think it's pinned as a top top tweet on my Twitter at data science Harp. If for not following, we already do. Follow me. Hi guys. So thank you all so much for being here. We're going to call this one a wrap. It's been wonderful having out here to get to see some new faces somewhere. Please do come back in the future. Peter, good to finally see you here as well. And I'm waiting for you to be here. Thanks for being here. Shout out to Kiko, Mark, Vin, Joe, Kenji Costa, Russell. Good to have you as well. And Christian, thank you all for being here. You all take care and have a good rest of the weekend. Have a good long weekend. Hopefully you guys do something fun and exciting this weekend and or at least just relax. I'll be in I'll be in San Jose at the end of this month, last week. So if you're at the Intel conference, let me know. I'll be hanging around the Bay Area as well. So shout out to my Bay Area peeps. You all take care and have a good rest of the day wherever you are. Remember, you got one life on this planet. Why not try to do some big cheers and doing?