The following is a rough transcript which has not been revised by High Signal or the guest. Please check with us before using any quotations from this transcript. Thank you.
===

[00:00:00] 

stefan: classically, machine learning is focused on prediction. predicting in settings that look like this. What should you expect to see? What do you think is going to happen? classical machine learning has many successes, but there is some sense in which it's it's not the most exciting problem.

Because prediction is just about describing the world as it is. In the status quo. And, I don't know, many people I talk with around here, they're not just interested in describing the world in the status quo, right? They want to build a company. They want to come up with a new medical treatment. They want to, in a literal sense, change the world in some kind of way.

And if that's what you're interested in, then prediction is not the right task to understand that. It's a concrete example. Let's imagine your Airbnb in 2008, and you try running a prediction algorithm, classical machine learning algorithm to see who would be interested in staying with an Airbnb host.

the answer's gonna be no one, right? Because the [00:01:00] company doesn't exist at the time. so a prediction is just somehow not the right task to, for making progress, in, in, in these, settings. causal ML once you bring in causal inference, now you're in a world where you're answering more interesting questions.

if I change my policy, how is this gonna affect outcomes? What should I do to maximize, I don't know, revenue, right? these are questions that I think are often much more interesting. they don't fall under the scope of just, prediction.

hugo: Machine learning is great at prediction, but prediction alone often isn't enough. If you're trying to tackle churn, optimize pricing, or boost customer engagement, simply predicting what's likely to happen just isn't gonna cut it. So how do you move beyond machine learning and prediction to actually understand causation and make better, more impactful decisions?

So in this episode, I'm speaking with Stefan Varga from Stanford, who has advised companies like Uber, Google and Dropbox. Stefan shares practical insights from the cutting edge of [00:02:00] causal machine learning, helping us understand why classical ML often falls short, and how causal ML fills that critical gap.

We'll dive into real world examples, like why churn modeling can go so wrong without causal inference, and explore simple frameworks, tools, and best practices to help you and your team integrate causal thinking into your machine learning workflows immediately. If you enjoy these podcasts, please give us a review, give us five stars and subscribe to our newsletter, which I'll link to in the show notes before we jump in, I'm just going to check in with Duncan Gilchrist from Delfina who makes high signal possible.

So I'm here with Duncan from Delphina. What's up, Duncan? 

duncan: Hey, Hugo. How are you? 

hugo: I'm so well, and I'm so excited about this episode with Stefan. Before we jump into it though, I'd just love if you could tell us a bit about Delphina and why we're doing HighSignal. 

duncan: Delphina is building AI tools for data scientists and by nature of our work, we get to meet with lots of experts in the field.

And so with the podcast, we're trying to share the HighSignal. [00:03:00] 

hugo: Totally. And look. The conversation I had with Stefan and the clip we just showed, I once again have had my mind blown in realizing how machine learning is of course incredibly important, but Alone, it delivers zero value besides perhaps thinking about what may occur in, in, in the future, right?

So I'm wondering how this resonates with you and how you've thought about and in your practice tied machine learning to making robust decisions. 

duncan: What's so amazing is that classic ML is really all about basic pattern recognition. Like what's in this image? What's the weather going to be tomorrow? What are sales going to be next week?

And causal ML, which is what Stefan highlights, is really about predicting what will happen given an action. How many Uber trips will you take if we give you a coupon? What specific features in our product drive, causally drive user retention? And if you take a step [00:04:00] back, I posit that all of the highest value ML applications are really causal in nature.

Since the highest value applications involve making decisions. Not just trying to look into the future for the sake of looking into the future. 

hugo: Absolutely. And we cover so much of that and dive into the details in the episode. So let's jump in


hi there, Stefan, and welcome to the show.

stefan: Thanks. Thanks for having me.

hugo: It's such a pleasure to have you here and I'm really excited to talk about not only machine learning and how it powers a lot of decisions made today, but how it does this and one thing that I'm really excited about your work is bridging, causality, causal inference and machine learning.

So maybe to set the stage, I'd just love to know a bit about why machine learning itself may not be enough. Why do we need to think about introducing. Causal inference or other types of tools.

stefan: classically, machine learning is focused on prediction. predicting in settings that look like [00:05:00] this. What should you expect to see? What do you think is going to happen? classical machine learning has many successes, but there is some sense in which it's it's not the most exciting problem.

Because prediction is just about describing the world as it is. In the status quo. And, I don't know, many people I talk with around here, they're not just interested in describing the world in the status quo, right? They want to build a company. They want to come up with a new medical treatment. They want to, in a literal sense, change the world in some kind of way.

And if that's what you're interested in, then prediction is not the right task to understand that. It's a concrete example. Let's imagine your Airbnb in 2008, and you try running a prediction algorithm, classical machine learning algorithm to see who would be interested in staying with an Airbnb host.

the answer's gonna be no one, right? Because the company doesn't exist at the time. so a prediction is just somehow not the right task to, for making [00:06:00] progress, in, in, in these, settings. causal ML once you bring in causal inference, now you're in a world where you're answering more interesting questions.

if I change my policy, how is this gonna affect outcomes? What should I do to maximize, I don't know, revenue, right? these are questions that I think are often much more interesting. they don't fall under the scope of just, prediction.

hugo: That's such a wonderful way to set on something I'm hearing in there. There are many ways to think about what causal inference allows us to do. but what I'm hearing in there is it allows us to try to answer what if questions, so counterfactuals, right?

stefan: that, that's exactly right. What if I did this? What if I did that? What's the difference? what should I actually do given this knowledge?

hugo: And we discussed this last time we spoke and people who've listened to this podcast before may have heard me say something along these lines before, but. I do think, even prior to data science, machine learning, or like robust causal inference, all of these things, like really great entrepreneurs and business people have always not only been fantastic experiment, experimentalist, [00:07:00] but wonderful counterfactual thinkers.

And I think, one example I give is if you're telling me Henry Ford, wasn't a great counterfactual thinker. That's one of the things I'd push back hard against because to actually succeed in business, you need to be not only constantly asking what if questions, but testing them.

and having uncertainty, having error bars. So the question then is once we move into the data powered business age, how do we quote unquote robustify these things, right? so I'm interested in with. We see that these questions are important. What is the current state of causal inference?

And I presume there's a distribution across industry, but how well has it done and how horribly do you see it done in practice?

stefan: so first, Yeah, just reacting to your earlier point about, Henry Ford was probably very good at counterfactual reasoning. I think one of the worst mistakes you can do if you try to move an organization or a system into a more, data driven direction, is to throw away common sense, throw away things you already [00:08:00] know.

because often kind of business leaders, people left to their own devices. They can be very sophisticated in reasoning about counterfactuals, dynamics, cause and consequence, and so forth. when it comes to data driven methods, everything is harder, right? Because you're trying to do things quantitatively, precisely, more abstractly.

but the biggest mistake you can make is that if you want to use data, and all you know how to do with data is run predictive algorithms, then you forget about everything that's important. and just run the predictive algorithm. that's the worst thing you can do. I'd rather you not use data than you use data to do the wrong thing.

okay. Causal ML, of course, it's a set of tools that at their best enable you to do the right thing with data. How much adoption has there been in industry? I think maybe I have a biased lens on this. generally the people who reach out want to talk,are, already know that they should be asking causal questions and often are doing, very cool things.

There are a lot of companies I've interacted with [00:09:00] that are doing very sophisticated, things in causal inference, really often, sometimes even doing things that in some spaces are ahead of what we're doing in academia. so there's a lot of really cool stuff going on in industry. but I should maybe caveat the answer with maybe I have a biased

hugo: I love in a, in a very classic way, you've identified the selection bias in the conversation we're having. And I appreciate that. I do. stepping back a bit, I'd love to know about some of the wonderful things you're seeing in industry to step back a bit though. I think we're not on.

So we're not only talking about causal inference, we're talking about causal inference meets machine learning. So it sounds pretty wonderful that you get the power of predictive ML, but you get the robustness and reliability of causal inference as well. So for people who, are using machine learning to predict machine learning to predict. Churn and lifetime value. And let's like, let's say you have a two sided marketplace, like Airbnb, where you're predicting, you're trying to do matching and figure out, these [00:10:00] types of things. How can these types of people who aren't doing causal inference, start causal ML, start to introduce causal ML into their workflows?

stefan: So I think that the most important thing is to have clarity on what you actually want to do. What's your action space? What are the things you're able to do? And in different settings, this could be, you could be changing prices, you could be giving out offers, you could be doing something else.

What's your action space and what's the outcome you want to drive? And how are these things linked? I think just having clarity on that is the most important thing. Because often if you're clear about all this, and then you try to bring just basic prediction ML tools to solve this task, you'll realize that it just doesn't fit, right?

The inputs and outputs aren't what they should be. Now if you want to go to causal ML, there, there are a number of, steps,to get there. The first is just like recognizing what the problem is. the last step is going to be, there are going to be these kind [00:11:00] of cool software tools for causal ML, that you can use.

there's been a huge amount of development across industry and academia in this space, in the past decade or so. there's EconML, it's a software suite, developed by MSR. over here we've developed a GRF package for, causal inference using random forest type algorithms. So you have all these software resources.

There's this thing in between though, you know what question you want to answer, you know what software package you want to use, you need to get the right data. And here, there's this fundamental challenge that usually the data you need for CausalML comes from experiments. And, yeah, if you want to deploy CausalML, you're gonna need to run an experiment, you're gonna need to run an A B, if you're interested in targeting offers, for example, and using CausalML to figure out who should receive the targeted offers.

You're probably going to need to run an experiment where you randomly assign some offers to some people, log that in a centralized way, and then you're in a space to use that data for learning.that of course is [00:12:00] expensive. It's not easy. I think that's just an important thing to be aware of when wanting to use these methods.

hugo: it makes a lot of sense. So I just want to recap a few of those. Few of those things, of course, you need to figure out what your question is. And something I'm hearing in there is actually, we're not only talking about predictive capabilities, we're talking about decision science, right?

So we step back a bit and think how, one way that we can slice and historically have sliced The space we both work in is into, descriptive analytics, so dashboarding, BI, that, that type of stuff. Predictive, so ML, that, that type of stuff. And prescriptive, so make, like,how to make decisions.

And so this seems like a wonderful way to tie these together. So always keeping your decision mind and the decision, sorry, the question around what type of decision you'll make. then seeing how ML can impact that, and then, collecting the necessary data, doing the experiments, having all your infrastructure in place, and making sure that I suppose the data is as close to the decision as possible.

Is that a way to think about it?

stefan: Or that the data contains the [00:13:00] information needed to make the right decision. Yeah. and then once you're in that framework, then kind of causal ML. It's an alternative to traditional ML that is focused directly on learning the parts of the problem, giving you answers to the questions that are relevant to decision making.

hugo: Yeah. And so what other, considerations, and I want to jump into some kind of grounded examples soon, but I do think about how to start to flex these muscles in an organization. I used to joke in the 2010s that we needed to like, from, what was my joke? Cause I work in Python, right? and I worked in R then as well.

And I appreciate you working in R also. But from, econometrics import causal inference or something like that. and we woefully failed at that as a discipline of data science. Maybe not at the big companies though. But the best causal inference and causal MP ML people I've ever met have come from.

Economics, econometrics, a lot of them were hired by Amazon. And I think that's one of the, one of the big thing, one of the big reasons, Amazon was able to be so successful. But when I've seen a [00:14:00] data science team hire someone who actually has causal inference background, that can be, a superpower for them.

And is that something you've seen as well?

stefan: Yeah, absolutely. And I think your observation that a lot of people doing really high impact work, around causal inference in industry have an economics background. I think that's, I think that's true. And I think it Like what's a training you often get in economics is a training about how to think about where your data is from, what kind of sampling biases, other kind of biases could be there in the data collection, and how do you avoid that to, to then reason about, optimal decision making.

and I do think it's like that's the training that's the most important. Then you want to scale that up, you want to work with complex data. You want to deploy this in live systems. There is a machine learning aspect to that. there's an engineering aspect to that. I've seen many people though, I mean like people coming from strong economics backgrounds are often very smart.

They're able to over time, [00:15:00] pick up this engineering, machine learning aspect. I think that what can be much harder is coming in with a more, just tools focused aspect. Like I know how to do machine learning. And then you have to unlearn everything. You have to go back to basics.

What's the question? What are we trying to do? Spend some time at this like very basic level. And then relearn machine learning in the context of questions that you're actually trying to answer.

hugo: I love that so much. And I do. I actually, most people I know who are some of the people I know who are in my humble estimation, the best at machine learning don't have machine learning backgrounds. In fact, most of them discovered machine learning through Andrew Ng's Coursera course about a decade ago, to be absolutely honest.

But they've, they've approached it also through. Through, specific domain problems, whether it's, people trying to do computer vision or astronomers trying to do prediction with a variety of things. So I do want to jump into some grounded examples. So I mentioned churn before.

And the reason I like this example is,it's an example that we hear a lot when, [00:16:00] introducing people to data science and machine learning and that type of stuff, but it highlights the problems of prediction. Of mere prediction immediately, right? So we're told that we want to use machine learning to, predict customer churn, right?

and you may, you've got tabular data, you may use XGBoost or whatever it is, or a random forest or something like that. Tree based methods will outperform nearly everything on this type of challenge. And I think, and you can get a pretty good prediction, but what the heck do you do with that predict?

if I predict someone's going to churn with 70 percent probability, what on earth am I going to do about that, right? My intervention will be totally different. if they're going to churn because my customer service sucks, versus if they're going to churn because a competitor has made, a competitor of mine has 

undercut my offer essentially. so this is something I've seen play out in practice in a variety of ways, but I'm wondering from your experience, and I know you maybe can't speak to like very specific examples, but you've worked with Google and Dropbox and Uber. So I'm wondering if there are any kind of grounding examples that come to mind that could [00:17:00] help listeners think about it.

stefan: Yeah, so I think this, churn prevention example is great because it's so simple. It's so clear why one can go wrong, with just basic predictive, modeling. And yet people get it wrong all the time. I actually remember that you mentioned this example of using machine learning to predict who's gonna, who's gonna churn, who's gonna leave your service.

I remember also seeing that in grad school and thinking nothing about it. It's oh yes, it's important to predict churn, so I'm gonna develop a better algorithm that's 2 percent more accurate, right? without really thinking about why do we actually care? you asked about examples.

One example I like, this is one I, learned from,Eva Ascarza, she's, at HBS now, the example is, loyalty program or subscription for a museum and you want to give people offers, loyalty gifts, to,encourage them to stay involved with the museum. And what she showed in this example is if you predict [00:18:00] churn and then give these loyalty gifts to the people predicted to be most likely to churn, you're going to do the wrong thing.

You're going to in terms of your business metrics, you're going to do worse than just randomly giving, these gifts to, to, to different people. The question is who responds to loyalty gifts? It's the people who are actually loyal. They've been coming to your museum for 10 years, And they're probably not going to churn.

But maybe they weren't there for the past year and they're thinking of non renewing, but you send them this nice letter and then they remind, they remember that they value being a member of this museum or community or something else and they renew. So these are people with a low predicted probability of churn, where you want to be giving them that intervention because it actually moves the needle with them.

On the other hand, you have the category of people who are predicted to be likely to churn. These are maybe people who moved away. who only joined the museum once because they wanted to visit a bunch of times in one year for some reason. They're high probability of churning, but there's nothing you can do about [00:19:00] it.

sending them a loyalty gift kind of doesn't make sense. They're not loyal. so this was an application where she showed that this kind of, you, you want to target, you want to give these, loyalty offers to, people where there's going to be the most lift, the people who are actually going to respond to the offers.

And she showed that this was opposite. from targeting based on a predicted trip.

hugo: I love this example for several reasons. Firstly, because it's a really nice example of how, even if you follow your intuition, uh, you can end up doing things that are not in your own interest, or aligned with your business metrics, in this case. I also love it because it's not quite obvious what the solution is.

So I'm wondering, once we've done this type of ML and we realize the failure mode, uh, how can causal ML and causal inference actually help us here? 

stefan: So basic predictive ML, that's a tool for, in this setting, answering the wrong question.causal ML, given data from an experiment, can give you an answer to the right question. And essentially what causal ML try asks [00:20:00] you to do is first you run an experiment. You randomly send some people these offers.

You randomly don't send these offers to others. And causal ML tries to figure out groups of people. Where some people have very strong positive response to the offer, others maybe don't. now you've identified these different groups where you have, different amounts of uplift and then you can use that for decision making.

hugo: That makes a lot of sense. I want to now jump into kind of what the moving parts are here because we've been talking about, machine learning models and those types of things. But maybe we can start by looking at how machine learning actually works to a certain extent. So an old friend of mine who Robert Chang, who he was early data science at Twitter, and he's been at Airbnb for the best part of a decade now or so.

wonderful thinker. And he has an amazing blog post from him. 2018 called getting better at machine learning, moving beyond model dot fit X, Y, and part of the point here is to recognize how wonderful, Kaggle [00:21:00] is and wasn't how important the common task framework has been for a lot of us, but that when you do such competitions, you're given the data and you're really, optimizing the model and hyper parameters and feature engineering and that type of stuff.

and there are other parts of the workflow which are key to, to talk about. So in 2018, he wrote this post, and I'll link to it in the show notes, where he talks about problem definition. Why thinking about, why thinking hard about your problem is crucial. Data collection, why setting up the XY, why setting that up right is half the job, done.

Then model building, productionization, and feedback loops. so I'm wondering what the practice of causal ML looks like, given this Framing of ML, which isn't really about the model as such per se. It's about a lot of other things.


stefan: I think every, everything you highlighted is key. You need to do that, just as before. And there's just one extra piece in Causal ML and that's, actions and counterfactuals. for reasoning about the effects of these [00:22:00] actions. that's one piece that just isn't there in the standard kind of X's and Y's, predict Y from X, framework.

You need to understand how the Y's could be different if you took different actions, and that's at the core of causal ML.this leads to a whole bunch of difficulties. if you have non experimental data, then you don't really know how your outcomes and actions were linked in the data. Like maybe the actions were following outcomes.

the classical example of this of course is predicting demand from prices. you often see that demand and prices are positively correlated. Does this mean that people like high prices? No, it actually means that more often prices follow, high prices follow spikes in demand.Once you bring in actions, you need to bring in counterfactuals to rigorously reason about the effects of actions.

but you also need to collect data on actions and you need the relation, kind of actions and outcomes to relate to each other in the right way so you can learn about these counterfactuals. and the easiest way to get this [00:23:00] right relationship between actions and outcomes that enables learning is to run an experiment.

Because in an experiment, the actions are randomized. there's no way that actions could follow outcomes. Actions are exogenous. They come from nowhere. They were randomized. And so then any link between actions and outcomes is actually causal. It's the effect of the action on the outcome.

hugo: So one reductive, but useful, all models are wrong, but some are useful, way of framing causal ML is it's machine learning meets randomized control, AB testing stuff, plus, plus, right? And the way I want to frame it like that, at least for the next part of the conversation is our listeners understand ML.

A lot of them understand online experimentation, and yet, online experimentation and thinking about causal inference, you can get so many things wrong, right? So there's, of course, the, all other variables held constant, which is incredibly tough to, verify and validate, right? Which we yolo that a lot of the time,and then there's do you have the instrumentation to even get your AA testing correct?

I'm just wondering [00:24:00] what you've seen, in terms of the skills and ways of thinking and operating that are really needed, in order to get this melding of experimentation and machine learning.


stefan: So that's a fascinating question, because in terms of my academic work, honestly, most of the, almost all the applications I've worked in academia, and almost all the applications I see happening in academia, involve observational data. You don't actually run an experiment. You imagine the ideal experiment you would have wanted to run, and then you collect data that gets as close as possible to approximating this ideal experiment.

this kind of thing in economics is widely used, andpersonalized healthcare is widely used. experiments are very expensive. In academia, we often can't afford to run these experiments, so we work with observational data. Now, you might think that in industry they'd go even further in this direction because often if we think that kind of experiments are the most rigorous thing, I was just talking about how in [00:25:00] academia observational data often enables us to make progress because we couldn't afford to run the experiment, then in industry people just want to get the job done, so they'd be even more likely to use observational data.

But actually what I see is the opposite. In industry. Almost all causal analyses I see are based on experiments. And this was unintuitive to me at first, but I think here's the catch. Working with observational data, it's cheap in terms of data collection. You don't need to spend, I don't know, millions of dollars running your experiment, at scale, in, say, a healthcare or development economics application.

But then if you have observational data, there's so many things that could go wrong. And Okay, in academia, you can put a PhD student on it and have a PhD student think for two years about everything that could go wrong, give seminars, refine this. People want to work on these kind of really high level questions that are going to be of broad interest, so they're willing to spend two years.

And okay, we can't afford to run the experiment, but we can afford to [00:26:00] spend the time to deal with all the complexities that come with working with observational data. And in this industry, no one can afford to spend two years to answer a question. you have to answer things more quickly. And what I've seen is, invariably it seems, when a company really understands causal inference, they understand that they want to use causal inference to reason about their decisions.

They want to make the correct decisions. They want to make the correct decisions fast. they see the business value. Then, for them, you want to scale this, you have to run experiments. Experiments cost something, there's upfront cost, but then once you have your systems,you can run the experiments, you can just answer a lot of causal questions quickly, rigorously, without any of these caveats that, do we actually have biased, sampling issues that break everything.

And that's where I've seen industry converge.

hugo: I just want to double click on, sorry, I've been working in tech too long. So let's double click on that. I just want to double click on something,you hinted at or [00:27:00] spoke to relatively explicitly, which is industry as a driver of, robust, methodology that may then get formalized in more academic circles.

But one of the, Historical examples I love is Gossett, otherwise known as Student, who developed the Student Tea Test, right? and he did, as an employee of the Guinness Brewery, I think to figure out, to compare crop yields with respect to the amount of alcohol and consumable alcohol they produce, tying it to the business metric of amount of Guinness sold, right?

we've talked about how models aren't The only important thing how there's a whole ecosystem of tools and methodologies and processes and ways of thinking around them, but I am interested in so when people come to me, and say, if I want to do data science machine learning, what models should I use?

Hugo? I dispel the myth that, you should be that model focus. I honestly tell them setting up some sort of evaluation framework To compare models against each other, and an evaluation framework that aligns with their business goals or [00:28:00] their general goals, whatever it may be, is useful. But there is also a heuristic, which I think I originally got from Jeremy Howard,Which is, which I tell everyone, and it's if you, if you know your linear and logistic regression, then if you learn how to build some random forests, XGBoosts, so tree based methods, and some deep learning, that'll get you 95 percent of the way, right?

you'll be able to do 95 percent of things you need to do in the modeling, part of your workflows. Alright, so if you have regressions, tree based methods, and deep learning, Relatively good. I'm just wondering, for people learning about causal ML now, what advice, you would give them, with respect to heuristics for choosing modeling techniques?

stefan: I, I think that's, I think that's completely right. And of course, with the caveat that like, as I said earlier, the first thing is just to understand experiments. And understand classical theory of experiments. Once you have that and you want to go into the causal ML world, I think, yeah. Tree based methods, deep learning, that's what's up.

And you can see these developments mirrored,in the [00:29:00] available software. If you want to do tree based methods I guess maybe here I'm partial to the GRF, software suite we've developed, here. it's R based. the goal, I think, what I like about tree based methods is that they're very plug and play.

If you know your data, you know what your x's are, you know what your actions are, you know what your outcomes are, you have data from an experiment, you can just throw that into a causal forest andyou get treatment effect estimates. 


hugo: And GRF is a generalized random forest?


stefan: Ah, yes, it's,The software package is called GRF, or Generalized Random Forest.

We have a bunch of types of forests. Essentially, the original vision for the software suite is that we can solve any statistical problem you want using adapted forests. Causal forests are the type of GRF that lets you learn treatment effects from data collected in an experiment.


hugo: Awesome. Full transparency. I actually did some deep research with [00:30:00] ChatGPT, about, Causal Forests and a bunch of things you work on, so I'm wondering if I could read it to you, and we could even spot check it to see if it's right.

stefan: I'd be curious to hear, yeah.

hugo: Generalize Random Forests Framework. Building on the Causal Forests idea. Varga was part of a team that developed Generalized Random Forests, a unifying framework that extends random forest methods to a wide range of estimation tasks. Links to web. stanford. edu.in a 2019 Annals of Statistics paper, Athi, Tibshirani, and Varga show that the random forest approach can be generalized to perform not only causal estimation tasks, But also tasks like quantile regression and instrumental variable analysis within a common algorithmic structure.

Does that paper exist?


stefan: Yes, that's right, that's the original paper. 


hugo: Okay, great. This research underpins the open source R package GRF, et cetera, et cetera. The GRF library, implements these methods and includes features like doubly robust estimation for causal quantities, making cutting edge causal ML techniques accessible to data [00:31:00] scientists.

How does that sound?

stefan: that's a pretty good summary. just in terms of, co authors, I just want to emphasize, you mentioned, Tibshirani, and of course there's a very famous, Tibshirani, statistician. but this is Julie Tibshirani, not Rob Tibshirani. and Julie is actually,is a professional engineer.

hugo: but, she was excited to join this project as an open source, collaboration. And having her part of, as part of the team is, I think, essentially why we have a package that eventually people were able to adopt more broadly instead of just having, research code.One thing I'd like your thoughts on is, in what I just read, there's a whole bunch of jargon or buzzwords, which are incredibly important. they describe real things, but I know a bunch of data scientists, of course, who work with instrumental variables know what they are. They are these types of things, but I know a bunch who would see that term and be like, Oh, I'm not in my comfort zone here.

So I'm just wondering for people who want to break into this ML, how you'd encourage them to think about [00:32:00] it.

stefan: Yeah, so there's definitely a bunch of buzzwords in there that you don't need to know to use the method. Some of them, like Double Robustness, Double Machine Learning. These are buzzwords about how we solve the problem. And to ,those people who are interested in kind of the implementation of, econometric or statistical techniques, those are going to be interesting, but otherwise, they're you can ignore them.

otherwise, instrumental variables methods. Those are buzzwords that are added into the description because of the generality of the package. we had some kind of, I think these are more often academic research facing workflows.but we just wanted to have this generality to solve all kinds of problems.

but if you just want to do treatment effect estimation, you have data from an experiment and you just want to figure out who should you give this churn prevention offer to and who's not going to respond. All you need is you need to know what your predictors are, we call those X, what your action is, we call that W, give the treatment or don't, what your outcome is, say do you renew, [00:33:00] that's the outcome you want to move, and then you can just use the function causal forest, and that will give you,heterogeneous treatment effects, basically the estimates you get out is an estimate of given your type, given your observed, variables X, what's our best prediction of the effect of giving you, this term prevention offer on your actually, renewing the service.

so that's that's the subset of the package you actually need to know, in order to use it.


hugo: So once we get our results from causal ML and causal inference and ML, how do you think about communicating them to decision makers? So a bit of context, I've done a lot of work in using Bayesian inference, in my own scientific research. I used to work in biology and physics and biophysics. and also in my consulting work and helping people make robust decisions using statistical inference.

one of the, one of the things, there are many things I love about Bayesian inference, but one is that you're able to [00:34:00] get out entire distributions, right? and then threshold and get point estimates from them and that type of stuff. I'm just wondering when we're trying to express uncertainty, particularly around, I suppose likelihoods of which different decisions and actions will have different impacts and effects.

how you think about this type of communication.


stefan: Yeah, I think that's actually. that's a great question. I think actually one of the big challenges in, successfully deploying Causal ML is how do you summarize what you've found? How do you communicate it? because often you, you might have this idea that oh, you've done Causal ML and now the system is just gonna explain to you what's going on.

And that's not what's going on, right? Causal ML, typically the output will be estimates of who's gonna respond to an intervention. with of course some noise and then how do you reason about the uncertainty, what do you do with that? my favorite, my first recommendation always is to just go back to what's your action space and what are you trying to accomplish?

Imagine kind of just rolling out [00:35:00] a system based on the outputs of this causal ML model. imagine actually choosing who to give these churn prevention offers to based on the output from this model and then evaluate that. See, is it moving your key metrics relative to the status quo.

That's how I'd really start to think about evaluation. Don't, not to focus on the actual numbers coming out of the causal ML model but just using it to do the thing you want to do and see, does it look good.that's of course If there's, if you know exactly what you want to do, that's great.

In some settings you don't. but then one tool I've seen that kind of works very nicely is this idea. It comes from the marketing literature originally. it's called the Qini Curve. Q I N I. And the idea with the Qini Curve is that you, okay, you make a plot. On the X axis, you put, Here's how much money I'm spending on this.

Giving target offers on the y axis you get here's how much benefit I'm [00:36:00] getting from all these target offers I'm sending and then you try to draw out this curve where you're like, okay I rank people based on how much benefit I think I'm gonna get From sending them this offer and then you can evaluate many different policies I only give the offers to the best 1 percent So here it costs you a little bit and you get this like pretty big benefit because the first 1 percent were like very responsive And then you keep going and eventually you maybe get this like concave looking curve that traces out for every policy that treats the 10 or 20 or 50 or 100 most promising or a thousand, a million most promising people based on your, causal ML output.

what's the uplift versus cost from giving them their treatment. If you were just randomly assigning treatment, everyone would sit on this line. if you're beating random, you get this kind of nice shape, and then you can, look at that plot,to reason about cost benefit analyses and, and the value of the targeting rule.

hugo: What a wonderful and clarifying example. Thank you, [00:37:00] Stefan. I'm also interested, a lot of people are interested, of course, and understandably so, in explainable ML. And I think there's a certain ambiguity between is causal ML more explainable or is traditional ML explainable? how do these fields, and methodologies relate to each other?

stefan: Yeah, there's this question, is causal ML more explainable than just basic, ML? In some sense, I think you can give two answers. and the first answer is just a kind of plain no. causal ML isn't in any way Simpler than basic ML is just basic ML where you've rigorously incorporated the counterfactual reasoning.

This makes it strictly more complicated, strictly more powerful, but maybe harder to explain. So that's a first kind of more pessimistic answer. I do think there's a second kind of more optimistic answer, where causal ML helps bring interpretability by transparently answering the right questions.

if you just take this kitchen sink approach, you just try to [00:38:00] predict everything out of this under the sun and read off some decision rules from that, who knows what's going on? that's just, very hard to say anything about. It's very hard to audit. Causal ML, you ran an experiment.

Based on this experiment, you try to directly learn the thing you care about, who responds to your intervention, who doesn't. And then you use These outputs to make a decision. so the machine learning part of it itself is as boxy as ever. It uses tree, trees, deep learning, something else like that.

But at least it's very clear what you're doing and why you expect it to work.

hugo: Oh, Stefan, this is so important, particularly when you said people trying to predict everything under the sun. I think I'm interested in specificity. And I think particularly when doing this type of works specificity is so integral. for a bit more context, I'm doing a lot of work consulting and teaching people how to build LLM powered applications at the moment, and [00:39:00] a huge anti pattern I've seen, In the end, it's an antipattern if you don't fix it.

Let me say, is Organizations or people having a corpus of documents and saying, Oh, wouldn't it be great if our users or customers or patients or students could, interact with this corpus of documents and ask it questions and get answers, right? without specifically tying it to an actual business use case and a business need, or even what types of questions people may ask, or are you interested in three personas, different types of people, and are these the scenarios they would actually,ask questions in, so I actually think in any type of, work or product design, really thinking through the specific use cases is incredibly key.

one thing that I, we see very often in engineers don't think about so much. when building, these types of information retrieval systems is users will come in. And actually it's my friend Eric Ma at Moderna who first alerted me to this. And then I saw this everywhere. People will come in and say, Hey, what is this corpus of documents about?

Or that type of stuff. And of course, embeddings are [00:40:00] not the way to deal with that type of prompt. 


stefan: Yeah, that's right. And I think a general, theme there is that whenever you're, you want to carry out a kind of data driven exercise, use data driven methods to, to do anything, you need to evaluate what you're doing somehow. And ideally your evaluations are connected to the actual tasks you care about.

and,what you're saying is just ask people to be specific. What do they want? Evaluate your, LLM, evaluate your AI based on what you actually want it to do. I think that's very well in line with kind of causal inference best practices.

hugo: And what's happening in the space of discovering causal relationships? I suppose we'd think of it as how do you build causal graphs like x is a causal factor with respect to y, and so on.

stefan: There are research areas around causal discovery.so you have this complicated system with many moving parts. In principle, everything could be [00:41:00] related to everything. And you're trying to specify this causal graph, specify the set of causal relationships, that could possibly exist. So this then gives you a clear handle on what's actually going on.

I've seen this kind of method be very helpful in, biology, in genetics. you have all these different biological processes going on and, In principle, everything seems interconnected, but it's actually that A drives B, and B drives C and D, and C and D together do something else. And once you understand that graphical structure, then you can do better biology.

In my experience, I've seen this, this kind of causal discovery approach used less in, business settings. The kind of obvious hypothesis is that when you're doing, Biology, when you're doing hard science, there are just certain scientific laws that are very strong, and once you find them, they let you explain essentially perfectly what's going on.

and in settings like that, you can actually discover the causal graph, and it's very powerful. in business settings, it's often, there's a million things going on, you're not [00:42:00] trying to understand everything that's going on, you're just trying to, here's your action space, and you're trying to do some things to move the needle to Get a little more revenue, a little more engagement.

or target some other outcome you care about.and often in these more complicated settings.

In business settings, I haven't seen causal inference tools, be used so much, to discover a cause and effect, to discover patterns you didn't know existed there before, but more to get a really calibrated understanding of relationships that you already knew existed.

You know that giving people discounts is going to make them engage more, buy more things. You know that giving people churn prevention offers is going to make them more likely to renew. But you want to really get a precise understanding of how much and when and for who.

hugo: I don't want to put words in your mouth, but I think you and I would probably agree that, it's definitely helpful for people getting up and running with new techniques and tools and methodologies to hear about failure modes and gotchas, from people who are far more experienced. So, I'm just wondering, From [00:43:00] your perspective and experience, what type of failure modes can occur when building out causal ML capabilities?

stefan: Somehow, I keep going back to experiments. I think the hardest part of getting the causal ML pipeline right is knowing to ask the right question and running an experiment correctly. I've seen very funny things. with experiments. I remember as a grad student, I once was consulting, I didn't know anything, butthey asked me to help them interpret what was going on with their experiments I was an over eager grad student I tried to help them I'm ready to, deploy cutting edge causal inference tools and they're getting these amazing results, huge treatment effects and I'm like, wow, what's going on?

turns out they had run the experiment incorrectly. They had first randomized people into treatment and control, but they were exclusion criteria. They were only allowed to experiment with a small subset of users. And then the mistake they made was first they randomized, [00:44:00] and then they applied the exclusion criteria to the treatment group, but not to the control group.

So the control and treatment group looked completely different. I think they were only, this was some kind of like onboarding workflow. or some kind of like early engagement workflow. So in the treatment arm, they only had new users in the control arm. They had users of all ages. And of course, this is not an experiment.

You can't learn anything from that. So yeah, I was going in ready to use sophisticated statistical tools and I. Went back to teach experiments 101. I think this is really the hardest partgetting experiments, right?

hugo: Oh, that's great. And it just reminded me, I had a boss once who, he always encouraged us, whenever you come with an idea, an experiment to do, a hypothesis, come with argumentation, or come with some preliminary data, or come with something experimental, something along those lines. 

So it wasn't a hard and fast rule, but a heuristic as to what could be persuasive to put resources into something. 

stefan: That's great. And, and, Blackbox [00:45:00] ML predictive model was not on the list

hugo: Exactly. So, something I'm interested in is the intersection and non intersection, between industry and academia. So I'd love to hear from a causal ML perspective. Are very different things happening in industry and academia? Or are they coupled or tightly integrated from your perspective?

Because in a lot of data science, I see they're not so much.

Also think there might be some selection bias in terms of the people we've had on this podcast as well for our avid listeners. Colleagues of yours, such as Ramesh Johari and Gabriel Weintraub, who have done a lot of work in industry. And of course, Ramesh has done a lot of work on online marketplaces and in experimentation.

And Chiara Farinotto at Harvard Business School. So yeah, I am interested in. How aligned and intersecting academia and industry are. And it may be my selection bias.

stefan: that's a great question. I think this might again be a Setting where I have a kind of biased exposure to this. it might be [00:46:00] that Stanford and Silicon Valley are just very special places But at least you mentioned the academic industry being siloed and I have no doubt that this is in some cases true.

but, at least here, I don't feel siloed at all. there's really cool work going on in industry. There's really cool work going on in academia. And there's a lot of back and forth, a lot of,connections. a lot of our PhD students. go to industry, do really cool stuff in industry, come back, tell us about it, that's a great connection.

one thing that I find very exciting about working in this space is that there's a lot of kind of methods adoption. people in the industry are following methods developed in academia. trying tools out, giving us feedback, that's a connection I really value. and there's a lot of flow the other way too.

in terms of questions, new ideas. for me personally, honestly, many of the papers in the past years, I'm most proud of, can be directly traced back [00:47:00] to a conversation I had with, someone in industry, someone who asked a question that was just interesting, completely different from anything I had thought about before, challenging,something that kind of left room to mull over for a long time, but then, resulted in kind of fundamental academic advances too.

so there's a lot of, back and forth,you,at Stanford specifically, you mentioned, Ramesh Johari, actually, Ramesh,Guido Imbens and I run this causal science center and we try to bring industry and academia together in a more formal way. We have this yearlyindustry oriented conference on experimentation.

have speakers both from, from tech, from other industries, from academia, share insights. we have industry roundtables and these are, I just always really enjoy these conversations that they're very dynamic and people from industry, the faculty and academia, the students all have a lot to say, all have a lot to share.

hugo: So this might be a bit wacky, but, when Ramesh came on the podcast, [00:48:00] actually, we talked about one of the big takeaways was how modern organizations very much need to be continuous learning and continuous experimentation, lifetime learning. And he mentioned, what we could learn from toddlers with respect to learning.

And I've noticed, kids growing up, there's this point between six months and a year where. They realize causality is a thing, and they're like, if I do this, that will likely happen. If I do this, that will likely happen. Of course, then children become obsessed with causality, and it's the question why all the way down.

But I'm wondering, and this is a pretty wacky question, but what we could learn from children about, causal inference and perhaps even causal prediction in the end.

stefan: My, my daughter just recently discovered that by announcing she has a poopy diaper, she can stop whatever activity was going on. So if she's like in the car, and she wants us to pull over, she yells she has a poopy diaper. She learned that cause and effect pretty [00:49:00] fast. what can we learn for this in terms of data driven causal ML?

hugo: I don't know. maybe that the world is strange. my experience seeing kids learn cause and effect, it's, none of the things, it's just, it forces you to revisit, do you really know anything about anything? An eternal question, my friend. I'm interested in, for people who want to learn more about causal inference and these types of things, where would you send them? And I've got a few resources up here that, I presume you'd also recommend. But I, I've got, Angrist and Pischke. And I'll put these in the show notes.

Mastering Metrics, The Path from Cause to Effect. I have Scott Cunningham, Causal Inference, The Mixtape. And then Judea Pearls. and Dana McKenzie's, The Book of Why.

stefan: Not only that, of course, there's your wonderful, causal inference, a statistical learning approach, which is still a work in progress. It's a draft version. comments are welcome, which is for the more mathematically minded people as well. But these aside, and I'll link to all of these in the show notes, are there any other resources [00:50:00] you'd recommend people check out?so I think if you want something that's, in the causal ML space, but about how to just use these methods, not like a math stat approach, Susan Athey and I have, a series of YouTube lectures. I can send you a link. we recorded these. It's for a class we teach on applied causal ML.

hugo: it was during the pandemic. We couldn't do this in person. We were recording anyway, so we're like, why not share them on YouTube too? So that's one resource,you can take a look at. 

I'll definitely link to those in the show notes as well and check them out myself. We're going to have to wrap up soon, Stefan. I am interested in, because we're talking about machine learning, we both work in machine learning. I'm going to ask you to perform an active prediction, half joking.

hugo: I'm interested in the future of causal ML. And I, I just wonder looking ahead, what is the biggest opportunities for causal ML in the business world, more generally speaking. And are there specific industries or areas where you think these techniques will become critical?

stefan: One [00:51:00] area where I think causal ML has, has a huge amount of promise is, in personalized healthcare. we have all this data. Maybe in the future it's more and more plausible that we'll have generally have available genetic data. We should be able to understand. There are three available treatments.

Which is the right one for you? There's this treatment that has bad side effects in 1 percent of cases. Are you in the 1 percent where you might face these side effects?So far, I think this is a space where there's been less progress than one might have hoped. For a variety of reasons, I think,data issues are really key here.

often the data tends to be siloed and unstructured. And this is just, getting the data into a form, unified, harmonized in a form where you can actually do causal ML on top of it. it's a huge lift, but, There are people doing really exciting, work on this, bringing data to a place where we can do causal ML on it, trying to explore settings where we can, learn about [00:52:00] what kind of medical treatments benefit who.

And I wouldn't be surprised if kind of 10 years from now, there's been a lot of progress in this space.


I would say the most, most immediate, most actionable thing, is just to,to use, use common sense. you want to bring data driven tools to answer a question, that's great. but don't abandon common sense. Just rememberwhat are the actions you're able to take, what outcome are you trying to drive.

think about how these things link together. And how machine learning based tools that can learn representations, connect different things to each other, can be useful in better sharpening, one's understanding of this relationship. I think there's this allure of machine learning that you just get these black box functions where you throw data in and you get answers without having to think.

but that's just not true, right? if you want to do causal inference, if you want to do causal ML,you have to think, and there's no way around that.

hugo: Look, I totally agree, and the truth is, to do a lot of things that we want to do, [00:53:00] you do need to think about, about these things, and there's a whole, the AI craze now, it's oh, reasoning models, and are we going to outsource thinking, and all of cognition, and those types of things, and it just reminds me, nearly 20 years ago now, Chris Anderson wrote for Wired, a piece called, The End of Theory, The Data Deluge Makes the Scientific Method Obsolete, Which I'll begrudgingly link to in the show notes.

For me, it's I think it was intended to be provocative. For me, it's a It schools us in a way to not think, essentially. The premise is pretty much contained in the title that, with the advent of quote unquote big data, machines can do the thinking for us. And, as it turns out, you still need to think.

stefan: That's right, and if you have access to data, that's great, but what do you want to do with it, and why do you think the data can enable you to do what you want to do?


hugo: Stefan, it's time to wrap up, but I just want to thank you, not only for such a fun conversation, but for all the amazing, interesting, and [00:54:00] impactful work you do in machine learning, inference, causal ML, and beyond. And also for coming back and bringing the knowledge and wisdom to share with other people, myself, and all our listeners.

So thanks once again, and it was such a fun conversation

stefan: Thanks a lot for having me, it's been a pleasure.