The following is a rough transcript which has not been revised by Vanishing Gradients or the guest. Please check with us before using any quotations from this transcript. Thank you.

===

[00:00:00] 

hugo: Hi, Hugo Bound Anderson here. Here with Asha. A gray wall gonna jump into all things about next generation of how we're gonna build data, machine learning, and AI powered systems into the future. Carol says, love loving, Remo as well. and I'm sorry, do I pronounce it Mamo? 

akshay: That's what we say. Marmo, yeah.

hugo: Cool. That's what I like to hear. and it's got some real algae vibes, right? 

akshay: It does, yeah. Marmo is a type of moss ball that is beloved in a lot of places, Japan and some places in Europe in particular. cute little spherical things. 

hugo: Super cool. So firstly, welcome everyone. Welcome Asha. it'll be great to know where you are watching from and what your interest in such things are or is.

I'm currently in, in Europe. Ashay, you're on the west coast of the US 

akshay: Yep, I am. That's right. In California. Awesome. 

hugo: Still incredible that we can jump on a call and stream to the cosmos. everyone, this is a live stream for the podcast Spanish in Gradients. it's a data ml and [00:01:00] ai, what we call AI these days.

podcast. please do Google it, check it out. It's on all the apps besides Tinder. we don't push it there. and give it a and subscribe. if you're into that type of thing and give it a review. if you enjoy it, if you don't enjoy it, don't give it a review. Just stop listening.

but it is the internet, of course. Carol says she's, dar in from sunny San Diego. Beautiful. I'm actually really excited to be here today. With you at Schaefer, many reasons. You are building Marmo. you have a PhD in intellectual engineering, from Stanford University, and I've done a lot of Stanford.

when you worked at Google, you worked on a lot of open source projects, including TensorFlow. And so one of the reasons I'm excited is you have such a diverse set of experiences to come in then, and working on a tool such as Marmo, in such a vibrant e ecosystem that I really wanna dive into all of that.

I do wanna set the stage by saying [00:02:00] I, I kind of feel old man. and what I mean by that is I came up. Not even using Jupiter notebooks. I came up using I Python before project. I was using I Python notebooks, And I was using pandas and map plot lib and sql. And these are still fundamental tools in a lot of respects.

I hear, the kids are doing a lot of different things these days, right? using Polars and Duck DB and altaire and Marmo. So I do wanna find out about the next generation and, what people might call future music of machine learning and AI tooling. So maybe with that, having been said, you could tell me about where we are and where you think we should be with the tooling landscape.

akshay: Yeah. that, that's a great question. So I did grow up on map plot live, and so you don't have to feel, I, I also feel old a little bit. but, , I guess, so there's machine learning tooling, there's like tooling for working with data in particular.

So I guess what you, the sort of the specific tools that you mentioned, it's actually interesting. I think we're seeing this across the board in the open source [00:03:00] ecosystem of the old guard of tools and the new guard emerging. So you mentioned TensorFlow, for example, which I did work on, while I was at TensorFlow, actually while I was there.

So I was working on TensorFlow too actually, which was making like TensorFlow more imperative, more PyTorch but while I was there, there was this small skunkswork project being developed on. Jak, which is now like the defacto, at least in, in research, especially at in Deep Mind and in Google sort of the defacto tool.

So like from TensorFlow there was JAKs which adapted more of a functional sort of declarative programming model from map plot lib. A lot of folks are using Alter now, again, actually from a very imperative API and extremely powerful API. I think some of the plots I made in my thesis I wouldn't have been able to do otherwise.

But going from map to alt alter, also more of a declarative paradigm, going from Pandas to Polars, which also has more of a functional vibe going for it with it's like lazy expressions. a lot of the chaining also is a lot more declarative than just types of things he would do in Pan [00:04:00] Does. and then some people say going from Jupiter to Marmo, and you might guess again, another sort of, a bit of the pattern here is going from.

a traditional notebook where you imperatively run cells, one at a time and you have very fine grain control over the state of your ram to a notebook that models sort of the data dependencies for you and, requires you code and think about notebooks holistically as like assembling an actual program where like you run one cell, Marmo knows what other cells need to run in order to keep your code up to date.

and in order to use Remo effectively, you got a code in a bit more of a functional way, which we can talk about. But I think like maybe one common thread that I'm just spitballing, I just came up with right now across all these tools you mentioned is this embracing of, data in, data out functional style programming.

'cause if you embrace that, tools can do, they can either do a lot more optimization like what Jax does. Or they can just provide you more guarantees about the state of your [00:05:00] code with what Maima does, or they can just improve the developer experience. So maybe Altair does, and Polars does, both of those things.

hugo: I love it and I love it how you like, it is a data first and a data centric paradigm. and we'll get to this and I think, but something I, I really appreciate about Marmo is essentially you have a di directed, a, a dag a directed a cyclic graph of, of cell execution. And it doesn't allow you to have loops, right?

And that does respect that, that what we call the data flow paradigm, right? which is incredibly important in, in our work, in, in so many ways. Now, these are places I do want to go to with this conversation, but I do think we don't wanna be too, I. Tool centric, right? Tools are here to serve us as, as data practitioners.

and you have such a rich history. I'd love to know more, like with your work from Google Brain when you're at Netflix, Stanford, across all these roles, where did you see and [00:06:00] consistently see ML AI and data workflows break down? 

akshay: Yeah, that's a great question. So I guess I will caveat that, but I had different kinds of roles at each of those three places.

So at Google, I was just basically a hundred percent on the engineering side where, I was sitting in Google Brain, but I was working on TensorFlow and TensorFlow too. So to the extent I was watching things break down, it was typically more like seg faults rather than model drift if you meet. So like I was working on computer systems there, At Netflix, I was working on, a bit of like optimization research and like systems research and actually that, that there's some interesting anecdotes to share there. And then at Stanford it was pure machine learning research, that was always grounded in like open source development. so I guess like that, there's a couple of thing [00:07:00] areas, I guess where I saw things break down, at Netflix, at least at the time when I was there, and this was quite a number of years ago.

one thing I think that was common, that was a common source of frustration amongst the engineers, there was actually just running batch jobs. Like running like big data processing workflows, maybe powered by Spark. 'cause they had a lot of batch jobs to run and

basically they were oversubscribed, their cluster was oversubscribed. So like scheduling was actually a big issue. Engineers would tell me that you would need to schedule your job at some arcane hour of the day to try and hope to hope that it would get, slotted in, and actually run on time like at 2:43 AM I dunno, I'm getting the details wrong, but scheduling and making sure that the systems that were running, the workflows that the data engineers needed run or the ML engineers needed run.

the systems side of that was constantly a pain. I Google that was pretty well developed as far as I understood. you could just [00:08:00] write your Borg, CFG or whatever sort of configuration file and let your autopilot thing handle the workflow scheduling. but not as many places have as sophisticated systems as Google.

Did back then and at Stanford. The things I saw break down, those were of a different kind. Right? So that was more of research. And we weren't at that stage where you had some cluster running, some batch workflows, but rather like things would break down in the beginning of the project. And like the pro ML project, and I think this is common in industry too, where like when you were, you started working on your model, started training your model, you didn't take the time to set up like model check pointing.

And did you use Git? I don't know. Maybe you did. Where did the data come from? I downloaded and put it in this file and my directory. and then a lot of this work did happen in the notebook of the day, these traditional sort of imperative notebooks. And when you do that, you can write a lot of code quickly, [00:09:00] but with traditional notebooks, not enforcing a data flow structure

on your notebook with traditional notebooks, letting you run things out of order. I saw this really often where you would be iteratively programming in a notebook. You write some code in one cell, maybe you comment it out and you run another cell below, or you copy paste it,above, and you run, you start running these cells in this piecemeal order, and then you get some output saved in your plot.

You do get something saved to dis maybe you forgot also along the way, did you keep track of the packages you used? Maybe you did, maybe you didn't, but at the end of the day, you would, the researcher would have, okay, yeah, these plots look good. I have something saved to disk, which is my model weights. All right, I have a model.

Then you pass that notebook file to someone else. You didn't say the requirements, you ran yourselves outta order. So like when I try to run your notebook, like I, I'm just gonna run it top to down top to bottom. And then it often didn't work. and we can talk about this more, but I think this is the idea of like imperative programming notebooks or REPLs accumulating hidden state.

and so that was a kind of [00:10:00] a consistent theme that I saw and inspired me to take these sort of experiences that I had of TensorFlow where I thought a lot about data flow. 'cause TensorFlow was, the whole thing was you're building a data flow graph. And my PhD where I saw a lot of folks working interactively with data, and I saw that it was incredibly valuable to have a programming environment to let you work interactively with data.

But I also saw that there was a lot more we could do to ensure that the work that you did, the work that came out after you were done working in a notebook was actually reproducible and reusable by others. 

hugo: Absolutely. And I appreciate that, that rich history and you giving us so much context.

I think before go going any deeper in, into some of these things, could you just, for those who may not know, explain the difference between imperative and declarative? 

akshay: Yeah. So I think a simple way to describe imperative is it's like a programming model in which you say. You describe a sequence of, instructions.

You tell the machine to do one thing, then do another thing, and then do another thing. and in [00:11:00] contrast, declarative is you just say what you want and you don't specify how you want to, how that goal should be achieved. and so I did my PhD in like mathematical optimization.

So maybe this is, this analogy will make sense and if it doesn't, we can find another one. imperative would be like, you have some problem that you want to solve. you want to, allocate stocks in a financial portfolio or something and you wanna get good return. So imperative would be like, trying to come up with some kind of strategy, on your own.

okay, maybe I should first. Buy some of this stock, and then short this stock, 'cause I have some feeling that, you know this, that would be a good idea. And then once that happens, then I should do this, et cetera. Like coming up with the plan of investment on your own declarative in this context would be saying, Hey, all I wanna do is I wanna maximize return, I minimize risk.

Here's my mathematical model of what return means. It's an expectation, and here's risk. It's like a, it's like a Covance, it's like a quadratic form [00:12:00] on Covance. So here's my model, here's my constraints. And you tell the computer, Hey, just go figure out what actions I should take in order to maximize return and minimize risk.

yeah, it's declarative is telling the computer what you want and not how you want it achieved. 

hugo: Awesome. I love that. And actually, firstly, we've got a bunch of people saying love marmo, exclamation point in the chat in YouTube. So that's super cool. Carol Willings made a really nice point,and we'll get here with respect to perhaps how this type of declarative paradigm is even more important in the world of generative ai.

But Carol said explicitly declarative works well with prompts, right? 

akshay: Yeah. yeah, exactly. 'cause you're telling the large language model. Th this is what I want. And, I prescribing a large language model how to do something, I actually don't even, it's not clear what its mechanism of doing, things is at all.

But you tell it what you want, right? And, it'll hopefully go and turn away and do what you want it to do. [00:13:00] Absolutely. 

hugo: And there are slight variations, which may be mixed or two slightly, but I still think they're more declarative than imperative where you say, explain your reasoning or something like that.

and quote unquote chain of thought stuff. that's for another conversation. thank you for spelling that out so much. So I'm wondering if you can share an example of a concrete workflow and workflow failure that's helped to shape how you think about reproducibility and collaboration in data workflows.

And I'm just gonna say we usually, I say AI because something to encapsulate data, ML and ai, but I prefer saying data because that's what it really is, right? Yeah. but I'm more comfortable in this conversation saying that. So everyone, data means data, ml and AI from here on in.

akshay: There's a few that I could talk about. I guess one, here's one that's may feel kind of micro, but like I've seen like many times and gets at somewhat to the [00:14:00] heart of what we're, what people appreciate about Marmo. and so I mentioned that the reason me and my team are interested in working on this sort of new kind of notebook is that one we see how valuable interactive computing is.

And there's a reason like that. Like everyone uses Jupyter Notebooks in order to first start exploring their data. But two, traditional notebooks where you can run cells single, one at a time, mutating, mutating the, your global state that can accumulate. hidden bugs.

And I can give one concrete example that again, may feel micro, but I think I've seen like many times, and it's the idea of like in a traditional notebook where the notebook doesn't have any understanding of how two different, cells are related. You can delete a cell and that will remove the code from the page.

Like you won't see the code anymore, but your variables will still live in memory. So if [00:15:00] you were, I don't know, training, some model, and you deleted a cell that like declared some hyper parameters or something and like maybe your cell was really long, right? So you weren't exactly, you didn't realize that some variable that you actually required downstream was defined in this cell and then you went ahead and you deleted it.

Your notebook will still work. As you intended it to, because that variable actually still lives in memory for the duration of that notebook session. But come back the next day and restart your notebook, that hyper parameter lambda or whatever that existed in that cell no longer exists on the page. It won't run.

So when other code runs that assumes the existence of it, it'll just totally fail. And like it sounds trivial, but I myself have spent several hours debugging issues like this. and I'm, you multiply that across by, times the number of people who work in traditional [00:16:00] notebooks for data ml and ai.

These kinds of problems are a real tax on developer productivity. And so that, that's one specific thing that like marmo just solves out of the box for you. and so the way that Marmo works with that going too much into the details is that when you type code in Python code in your cells, it sees what variables your cell defines and what variables it references, and then it builds this data flow graph on it.

And so that if you delete a cell, MIMA says, Hey, all those variables should not be available to the rest of your program anymore. And it'll actually remove them from program memory and then tell you like it, it'll invalidate other cells that depended on those variables. So like you catch your bugs immediately, basically before they actually have any deleterious effect.

and I think at a higher level what our users tell us is that because of things like this that we do, like we eliminate hidden stay, we store notebooks as pure Python. These notebooks [00:17:00] actually end up becoming reusable artifacts for the data person, the ML engineer, the data scientist, the data engineer in a way that they didn't really have before.

and so for people who like traditionally live in notebooks, like this is like a huge level up or like new ability that they've gained of like essentially being able to write software for the first time almost. and so I think like that reproducibility bit and there's more we can talk about what we do for packages, as well.

I 

hugo: That gives such rich co context andI've been using notebooks since, since, but actually the first notebook I used wasn't even ipy. Think it was Mathematica before that in, in the two thousands in the zeros. but, my, my background's in research science and biology and physics and math, and I love notebooks.

I love them so much because I don't, I think people have worked in science and biology in particular know this, but the mental model comes from experimental notebooks where people were like. Take their PCR gels and stick them in an [00:18:00] actual notebook and write notes next to them and have their little plots and put them there and, thi this type of stuff.

So having this in a computational environment, incredibly useful. hidden states, being able to delete cells. I was triggered numerous times when you were talking through that, how many times I've opened like untitled 6, 6, 6, 6, 6, ipi Y and B on a Monday morning after being happy on a Friday, late Friday evening.

Absolutely. Brutal man. Absolutely brutal. And you said hours in, in a row. the amount of chaos I've caused from for myself, by not being rigid and the flexibility of notebooks I've used are fantastic. but. I do think you need structural rigidity imposed by the infrastructure as guardrails, which is what you're speaking to.

and the fact that you mentioned if you, delete a variable, it will let you know what's happening downstream because of that. Also, if you try to define something that you've already defined, it won't allow you to, it'll very nicely offer you a scratch pad where you can do those things,as well.

and so these, types of features I think are very [00:19:00] aligned with the type of reproducibility we're talking about. Also, to your point, I was, when playing around with Marmo, I was very, excited to see, I think you mentioned this explicitly, but they're not stored as js, ON,They're actually.py. they're Python, Python files, Python scripts. So they are executable as a, as pure python, which I love. the other thing I would love your thoughts on. Is a lot of the time when I build data power products or workflows, I'll switch between scripts and and notebooks, because this whole scripts versus notebooks thing, it's just, it's been such an unproductive flame warfare for so long.

but what I, my general workflow is when doing experimentation or exploration, I'll work in notebooks. and when trying to serve things to prod, I'll work in scripts. But what I'm hearing here is that perhaps we've found a possible middle ground, right? 

akshay: Yeah. So Marmo does blur the lines because first and foremost, we store [00:20:00] everything as a Python file.

and that Python file is also, you can import that Python file and reuse it as a regular Python module. And we take care to store your code in a way that if you import it, it's not gonna just run your notebook. Like it'll only run it if you run it from the command line. So you can import it and then re, you can pull out.

Functions and classes. So you can say, from my notebook, import my function or import my class. So hopefully to encourage folks to use that feature instead of just copy pasting cells across notebooks into untitled 6, 6, 6, 6. so we do, blur the lines between traditional, you know, just a regular Python module and a notebook, that said I think every tool has its place, right?

so like I would say Marmo is great for yes, definitely exploring data experimentation. Also, every Marmo notebook can be run as an interactive data app. So we have the really rich ecosystem of like interactive widgets that work without callbacks. So then you can also use your [00:21:00] notebooks should you want it to.

As this interactive data app without porting it to STREAMLET or Dash or any other tool. So you get that for free. you can run it as a script and that can actually be really helpful for these like data engineering style like workflows where I think the observability you get, it's a certain type of observability you get by working in a notebook where you can see actually visually like you get a record of like intermediate steps in a workflow.

So you can actually prototype and actually totally build your pipeline or, in a notebook. And then the notebook view lets you see what's going on while you build it. And then you can run it as a script from the command line. we have a concept of testing notebooks.

So PI test can pick up functions that are it that start with test underscore so you can harden them in that way. so we go a long way of like making existing things people do in notebooks, which I think is interactive exploration app-like things and like data engineering, workflows type workloads would make a lot of those, a lot more [00:22:00] reproducible and reusable.

That said I wouldn't advocate building, entire software applications out of a collection of marine notebooks. I think you can take this and some people have taken this idea like to like really interesting places like this idea of literate programming and NB dev where I think in that community you do use their framework to just build regular software.

that may not necessarily have to do with data. And I hope I'm not mis misrepresenting, but that's my understanding and I think what they've done works for their users. we're not pushing really in that direction. I still use regular Python to develop marmo. I'm not developing marmo itself in Python of books, if that makes sense.

hugo: That makes a lot of sense. I'm interested in, we've mentioned the term reproducibility a, a number of times, and I think there are several levels of reproducibility, right? So how do you think about reproducibility and what are the, what's the base level of reproducibility that you think we need and where can we move from [00:23:00] there?

akshay: Yeah, so I think there's, yeah, there's several levels. there's a few kinds that sort of, we, that I think are really the base level is that in a notebook environment, at least I. The code on the page should match the outputs you see. , And if it doesn't, that's when you can get into all these sort of issues where there was a study that from 2019 that pulled down a million notebooks from GitHub and tried to rerun them and found that only 25% of them could even be run.

And only 4% of those, when you ran them, reproduced the outputs that were previously serialized in them, meaning the code on the page didn't match the outputs you saw. So I think that's a huge problem for a tool that's supposed to be used for science. So that's base level runtime type of reproducibility.

and the way the Marma tries to solve that is with this data flow type structure, it imposes so that if you run a cell, it runs dependent cells or if your notebook's expensive, you can configure [00:24:00] it to mark the other cells as stale or lit. so like you get visual indication that hey, you need to rerun the cell.

So that's one level. The other level is third party dependencies. Like I think it's really common, especially for these kind of data exploration or even model training workflows to just wanna get an artifact out as quickly as possible. And you might not be as disciplined of setting up a PI project tomo and recording your dependencies.

but that's a problem 'cause then when you share someone a notebook file, they're like, what do I need to install in order to get this to work? so one way we solve that is actually by tightly integrating with the UV package manager, which is a new package manager that, I guess the broader thing is it's a package manager that, supports.

Python, new Python standards, or I guess Python standards like pep 7, 2 3, that allows you to inline your script metadata and your p your package dependencies in your Python file header. Marmo notebooks are just Python files, so we can put that header in our notebook, so that if you run marmo with the specific flag when you [00:25:00] install packages through our ui, it's just saving them directly in the notebook file so you don't have to even think about it.

And then you share that single PPY file with someone else. Remo will create an isolated virtual environment for you, install those packages and you're just, you're good to go. so that's two level on like the sort of that we can address with a single sort of tool like Marmo itself. But I think there's many other kinds of reproducibility that we don't yet address.

we talk about data workflows, right? I think one thing that's really common, and I've seen some tools, address this in interesting ways, you run some data pipeline. It fails, fail if things always fail, right? So it fails for whatever reason, and you wanna figure out why it failed.

And step one to doing that is finding a way to rerun it, right? but sometimes if it failed in the cloud, but the setup in the cloud is different from what you have locally. So it's hard to reproduce locally. Maybe you didn't snapshot the data that was used that was fed through the pipeline.

So you might not even have access to rerun the pipeline with the same data that it, [00:26:00] that, that triggered the failure. And I think what people often do in practice, 'cause these things are hard, is that okay, it failed in production. I don't have access to the exact data, but let me try and recreate it.

Okay. It seems good. Push it up and hope it works. I think that's a huge class of problems that in that would be interesting to solve for and,and some folks are trying to solve it. 

hugo: that's super cool. I am interested, so we have a course. The order of execution and the ability to reproduce a notebook when shared with others, reproducibility of data being in introduced package management story incredibly important.

and UV, of course, is another thing that makes me feel old, just joking. But, U UV is in incredible. It's, it's saved us from a lot of, we've, every now and then things tend to break down and we end up in dependency hell as a community and an ecosystem. and things slow down.

And UV has brought back all the wonderful things that PIP used to be capable of, but made it. Fast for us in an ecosystem, which has a lot of bloated packages as [00:27:00] well, to, to a lot of great packages. But there's a huge amount of, of bloat also. I'm interested also if we're thinking about reproducibility.

I love that you have some lazy characteristics and that, things that aren't computed if they're, too expensive for some definition of expensive. I wonder if you've thought about reproducibility of, compute or being able to have cells attached, even by decorators or something like that, to a particular compute infrastructure.

So we can introspect into that after the fact. 

akshay: Yeah, no, I love that idea and it's something we've toyed around with in the ideation phase and the whiteboarding phase. We haven't worked on that in particular yet, but the idea, and some of our users have said, oh, wouldn't this be so cool if you could offer this to us?

Of why does a notebook have to be tied to a particular compute infrastructure, right? Like you think about Databricks notebooks, for example, and they're awesome. It's a huge, it's an amazing product and service, so many industries, but it's like you start your notebook on a spark cluster and whatever, and that's it.

but what if, you wanted to do write some kind of program, and maybe it's in, you're [00:28:00] interacting with it in a notebook that like did, started with some analytical queries, right? Maybe hit some Ola engine. It could be anything. It could be Spark, it could be edb, whatever.

it could be local, it could be remote, but hit some, does some OLA queries, collects all the data, maybe visualize a little bit, and then trains a deep learning model, maybe on a array cluster or somewhere else, right? and like seamlessly all within What feels like the same program, the same interactive environment.

I think that'd be really powerful. and whether it's on a decorator level, on a cell or whether it's function based, I think these are all interesting ideas to explore and folks are exploring it in, I think in like we talk about the landscape of tools, I've seen ideas like this explored in, a variety of different ways.

Obviously there's the resurgence of serverless functions and companies like modal that led you run a function on remote hardware and you specify the image that's used, the container image used to run that function. Similarly, there's coiled and the company behind Dask and they're coiled [00:29:00] functions, which is a similar concept, but in your own cloud.

run House is another company I've seen or open source project, and company, That also has this idea of I think they talk about themselves as something like the PyTorch for PyTorch sensibilities for ML infrastructure. But the idea of you have your functions and they're the first class thing and you specify where they run and it's decoupled from compute super compelling ideas.

something that, we haven't built out for yet, but that's that we're actively thinking about. 

hugo: Yeah. Yeah. I love it. And one of the reasons that just the decorator paradigm or mental model came to mind is, I worked, with the meta flow team for years and out of Netflix, and that's how they, approach, their data powered machine learning ai, AI workflows where you have steps and you attach different decorators so you can assign different types of compute.

and once again, that was AWS native because Netflix was so. AWS native, I think few people actually know,or recognize that I think for a significant amount of [00:30:00] time, Netflix was like the biggest consumer of AWS even before Amazon itself wa was right. And I think my understanding is that, being that consumption like powered a lot of the ability for AWS to scale the way it did as well.

actually, but in meta flow, the ability to assign different steps. I can use meta flow and do steps locally. and if I need some batch or like GPUs or whatever, I can just send it, to AWS or wherever it is now. super cool and increasingly important with all the, generative AI stuff we have to do today.

And that actually brings me to another point. For better and for worse. I love vibe coding, man. I love it. and I'm not gonna, kill the, I'm not gonna build the next SaaS product and kill industry with it. Like all the linked influencers wanna rail against old men screaming at clouds.

but one of my favorite things to do actually, 'cause I work on a lot of LLM powered software, is to, vibe code, like custom Jason viewers so I can annotate my own [00:31:00] data and, all of this and this type of stuff. Super fun. But one issue I've come up, a serious issue I've come up against, like I could probably list 10 serious issues I've come up against, to be honest.

But one is the inability of even the best,The best tools and pieces of software to help me vibe code in notebooks. And I know this is not quite vibe coding, but something I've seen in Marmo is the ability for you to chat with an AI assistant in your notebook, which is super cool. So I'm wondering how you are thinking about the future of being able to have, an AI agent, help you code in a Marmo notebook.

akshay: Yeah, definitely. I think like you mentioned, these tools are so ubiquitous now. And then, if you're building anything that resembles, a programming environment, IDE like thing, you have to have, some kind of large language model assisted coding. And so we have a bunch of, or we integrate LLMs in a variety of ways.

But I think at a high level, I think what's in, what's interesting [00:32:00] about. the notebook or the data sort of world is that I think the way you use LMS is also different from like how it might be used by a software engineer working in Cursor for example, right? Because if you're in Cursor or building some, kind of software that, that doesn't have this data component, yeah, curse is gonna go in, look at all your huge, your repo pull in your code as context, start jamming.

I think what's interesting about the data thing, data space and especially notebooks, is that like you have, you have data in ram, right? You have it in memory and you can provide the variables in ramus context to your large language model, right? And so that it can start actually doing, like you mentioned, like a agentic like things and start coding for you a, against the actual schemas of your data frames, the actual values of their columns.

and so one small example of that, but I think is pretty powerful, in, in our sort of, integration with large language models. And you can tag your data frames, for example, that you can say, [00:33:00] generate a plot of at my data frame,column X versus column y, faceted by whatever.

then behind the scenes we'll read the column names, maybe the first five values, put that into the context of the prompt and then help the LM code, against it. And and then that's not even full on agentic that, it's one step. But I think like even that, that one step thing of read the data and give it to the large language model as context and then, iterate on that, I think is pretty powerful.

and yeah, I imagine it, as I think many people do, as a mix of, LM to write some, boilerplate code that, I always forget the syntax of whatever plotting library I'm using. And it's nice to be able to accelerate that. And then, when I don't know exactly what I'm looking for.

And actually if I actually need to touch the data myself, then, I'll be manipulating it with code directly. yeah, 

hugo: I love that example. And, sorry, my mind's going in several directions. The first is like the F head, and now I'm going back to pandas. like when I start exploring data in a notebook to get a sense or sitting there with a colleague, pairing to get a [00:34:00] sense of what's up, look at it, chat about it together.

I do, when I work in, in, in cursor or continue, I do. say, Hey, let's explore the database schema together and then, have a look at some, some of it and see what's up there, which I think is, incredibly important. It comes down to another thing that I think anyone who's getting started with vibe coding, or even if you're a bit more seasoned, you probably know this, but plan with your agent, don't execute immediately.

Now, of course, when you try to plan, it'll try to execute. that's actually another huge failure mode is you're like, let's talk about this. And it generates like five sub directories, each with seven files of a thousand lines of code each, or something like that, right? and curse is not as bad as, Manus or Devon, but, these are pretty serious failure modes.

But if you like, develop a spec, who would've thought writing a product spec would help? half joking. But if you do this with your, with your agents, it can help a huge amount as well. I'm also interested actually into.

It's not quite to do with reproducibility, but it's a very important part of, the [00:35:00] holistic story here is observability. and I know you've, started thinking about, being able to show traces of what's happening and have an in inspector of all your variables in Remo. And by the way, everyone, we're gonna have a demo at the end where we'll show a bunch of these things.

I, I hope. But, yeah,I wonder how you think about monitoring and observability as a part of this story you're trying to tell as well. 

akshay (2): Yeah. 

akshay: and I think that's a great question. so there's all kinds of levels of it, there's from the most basic of logging, python, logging module to having traces of execution, for interactive programming environments.

And it's interesting 'cause Marmo is. A notebook, but it's also usable as a data app. And it's also usable as a script, and we're working on features to upgrade from usable as a script to usable as a pipeline. So there's different kinds of observability you want in each of these different modalities when you're working in the notebook, like exploring data interactively, querying with Python or sql, then yeah, the variables panel that we have is very helpful.

I think, we have a trace of like how long [00:36:00] each cell took to execute, which is also helpful. when you're running as an app, typically then you just want traditional Python style logging, right? And it's just a Python file. You can just use logging and it works as expected when you run a script.

that's what, or a pipeline. That's when it starts to get pretty interesting. 'cause again, we talk about when you run pipelines, they fail and The kind of observability you want is the kind that kind of helps you diagnose why something failed. and I think there's like a lot of things that we could do there that we haven't really pushed on too much yet.

but, for example, I think Vincent Warmerdam actually, who's also member of the Marmo team, has this really cool, Python package called Flow Show that lets you put together a little pipeline of com, composed the functions and it'll trace the execution of the functions and show you

how long each one took to execute. And also if something fails,it'll show you where the failure occurred. But I think at least what I imagine for pipelines that I would really like in terms of like observability in this, in, in this fashion would be , you run your notebook as a script and it may, maybe we [00:37:00] help you parallel some of the work in it.

and we help you with checkpointing, state, et cetera along the way. but you run it as a script or a pipeline and you get a visual record of each cell output, right? So you get to see okay, that gives you just like ad hoc, okay, does this kind of look, does this pass the, is this reasonable test?

but then also you get a record of how long each cell took to run, where it pulled its data from and things like that. And you get basically a whole, I guess package artifact that completely describes,that workflow run. This is something that's in the ideation phase, but I think that would complete our story of observability across the interactive notebooks, apps, and script slash pipelines.

hugo: Incredibly cool, man. And I've actually just linked to Vincent's video , it's from Mari's YouTube channel, two weeks ago, new tools to inspect the flow of Python code, where he's demoing a flow show and every, follow Mamos YouTube channel, follow everything Vincent does as well is one of the most thoughtful and [00:38:00] interesting and fun people in the space.

A lot of this, even sounding serious while talking about it, everyone's so serious and Vince is a serious Dutchman to be sure, but he's also like a deeply playful, interesting, human who make,who reminds us the sense of play with all these tools and how interesting that can be as well.

so that's, that. That's super cool. I just do wanna step back a bit and we are in a space. One thing that I love about the Python data landscape from, from PI data and SciPi onwards and to what's happening with LLMs now is,how important the hacker mindset is. and composability like some old school euch stuff, right?

Man. like pipes and just joining everything together and seeing what's up, plumbing. And, one thing I love about Marmo is how, it integrates with Duck DB on, on one end and then on another end with Pide. And I've actually just linked to, um, ADE's, blog post,about Remo and I'm just wondering if you could [00:39:00] give us at least a, I know it's too much for, one, a segment of one podcast recording, but a brief introduction to.

All the tools that Marmo integrates with and where they sit in the landscape and what people can use them for. 

akshay: Yeah. Oh, I love that question. Like I mentioned, Marmo can be used as a notebook, as an app or as a script.

all three things. all three sides of the same, three sided dice. I don't know. I like 

hugo: it. Or three sided coin. 

akshay: Yeah, there you go. you mentioned,duck DB on one end and pi on another. And I think those are good places to start. So we put a big emphasis on Mima being just Python, and it is, all the files are just stored as Python.

At the same time, we recognize that like SQL is extremely important for working with data, right? So we do have first class support for SQL in a variety of ways. And, but in the UI you can get, you push a button, you get a SQL cell, and by default that's powered by Duck db. And under the hood, we write some Python code that contains your query.

and Duct DB is a really fast in-memory [00:40:00] ola in, engine. so it's really good 'cause it can, actually you can pass your data frame from Python into your duct DB cell and then run your query and then out comes another data frame. And so actually the data flow lab actually spans from Python actually over SQL and then back to Python.

And that's something we don't talk too much about, but it's actually we have a parallel data flow graph on your SQL cells. and so that's one end and DocDB is the primary interface. But actually, you can connect to all kinds of databases and you can connect the Postgres, you can, connect to, Other analytical things like Click House or Snowflake, you can connect to iceberg catalogs even. and we show you like a preview of all your schemas and stuff like that. that's one end. so that's where working with data basically, right? and so when you work with data, there's sql, there's charts too, right?

Plots, so we have pretty tight integrations with Altair and Plotly, so that, if you output a Altair plot for example, you can select some data in the plot and then it comes straight back [00:41:00] a selection that you've made. like you selected some points in a scatter plot, you get them back as a data frame in, in Python.

and so that really lets you make these interact with your data that was previously not really easy or possible in previous programming environments. So integrations on the plotting side there, on the front end. I think this is for sharing is how I think about it.

there's a big use case of notebooks is publishing or sharing your, your work with text, images, code. there's a few different things. So marmo, you can run it locally, you can run it on a server and do SSH port forwarding just like you do with traditional notebooks.

But you can also run it entirely in your browser, like all the Python code gets executed by your browser. And this is thanks to a tool called pide, which is like a port of C Python to web assembly. The point is that like you can export your notebook as an HTML file in just some assets and then just serve it on GitHub pages or internally to in some private domain.

And this makes it really easy to just get started. both to get started with notebooks. You can go to our online playground at Mima app, but [00:42:00] also to share interactive experiences you can embed interactive notebooks in your docs, for example, so we do that throughout docs, MIMA io and we have some other folks doing that as well.

you can also embed them in make docs documentation. We have a plugin. Or Quarto. We have a plugin for that too. Thanks to pi it actually quite easy and quite pleasant to create these interactive Python based, experiences. and then I can keep going.

So we integrate with UV package manager quite tightly. So that's on the packaging side and we've talked about that already. we integrate with, a lot of the, because I think it comes down to 'cause Python, mem Notebook are just Python files. There's like a long list of the developer tooling for Python that we integrate with, like seamlessly, like PI test, UV doc test.

and there's more, AI assistance like GitHub copilot and all your other sort of favorite open AI compatible endpoints work with Marmo. Oh, a big one that I should mention that I haven't mentioned yet is, any widget. So any [00:43:00] widget is a, I guess both a specification and implementation of a way to write like these portable, interactive widgets that work both in the Jupiter ecosystem, in the Marmo ecosystem and elsewhere.

And we've standardized on any widget as the way to write third party plugins, or UI plugins, UI widgets. and that really opens the door for folks who do know JavaScript to really make these truly interactive experiences. and any widget is developed by Trevor Mans, who's this amazing, creative, passionate, developer who's joining us full time, I think in less than a week, about a week.

Um, congrats. Yeah. Yeah. Thank you. he's amazing. he's, yeah. If you look at any widget and look at the work you've done, you can clearly see his, passion for building developer tools and things like that. 

hugo: that's all amazingly cool. You've mentioned so many fantastic tools. I've linked to a couple, I've linked to any widgets.

I've linked to Corto, which I just love to be honest. I [00:44:00] also just, something that came to mind while you're speaking is there's such a rich lineage of tools here, and it isn't as though these tools, new tools emerge out of nowhere and aren't connected to what preceded them. for example, Altair right.

is, I don't know if it's still maintained by Jake VanderPlas, but it was created by Jake and, and Brian Granger, over a decade ago now. And Jake has been working the pie data and spy ecosystem for a long time. Brian is a co-creator of Project Jupiter. similarly, as we've mentioned.

Jupiter note, people are always like, oh, Jupiter Notebook over I Python. Forgetting that not only is it up a predecessor of I Python, but it operates on top of I Python as well. I Python is the standard on which it operates. Similarly, people are like, oh, I prefer Seaborne to map plot lib. Absolutely forgetting that.

seaborne, works on top of map plot lib to allow you, it's among other things, an abstraction layer on top of map plot lib, which Michael Wascom, who created, will be one of the first people to tell you. So there's a rich historical lineage of tools and people building them. as practitioners themselves, nearly everyone we've mentioned are [00:45:00] actually, have been research scientists for a lot of their careers and built their tools, these tools in order to support their own research and those of their communities.

One other thing that just came to mind is, computation happening in the browser. what type of tools support that and how do you think about that with Marmo? 

akshay: Yeah. Yeah. it is a wild concept, but, the 

hugo: first time I heard of it, I was like, what? 

akshay: Yeah. Yeah. And so that, that's the ide and web assembly bit, right?

So like with Python being, compiled or see Python being compiled to, to web assembly via this tool chain called him scripting, phenomenal project that came out of Mozilla. And actually just for a short digression, you're talking about legacy of,rich history, of projects.

There's a quite direct through line, at least to the, our in browser version of Marmo. Powered by Pide to like history of Pie D'S development itself. So at Python we got to meet some of the original developers of, of Pide from, way back [00:46:00] when,Mike, who is now, at Microsoft was working on the Python J and all.

but, it was,PIDE, the ability to run Python code in the browser came from Mozilla with this idea that, they had this project that they called iodide, which was like a marmo notebook thing. It was a computational notebook that ran in the browser and that could execute Python in the browser.

And it was designed to help scientists, share their work with others. 'cause, they were, if anyone were to realize it, it would be Mozilla that, the web is, the future sort of platform of sharing and, computation, that, that makes computation very accessible.

and since then, ed has been maintained, I think maybe by several maintainers, but today's maintained by Hood. Hood, Chatham. fantastic, passionate human being. And,what ide lets us do is make it extremely easy to share interactive Python notebooks in the browser, right? so you don't even have to install marma locally if you don't want to just to try it out and get a flavor of it.

You just open up our playground at [00:47:00] marmo new and there are limitations, right? right now PI Ides, I think total addressable Ram is two gigabytes or something like that. So you're not gonna train a deep neural network in your browser, at least not through Pied. but what's impressive is that like basically the entire PI data stack works in the browser.

So you can run SK Learn, you can run SciPy Map Pot, they have all these projects. they NumPy, they, they all just work. and I think you can combine PI IDE with, with. If with the, with caching, so basically we, one, one thing that whenever engineers, Dylan Ed is working on is, intelligent caching of artifacts competed in your marma notebooks.

And so you can train some neural network using jacks and some gnarly stuff on your machine. and then store them in some remote cache, say in some bucket somewhere. and then you can share a link to your notebook, like a web assembly link. So the code will run in the browser, but your expensive code blocks will be decorated with this cache thing.

Or you'll use our [00:48:00] built-in caching mechanism. But basically you'll be able to pull the pre, the computed data from cache. And so you'll actually have the code that was used to train the model, but at the end of the day, maybe you've distilled all your data and all your training code into some small model and that can actually run in the browser, maybe through Onyx or something like that.

And so that gives away to actually share artifacts of research and the complete artifact. And entirely in the browser, w executing in the browser with the code just short circuiting actually the computationally heavy code. 

hugo: That's super cool, man. can't wait to play around with that type of stuff more.

And I wonder the two gig limitation is that hard to work around with the pilot? Because the type of stuff we get with consumer grade lap, oh, I haven't said this publicly, I don't think, but I just got a new laptop. I got one of the like M four max with 128 gigs of unified Ram bro. And it, I'm really gi I was able to, one of the first things I actually vibe coded at Gladio if people haven't played around with [00:49:00] Gladio.

It helps you build like simple, UXs, UIs, for ML AI stuff from hugging face. I,I, the first thing I did was downloaded lama, and pulled. 10 models and spun up a gladio interface with a whole bunch of 14 billion parameter models and able to run a single prompt and in 10 to 15 seconds see the outputs of all of them side by side, man.

So I'm really giving this laptop a workout, but with P diet, it could be exceptional. Do you know if that's something that would be difficult to get up and running? 

akshay: I have seen discussions in their GitHub, in their GitHub repo, from some time ago. I honestly don't know the details.

I think it's certainly possible and, there's discussions there talking about how they would do it. I think it's more of a matter of having people staffed full-time to work on it. absolutely. it, it does sound I'm sure, like a hard technical problem as well, but, 

[00:50:00] yeah. 

akshay: I, the one thing that I've been noticing with, PIDE is that it just.

Keeps getting better, like totally inevitable and so I imagine it'll be lifted in the future. I, if I were to just guess, just the way that these things go and there's promising demos I've also seen, right? even outside of just ram, leveraging WebGL to actually have GPU accelerated code as well.

and so if I imagine like the future of Python and the browser, I personally would love it if yeah, you could seamlessly use WebGL to, run your torch models and yes, if you could lift that limitation on Ram, that would be amazing. I expect to continue to see, great progress in that sphere, especially because I think more and more people will ask for it, right?

I. AI is so prevalent today and the Mozilla folks recognized so, so long ago. Like the web, the browser is this common operating system that we all have access to. so yeah, I imagine push on it. 

hugo: Absolutely. And particularly with consumer grade [00:51:00] laptops these days are absolutely wild as, as well, to be honest.

Yeah. I'd love to jump into a demo in, in a minute. I would just like to say, and I should have mentioned this earlier when we're talking about Corto,I'd use Corto to publish my own blog and I'll link to that in, in the show notes. but it's a wonderful open source scientific technical publishing system, has a lot of affordances.

You can use it with Python, r Julia, a bunch of different things. One of the reasons I wanted to mention it and just focus on it for a second, is. It was built by, our studio. now posit and I do wanna say if we're, we got a bit sociological with respect to the builders,in the ecosystem, I just wanna give a shout out to the entire r and our studio community o for over the years and c continuing to build diligently.

There's, more of a focus on Python now for obvious reasons, but huge suede of people use our, and the way , poit now is, being able to build stuff for Python as well is just absolutely wonderful. So huge shout out to JJ Lair and ev everyone in, in, in that ecosystem. And the tidyverse as [00:52:00] well.

akshay: Yeah, no, it's, it is an amazing community and we're really thankful for all that they've developed. them, the Jupiter community, it's, it, yeah, what they've been able to do over a couple of decades is just phenomenal because, like you mentioned earlier now, okay, I'll show a demo, but there's.

The Jupyter Notebook, but people don't realize that's running on top of IPI kernel. And that's just one flavor of what Jupyter is, right? Jupiter is actually this, collection of really well designed, interoperable components of the Jupyter Server, the Jupyter Kernel, which is language agnostic.

and then the front end, which the Jupyter team has made. But there's also other front ends, like CoLab is a front end for, the kernels. Like Databricks notebooks are, I think a front end for, so it's what they've done is they've managed to build this foundation for so many other people to build on top up, which is 

hugo: absolutely, yeah.

And we didn't actually get into this, but I do. Jupiter, as you said, is language agnostic and you made a design choice to focus on Python, which, and I think that constraint perhaps was helpful in you [00:53:00] building something which served particular user needs as well. is that a reasonable characterization?

A 

akshay: hundred percent, yeah. I think focusing on Python was, it doesn't impose a big constraint. It also allows us to do a lot more like all these sort of integration, language integrations that, that I've mentioned through throughout this, discussion. and it is a trade off, right? Sometimes folks come to us from the, our community and say, oh, hi, can, hi, can we get, Marine will you ed support for our, that's really hard for us because we have to parse the Python a s T's not just a notebook, but it's also a library that gives you the UI elements and a bunch of other powerful features.

and so I think that is a conscious trade off that we've made and I think we can develop better experiences for our Python users. At the cost of not being able to serve, the art community. And, I, I do wanna just give one more shout out to the Julia community. 'cause where a lot of the inspiration for this project came from is from a really cool reactive notebook called Pluto jl, which, took that philosophy of we're gonna build [00:54:00] a notebook just for the Julia programming language.

And they ended up with a reactive notebook that,if you study Pluto, you will see a lot of it it has cast a lot of shadows in a good way. the, there's a lot of design heritage that influenced us in developing marmo.

hugo: Awesome. so I'd love to jump into a demo. I wanna do something a bit out there though. Actually, I actually had food poisoning the past few days and I'm coming out of it now and a lot of my time in. Lying down, I was playing around with Marmo, so I'd like to show you something I did with Marmo. Oh, I love it.

akshay: Yeah, you go ahead. Yeah. 

hugo: And then I'd love to see something you've been playing with and I'm a bit nervous, like I, we've spoken a few times, but I,I, and I'm no expert with this stuff, but can you see my Chrome browser? Yes. 

akshay: Can see, I can see iGen faces, Py. 

hugo: Yeah, exactly.

Okay, cool. So this is actually based upon an example in your gallery, which I link to. in the YouTube chat. I'll do so in the show notes, and I'm gonna talk [00:55:00] through this as well, just in case if we include it in the audio in the podcast so people can check it out.

But it's based upon an example where, you have the mni data set and you show how you can cluster it and visualize it and then interact with the clusters essentially. So I thought to do this. With, an example, it's a facial recognition dataset called Labeled Faces in the Wild. Now, I normally, wouldn't use a facial recognition dataset.

I actually see relatively few good uses for facial recognition. But the wonderful people at Psychic Learn have a subset of the dataset, which only has world leaders in the dataset. And I have no problem with doing facial recognition on world leaders, particularly these ones, to be honest.

So I'll just go through what's happening there. But we have, Chavez and Blair and Bush, Colin Powell, Ariel, Sharon, and so on, right? So to get there, what I've actually done is I've so doing our imports. Then I've imported the faces in the wild dataset. I've got,1300, samples, 1850 [00:56:00] pixels or features and, seven classes.

So seven people. Then I've used, A face net model, which is a facial embedding model essentially. and what I've done is I've taken each of the images and use this embedding to, I embed them. And this is all in this marmo notebook, okay? As you can see. And then I thought to project down to two dimensions using Ts e and u map.

Right now in Marmo you can actually do a dropdown, but for the purposes of this demo, I decided not to. 'cause I wanna show you, an even simpler version of reactivity here. But you can see I'm using tSNE to two components to find a few things there. Then I,create a data frame of my, embedding, and then I use Altair to get me a scatterplot, right?

now this didn't run because it's not marked down, I think, and I can easily change that to, yeah. And then mark down here, and the run button. Yep. Great. And I will just, so it's [00:57:00] pretty, I will, oh, I will hide the code. So what we have, and I love this man, this is so cool. We look, we've embedded our world leaders where we have, Ariel Sharon there, we have Colin Powell and what I did this in a matter of minutes, right?

Tony Blair, George W. Bush, Gerhard Schroder, Donald Rumsfeld. and up here we have Hugo Chavez. And then we have, so this is actually one of the really nice things. I was like, all of these clusters are really really linearly separable, super cool, but we've got one bloody George W. Bush there, right?

And I'm like, what is happening there? And so I wrote some code, which I got from you and your demo, which allows me to select this. And then scroll down and it shows me all the ones I've selected, right? So that's super cool. Now I wanna see the George W. Bush one, but I can't, it's Chavez all the way down, right?

And there are 72 rows. But if I want to get to the W bush one, I click through it. I saw that there. Now I just click on this. That's the one [00:58:00] that it thought looks like Chavez. Maybe it's the sunglasses, I don't know. But that's a really beautiful way to interact with my embeddings immediately, to see what's up there, which I found super useful.

Now, on top of that, let's say I didn't like that because I'm like, that doesn't feel good to me. That, and that may actually be a result of, my,facial embedding as opposed to this projection here. But to test that, I decided to do umap instead. So I can now re-execute this cell.

And it automatically executes that and then will re-execute everything that's dependent on that following the directed ICIC graph principle. Anything that depends on that it re-execute. And now we have this new embedding, right? Or this new projection of the embedding. So we've got Bush there, we've got Chavez there.

Super cool here. We've got, for some reason Rumsfeld looking like Sharon. So let's just go down and see which one it is. And there may be an easy way to [00:59:00] search this, but 

akshay: yeah, there's a search button in the bottom left. 

hugo: Oh yeah. Great. okay. Yeah. So I'm gonna go back to the start. Gonna do search and do, yo dude,

I can see it. I might not classify it as Sharon. The other thing is I love how data frames render with, this, these really nice visuals at the top and that type of stuff. But this was all to show, based upon a demo of yours on the Mni dataset. I was quickly able to import my own dataset, play around with it, do some embeddings and see what's up, immediately.

And I had so much fun doing that, man. As I said, I had food poisoning and this was actually some of the most fun I had in, in the past couple of days, so really appreciate that, man. 

akshay: I'm glad we could help you get through your food poisoning. 

hugo: Totally. and of course I'd love to hand it over to you in a second, but all of this wonderful stuff on the side, such as, let's see.

The variable explorer. we saw before what we talked about before the scratch pad. I mentioned the ability to chat with AI and those types of things. [01:00:00] is there anything, any reflections on that or any, anything you would've done differently or anything I did b that you'd suggest to do differently with Mar Marmo?

akshay: No, this was fantastic. no, I don't think I would've done much anything differently. In fact, this is a lot. I like this is a lot more entertaining than the Ness demo that I show. And, oh, I think I accidentally reacted. 

hugo: Yeah. At least it was a thumbs up and not a thumbs down. dude, I would, if I'd love to contribute it to your docs or your gallery or something like that at some point if it would be helpful.

akshay: yeah. we'd love, I'd love to add this to the gallery. That would be super helpful. no, this is amazing. I. 

hugo: Awesome man. why don't we jump in and I'd love to see what you've been thinking about as well with Marmo, but that was really fun. So thank you for humoring me on that. 

akshay: No, of course.

sure. Let me share my screen. 

hugo: I'm seeing some comments by calm code by Vincent in the chat and he says, Hey, that looks like a bulk labeling que demo. Great. And he wrote, the nice thing about this approach is how general it is. You can embed everything, anything [01:01:00] and the kitchen sink these days and you can always follow up with tSNE U map after to totally agree.

and I, not enough people know about Umap and Tny for, and so let's definitely spread that word. 

akshay: Yeah, no, it's a cool project. In fact, my PhD thesis was like a generalization of all these embedding methods, so I spent a lot of time playing with all these embeddings and they're really great tools for interactive data.

hugo: Yeah, and Vincent actually also mentions the clusters that shop are always informative and I totally agree. They're either informative about stuff happening in the data or stuff That's garbage in your embedding model as well, right? Yeah. Yeah. So super cool in both respects and you have to do the work to figure out what's up.

akshay: Yep, yep. And the way these clusters work, they're essentially, if you look at it, they're like k years neighbor clusters. and so it, it does tell you a lot also, about the metric used to, do that. Okay. I can share something. One second.

okay. And could you mention, mentioned Vincent, actually this demo was developed by none other than Vincent, who [01:02:00] is. among other things, he has a amazing capability to build inspiring demos, that are succinct and just just so fun to play around with. so this is a mima notebook 

So this is actually using, our column, multi column view. So you'll notice there's two columns of code here actually. And you can enable that in the setting. You can say display with columns and Vincent likes to do this a lot. Actually. I think he's got his logic in the left hand column and his visual outputs in the right.

And this demo is showing how to prototype, an LLM assistant, in a marma notebook. and an LM assistant that has access to tools. So something that is a lot more lightweight than, say, setting up a MCP server, is just Python functions all the way down. So he's using this cool library called Mirror Scope, which, lets you define Python functions and then, register 'em as tools with a large language model.

so [01:03:00] from me Scope, he imports this LLM module, and he has this little chat bot that has access to a few different functions. In particular, it can roll a dye, which has some number of sides. and again, that's just a python function. It can remember things that the user has asked it to remember.

and that just calls a set memory function. It can recall things that the user has asked it to recall, that recall things from its memory. he uses the LM function, the decorator to, create,a base prompt for it. And then just a little bit of python logic here to create a, the logic for what the chat bot flow should be.

Wraps that in, this marmo UI element mo ui chat that lets you interactively build your chat bot, or your agent. I can ask it to rol a dice,roll a dice and it should go. It gives me a number [01:04:00] one. Okay, I can say roll a dice with 1000 sides. Let's see if this works. 1 35. So what's actually cool about this, and it's a bit of a slight of hand, right?

it's able to, I've got my parameters here, number of sides. it knows to call this particular Python tool,and then passing the value to the Python function here. I can say, remember my name, I'll remember that. And then I can say, call who you are talking to.

We'll see if this works. It Got it. And I'm talking to Akha. And then I think one thing that's compelling about this Notebook experience that, Vincent was showing me, is that like it's helpful for debugging. Like things can, like really go wrong, right? but what's nice is that we have this function here that shows what the LMS memory is.

So I think, for example, I can say something like, forget my name. Okay. That, that, that didn't quite work. So I'll remember that. So we go over and we say, okay, it actually just added, [01:05:00] forget my name to my memory. Whereas instead, I actually wanted to evict it from memory. and so like that's something that I like, this kinda interactive experience really makes debugging things like that, like a lot easier than if you were just starting in the script.

and again, this was totally developed. This example was developed by Vincent, and follow his YouTube, his YouTube videos on the Marma channel for a lot more inspiring videos. I like this one. 

hugo: yeah, that, that is so cool, man. I love a lot of things about that. I do love the decoupling of, quote unquote notebook and UI in, in the left and right and in the panels and how you can do that for a variety of different purposes.

I really love, Small, like visualizations of memory here, as well. And I do, I work with a lot of people who wanna build agents these days, quote unquote agents. 'cause whatever that, that actually means, I'm being half cynical there. But I do, I actually often direct people talk about augmented l lms.

I say let's talk about them first, where you have memory retrieval and tools, right? and one thing I do [01:06:00] with people is get them to try to build an LLM that has some form of memory to it as, as well. and I'm gonna, I usually do that with gladio or something along those lines by doing it with Marmo.

Now, with something like this,I'm gonna try and it makes a huge amount of sense. I also love the ability, like I'm gonna go away and try to use this to. When building simple agents with tool calls, being able to see the response of the agent but then visualize on the right, the entire tool chain,will be super fascinating and interesting and the ability to introspect there,will be super cool.

big shout out to, microscope as well. And Vincent, mentioned he is doing a live stream with William Baxt from Microscope on Friday. Yep. So check out the Marmo channel for that. Vincent has also said Arthur in the chat. Can you try something? What if you tell it your name is Vincent too. what happens if you try to recall the name when there are two names in memory?

also he also says trying to break a tool is a great way to learn it. And then he says, [01:07:00] also notice the logs below the chat.

akshay (2): You are Vincent. So it pops from the end maybe. Oh yeah. Very cool. Yeah. So

no,this is a lot of fun. 

akshay: I really appreciate also just like how 

akshay (2): little 

akshay: code it is. It's just like Python functions that are like a couple lines long and the entire chat loop is 

akshay (2): what, 10 lines long. 

hugo: Yeah. yeah. that's the type of code we wanna write, right? 

akshay (2): Yeah. 

akshay: Yeah. no, definitely.

It's a 

hugo: nice abstraction layer. 

akshay: Yeah. Less and less. and yeah, this was, this is using open ai,under the hood and open ai, API. And what's actually nice is. I think we provided it through an environment variable and Marma has, we had a ton of these little panels and this one is the secrets panel.

So you can, quickly debug like you might have for oh, did I actually provide my API key? And I could say, oh, okay. Yeah, I did. because some, I think common annoying thing for like beginners is that like just even getting the environment variable into the notebook, and so small things like this I think can go a long way.[01:08:00] 

hugo: Absolutely. And is there anything else in the left hand panel you wanna show us briefly? 

akshay: Yeah, there, there is a few. So actually, so this notebook is using the, package metadata that I was describing. So it's using the, integration with uv. So if you click on the managed packages button here, you can actually see all the packages and their dependencies that are in this environment.

And what's cool is this is like an isolated environment. And actually the way I'm running this notebook is, I think actually deserves some kind of attention, special highlighting, because I basically 

, I'm gonna just create a new cell and do it here. So this is the command that I use to run this notebook. And so we have Remo edit. UV X is going through uv and what's cool is that this is actually just a link on GitHub. So Vincent shared this notebook to, with me as a GitHub gist.

I can go here and look at the source code and you can see it like bootstraps itself with the dependencies that are just in line here. Okay. And I think that like just makes [01:09:00] sharing like these examples just a lot easier than it was before where we had to clone a repo, go to the, see if it had a requirements.

TXT like this, just like works outta the box. and I think that was really powerful. In terms of other things that I'm gonna highlight here, I think in this context, obviously, so we do have a, you can chat with AI if you configure your know book to have your API key. small things like this, live docs.

I think our users really like this. It's basically as I move my cursor, I get the documentation of whatever symbol I'm looking at. And so this is just almost like a documentation co-pilot or something, if you will, that kind of just helps, at a glance of how to use the APIs you're using.

yeah, I think those are the main ones. Another one, this might not take effect here, but I really like this panel. It's called a data sources panel. and so if you have a, if you have any data frames or database connections. In your notebook, and this is Marina's package installation ui. Install the package [01:10:00] real quick.

They'll show up in this data source panel on the left. So you can also have that as a little side card, just like at a glance, like what is the data you're working with in your notebook. You'll see here now I have this data frame called cars. I can click on it and see at a glance like what its columns are.

if I had t alter, we can try this at t alter and I go back to the data source panel. You can get a little chart, preview of your columns. And I think this is super, super helpful of just knowing what data you have available and remembering what you're actually working on and amazing.

Yeah. And work like this data is, should be the primary, object that you're thinking about. So it's great having it all. 

hugo: and Vincent did just comment that the auto install when you import, that's the bee's knees is what he wrote. 

akshay: Yeah. No, it, it is a really nice, quality of nice feature, that, my co-founder Miles added, and my Miles also did the integration with UV and integration with Pie ide and a lot of other, the really cool features that, that we have.

yeah, 

hugo: incredible. 

akshay: The main thing 

hugo: and all of those [01:11:00] things. Once again, speaking to the lineage, I almost hesitate to say this, but some of the things you just showed, like being able to explore your variables in your IDE and that type of stuff are things that I used to love in MATLAB back in the day, That I missed in Jupiter Notebooks originally, I loved in, studio then I loved in Jupiter lab. So you're really bringing together a lot of the different things that, that I want as, as well in, in my computational infrastructure, man. 

akshay: Yeah, it's funny like when you work on new things, I think people forget

tools aren't developed in a vacuum, but it's the lineage of everything that comes before you that enables you to make something new. And really, it's just like taking ideas from a bunch of different projects, seeing what worked, what didn't, and just remixing them in a slightly new way.

And I think that's how new things are made, and you'll always see that history of all these other tools that came before it embedded in each new project. 

hugo: Totally, man. 

we're gonna have to wrap up, but I'm wondering, how can people get in, get involved? Of course, we've linked to the gallery, we'll link to the docs, you have a discord.

what a, what are good ways for people to get [01:12:00] involved with marmo and the community? 

akshay: Yeah, there's a few ways. I think first, and for foremost, first you gotta try it out and see if you like it. And the easiest way to do that, so you can pip install Marmo or UVX Marmo. tutorial intro, but really just go to our GitHub repository, github.com/marmo, team slash marmo, which has all the instructions of getting started.

discord is where our community is more most active. You can get a quick link to it at marmo.io/discord. we've got a bunch of folks in there chatting about all things related to notebooks, data ai. It's a hodgepodge, and it's just a fun place to be. YouTube definitely, check out Vincent's YouTube videos, both for tutorials, but also just inspirational,demos that really sh at least open my mind of what you can do with a tool like marmo.

Subscribe if you like it. we're on all the major social platforms as well. And I guess the last thing I'd say is we are really open to community contributions. So we have over a hundred, contributions or [01:13:00] contributors to our GitHub repo. and we love hearing feedback. So even if you're not comfortable making a contribution, we love to hear from you, like in our GitHub issues or GitHub discussions or Discord.

so please do get involved. Yeah, we're very welcome. Absolutely. 

hugo: and if you're not comfortable, like making a contribution as well, like documentation, even if you notice a typo or something, these are beautiful places to, to start. don't forget if you in, if you like using Remo, give it a star on, on GitHub as well.

one final question actually, I'm interested in kind of what you'd like to see people in the space do more of. What would you like encourage? What would you like to encourage PE people 

akshay: to do more of? Okay. That's a really good question. 

I think maybe this goes back to where our conversation started about this sort of new next generation of tools. but I would like at least tool developers to, to be opinionated in the [01:14:00] tools that they make, and to really lean into particular use cases or particular types of users, like you mentioned, like how we really zone in on Python users.

And I think that's really unlocked a lot of benefits. there's a really cool Company, I think called Bauplan that makes it easy to write data pipelines as code. And actually they have a lot of similar design philosophies as us, except they have no notebooks and in their tool, but they have this concept of, writing pipelines that take, iceberg tables in and then iceberg in, arrow out, arrow in, arrow out.

And it's like super simple, super beautiful, easy to understand and just a ple pleasure to use, being opinionated and making declarative tools so that we can free up our minds from, I. How to do things and rather what we wanna achieve. And I guess that is the whole zeitgeist of the modern AI era.

And yeah, I'd like to see that not just through ai, it's an amazing thing, but also in, in the tools that we make for humans. 

hugo: Yeah. I love it. So be opinionated and maybe make tough decisions around [01:15:00] what you're trying to do as well. Don't try to solve everything. yeah.

And be human centric. And I'm also a huge fan of,of Bauplan. old friends with YPO and she and shero, as well. Super cool guys. Yeah. building. lots of fun stuff. So actually, I just wanna thank you for all your work in the open source space and for your generosity in coming and sharing your wisdom as well.

Really excited to see what happens next with, Marmo. and excited to meet in person at SciPi in. just under a month and a half or something like that. And, if anyone's gonna be at SciPi in, it's in Tacoma, near, near Seattle and in Washington state, come and say hi to myself and AK Shack.

I'm giving a workshop and a talk. are you presenting anything there or 

akshay: giving a talk? Yeah. overview talk. I'm marmo. Yeah. 

hugo: Amazing. Amazing. come to his talk, my talk and my workshop. and we can, get together and see what's up in, in Tacoma. Thanks everyone for joining. and thanks once again for such a wonderful chat.

akshay: Thanks so much for having me, Hugo. It was a real pleasure.