The following is a rough transcript which has not been revised by Vanishing Gradients or the guest(s). Please check with us before using any quotations from this transcript. Thank you.

Open Science at NASA_ Measuring Impact and Exploring AI Applications
===

hugo: [00:00:00] Chelle, you've had an fascinating journey into open science and AI at NASA. I'm wondering what initially drew you to this space and how your role, as Open Science Program Scientist has evolved over the years. 

chelle: I saw a workshop. There's a big conference in the Earth and Space Sciences every year.

It's called AGU. It's the American Geophysical Union and they have this giant conference that like,20, 000 earth and space scientists attend every fall. And in one of the workshops, there was a group called Pangeo doing a workshop on cloud computing and a new way to do science. And. It was sold out because I think I figured this out like a couple of days before it was actually supposed to be held.

So what I did is I signed up for the workshop next door and then I went over to that workshop and that workshop, I remember just sitting there completely blown, just completely blown away [00:01:00] because everything I had done up until that point, I had done most of my science working in big groups and across the world doing oceanography and computational oceanography.

And everything was these, everybody's programming in Fortran and you're writing all your code from scratch and dealing with these big hairy datasets that, you had to write, you would spend a week writing the, just the intake program, and what they were demonstrating there was like X ray and Dask and NumPy and SciPy and going on with the cloud and opening up 40 different computers and what they demoed was they took a oceanography paper that I knew really well.

And it's a classic in my field. And I knew that downloading the data and doing the analysis, trying to recreate that paper would have taken, a month or so. At least. And they were doing this during the workshop, which was just a couple hours long and they did it in [00:02:00] just like almost like 30 minutes.

They had us up and running and we were processing the data because they had put it into czar. So it was easy to read. And then you were using x ray to read it in one line, and then you're recreating all the figures in that paper. And it completely blew me away because I had really,it's easy to just not see open science and all of the evolution of that system.

When you're in active science, you get busy in your field, you get busy in your area and you're working with colleagues that are working in a similar way. So when you step out of that and you ask, or is there a different way to do things? And that, that really emphasized to me, okay, I need to do this, otherwise I'm never going to get a proposal funded again.

Because this is so just groundbreaking. So I just threw everything I had into learning Python and really starting to become part of the open science community and really rapidly saw the impacts in my own [00:03:00] scientific work, where I was able to look at things, create results and do everything so much faster because I was building on and with other people instead of trying to recreate the wheel every single time.

So that really convinced me that. Open science is great and it's getting better for science. And the other part of this was that when you're doing all this code on GitHub, when I had questions or problems, I would be able to find answers on either stack overflow. I would find snippets of code on GitHub that other people had published.

And it really made sense to me. Let's all help each other here. Let's all try to find these solutions together and not keep reinventing the wheel. And so from that, I started really becoming an advocate for open science in my community and also Python and Pangeo and cloud computing. And that eventually led to an opportunity to move to NASA headquarters and continue that work full time.

hugo: Incredible. [00:04:00] so I'm interested in your role as open science program scientists, what excites you most? about what's happening with open science at NASA. 

chelle: I think it's really great being part of an agency that has really embraced open science and is putting resources behind it. Andwhat I work to do is to really make sure that the science community has a voice at headquarters.

And headquarters sometimes, it's money going out, it's programmatic decisions and trying to decide what are we funding this year? What are we funding next year? What's our outlook for the next five years? How are we developing the infrastructure that we need to do that? And a lot of times with many federal agencies, it can be like, let's see what we can invent.

Let's just decide that this is the way the community is going. And so a lot of my role is At headquarters is to really be out there in the community, talking to everyone to make sure that they have a [00:05:00] voice at headquarters where a lot of the decisions are being made. And that's just, that's a really fun space to work in.

I get to meet lots of people. I get to talk to a lot of wonderful scientists about what they're doing and figure out. Okay, what are some of the barriers you're encountering? How can we try? is there something that we can do to try and fix that? 

hugo: Yeah, it's interesting. I haven't quite thought about this before, but in, we do very different things, but a lot of the things we love about our work are similar in terms of a lot of involves community and bridging and building communities and communicating between different sets of people.

and as something I love about my line of work is. How many cool people I get to chat with, such as having the luxury of chatting with you right now, I am interested in how we define open science. So we've already thrown around a bunch of terms like cloud computing, and you talked about X ray and Pangeo and we, As a provocation to think [00:06:00] through like we can have open source tools as much as we want, but that doesn't necessarily make cloud computing super accessible. We can have like low level open source programming languages, which just a lot of people can't, craft the, learn to write enough of.

So I'm wondering if you can, we can start with getting a sense of how NASA designs open science and what the core principles driving the movement within the agency are. 

chelle: So we have an official definition that the federal government is using and that was announced. So 2023 last year was a year of open science and the federal definition is that open science is the principle and practice of making research products and processes available to all while respecting diverse cultures, maintaining security and privacy and fostering collaborations and reproducibility and equity.

And that's a really long sentence with a lot of big words that you had to get like 18 agencies to agree on, which was great. But really what it is, it's [00:07:00] making research products and processes. So first we want to recognize that science is not just a publication at the end of a research. It's the entire process through that and all of the artifacts that are created along the way, including how you hold meetings, how you ask questions.

How you answer questions, whether you're doing that in private over email or in public on forums. How you're producing datasets, how you're sharing datasets, how you're producing code and sharing code. It's everything about that step. The more that we can be open, the more that it makes our end result, that scientific paper, it makes it more reproducible, it makes it more transparent, it increases the security of research.

Because we really want to have reproducible results that someone can just take and build on. And so that's what a lot of open science is about. It's like creating that, that springboard to the next result. [00:08:00] And while doing so, we really want to be careful that we're doing it thoughtfully. So we are not advocating, I am not advocating just share everything.

All the time, everywhere. There's, you want to be careful that all your collaborators agree. You want to be respectful of the community that you're involved with and what their practices are. And you want to make sure that if you're taking data from certain groups of people that they have a say in how that data is shared.

So it's about working together, but thoughtfully so that we make sure that when we're sharing, this is what we want to share. We know it's okay to share it publicly. Everybody who's. Participated in it is, said, this is what we're doing and we're really starting to build trust and doing science more collaboratively and inclusively because when you share like that, I get emails all the time from code snippets I shared or, medium posts I've done from scientists all around the world saying, Oh, this was really useful.

Thank you. Thank you. [00:09:00] And that's I feel like that's this benefit that it's going to keep on giving for a long time because You do this sharing and it takes a little bit of work to share it, right? It takes work to be open, but then you get these benefits because you have these random people Thanking you and it always seems to happen when you're having a bad day And then somebody reaches out and says thank you and it turns the whole day around.

hugo: Absolutely I couldn't agree more and I am interested You In thinking through some of the challenges you've encountered in promoting open science, I am, I do want to start by just asking, in terms of promoting open science, is there some sort of skills gap? Does a lot of it come down to education, among other things?

chelle: What we found is that a lot of the initial mistrust or discomfort around open science is due to not really understanding open science. So I think that there is this training that needs to happen. This is so [00:10:00] we, part of what we did during the year of open sciences, we developed this course called Open Science 101 and I don't know, can you put a link in the chat?

hugo: I can, yeah, and I'll include it in the show notes as well. I'm just doing that right now. 

chelle: Yeah, and we have a promotion this month. The first 2, 000 people to complete that course, it's an online, open, public, free MOOC, and the first 2, 000 people to complete it get sent a swag bag from NASA. They get a digital certification that goes on their LinkedIn, on their GitHub, they get, and then they get a swag bag of stuff from NASA.

So anybody who's interested in like a digital certification from NASA that says, I know how to do open science. You just have to complete the course. And if you do it this month, you get this swag bag sent out to you. 

hugo: Amazing. So I've just put the link to OS 1 0 1, module one. Is that the place to start the ethos of open science?

chelle: Yes. There's, here, let me, it's open science 

hugo: oh one oh one.org. 

chelle: Yeah, there you go. 

hugo: and [00:11:00] just so people know, I'm sure I'm excited to jump in and see whether I can complete it. I want some NASA swag . weirdly, I do have a NASA T-shirt. I should have worn it today that I got a thrift store in New York many years ago.

What's the average time commitment people should expect if they're going to do this? 

chelle: It depends. And if you are really new to open science, you probably are, you're reading through the whole module. You're reading through all of it and taking the little tests at the end. it's probably going to take, anywhere from six to 10 hours.

It's cause there's a lot of content in there. And it's meant for people who really have no familiarity. So it's a real introduction, but most people are going to have the software part, or they're going to have the data part, they'll have parts of it. And you can fast track any of those modules. So when you go into it, you'll see there's a fast track option, or you can go through the module.

And if you do the fast track, it can be done fairly quick because you're just taking a couple of tests, [00:12:00] a couple of tests about open data, open software, open publications, and then you get your certification at the end. 

hugo: So I'm interested in expanding our conversation around some of the challenges you've encountered.

so maybe you can tell us a bit more about the challenges you've encountered and how you've navigated these obstacles in promoting open science. 

chelle: I think the number one complaint that we hear when we're as a federal agency, asking people to share results at the time of publication, share data, share code, is that this is just another unfunded mandate you're asking us to do more, but you're not increasing the budgets.

So you're actually asking us to do less science. You're asking, you're putting a burden on us as a researcher. We have to,get the data. We can't just put the data set online. You're saying, we need to get a DOI. We need to have it in a machine readable format. It needs to have metadata.

And then with code. when we put code out in [00:13:00] public, then we have to support and maintain it. Or, there's a lot of fears around sharing code while somebody could just take it and then they would do it. And I want to work on that more and just sharing the act of sharing takes time and it takes resources.

And that unfunded mandate is taking. Time away from science. And if you add it up across all of science, now you have a certain percentage of people's time. Maybe it's 5 percent that is taken away from doing science. And when we get this question, I like to think about it. I like to switch how we're framing that question because when you're doing science, If you're writing everything from scratch, then yes, this is an unfunded mandate.

This will add to your work, but let's not just count on the negative side of that scale. Let's also look, are you using open source tools that other people spent [00:14:00] developing? are you using other people's results? Are you building on them? Because the reality is now most of us are programming in Python.

We're using all these amazing tools that developers have spent years and years Writing and maintaining. So our science is actually much faster because of other people sharing. So asking you to share is paying it back, right? Or paying it forward. So we want to accelerate science and yes, it takes more time, but if you're actually building on what other people have done, because they shared, I think you need to account for that also.

And what I've found is once you get used to sharing. It's not an onerous task. It doesn't take as long because you start when you're writing your data sets, when you're saving them, you're automatically like making them nice and you're making sure that the metadata is saved and propagated along as you do your analysis.

So I don't think it's that big of an ask. [00:15:00] It is an ask, but I think that it's an important ask. And as a federal agency, I also think that Reproducibility is really important to us. We are here to advance science and making sure that the research that we fund is reproducible is incredibly important. And I think that is okay that we ask that when we're giving you funding to do research, we can ask that it is reproducible in to the extent possible.

hugo: Yeah, and there are levels of reproducibility of course, but I yeah, there's a lot to unpack in there something I'm very interested in you said in some ways we're asking people in terms of making your code and data available in machine readable formats, all of these types of things were taking away the time of people to do science right and of course that's the framework we're all thinking of it now because of how science is culturally defined and in terms of [00:16:00] incentives defined.

But something I'm hearing in there is that no, this is actually a part of science and a very important part of open science and reproducible science. So we actually have kind of structural issues and issues of incentives and what people, essentially And what their career track is in terms of getting promotions and being tenured and these types of things.

And I'd like to approach that conversation, but through the lens of thinking through measuring impact and broader implications. Because as we both know, and as we'll get to, Measure measurement traditionally has been done through measuring papers and this type of stuff. So I suppose I'm interested in, talking about measuring the impact of open science.

Why is it difficult to quantify and what new ways are you thinking? a new metrics to capture its effectiveness. 

chelle: So you may recognize the name Fernando Perez.So he invented [00:17:00] iPython and he founded Jupiter, co founded Jupiter with Brian Granger and Jupiter has 5 million users. It has affected its impacts on science globally on almost all disciplines is enormous.

But if you look at the H index, so the H index is what is traditionally used to measure, are you a good scientist or not? And what it measures is it looks at the number of papers you've written and how many people have cited them. So if you have 10 papers that have been all cited by 10 people, your H index is 10.

And usually you say if somebody has been in research for 20 years, their H index should be about 20 and that's a good scientist. And then as it increases from there,or have a, or you're better and better. Fernando's H index is like 21 and mine is like 31. And I'm going to be [00:18:00] really straightforward.

My impact on science. Science is like a drop in the bucket compared to Fernando's impact on science. So clearly how we've been measuring impact is completely broken because if the person, like the individual who has really revolutionized how we're doing science today and modernized it across almost all of science is ranked lower than I am.

That's a problem. And it's because this, like you said, this, career track is based on publications. And this is the game that we're supposed to play where it's all about the publications. So what we're trying to think about at NASA is, what do we really want to see? We want to see breakthroughs.

We want to see reproducible science. We want to see transparent science and we want to see equitable and inclusive science. So if those are our goals, what are the metrics that we should be looking at? [00:19:00] And I don't papers as part of that, but it is not the whole enchilada, of course. And so we need to start looking at.

What,what podcasts are you doing? What YouTube channels are you doing? how are you promoting your science? Because there's so many early career scientists, they're publishing, so they're playing that game because they know they have to, but they're also active in all these other areas. They have software projects that have thousands of users.

They have shared data sets that thousands of users are now, using to do science. So we need to create metrics that follow. Not just how many data sets you've published or how much code you've shared, but how many users you have and what sort of impact you're having on the scientific community. So I think when we think about metrics, we're really trying to think about all of these different artifacts that come along with the research product projects and how we start to value those just as much [00:20:00] as we value research papers.

Because the other issue with research papers is those peer reviewed journals are really expensive.publishing open access is like 6, 000, sometimes 12, 000 for a single article. And that means thatmany people can't participate. And that is really, that's fundamentally problematic. they can publish in other ways, but when you're not on that H index scale, you're not going to get promoted.

You're not going to get seen as much. So how do we also really work to incentivize these other ways and show that we value them so that We're not just saying, it's only the rich. 

hugo: I totally agree. I do. So I'm actually going to link to a tweet, which is right. It's a tweet. That's now. Over a decade old, it's Jake Vanderpluss who has worked on, he's worked in astronomy for years, but worked on a lot of Python stuff and then went to Google.

but he tweeted out, Daily conundrum, do I [00:21:00] work on A, my paper, which ten, or around ten people might read in three months, or B, open source Python, which thousands of people might use tomorrow? And I think that dichotomy there in the incentive system, and I can't speak for Jake, but we know that he then went to Google, right?

and I'm someone who I did my postdoc and was applying for academic positions and decided to go to industry. And because of a lot of the challenges and the incentive mismatch, I think there is a huge talent drain from academia to industry as well, which we don't speak about enough. 

chelle: Yeah, and I think that we're feeling that a lot right now with A.

I. There's the A. I. Gold rush. So the incentives to stay in academia, they really and to get federal grants like we have to modernize this system. Otherwise we will all of the talent is going to go elsewhere. 

hugo: so then how do you think about. I suppose [00:22:00] the modernization process and then the potential long term impact of open science on the broader community.

chelle: Yeah, so we're really, we held a workshop last January where, it was, organized by Florida International University. It was down in Miami and they invited, 50 university presidents and provosts to all come into a room together to talk about updating or modernizing review, promotion and tenure at their universities.

I think, and they were all on board and there is, I think there is a global momentum around open science and around modernizing how we calculate impact. That is going to, that we're going to see the outcome of a lot of these efforts from people all around the world who are working on these impacts, like UNESCO is working on it.

The French have an open science monitor. There are a lot of people working on this right now. And I think as [00:23:00] we start to get this a little bit further along and the universities are starting to have open science project offices, Things are really starting to change. I think that question whether you had to choose, do I work on my research paper or do I work on my open source project?

I think I hope that in a couple of years you could work on either and still get tenure. 

hugo: Great. So could we just drill down a bit more into what methods you're exploring to include elements like data and software contributions in measuring open science impact? 

chelle: we have to. We've started with, so NASA has a new scientific information policy and it has a lovely name.

It's the perfect password. it just in case you wanted a password, it's called SPD 41A. It has a special character, it has letters, numbers, 

hugo: some are 

chelle: capital, some are lowercase. It's the perfect password. You've 

hugo: got it all. 

chelle: So it's the scientific policy document 41A. Part of doing [00:24:00] these metrics is we actually have to get people to start sharing things and sharing them in a way that we can follow.

So having DOIs for your data sets. So we're asking people, share at the time of publication, share your code, share your data, have DOIs for everything so that we can start to build up this, the ability to track who's doing what and link all of these things together. And right now, most of the metrics are done with publications because those are the ones that have DOIs.

They have a database like that's something that you can easily work with software and data. Part of the reason we have this new information policy is to push that forward so that we can start developing metrics that look at those. And we're looking at a bunch of different things. We're looking at how do we do automated metric analysis.

and that's gonna involve probably using AI and machine learning to go through papers to try and, ascertain, are they sharing data? Are they sharing code? And how are they [00:25:00] doing that? How many people are signing it? And are people getting those data sets? we're looking at altermetrics, and altermetrics had a moment 10 years ago where people were like, Oh, we need to study, blogs.

And then it petered out, but I think it's really coming back because I do think early career scientists, I know some early career scientists that have YouTube channels with like 20, 000 followers and that's how they like every day they're uploading their latest results. And so we need to make sure to try and include those in that metrics.

Like how are people sharing things through medium blog posts, through code notebooks, hosting workshops, podcasts, things like that. And, also looking at. Then, doing interviews to understand where different communities are at and where we can try to help out. So we're there's five different areas that we're looking at.

We're exploring them all right now. And then trying to figure out okay, what's our MVP. What's the first thing that we can just get done to set a baseline so that then we [00:26:00] can start to make investments in the areas that we know we have gaps. So we can start to track this in the future.

hugo: Fantastic. So now I want to move into thinking through all the wonderful stuff you're doing with AI and data accessibility. so you are working on making NASA's data more usable and accessible through AI. So could you give us some examples of how that's being achieved and why it's a focus for you?

chelle: Yeah. So it's a focus for us because if we actually want more people to participate in science, we really need to increase the accessibility of data. And. When I say accessibility, our data is already open and publicly available. A lot of it's on the cloud, so you don't have to download these massive data sets anymore.

But that still requires like a PhD to figure out how to get that data, how to read it, how to interpret it. We really have this vision that If we can [00:27:00] create resources that people can just ask a question, like basically like codeless science, right? So how do we get to a point where people can interact with NASA data without knowing how to code?

And it's, things like that have been around for years where you can go to some of the data archives and they have visualization and you can maybe make some basic plots or you can look at a certain area that usually doesn't get you very far. Like sometimes that's useful for policymakers. But it's not usually that useful for science.

And so we want to be able to like, let people ask their questions to the data and get answers back and to start that off. So NASA, IBM, and Clark university have released an open source foundational model based on the Landsat, the harmonized Landsat data set. And this is open source. It's on Hugging Face, and we're already, people are taking this and applying it to flood [00:28:00] mapping, to looking at burn scar identification, to looking at fire management recovery, to looking at locust prediction, and it's already being picked up and used for all sorts of things that we didn't expect.

And that's, I think that's a really great example of us making an investment. in making data accessible in a way that's not like, Oh, we just put it on a different website. Like we created a new way to interact with the data and we provided this foundational model that people can then build on. And the vision there is, you should be able to go and say, Hey, NASA data, show me the fire risk around my house.

Hey NASA data, show me where there are, open parks near my house. Or I want to look at heat island effects. Or I want to look at the number of trees in my city and how it's changed over the last 20 years. And these are things that, high school students should be able to go to a data set, ask about that, [00:29:00] and then, Start to do some local activism around like this is what we want to see is we want to see everybody from students to policymakers to scientists be able to just ask questions in a code free environment.

But then if they want the code, they can download the code and then keep going. 

hugo: Amazing. And I've just linked to, the blog post NASA and IBM openly released geospatial AI foundation model for NASA earth observation data that I'd encourage everyone to check out. Please do explore the models on hugging face as well.

and let us know what you think, also on what you discover, look, Chelle, this is actually so exciting because I just want to give a bit of historical context that you and I know about, I love the idea of being able to do science without coding. Particularly, it's easy to forget that it was only maybe just over a decade ago, maybe 12, 13 years that like pandas.

readcsv came out and allowed us as scientists to read CSV into what was then an [00:30:00] IPython, not even a Jupyter notebook, right? So into IPython notebooks, five years before or six years before John Hunter had released matplotlib, which allowed us to do inline visualizations in a notebook, connecting all these modularized tools and that type of stuff that then became the scientific python stack with numpy under the hood and all of these things i just the reason i'm waxing lyrical about this is not only are these foundation models Amazing, but the fact that it's like a decade after we've been doing all this stuff I think we should take pause and recognize how incredible that is and you and I met when you were using Dask and to study the sea from space right doing all the like your Oceanography work and that even using Dask for that type of stuff is highly technically non trivial And pretty full on work to do, to be honest, even getting connected to clusters and that type of stuff.

So this is revolutionary, [00:31:00] right? 

chelle: Yeah, it's almost crazy to me how quickly, there's like November, 2022 and then everything beyond that, and it feels like all of these innovations we're building to this moment. And then there's the spark of, large language models and foundational models.

and this, everything was ready. Like the ground had all been fertilized. And now all of a sudden we're going to see a thousand flowers grow because you're right. the stuff I was doing with Dask andthere were a lot of moments where you're just basically crying in the dark. Cause coding can be really hard.

And. The ability for people to ask these questions of data sets is just so it's like it really is a miracle. It's a miracle of science, and it makes it so that so many more people who have questions can [00:32:00] ask them. They have a place to go and ask these questions rather than trying to find a scientist to work with, or, and maybe they still want to work with the scientists, but they have all these different things that they can do now.

AndI can't wait to see what's going to come out of the next couple of years. Because I do think this is just going to change who can participate in science and even the coding that I'm doing. So I have this secret little side project right now, and I'm actually coding again for the first time in two years.

And I started working with an artist and we were coding together and I was like, Oh, what was the, is it MP dot? which was the, is it PD or is it MP? And I was trying to remember the syntax and he's like, why aren't you using chat GPT? And I realized like all my coding has been pre chat GPT. And [00:33:00] so then I started doing that.

I'm like, Oh my gosh, this is. It's so much easier. And I have been on like Relit. Is it really called? It's a IDE that my kids use when they code. And it basically, Oh, it's wonderful. Like we had a Python fart joke generator. You and I, okay. You and I like the fart jokes and I have two 14 year old twins who are boys.

So we do a lot of fart jokes around our house. We had a fart joke generator up in five minutes. Amazing. Just because it was, it's suggesting, you probably want to, you probably want to do this next. You probably want to use the random library and here's how you do 

hugo: that. 

chelle: It was, it's 

hugo: incredible.

And I think maybe you can give replit, could it be replit? Replit, yes, replit. Yeah, I'll link to replit. Funnily, it's funny you should mention these jokes. I got ChatGPT the other day for some reason, I asked it for domain specific variations on why did the chicken cross the road? And what my favorite was I've got, I just, I'll tell you a [00:34:00] joke.

the mathematician's version is why did the chicken cross the Mobius strip?

To get to the same side.

So there you go.and I also do think the ability of large language models to help with coding when they have.the correct training data, essentially. If it's something that's been developed since and they're not quite certain about the API, they can lead you down pretty wild paths.but when they know about it, it's fantastic.

On top of that, the ability to, for me to paste in just my, traceback error, and it abstract over all my local file directory structure and that type of stuff in a way that Google and Stack Overflow couldn't, is absolutely wonderful. 

chelle: yeah, it's just making it, it makes code. I've always, okay.

That is the dog in the background. Sorry. I hear the dog. I'm not drinking water.it's, it makes code so fun becausewe've all been there when we were learning how to code and you forget a colon at the end of your for loop [00:35:00] or forget, you miss type one thing. And when you're new to coding, those little bugs can take so long to figure out, like learning how to debug is to me, a lot of what coding learning how to code is about is learning how to debug.

And with the, with all these AI suggestions, it's it's you don't need to care about the syntax anymore. It just, you can just ask your problem and it helps you find the solution. And it's so easy. 

hugo: Absolutely. So we did promise to talk about how NASA is exploring AI across various divisions from rats in space to understanding the origin of the universe.

So maybe you could tell us a bit about these things. 

chelle: NASA's looking at using AI for across all of science. And so where we're really looking at is when you think about science, there's observations and there's questions and hypothesis analysis [00:36:00] conclusions. And we're looking at how can we use AI.

Across all different parts of that. Like, how do we get to better product generation? how do we find, get data sets to be more discoverable? How do we get code to be easier to write? So we're looking at infusing AI across the entire research life cycle. And we also have this vision of creating foundational models for each of the divisions.

So we have started on the one for Earth, which is this, HLS Landsat data set for Earth. But we also, we want to do this five in one strategy where we create five foundational models. So one for each of the five NASA divisions. So using some of their keystone datasets, like for heliophysics, using the SDO so that you can look at space, weather, astrophysics, Biology and physical sciences as well as, planetary.

So making a lunar data, like making that a foundational model for that. Wouldn't that be cool to be able to [00:37:00] ask questions about all the moon? and so those are all really exciting and that's what we're working on in the next few years. And then we want to have this one foundational model, which are large language models so that we can actually ask.

ask about research questions, ask about data sets and use language to just do that. And that's a, that's going to take a lot of time and a lot of data. And we've gotten started and we're looking forward to doing that in the next few years. And that's where we think a lot of the applications are going to come out of once we get those foundational models set up and then set them free into the wild to see what people do with them.

hugo: What an exciting time. I also do know that NASA and yourself very interested in interdisciplinary research and collaboration. So are there any ways that AI is fostering connection between different scientific fields and opening up new possibilities there? 

chelle: Yeah, and I think it's I think it's not just AI.

[00:38:00] I think it's a little bit deeper than that. It's the whole open science ecosystem where you see we've seen for a while, like neuroscientists are starting to use x ray, financial people are starting to use x ray, earth scientists are using x ray. And so you start to have people interacting through different open source software ecosystems where I think we really get AI use.

Is where we start to have really different fields that have very different languages, even about how they talk about problems, be able to communicate to each other, and so that they can ask questions and AI can understand it and start to give them answers. Whereas I might struggle to understand what they're asking.

'cause we have different ontologies in different fields. And so I think AI is gonna be a connector for some of those where they will get pointed to a data set and then. Maybe be part, they'll reach out to the open science community around that data set and get connected to researchers in that field. [00:39:00] I still think that we're going to have new questions answered, but we still really need the scientists who are the domain experts involved at some point so that we don't, there's, a lot of the data sets that I work with, people grab a data that I'm like, Oh no.

That one's been superseded, or this is the one that you want. And so having that scientist in the loop can be really helpful. And I think AI helps us work together because it makes it easier for us to code together. It makes us easier for us to access data and look at things together. 

hugo: I love it.

we have a question in the chat from Pascal around how open science can prevent plagiarism efforts for lack of a better word is what Pascal writes. So I wonder just thinking about how, yeah, plagiarism can be a concern in science and whether more open science can help us with that.

chelle: I wonder if this question like so plagiarism, like right now when you submit a paper to a journal, it goes through a plagiarism checker. And [00:40:00] I know that, one of the downsides to AI is that a lot of these large commercial AI models are just swerving up everything. And there's been documented cases of people who have,CCB licenses to their software where they're not for commercial use, and then they find their exact code being returned as an answer in chat, GPT, or in another Code suggestion.

And so I wonder, I don't know if plagiarism, I think, I don't know that is as big of a problem as it is like making sure that what you do is licensed correctly and that the AI that we're developing in the future is ethical about how it uses it. 

hugo: Yeah, that makes sense. And actually dovetails nicely into what I wanted to chat about next.

So I want to just look to the future a bit and think about the future of open science and AI. How do you see open science and AI evolving together and what are the broader implications for scientific discovery space [00:41:00] research and beyond? 

chelle: I think that we're entering a really interesting area.

And as we, there's this, as we start to be more open, we're giving AI more data. And I think that could be a good thing. But I also think we need to be really careful about. whose hands it's in and how it's being used. So yes, be open. Yes. License openly so that these tools can be used to generate knowledge because that's what we want to do.

We want to advance knowledge and AI does help us do that. and I think that in the future, as we are moving towards more open, we're getting more knowledge into the open. I think these AI models are going to, the answers are going to improve a lot. Cause right now they're based on only what's open. And we know that 70 percent of all scientific research is behind a paywall.

We know that 50 percent of all climate research is behind a paywall. So we [00:42:00] have to start opening all of this research up so that, when someone goes and asks a climate question to AI, a generative AI model, it gets a good answer. 

hugo: How do we, I don't particularly want to go here, but I, it was,how do we get.

How do we get open science when there are big publishing houses, for example, one that starts with E and ends in R like that essentially gatekeeping huge amounts of collective human knowledge. 

chelle: Yeah. I think that is really hard and it's really hard because you can't advance, you can't get grants funded.

You can't advance in your career unless you publish in journals that you have to pay to publish in. And then people have to pay to see your work unless you pay for an open access. So part of what the 2022 Nelson memo from the White House, basically, is now requiring that all federal, [00:43:00] all federal agencies develop plans.

So that. All of their research publications will have,they, the, before you submit it, the last version of your paper, you can make public. So it's already gone through peer review before it's sent off to the publishers and they basically format it and then publish it. But that last unformatted version we can put on preprint servers.

So I think preprint servers are, One answer. And I think that this is going to be really interesting to see how it plays out in the next few years. Cause I do think that there's a lot of money in that industry and a lot of power, but we've already seen science shift away from that. We're already seeing YouTube channels.

We're already seeing software and data become valued. And the more that we can do that, I really. I feel like maybe AI is one of the ways that we can start to help [00:44:00] promote research results.So instead of having to go to a specific journal and look for a paper on a topic, you don't have to go to that journal anymore.

most people start with Google now and they're looking when they're doing their sort of literature reviews for lack of a better word. And I think now we can think about how do I find notebooks on this? How do I find blog posts on this? How do I find what's going on right now in this area?

And. That the journal publications are too slow for that. What I see the early career scientists doing is they're publishing online right now as they go and the paper is at the end, but they've already built up a whole community of audience and followers. And when you've done that, you don't need to publish in a high impact journal.

You can almost publish anywhere or just publish online in a Jupiter book or in a, yeah, in a Jupiter book, right? You, it's just a PDF. Yeah. And a PDF is not as valuable as a reproducible [00:45:00] notebook, at least in my field. 

hugo: Absolutely. but having said that, I don't know if you, if my friend Wolf, who I'm working with him currently, he built out mamba that part of the conda ecosystem.

and he's got a new project called Pixi where he's rebuilding a lot of the conda build stuff, on rust. So it's super fast. maybe I share this with you, but he. Did something called Pixi PDF, which allows you to include embed your, environment in any PDF. So someone can read the PDF into their terminal and reconstruct the environment as well, which I think is super cool.

chelle: So cool. So these are the 

hugo: types of things that you don't need to be in, in nature or science or cell. my background's in cell biology, right? and get your impact factor to, to provide serious utility. And your example of Fernando Perez, among many others, I think is testament to that. But we still don't have the incentive system, in academia to support that type of work in the same way, right?

chelle: I think it's [00:46:00] changing. 

slowly, but I do think it's changing because how can you argue with someone who isif you publish a paper and you get 10 sites, especially if you're early career, I would say, if you're publishing a paper, getting, first of all, it takes a year to publish. And then it takes another couple of years to build up citations.

It's far faster. If you can show that you have a notebook that, 500 people have downloaded, which That's a big impact right there. And then you have your notebook as a DOI, so it's getting cited in their papers. I think that's the faster path to impact. And that, I think many universities are starting to incorporate that into their values.

Because that aligns with their values. The university's values are, most universities have values around community participation, around scientific impact. 

hugo: Yeah. And I do. It is a longer timeframe. You're right. And I do wonder how much of it is even generational on, on that type of timescale. to be honest.

chelle: [00:47:00] Yeah. So I'm wondering, it's just so cool. It's who wants an old paper at this point? I just give me the notebook. 

hugo: Exactly. Exactly. And I, we've, the blog post, announcing, NASA's foundation models. As we said, links to hugging face and the existence of a platform like hugging face is absolutely amazing to host and share these types of models in the open as well.

particularly as they set up inference end points and that type of stuff. So you don't even need to necessarily do it locally yourself. And, they make it relatively easy for, you still need to be somewhat technical to, to get these things set up, but it's getting easier and easier. I do wonder whether, let's say we wanted.

An LLM that we could query the history of papers in science and nature, whether that's something which those publications will launch themselves or whether open AI will have collaborations and deals with them, which means it's all in chat GPT. [00:48:00] But then again, you've got that all, as I, as we know, my backgrounds in biology and the bio archive, they may create something which just has everything there already.

the forces. of openness and sharing. I think in the longterm are often quite a lot stronger than the forces of closure. and, opacity. 

chelle: Have you heard of science cast? 

hugo: No, I haven't. 

chelle: So it's science cast. It's one word. org. It's this cool new project, which we fund.we started funding this and it's about, it takes papers.

It creates a, it helps you basically, advertise your paper because it creates, simple language, Plain language interpretation of it, and I love it because when I come across a paper now that I don't understand, I can go and look here and basically get it into language that I can understand what's going on.

And they also do, so they just use AI to do [00:49:00] these summaries of it, and they're remarkably accurate and incredibly useful. 

hugo: Amazing.I've just shared that in the chat. I'll share that in the show notes as well. I'm excited to jump in and learn more there. I actually, something I do with chat GBT and Claude quite a bit is I ask it to Eli five, explain like I'm five, complex stuff.

And it does that wonderfully for stuff that it knows about. so in terms of future music, I'm just interested, what are NASA's long term goals for open science and how will they shape. the future of interdisciplinary collaboration within, but also beyond the agency. 

chelle: I think the long term goals are that we get to an end point where research is more reproducible and that we are essentially enabling more breakthroughs.

The goal with openness is to essentially create much more robust science, It's much more rigorous science and just more science. [00:50:00] I really hope that as we expand participation in science, we expand the budget for science. Because with more people paying attention and more results, I'm hoping that we start to have more funding for science.

And that having all of these people working together more efficiently, that sort of that nice base is why that you need that in order to have. More breakthroughs right breakthroughs don't just happen because there's one person who has some great idea. It's that there's this really Robust community that is trying out all sorts of different variations and then one of the variations takes off But you have to have that really robust active community that's tried out 300 different ideas before the one idea that somehow different then takes off And I think that open science really helps enable that.

And NASA has the new information policy. It's moving towards open. It has the open science one on one to help train people. And we really hope that in [00:51:00] the next five years, science is just faster. 

hugo: Awesome. So it's time to wrap up. I am interested for our audience. What are the key takeaways that you hope they gain from this discussion about open science and measuring impact?

chelle: I think that it takes every one of us to make science good. To make science rigorous and to get reproducible and that all types of contributions are valid. I think my first contributions were, text edits on GitHub, right? So we all start someplace and that's our path into science. And then once you do that, you start to meet some of the developers.

So I just encourage everyone to start, find a GitHub project that you like and start interacting with that community. That's what my path was with the Pangeo project. And it really did. It changed. It really changed my life. it changed how I did science and it ended up changing my job. And I met all these wonderful people.

So I just really encourage you. I think community, like [00:52:00] we all love what we do and we love science, but it's more fun with more people. 

hugo: Absolutely. And mentioning Pangeo for example, a lot of this is so. community oriented. Pangeo has so many people. Ryan Abernathy, for example, is such a wonderful, supportive human for so many people in a lot of ways.

And that's what we see with a lot of these projects as well, that people become interested because of their professional scientific interests, but then stick around because of the wonderful communities as well. 

chelle: Yeah. Yeah. 

hugo: Probably that year, right? Exactly. Exactly. And so I'm also going to link once again to the Welcome to Open Science.

101 openscience101. org, and remind everyone that if you complete, this module or this course, was it this month in the month of September? Yeah, 

chelle: by September 30th. 

hugo: By September 30th, you get some, you get a bag of NASA swag. So I know what I'm doing as soon as this stop recording. I'd like to thank everyone who joined [00:53:00] and for the questions in the chat.

Most of all, I'd like to thank you for your time and wisdom and generosity Chelle. It's always fun to chat. 

chelle: Always great to talk to you. Thanks for the invitation. Take care. And thanks everyone for listening.