How to increase citations, ease reviews and facilitate collaboration for ML in applied science

Jesper Dramsch machine learning, reproducibility, SSI, tutorial, read in 4 minutes

Are you a scientist applying ML?

I wrote a tutorial with ready-to-use notebooks to make your life easier!

Let's focus on 3 aspects:

More Citations
Easier Review
Better Collaboration

This was a EuroScipy tutorial in 2022.

In the future, a talk recording will be available. Until the you can get the notebooks here.

📐 Model Evaluation

In science, we want to describe the world.

Overfitting gets in the way of this.

With real-world data, there are many ways to overfit, even if we use a random split and have a validation and test set!

A machine learning model that isn't evaluated correctly is not a scientific result.

This leads to desk rejections, tons of extra work, or in the worst case maybe redactions and being the "bad example".

Especially on:

Time Data
Spatial Data
Spatiotemproal Data

Here's the Notebook euroscipy-tutorial-0-Basic-Data-Prep-and-Model

🔬 Benchmarking

Compare your models using the right metrics and benchmarks.

Here are great examples:

DummyClassifiers
Benchmark Datasets
Domain Methods
Linear Models
Random Forests

Always ground your model in the reality of science!

Metrics on their own don't paint a full picture.

Use benchmarks to tell a story of "how well your model should be doing" and disarm comments by Reviewer 2 before they're even written.

Here's the Notebook

Sharing models is great for reproducibility and collaboration.

Export your models and fix the random seed for paper submissions.

Share your dependencies in a requirements.txt or env.yml so other researchers can use & cite your work!

Good code is easy to use and cite!

Use these libraries:

flake8 for linting
black for formatting

Write docstrings for docs! (VS Code has a fantastic extension called autoDocstring)

Provide a Docker container for ultimate reproducibility.

Your peers will thank you.

Here's the Notebook

⚗️ Testing

I know code testing in science is hard.

Here are ways that make it incredibly easy:

Doctests for small examples
Data Tests for important samples
Deterministic tests for methods

You can make your own life and that of collaborators 1000 times easier!

Use Input Validation.

Pandera is a nice little tool that let's you define how your input data should look like. Think:

Data Ranges
Data Types
Category Names

It's honestly a game changer and easy!

Here's the Notebook

🧠 Interpretability

This is a great communication tool for papers and meetings with domain scientists!

No one cares about your mean squared error!

How does the prediction depend on changing your input values?!

What features are important?!

Here's the Notebook

✂️ Ablation Studies

You know it. I know it.

Data science is trying a lot and finding what works. It's iterative!

Use ablation studies to switch off components in your solution to evaluate the effect on the final score!

This care is great in a paper!

Here's the Notebook

Conclusion

We looked at 6 ready-to-use notebooks to make your life easier.

This resource is for you to steal and make better science.

Each tool makes it more likely for

Your results to go through review
Others to use and cite your stuff
The code fairy to smile upon you

Frequently Asked Questions

Do you have actionable advice in addition to this blog post?

Check out the notebooks on Github that have code, links, and extra descriptions that can be used to get started right away!

I want to be a chamption for sustainable software, any advice?

Yes! I am a 2022 fellow of the Software Sustainability Institute. Check out their information to get support, community and funding each year! I also wrote a blog post for them called "Would I even fit in?" for those wondering if they'd be good enough to even apply.

Is there any more information you have on these topics?

I do! I wrote a small ebook about making machine learning work in the real world. You can grab a copy as freebie for signing up for my newsletter!

Subscribe to receive insights from Late to the Party on machine learning, data science, and Python every Friday.

Your background must be computer science! How did you get into this topic?

In fact, my background is in geophysics. But I talked to a lot of people smarter than me and distilled their knowledge down into good practices for scientists that would like to further their field using machine learning.

How did you afford going to Switzerland in a pandemic?

I got funding to give a tutorial through the Software Sustainability Institute's fellowship programme.

This website stores data such as cookies to enable site functionality including analytics through Google Analytics and personalization. By using this website, you automatically accept that we use cookies.

Read more...

How to increase citations, ease reviews and facilitate collaboration for ML in applied science

How to increase citations, ease reviews and facilitate collaboration for ML in applied science

Improve your machine learning application for applied science

📐 Model Evaluation

🔬 Benchmarking

⚗️ Testing

🧠 Interpretability

✂️ Ablation Studies

Conclusion

Frequently Asked Questions

Do you have actionable advice in addition to this blog post?

I want to be a chamption for sustainable software, any advice?

Is there any more information you have on these topics?

Your background must be computer science! How did you get into this topic?

How did you afford going to Switzerland in a pandemic?

Search

Contents

Recent Articles

Newsletter

Related Articles

Categories

Popular Tags

© 2024 Jesper Dramsch All rights reserved.

Ethics Statement ◆ Privacy Policy ◆ Terms & Conditions ◆ Affiliate Disclosure

Related Articles

How to get Involved in Offensive LLM Security

How does generative AI change in 2024?

My 2023 in Review: The year of AI

Preparing customGPTs for the GPT store

This website stores data such as cookies to enable site functionality including analytics through Google Analytics and personalization. By using this website, you automatically accept that we use cookies.

Read more...

How to increase citations, ease reviews and facilitate collaboration for ML in applied science

How to increase citations, ease reviews and facilitate collaboration for ML in applied science

Improve your machine learning application for applied science

📐 Model Evaluation

🔬 Benchmarking

🤝 Model Sharing

⚗️ Testing

🧠 Interpretability

✂️ Ablation Studies

Conclusion

Frequently Asked Questions

Do you have actionable advice in addition to this blog post?

I want to be a chamption for sustainable software, any advice?

Is there any more information you have on these topics?

Your background must be computer science! How did you get into this topic?

How did you afford going to Switzerland in a pandemic?

Search

Contents

Recent Articles

Newsletter

Related Articles

Categories

Popular Tags

Share

© 2024 Jesper Dramsch All rights reserved.

Ethics Statement ◆ Privacy Policy ◆ Terms & Conditions ◆ Affiliate Disclosure