People pretend it's enlightened to not have worked on Kaggle.
Kaggle is a platform that grew out of a machine learning competition website. Competitions are still a huge part, but they're also a dataset repository, course platform, code repository and free cloud computing platform at this point.
Why do people hate Kaggle competitions though?
It is very trendy to hate Kaggle.
In Kaggle competitions you get a dataset and a metric to optimize for.
This means some parts of the data science workflow are abstracted away.
- Most of the cleaning of data has been done.
- Finding the right metric has been done.
- Train-Test split has been done.
- Ethical work, i.e. privacy of dataset and evaluation of the task, is done.
So why do I still think Kaggle gives great insight?
The Opportunities Kaggle gives you
Validation Pretending a train-test-split is enough, shows a lack of understanding of machine learning. Usually, people win a competition without a thorough cross-validation scheme to ensure generalization.
Model Optimization You get an intuition which models work for which types of problems. But also tricks to make models work better.
Training Tricks Certain tricks can increase model performance without any changes to the data or model, label smoothing and test-time augmentation are one example.
Data Leakage You will quickly learn to test the dataset for leakage to either get an unfair advantage or ensure generalization, both are possible.
Golden Feature Sometimes your dataset has that one feature that will carry you over that finish line above all other features.
Additional Data Most competitions allow use of additional datasets if disclosed. Finding those and making them useful is an incredible skill.
Shake Up/Consistency You learn that your hold-out dataset can be very different to your test data and ensuring compatibility is essential.
Collaboration It's very hard to win a competition alone, so most people form teams and start collaborating early, which is a great skill.
Machine learning competitions teach important skills to top performers which should not be under-estimated.