4 Tools Kaggle Grandmasters use to win $100,000s in competitions

Expertise is figuring out what works and what doesn't.

Why not let the experts tell you?

Rather than experiment from the ground up for a decade!

  1. Pseudolabelling
  2. Negative Mining
  3. Augmentation Tricks
  4. Test-time augmentation

🎨 Pseudolabelling

Some competitions don't have a lot of data.

Pseudo labels are created by building a good model on the training data.

Then predict on the public test data.

Finally, use labels with high confidence as additional training data!

📉 Hard Negative Mining

This works best on classifiers with a binary outcome.

The core idea:

  1. Take misclassified samples from your training data.
  2. Retrain the model on this data specifically.

Sometimes this is specifically applied to retraining on false positives as negatives.

🏁 Finish training unaugmented

Data augmentation is a way to artificially create more data by slightly altering the existing data.

This trains the ML model to recognize more variance in the data.

Finishing the last training epochs unaugmented usually increases accuracy.

🔃 Test-Time Augmentation (TTA)

Augmentation during training? Classic.

How about augmenting your data during testing though?

You can create an ensemble of samples through augmentation.

Predict on the ensemble and then use the average prediction from our model!


Kaggle can teach you some sweet tricks for your machine learning endeavours.

This article was about these four:

  • Create extra training data
  • Train on bad samples
  • Top of training with original data
  • Test on an ensemble of your data