How we Organized the Largest Machine Learning Competition of the Year

5 min readJun 6, 2018

[EDIT] The challenge is now available for everyone here.

Last thursday was held the second annual edition of “Le Meilleur Data Scientist de France” — a 2-hours kaggle-like machine learning challenge.

In a nutshell: 350 data scientists on the same location, competing on the same challenge, during the same evening.

As far as we know, this is the largest data scientists gathering of the world with this format (tell us if you have heard about something similar?)

It took 6 months to FrenchData — an initiative promoting outstanding french data projects — to organize this event. Several members of Zelros team were part of the adventure. Here are our takeaways, and what we believe are the reasons of the success of this 2018 edition!

A stunning venue

Hosting 350 persons for a data science challenge on the same location requires a large, flat and open space. If you think about it, few venues are eligible for that.

Hopefully, we found the perfect spot in Paris: Station F. This massive building is a former rail depot, now renovated and turned into the world’s biggest startup campus (btw, that’s where we are hosted, which facilitated logistics).

Nowhere else would have been better!

A great dataset, with a purpose

Carefully choosing the dataset is critical. Make it too simple — or too complex — and the challenge would have been deceptive. This is an art to find a good balance, so that beginners can learn, and seasoned practitioners can have some hard time :)

We tailored the dataset to include a variety of variable types: numerical, categorical, text, and even images! 10 volunteer data scientists helped us to benchmark the challenge, and validate that it was suited for a 2 hours competition.

But even more important, we selected a dataset in a domain that makes sense for the common good. We helped Emmaus, an international solidarity movement founded in Paris in 1949 to fight poverty and homelessness.

More specifically, the challenge was about predicting how used objects are sold on their charity web market place — and understand the factors that drives the popularity of an offered good.

Maud Sarda (Label Emmaus) explaining how tonight’s challenge will be impactful for the solidarity movement

A robust challenge platform

To host the challenge and the competitors submissions, we needed a platform able to support the pressure of a very high traffic and load. We had a look at various open source platforms like crowdAI or EvalAI or Submission, but none of them could support 350 data scientists at the same time.

So, we decided to develop QScore, a simple cloud platform which supports a lot of users for an event.

We took the opportunity to add new exclusive capabilities like:

submitting predictions to the platform directly from a Python or R dataframe (through an API, hence avoiding to have to create and upload a CSV file)
forcing a 5 min delay between two submission. It was an alternative way to avoid leaderboard overfitting, instead of the traditional way of doing through a limited number of submissions

Qscore, the data science platform specially developed for the competition

A bulletproof logistics

We had to entirely install 350 working seats in a few hours. Around 10 people from a high experienced service provider made it possible thanks to their professionalism.

We also carefully sized two vital components: wifi and electricity. Each participant had a dedicated power plug and a stable wifi connection of 2,5 Mbits/s.

Installing the tables and chairs for 350 data scientists

Next, registering and installing 350 persons in 30 minutes was a real challenge. For that, we placed 25 staff members all along the participants journey, to streamline the onboarding. Finally, everyone entered the area in 20 minutes :)

Last briefing of organisation staff before doors opening

Registering the 350 participants in 30 minutes

Committed partners

All this need funding (a budget of several 10k€). It won’t have been possible without the help of sponsors and partners: Maif, Air Liquide, Engie and Latitudes.

In addition to that, Microsoft — our main partner — provided free Azure Machine Learning credits and turnkey tutorial notebooks to the participants. It was a way to make everyone on an equal footing, whatever the money they had to buy computing resources — and to avoid local installation issues.

And the winner is …

Nikita Loukachev won the first prize of the challenge. Huge congrats to him: the competition level was very high — much higher than the previous edition. He came back at home with a Devialet phantom speaker!

Congrats also to the data scientists from prevision.io who reached the podium.

Several participants have already open sourced their solution to the challenge, like Hakim, Mamy or Kalli. This contributes to the transparency and openness spirit of this yearly initiative.

It has been an incredible experience to set-up this unique event. We are now thinking about the next steps to make the 2019 edition even better!