The final competition leaderboard is now closed. We will soon open the unconstrained leaderboard and release the test set inputs so that you can use your own hardware and provide just predictions to be ranked.
At the end of the challenge, we will release the test data (without labels) for a week, whilst this isn't part of the challenge (e.g. there is no prize associated), it will allow participants test their methods with the hardware of their choice, with no limits. We will open a second leaderboard where we will just score the predictions.
The RecSys Challenge 2021 will be organized by Politecnico di Bari, ETH Zürich, Jönköping University, and the data set will be provided by Twitter. The challenge focuses on a real-world task of tweet engagement prediction in a dynamic environment. For 2021, the challenge considers four different engagement types: Likes, Retweet, Quote, and replies.
This year's challenge brings the problem even closer to Twitter's real recommender systems by introducing latency constraints. We will also increase the data size to encourage novel methods. Also, the data density will be increased in terms of the graph where users are considered to be nodes and interactions as edges.
The goal is twofold: to predict the probability of different engagement types of a target user for a set of tweets based on heterogeneous input data while providing fair recommendations. We are conscious that multi-goal optimization considering accuracy and fairness will be particularly challenging. However, we believe that the recommendation community is nowadays mature enough to face the challenge of providing accurate and, at the same time, fair recommendations.
Fairness in ranking
Historically, ML challenges focus exclusively on accuracy and results are ranked accordingly. A more realistic system should consider other factors too, such as fairness.
When defining fairness for this challenge, these are the things we considered:
A popularity-based (measured as the number of followers for an author) metric, has all the above characteristics. In this scenario, the quality of the recommendations should be independent from the popularity of the authors. Said in another way, ideally, the users would not be penalized for being less popular on the platform.
Concretely we divide authors into 5 groups according to their popularity (computed as quantiles of the authors' number of followers in the test set), and we compute RCE and average precision for each group. The final score is the average of the scoring across each group.
Note: The number of rows will not be equal for each popularity cohort since more popular producers have a larger audience which typically translates into more opportunity for incoming engagement.
As a reminder, fairness is a societal concept, rather than an optimization problem. It comes in many different flavors and we don’t want to suggest that the producer popularity constraint is the only, or even the most important, aspect to consider for serving fair recommendations, but we do feel that it allows us to make an important step forward with the RecSys Challenge being not just about accuracy
For a code example of how fairness is factored in to the leaderboard, please see the example here
Participation and Data
Twitter will make available a public dataset of close to 1 billion data points, >40 million each day over 28 days. Week 1 - 3 will be used for training and week 4 for evaluation and testing. Each datapoint contains the tweet along with engagement features, user features, and tweet features.
Participation for this challenge is subject to your acceptance of these Terms & Conditions , and your successful completion of the steps required within, including the registration and approval process with the Twitter Developer Program(To Be Announced on Early March)
The Terms & Conditions already require that all submissions are accompanied by reproducible code, so that we can inspect winning solutions in detail: “Your Submission must include the source code and any related information used to derive the results contained in your Submission. The source code must be released under an open-source license (Apache 2.0). A third party should be able to use your submitted source to regenerate your results.” Furthermore, we explicitly state in our rules that NO de-anonymization or access to data from Twitter users and user behavior from the Twitter API other than that in the challenge dataset is allowed. Enriching the data with other data sources remains possible.
If any of the rules mentioned in the Terms & Conditions (and explained further above) are broken and thus discovered by the organizers in the code submission, the participant(s) that the submission belongs to will be disqualified from the competition. As mentioned in the Terms and Conditions: “Organizers reserve the right, in their sole discretion, to disqualify any participant who makes a Submission that does not meet the Requirements or is in violation of these Terms.”
Note: the timeline is subject to slight modifications.
Paper Submission Guidelines
To be announced