XentGame: Help Minimize LLM Surprise!

Link Your Account

Use this link to maintain your identity across devices. Copy and open it on your other device.

Your goal is to write a prefix that most helps an LLM predict the given texts. The more your prefix helps the LLM predict the texts, the higher your score. This mode is the same as the single-text mode, but your final score will be the sum of the scores from each text. The same rules apply: you can use up to 10 tokens (~7 words) and you can not use any words from the text. Click the 'About' tab to learn more.

Multi-text Mode Leaderboard

No entries yet
Rank Player Score

Your goal is to write a prefix that most helps an LLM predict the given text. The more your prefix helps the LLM predict the text, the higher your score. Your prefix can be up to 10 tokens (~7 words) long and can not include any words from the text. Click the 'About' tab to learn more. Don't worry! You can play as many times as you like and only your top score appear in the leaderboard.

Leaderboard

No entries yet
Rank Player Score

About the Game

There have been 11471 submissions across 34 challenges from 225 players.

Together, we have saved an aggregate of 233223.598 bits of surprise for the LLMs!

How to Play

Enter a prefix of up to ten tokens (words or parts of words) that you think best captures the information in the text, without reusing any of the words that appear in the text. (The multi-text mode is similar, but you should write a single prefix that works well for all of the texts. The score for multi-text is the sum of the individual scores for each text) The prefix is then fed to the LLM (GPT-2), which uses it to guess the content of the text. The score shows how much your prefix helps the LLM guess correctly. Be careful with multiple spaces and punctuation, as they can count as tokens!

Scoring System

The score is estimated using the GPT-2 model's cross-entropy ("xent") loss, which measures how "surprised" the model is by the text. The score is the difference in xent loss between the text alone and the text with the prefix added: the higher the score, the less surprised the model is by the text.

More specifically, the score is computed as xent(text) - xent(prefix:text), where xent(text) is the cross-entropy of the original text, and xent(prefix:text) is the cross-entropy of the text with {prefix}: prepended.

Weekly Challenges

Every week, a new set of texts is selected. These may be brand new texts we are excited for you to try, or they may be some of our favorites recycled from previous challenges.

Given the large combinatorial space of possible prefixes, the game is very open-ended. It is unlikely that the absolute best prefix for a given text has appeared on the leaderboard yet. So don't give up hope at reaching the top!

You'll be able to see all of the answers that are lower than your current best. You can view better solutions once you discover some for yourself.

Bot Players

When you load the page, we store a random session ID in your browser. This ID is the 3 emojis after your username when it appears in the leaderboard. We do this simply to allow users to have the same username without their scores being merged. If you use a different browser or clear your browser data, you will lose your session ID and will have to start over.

You may notice that some entries dont have a session ID at all and are things like "DeepSeekR1". These are mostly bots that we run using models we are interested in testing out. They are not meant to be competitive, but rather to provide a baseline for human performance. If you are interested in the results of a specific model, reach out and we'll be happy to try to provide them.

Tips & Strategies

When you submit a prefix, the score is computed and displayed for each token in the text(s). You can see the score contribution of each token (ie xents(text)[token_index] - xents(f"{prefix}: {text}")[token_index]) by hovering over the token. Positive scores are in blue and negative scores are in red: blue tokens are less surprising to the LLM given your prefix, and red tokens are more surprising. Your goal is to find a prefix that makes the text as unsurprising as possible (i.e., has total score as high as possible). By clicking on a leaderboard submission, you can see the decomposition of the score and use this to refine your prefix.

Try to find a prefix that is both short and informative about the content of the text.

About XentLabs

Who We Are

This game is built by XentLabs. We are a team of researchers and engineers interested in using LLMs to go beyond the current chatbot paradigm, in particular to reveal interesting patterns in data.

Contact

Did you like the game and want to talk about it?
Did you find a bug or have a suggestion for improvement?
Do you want to contribute texts for the game?

Contact us at xentlabs@gmail.com. We would love to hear from you!

Global Leaderboards!

Total Xent Saved

Rank Player Xent Saved

Total Submissions

Rank Player Number of Submissions

Best Single Submission (highest score across all challenges)

Rank Player Highest Submission