writing
Entropix Sampling
Oct 2024
Entropy sampling approach by xjdr; create a pertoken sampling strategy for LLM using information about the distribution of logits ^{1}.
Entropy and Varentropy
 Entropy tells you how uncertain you are on average
 Varentropy tells you how much that uncertainty varies across different possibilities
Entropix Sampling Approach
The model’s behavior is determined by the degree of entropy and varentropy.
 (⬇️ entropy, ⬇️ varentropy) High degree of confidence: the model will return the token with the highest probability.
 (⬆️ entropy, ⬇️ varentropy) Consistently unsure: it will either backspace and resample to get back on track or give an EOT token to prevent hallucination.
 (⬇️ entropy, ⬆️ varentropy) Confident on multiple paths it will branch out and explore, returning the most confident path.
 (⬆️ entropy, ⬆️ varentropy) Randomness needed: the temperature will be very high and top_p will be decreased to prevent gibberish.
Reference Implementation
def calculate_varentropy_logsoftmax(logits):
log_probs = jax.nn.log_softmax(logits, axis=1)
probs = jnp.exp(log_probs)
entropy = jnp.sum(probs * log_probs, axis=1) / LN_2 # Convert to base2
varentropy = jnp.sum(probs * (log_probs / LN_2 + entropy[..., None])**2, axis=1)
return entropy, varentropy
Section titled Footnotes
Footnotes

(and attention heads) ↩