Search the BPE vocabulary: look up Token IDs, tokenize text, and discover the most common tokens
Embeddings & Tokens (4/5) – interactive tool for understanding tokenization.
Token costs are API costs. Understanding how tokenization works helps optimize prompts and explains why German texts require more tokens than English.
Modern LLMs use Byte Pair Encoding (BPE) vocabularies with 50,000–128,000 tokens. Each token can be a complete word, a subword, or individual characters. Common words are single tokens, while rare words are split into multiple sub-tokens.
| ID | Token | Type | Frequency |
|---|