See how GPT, LLaMA, and other models chop your prompt into tokens.

Pick a model, type a prompt, and visualize the token boundaries, IDs, and vocabulary size.

Hugging Face modelOptional HF access token (for gated models)Prompt

Vocabulary size

—

Tokens produced

—

Special tokens

—

Run a prompt to see color-coded tokens appear here.

No tokens yet.

What happens to tokens after this?Click to read the next steps

After tokenization, each token ID is mapped to an embedding vector that captures learned semantic and positional information. These vectors form the initial input sequence for the transformer stack.

The sequence is processed by repeated transformer layers. Within each layer, self-attention mixes information across positions so each token can borrow context from others, and a position-wise feed-forward network refines every token's representation. Residual connections and normalization help the model stay stable across depth.

The final hidden states feed into a projection head (for logits or other tasks), which turns the refined vectors back into probabilities over the vocabulary or other outputs. In autoregressive models, the process repeats for each new generated token.

See how GPT, LLaMA, and other models chop your prompt into tokens.

Choose a tokenizer

Token stream