Model
Choose a tokenizer
Vocabulary size
—
Tokens produced
—
Special tokens
—
Visualization
Token stream
Run a prompt to see color-coded tokens appear here.
Pick a model, type a prompt, and visualize the token boundaries, IDs, and vocabulary size.
Model
Vocabulary size
—
Tokens produced
—
Special tokens
—
Visualization
Run a prompt to see color-coded tokens appear here.
After tokenization, each token ID is mapped to an embedding vector that captures learned semantic and positional information. These vectors form the initial input sequence for the transformer stack.
The sequence is processed by repeated transformer layers. Within each layer, self-attention mixes information across positions so each token can borrow context from others, and a position-wise feed-forward network refines every token's representation. Residual connections and normalization help the model stay stable across depth.
The final hidden states feed into a projection head (for logits or other tasks), which turns the refined vectors back into probabilities over the vocabulary or other outputs. In autoregressive models, the process repeats for each new generated token.