AI

Free AI Token Counter & Tokenizer Visualizer

Exact tiktoken counts for GPT-5/4o/o1 with per-token colored visualization, ChatML chat template mode, plus estimates for Claude 4.x, Gemini 2.5, DeepSeek. 100% browser-based.

What is a token?

A token is the smallest unit that a large language model processes. Before a model can read your prompt or generate a response, the text is split into tokens by a component called a tokenizer. Every piece of text you send to GPT, Claude, Gemini, DeepSeek, or any other LLM is first converted into a sequence of integer token IDs — that is what the model actually computes on.

Modern LLMs use Byte-Pair Encoding (BPE) or SentencePiece tokenizers. These algorithms learn a vocabulary of sub-word pieces from training data: common words like the or and become a single token, while rarer words split into multiple tokens — tokenization becomes token + ization. Numbers, punctuation, and non-Latin scripts often consume more tokens per character than plain English.

A useful mental model: one token is roughly ¾ of an English word, or about 4 characters. But that average hides a lot of variance — which is exactly why it matters to count, not guess.

Why token counts differ across models

Each model family ships with its own tokenizer, trained on a different corpus and tuned for different trade-offs. The same paragraph can tokenize very differently across vendors:

FamilyTokenizerVocab sizeNotes
GPT-4o / GPT-5 / o1o200k_base (BPE)~200,000Newer OpenAI vocab — stronger on code and multilingual.
GPT-3.5 / GPT-4cl100k_base (BPE)~100,000Legacy OpenAI tokenizer; higher token counts on modern content.
Claude 4.x familyClaude SentencePiece (proprietary)~65,000Anthropic does not publish the tokenizer; counts via API.
Gemini 2.x familyGoogle SentencePiece~256,000Very efficient on multilingual; 1 char ≈ 0.25 tokens for English.
DeepSeek V3 / R1DeepSeek BPE~128,000Optimized for Chinese + code; rich multilingual coverage.

A 1,000-word English article is about 1,330 tokens in GPT-4o and about 1,400 in Claude Sonnet — a ~5% difference. On Chinese or Cyrillic text, the gap widens dramatically: some tokenizers are 2–3× more efficient than others on non-Latin scripts.

Chat templates and special tokens

When you call the chat completion API, the provider doesn't just concatenate your messages — it wraps them in a chat template. OpenAI uses the ChatML format, which injects special control tokens around each message: <|im_start|>, role name, newline, content, <|im_end|>. Those markers are real tokens that count against your input budget — typically 3–5 per message, plus overhead.

Switch this tool to Chat (ChatML) mode to see it. The three fields (system, user, assistant) get wrapped into a proper ChatML envelope before tokenization, and the special tokens appear highlighted in red in the visualizer. This is the count the API will actually bill you for — not just the raw content length.

Why count tokens before sending?

Every production-grade LLM integration needs a token budget. There are three reasons to count before you send:

Cost estimation

API pricing is per token, usually quoted per 1 million tokens input and output separately. A single automated workflow that processes 10,000 support tickets × 4k tokens each is 40M tokens — the difference between GPT-4o and GPT-4o-mini on that workload is thousands of dollars per month.

Context-window fit

Every model has a hard token ceiling: 128k for GPT-4o, 200k for Claude, 2M for Gemini 2.5 Pro. If your system prompt + chat history + new input exceeds the window, the model rejects the request or silently truncates. Checking before sending prevents mid-pipeline failures.

Rate-limit planning

Providers enforce tokens-per-minute and tokens-per-day limits per account tier. Large batch jobs need to estimate throughput in advance to avoid hitting 429 errors during production runs.

Context window limits per model

The context window is the maximum number of tokens a model can attend to in a single call — it includes your system prompt, prior messages, new input, and the generated response. Here are the ceilings for current frontier models (as of 2026-04):

ModelContext windowPractical headroom*
GPT-5400,000 tokens~380k input + 20k output
GPT-4o / GPT-4o mini128,000 tokens~120k input + 8k output
o1200,000 tokens~180k input + 20k output
Claude Opus 4.7200,000 tokens~190k input + 10k output
Claude Sonnet 4.6200,000 tokens~190k input + 10k output
Claude Haiku 4.5200,000 tokens~196k input + 4k output
Gemini 2.5 Pro2,000,000 tokens~1.9M input + 64k output
Gemini 2.0 Flash1,000,000 tokens~990k input + 8k output
DeepSeek V3128,000 tokens~120k input + 8k output

* Practical headroom reserves output tokens. Many providers also cap max_tokens on the response independently of the window.

Rules of thumb for estimating

English prose

1 token ≈ 4 characters, or ¾ of a word. 1,000 words ≈ 1,300–1,400 tokens across all modern LLMs.

Code

15–25% more tokens per character than prose. Minified JSON is the worst case — roughly 1 token per 2.5 chars.

Non-Latin scripts

Chinese, Japanese, Korean, Arabic, Cyrillic: 2–3× more tokens per character than English in most tokenizers.

Whitespace

Spaces, newlines, and tabs each consume 1 token in most BPE tokenizers. Excessive indentation costs real money at scale.

Numbers

Long numeric sequences often split per 3-digit group. "1,234,567" can be 3–5 tokens depending on the tokenizer.

Emojis and Unicode

Non-ASCII characters frequently tokenize per byte (UTF-8), so a single emoji can consume 2–4 tokens.

About this tool's accuracy

For OpenAI models (GPT-5, GPT-4o, GPT-4o mini, o1), this tool runs the real tiktoken BPE tokenizer (o200k_base / cl100k_base) locally in your browser via js-tiktoken — counts are exact, and the visualization on the right shows every token boundary, ID, and special marker. The ranks file lazy-loads on first model use.

For Anthropic, Google, and DeepSeek, the count is an empirical estimate (character-to-token ratio per tokenizer family, with adjustments for code signals, non-ASCII content, and word-length distribution) — typically ±5–10% for English prose and modest code, wider on heavy CJK or minified code. Those vendors don't publish a client-side tokenizer, so per-token visualization is disabled for them (showing fabricated splits would mislead). For exact counts in production:

  • OpenAI: tiktoken (Python) or js-tiktoken (Node/browser)
  • Anthropic: client.messages.count_tokens() in the official SDK
  • Google: model.count_tokens() in the Vertex AI / Gemini SDK

Related tools

FAQ

Common questions

Is my prompt sent anywhere when I use this tool?

No. The token count is computed entirely in your browser using local JavaScript — your prompt is never transmitted to any server, logged, or stored. Feel free to paste proprietary prompts, system messages, or sensitive data. You can verify by opening DevTools → Network tab and confirming zero network requests while counting.

How accurate is the token count?

For OpenAI models (GPT-5, GPT-4o, GPT-4o mini, o1), the count is exact — we run the real tiktoken library (o200k_base / cl100k_base) locally in your browser, so the per-token visualization matches what OpenAI's API would produce. For Anthropic, Google, and DeepSeek, counts are empirical estimates (±5–10% for English prose, wider for code and non-Latin scripts) because those vendors don't publish a client-side tokenizer. For production cost estimates on Claude/Gemini, call each vendor's token-counting endpoint.

Why can I see tokens highlighted for OpenAI models but not others?

The visualizer runs the real BPE tokenizer in your browser — it needs the model's vocabulary ranks file. OpenAI open-sourced theirs via tiktoken, so we can show you exactly how GPT splits your text into tokens with colored boundaries, IDs, and special-token markers. Anthropic and Google keep their tokenizers closed — for those, we fall back to an empirical byte-ratio estimate and don't draw per-token boundaries (showing made-up splits would be misleading).

What does Chat (ChatML) mode do?

In a real chat completion call, the provider wraps your messages with role tokens before feeding them to the model — OpenAI uses the ChatML format with <|im_start|>system, <|im_start|>user, <|im_end|> markers. Each of those control markers costs tokens (~3–5 per message) on top of your content. Chat mode composes your system/user/assistant fields into the ChatML envelope and tokenizes that, so the count matches what the API actually bills. The red highlights mark special tokens.

Why do different models give different token counts for the same text?

Each model family uses its own tokenizer with its own vocabulary. GPT-4o uses BPE with a 200k-token vocabulary (o200k_base), GPT-3.5/4 uses a 100k-token vocabulary (cl100k_base), Claude uses a proprietary SentencePiece-based tokenizer, and Gemini uses yet another variant. The same word might be one token in GPT-4o and two in an older tokenizer — especially for non-English text, code, and rare words.

How many tokens is 1,000 words?

For English prose, 1,000 words is roughly 1,300–1,400 tokens across most models (GPT-4o: ~1,330 • Claude: ~1,400 • Gemini: ~1,300). Technical writing runs higher, around 1,400–1,600 tokens per 1,000 words because code, punctuation, and technical terms split into more sub-tokens.

How many tokens is 1 MB of text?

A 1 MB plain-text file (about 1,000,000 characters of English) is roughly 250,000–290,000 tokens. Code-heavy files skew toward the higher end because curly braces, operators, and identifiers fragment into multiple tokens. This matters for context-window planning — a 1 MB log file exceeds GPT-4o's 128k window but fits comfortably in Gemini 2.5 Pro's 2M window.

What is a "token" exactly?

A token is the smallest unit a language model processes. Modern tokenizers use subword BPE (Byte-Pair Encoding), which splits text into pieces that are common across the training data: frequent words become single tokens ("the", "and"), rare words split into multiple tokens ("tokenization" → "token" + "ization"), and non-Latin characters often split per byte. One token is roughly ¾ of an English word on average.

Why does code use more tokens than English?

Programming languages are punctuation-heavy and full of identifiers that are rare in natural-language training data. Curly braces, semicolons, angle brackets, and camelCase names fragment into many sub-tokens. Expect 15–25% more tokens per character for typical source code vs English prose. JSON and minified code are the worst offenders — they can hit 1 token per 2.5–3 characters.

Why does non-English text cost more tokens?

Most LLM tokenizers are trained predominantly on English. Languages that share Latin script (Spanish, French, German) tokenize efficiently. Languages with different scripts — Cyrillic, Greek, Arabic, Chinese, Japanese, Korean — often tokenize per byte or per character, which multiplies token counts 2–3×. This is why a 1,000-character Chinese prompt can consume more tokens than a 4,000-character English prompt.

Does the context window include both input and output tokens?

Yes. The context window is the total: your system prompt + conversation history + new input + model output all compete for the same budget. If GPT-4o has a 128k context window and your history is already 120k tokens, the model can only generate 8k tokens before hitting the ceiling. Always reserve headroom for the response — typically 4k–16k tokens depending on use case.

How do I reduce the token count of a prompt?

The biggest wins: (1) remove few-shot examples you no longer need, (2) summarize long conversation history rather than resending it verbatim, (3) strip Markdown formatting from system prompts if the model doesn't need it, (4) use abbreviations in system prompts where clarity allows, (5) move static context to prompt caching (Anthropic) or the OpenAI Batch API for 50–90% cost savings on repeated prefixes.