AI Tools

Free AI Tools for Developers

Client-side utilities for working with large language models. Count tokens exactly for OpenAI models, visualize how GPT splits your prompt, estimate context-window usage across Claude, Gemini and DeepSeek — without sending your prompt to a server.

Why AI tools should run in the browser

Most online AI tools proxy your prompt through their own server before returning a result. For token counting, prompt templating, or export cleanup, that's unnecessary — and risky. System prompts often contain proprietary instructions, customer data, internal URLs, or API keys embedded in examples. Pasting them into a server-side tool means trusting a third party not to log or train on that content.

Every AI tool on ToolKit runs entirely in your browser. For OpenAI models we load the real tiktoken BPE ranks (o200k_base and cl100k_base) locally and tokenize client-side — counts are exact, identical to what the OpenAI API bills. For Anthropic, Google, and DeepSeek, we fall back to empirical per-tokenizer ratios with adjustments for code and non-Latin scripts, and we label those counts as estimates rather than pretending they're exact.

No analytics pings your prompt. No server logs capture your system message. You can verify by opening DevTools → Network tab and confirming zero requests while using the tools.

AI best practices

Count before you send
API pricing is per token. A workflow processing 10k tickets at 4k tokens each is 40M tokens — the gap between GPT-4o and GPT-4o mini is thousands of dollars per month. Count the prompt template once, multiply by volume.
Reserve output headroom
The context window covers input + output. GPT-4o has 128k total; if your history is at 120k, the model can generate only 8k before hitting the ceiling. Leave 4–16k for responses depending on use case.
Chat templates cost tokens
ChatML wrappers (<|im_start|>, role, <|im_end|>) add ~3–5 tokens per message before your content. Use Chat mode in the Token Counter to see the real billed count, not just raw text length.
Non-Latin text is expensive
Most tokenizers are English-biased. CJK, Cyrillic, Arabic, and Hindi tokenize 2–3× less efficiently per character. A 1,000-char Chinese prompt can cost more tokens than 4,000 chars of English.

FAQ

Common questions

Are my prompts sent anywhere?

No. These tools run 100% in your browser using WebAssembly and pure JavaScript. Prompts, system messages, and conversation history never leave your device. Verify in DevTools → Network.

Why exact counts only for OpenAI?

OpenAI open-sourced their tokenizer (tiktoken) with the vocabulary files. Anthropic, Google, and DeepSeek keep their tokenizers closed — there is no published client-side library, so we show an empirical estimate with a visible ~ prefix instead of drawing fabricated token boundaries.

How accurate are the estimates for Claude / Gemini / DeepSeek?

Typically ±5–10% of the official tokenizer for English prose, with wider variance on code, minified JSON, and non-Latin scripts. Good enough for cost planning, context-window sanity checks, and rough budgeting. For production billing accuracy, call each vendor's count_tokens endpoint.

Do these tools work offline?

After the page loads once, all tokenization runs locally. The BPE ranks file (~500–700 KB gzipped) is cached on first use, then works offline.

Which models are supported?

GPT-5, GPT-4o, GPT-4o mini, o1, Claude Opus / Sonnet / Haiku 4.x, Gemini 2.5 Pro, Gemini 2.0 Flash, and DeepSeek V3. OpenAI models get exact counts and per-token visualization; the rest use estimates.

More tool categories