Why Subset by Language
Modern web fonts ship with broad Unicode coverage by default, many include Latin, Latin Extended, Cyrillic, Greek, and Vietnamese in a single file. For an English-only site, that's 60-80% wasted bandwidth. For a Chinese site, the calculus inverts: you can't fit the full character set in one practical download, so subsetting becomes a delivery architecture problem, not just a size optimization.
| Script | Approx. Glyphs | Full Font Size | Subset Strategy |
|---|---|---|---|
| Latin Basic | ~95 | 100-200 KB | Single subset, 30 KB |
| Cyrillic | ~250 | 200-400 KB | Single subset, 50-80 KB |
| Arabic | ~1,000 | 300-600 KB | Single subset, 80-150 KB |
| Chinese (Simplified) | 3,500-30,000 | 3-30 MB | Frequency-band partitioning |
| Japanese | 7,000-15,000 | 3-20 MB | Joyo + Hiragana/Katakana subset |
| Korean | 11,172 syllables | 2-10 MB | KS X 1001 subset (2,350 chars) |
The main lever is the CSS unicode-range descriptor inside @font-face. It tells the browser to download a particular subset only when the page actually contains characters in that range. For multilingual sites this turns "serve everything to everyone" into "serve only what's rendered." For CJK sites it makes progressive font loading possible in the first place.
How to Subset Fonts
The workflow is identical across all scripts. Differences are in which presets you select and which Unicode ranges you include, those details follow per-script below.
Open the Font Subsetter
Visit our font subsetter tool. Browser-based, no installation, processes fonts entirely in RAM.
Upload your font
Drag and drop TTF, OTF, WOFF, or WOFF2. The tool analyzes the file and reports which scripts the font supports and its current glyph count.
Pick presets or Unicode ranges
Choose from script-specific presets (Latin, Cyrillic, Arabic, CJK) or specify custom unicode-range values directly for fine-grained control.
Add common characters
Numbers, punctuation, and currency symbols are usually needed regardless of script. Most quality fonts include these in the script's own range.
Generate and download
Click subset. The tool produces an optimized TTF/OTF/WOFF2 file with only the requested glyphs. Verify the size reduction matches expectations.
Convert to WOFF2 if not already
After subsetting, convert to WOFF2 for an additional 20-30% reduction via Brotli compression. Use our converter if your subsetter outputs TTF.
Tools you'll use
- • Font Subsetter, for the actual subsetting operation
- • Unicode Range Generator, for CSS
unicode-rangedescriptors - • Webfont Generator, to wrap subset output in WOFF2 with @font-face CSS
- • Font Analyzer, to confirm which scripts a font already covers
Latin
The simplest case. Latin Basic (A-Z, a-z, 0-9, common punctuation, basic symbols) covers English and produces dramatic reductions, often 70-90% smaller than the source font. Latin Extended adds accented characters needed for most European languages (French, German, Spanish, Polish, Czech, Portuguese, Italian, Scandinavian languages).
Coverage Tiers
| Subset | Unicode Range | Languages Covered |
|---|---|---|
| Basic Latin | U+0000-007F | English (ASCII) |
| Latin-1 Supplement | U+0080-00FF | Western European (French, German, Spanish, Italian, Portuguese) |
| Latin Extended-A | U+0100-017F | Central European (Polish, Czech, Hungarian, Croatian) |
| Latin Extended-B | U+0180-024F | Romanian, Welsh, Vietnamese partial |
| Latin Extended Additional | U+1E00-1EFF | Vietnamese, additional diacritics |
Expected Size Reductions
Practical default for English-only: include Basic Latin + numbers + punctuation + currency. Skip Latin Extended unless your content has any non-English text including em dashes, ellipsis (…), and smart quotes (' ' " ") that live in Latin Extended.
Cyrillic
Cyrillic covers Russian, Ukrainian, Belarusian, Bulgarian, Serbian, Macedonian, and numerous minority languages across Eastern Europe and Central Asia. Single-language subsets work well, Russian alone needs 33 letters and produces 60-70% size reductions from a typical multi-script font.
Cyrillic Languages
| Language | Letters | Unique Characters |
|---|---|---|
| Russian | 33 | Standard Cyrillic base |
| Ukrainian | 33 | ґ є і ї (unique to Ukrainian) |
| Bulgarian | 30 | Specific letter forms (Bulgarian localization) |
| Serbian (Cyrillic) | 30 | ђ ј љ њ ћ џ |
| Belarusian | 32 | ў (short u) |
Unicode Ranges
/* Basic Cyrillic, covers all major Slavic languages */ U+0400-04FF /* Cyrillic Supplement */ U+0500-052F /* Cyrillic Extended-A, historical and minority languages */ U+2DE0-2DFF /* Cyrillic Extended-B, additional historical chars */ U+A640-A69F
Tips
- If you also display English on a Cyrillic site, include Basic Latin (U+0000-007F), most quality Cyrillic fonts already bundle it
- Bulgarian forms: Bulgarian uses different glyph shapes for some letters. Quality fonts include OpenType Bulgarian localization features (loclBGR). Check the font's documentation
- Ukrainian-specific: ensure ґ є і ї are included, they sit in U+0400-04FF but some restrictive subsets miss them
- Recommended fonts: Roboto, Open Sans, Inter, Noto Sans, PT Sans all have good multi-language Cyrillic coverage
Arabic
Arabic adds complexity that Latin and Cyrillic don't face: right-to-left (RTL) direction, contextual letter forms (each letter has up to four shapes depending on its position in the word), and mandatory ligatures. A naive subset that drops the contextual positional variants will render text incorrectly. Arabic also covers Persian, Urdu, and other languages with extended character sets.
Unicode Ranges
/* Basic Arabic */ U+0600-06FF /* Arabic Supplement (additional letters for African / South Asian languages) */ U+0750-077F /* Arabic Extended-A (Quranic notation, additional letters) */ U+08A0-08FF /* Arabic Presentation Forms-A (positional variants, KEEP) */ U+FB50-FDFF /* Arabic Presentation Forms-B (additional positional variants, KEEP) */ U+FE70-FEFF
Critical Subsetting Rules
Don't drop contextual forms
Arabic letters change shape based on position: isolated, initial, medial, final. These live in U+FB50-FDFF and U+FE70-FEFF. A subset that excludes these will render Arabic text in disconnected isolated forms, readable but visually broken. Always include these ranges for any Arabic-supporting subset.
Language-Specific Coverage
- Modern Standard Arabic: U+0600-06FF covers all standard letters
- Persian (Farsi): needs پ چ ژ گ which sit in the basic Arabic range
- Urdu: requires Arabic Supplement (U+0750-077F) for ٹ ڈ ڑ ں ھ ے
- Quranic text: include Arabic Extended-A (U+08A0-08FF) for honorifics and Quranic notation
Tips
- Set
dir="rtl"on the relevant HTML elements; subsetting alone doesn't handle direction - Don't mix Arabic with non-Arabic fonts unless the Arabic font has good Latin coverage too, fallback chains often produce mismatched x-heights
- Recommended fonts: Noto Naskh Arabic, Cairo, Almarai, IBM Plex Sans Arabic, Tajawal, all support contextual forms and have permissive licenses
- Test subsetted Arabic fonts with real RTL content before deploying, visual rendering issues are easy to miss in LTR previews
Chinese
Chinese subsetting is fundamentally different from alphabetic scripts. A complete Chinese font supporting GB 18030 contains 27,000-30,000 ideographs and exceeds 10-30 MB. You cannot ship that as a single font file for the web. The strategy is frequency-band partitioning: split the character set into ~50-100 chunks of frequently-co-occurring characters, serve them as separate files, and use unicode-range in CSS to progressively load only the chunks the page actually needs.
Practical Subset Tiers
| Subset | Glyphs | Coverage |
|---|---|---|
| Top 500 Hanzi | 500 | ~70% of common text |
| Top 2,500 Hanzi | 2,500 | ~95% of common text |
| GB 2312 (Simplified) | 6,763 | ~99% of modern Simplified Chinese |
| Big5 (Traditional) | 13,053 | Modern Traditional Chinese (Taiwan, HK) |
| GB 18030 | 27,000+ | Comprehensive standard, includes minority languages |
Unicode Ranges
/* CJK Unified Ideographs (main range) */ U+4E00-9FFF /* CJK Unified Ideographs Extension A */ U+3400-4DBF /* CJK Unified Ideographs Extension B (rare characters) */ U+20000-2A6DF /* CJK Symbols and Punctuation */ U+3000-303F /* Halfwidth and Fullwidth Forms */ U+FF00-FFEF
Frequency Partitioning Strategy
Google Fonts uses this approach for its CJK fonts (Noto Sans SC, Noto Sans TC). Instead of one massive subset, the font is split into ~100 chunks based on character frequency co-occurrence. Each chunk has a unique unicode-range covering the characters in that frequency band. When a page renders text, the browser only downloads the chunks containing the characters actually used.
For self-hosted Chinese fonts, replicate this approach: use a tool like cn-font-split or Google'spyftsubset to generate multiple subset files, then write @font-face declarations referencing each one with appropriate unicode-range values.
Tips
- Simplified vs Traditional: they share ~50% of characters but produce visibly different glyphs, choose based on your audience (Mainland China = Simplified; Taiwan/HK = Traditional)
- Static-text pages (e.g., a single product page) can be aggressively subset to just the characters present, use a build-time analysis script
- Dynamic content (CMSs, user-generated text) must serve the full frequency-partitioned set, not a static subset
- Recommended fonts: Noto Sans SC/TC, Source Han Sans, MiSans, all open-source with comprehensive coverage
- Don't forget CJK punctuation (U+3000-303F), Chinese uses different quotation marks (「」『』) and full-width punctuation (,。)
Japanese
Japanese mixes three writing systems in everyday text: Kanji (thousands of Chinese-derived characters), Hiragana (46 syllabic characters for native Japanese words and grammar), and Katakana (46 syllabic characters for foreign loanwords). A comprehensive Japanese font ranges from 3-20 MB, with professional fonts exceeding 50 MB. Practical subsetting starts with the Joyo Kanji list, the 2,136 characters taught in Japanese schools.
Subset Tiers
| Subset | Glyphs | Use Case |
|---|---|---|
| Hiragana + Katakana only | ~180 | Children's content, transliteration |
| Joyo Kanji + Kana | ~2,500 | Modern Japanese text (~99% coverage) |
| JIS X 0208 | 6,879 | Comprehensive standard, names, place names |
| JIS X 0213 | 11,233 | Extended standard, classical literature |
Unicode Ranges
/* Hiragana */ U+3040-309F /* Katakana */ U+30A0-30FF /* CJK Unified Ideographs (Kanji) */ U+4E00-9FFF /* Half-width Katakana */ U+FF65-FF9F /* CJK Symbols and Punctuation */ U+3000-303F /* Fullwidth Forms (numbers, punctuation) */ U+FF00-FF60
Tips
- Kana are mandatory: almost every Japanese sentence contains both Hiragana and Katakana, never subset out one or the other
- Joyo Kanji as baseline: the 2,136-character Joyo list covers 99%+ of modern published text. Adding the Jinmei-yo list (863 additional characters used in personal names) gets you to ~2,999
- Use frequency-band partitioning like Chinese for sites with extensive content
- Vertical writing: Japanese supports vertical text (tategaki). If your design uses it, ensure the font includes the necessary OpenType vertical features (vert, vrt2)
- Recommended fonts: Noto Sans JP, Source Han Sans, M PLUS, all open-source with comprehensive coverage and good vertical-writing support
Korean
Korean uses Hangul, an alphabetic script where letters combine into syllable blocks. Theoretically the system could be encoded with just the 24 Jamo (basic letters), but in practice Korean fonts ship 11,172 precomposed Hangul syllables, every possible combination. Plus Hanja (Chinese characters used in Korean), making a full Korean font 2-10 MB. The KS X 1001 standard's 2,350 syllable subset covers ~99% of modern Korean text and is the practical baseline.
Subset Tiers
| Subset | Glyphs | Coverage |
|---|---|---|
| KS X 1001 | 2,350 | ~99% of modern Korean text |
| Full Hangul Syllables | 11,172 | All possible combinations |
| Full + Hanja | ~16,000+ | Academic text, classical works |
Unicode Ranges
/* Hangul Syllables (precomposed) */ U+AC00-D7AF /* Hangul Jamo (basic letters) */ U+1100-11FF /* Hangul Compatibility Jamo */ U+3130-318F /* Hangul Jamo Extended-A */ U+A960-A97F /* Hanja (Chinese characters used in Korean) */ U+4E00-9FFF
Tips
- For modern Korean web content, KS X 1001 (2,350 chars) is sufficient. Most users will never see the missing 8,000+ rare syllables
- If your content includes proper names, place names, or formal academic text, expand to the full 11,172-syllable set
- Hanja is rarely needed: modern Korean rarely uses Chinese characters except in academic/legal contexts. Skip Hanja unless you specifically need it
- Use frequency partitioning for very large Korean sites, splitting the syllable set by frequency band can reduce initial download by 70%+
- Recommended fonts: Noto Sans KR, Pretendard, Spoqa Han Sans, all open-source with comprehensive Hangul coverage
- Test with both formal Korean and casual social media text, vocabulary differs significantly between registers
Universal Best Practices
Always Convert to WOFF2
After subsetting, convert TTF/OTF output to WOFF2. Brotli compression adds another 20-30% reduction on top of the subset savings. WOFF2 has 97%+ browser support.
Use unicode-range in @font-face
For multilingual sites, define multiple @font-face declarations with unicode-range. Browsers fetch only the subsets containing characters actually rendered on the page.
Test with Real Content
Subsets that work for sample text can fail on real content with unexpected characters. Test with actual production text and watch for tofu (□) where glyphs are missing.
Verify Licensing Permits Subsetting
OFL fonts allow subsetting explicitly. Many commercial EULAs prohibit modification , including subsetting. See our font modification rights guide.
Subset Your Fonts Now
Browser-based, no installation. Works for every script covered on this page. Output ready WOFF2 with @font-face CSS in one workflow.
