Font Converter

Font Subsetting by Language

Different writing systems have radically different subsetting profiles. A Latin font with 200 glyphs subsets down to ~30 KB; a Chinese font with 30,000 glyphs needs strategic partitioning to stay usable for the web. This reference covers Unicode ranges, expected size reductions, and tips for the six major scripts.

TL;DR

  • -Latin: Easy. Basic Latin (A-Z, a-z, 0-9, punctuation) → ~30 KB. 70-80% reduction typical.
  • -Cyrillic: Single-language subsets reach 60-70% reduction. Russian alone uses 33 letters.
  • -Arabic: RTL with contextual forms and ligatures. Keep all four positional variants per letter.
  • -Chinese: 30,000+ glyphs. Use unicode-range partitioning by frequency band, not single subset.
  • -Japanese: Mix of Kanji + Hiragana + Katakana. Joyo Kanji subset (2,136 chars) is the practical baseline.
  • -Korean: 11,172 precomposed Hangul syllables. KS X 1001 (2,350 chars) covers ~99% of modern Korean.

Why Subset by Language

Modern web fonts ship with broad Unicode coverage by default, many include Latin, Latin Extended, Cyrillic, Greek, and Vietnamese in a single file. For an English-only site, that's 60-80% wasted bandwidth. For a Chinese site, the calculus inverts: you can't fit the full character set in one practical download, so subsetting becomes a delivery architecture problem, not just a size optimization.

ScriptApprox. GlyphsFull Font SizeSubset Strategy
Latin Basic~95100-200 KBSingle subset, 30 KB
Cyrillic~250200-400 KBSingle subset, 50-80 KB
Arabic~1,000300-600 KBSingle subset, 80-150 KB
Chinese (Simplified)3,500-30,0003-30 MBFrequency-band partitioning
Japanese7,000-15,0003-20 MBJoyo + Hiragana/Katakana subset
Korean11,172 syllables2-10 MBKS X 1001 subset (2,350 chars)

The main lever is the CSS unicode-range descriptor inside @font-face. It tells the browser to download a particular subset only when the page actually contains characters in that range. For multilingual sites this turns "serve everything to everyone" into "serve only what's rendered." For CJK sites it makes progressive font loading possible in the first place.

How to Subset Fonts

The workflow is identical across all scripts. Differences are in which presets you select and which Unicode ranges you include, those details follow per-script below.

1

Open the Font Subsetter

Visit our font subsetter tool. Browser-based, no installation, processes fonts entirely in RAM.

2

Upload your font

Drag and drop TTF, OTF, WOFF, or WOFF2. The tool analyzes the file and reports which scripts the font supports and its current glyph count.

3

Pick presets or Unicode ranges

Choose from script-specific presets (Latin, Cyrillic, Arabic, CJK) or specify custom unicode-range values directly for fine-grained control.

4

Add common characters

Numbers, punctuation, and currency symbols are usually needed regardless of script. Most quality fonts include these in the script's own range.

5

Generate and download

Click subset. The tool produces an optimized TTF/OTF/WOFF2 file with only the requested glyphs. Verify the size reduction matches expectations.

6

Convert to WOFF2 if not already

After subsetting, convert to WOFF2 for an additional 20-30% reduction via Brotli compression. Use our converter if your subsetter outputs TTF.

Tools you'll use

Latin

The simplest case. Latin Basic (A-Z, a-z, 0-9, common punctuation, basic symbols) covers English and produces dramatic reductions, often 70-90% smaller than the source font. Latin Extended adds accented characters needed for most European languages (French, German, Spanish, Polish, Czech, Portuguese, Italian, Scandinavian languages).

Coverage Tiers

SubsetUnicode RangeLanguages Covered
Basic LatinU+0000-007FEnglish (ASCII)
Latin-1 SupplementU+0080-00FFWestern European (French, German, Spanish, Italian, Portuguese)
Latin Extended-AU+0100-017FCentral European (Polish, Czech, Hungarian, Croatian)
Latin Extended-BU+0180-024FRomanian, Welsh, Vietnamese partial
Latin Extended AdditionalU+1E00-1EFFVietnamese, additional diacritics

Expected Size Reductions

70-80%
Basic Latin Only
English-only sites
60-70%
Latin + Extended-A
Most European languages
50-60%
Full Latin Extended
All Latin-script languages

Practical default for English-only: include Basic Latin + numbers + punctuation + currency. Skip Latin Extended unless your content has any non-English text including em dashes, ellipsis (…), and smart quotes (' ' " ") that live in Latin Extended.

Cyrillic

Cyrillic covers Russian, Ukrainian, Belarusian, Bulgarian, Serbian, Macedonian, and numerous minority languages across Eastern Europe and Central Asia. Single-language subsets work well, Russian alone needs 33 letters and produces 60-70% size reductions from a typical multi-script font.

Cyrillic Languages

LanguageLettersUnique Characters
Russian33Standard Cyrillic base
Ukrainian33ґ є і ї (unique to Ukrainian)
Bulgarian30Specific letter forms (Bulgarian localization)
Serbian (Cyrillic)30ђ ј љ њ ћ џ
Belarusian32ў (short u)

Unicode Ranges

/* Basic Cyrillic, covers all major Slavic languages */
U+0400-04FF

/* Cyrillic Supplement */
U+0500-052F

/* Cyrillic Extended-A, historical and minority languages */
U+2DE0-2DFF

/* Cyrillic Extended-B, additional historical chars */
U+A640-A69F

Tips

  • If you also display English on a Cyrillic site, include Basic Latin (U+0000-007F), most quality Cyrillic fonts already bundle it
  • Bulgarian forms: Bulgarian uses different glyph shapes for some letters. Quality fonts include OpenType Bulgarian localization features (loclBGR). Check the font's documentation
  • Ukrainian-specific: ensure ґ є і ї are included, they sit in U+0400-04FF but some restrictive subsets miss them
  • Recommended fonts: Roboto, Open Sans, Inter, Noto Sans, PT Sans all have good multi-language Cyrillic coverage

Arabic

Arabic adds complexity that Latin and Cyrillic don't face: right-to-left (RTL) direction, contextual letter forms (each letter has up to four shapes depending on its position in the word), and mandatory ligatures. A naive subset that drops the contextual positional variants will render text incorrectly. Arabic also covers Persian, Urdu, and other languages with extended character sets.

Unicode Ranges

/* Basic Arabic */
U+0600-06FF

/* Arabic Supplement (additional letters for African / South Asian languages) */
U+0750-077F

/* Arabic Extended-A (Quranic notation, additional letters) */
U+08A0-08FF

/* Arabic Presentation Forms-A (positional variants, KEEP) */
U+FB50-FDFF

/* Arabic Presentation Forms-B (additional positional variants, KEEP) */
U+FE70-FEFF

Critical Subsetting Rules

Don't drop contextual forms

Arabic letters change shape based on position: isolated, initial, medial, final. These live in U+FB50-FDFF and U+FE70-FEFF. A subset that excludes these will render Arabic text in disconnected isolated forms, readable but visually broken. Always include these ranges for any Arabic-supporting subset.

Language-Specific Coverage

  • Modern Standard Arabic: U+0600-06FF covers all standard letters
  • Persian (Farsi): needs پ چ ژ گ which sit in the basic Arabic range
  • Urdu: requires Arabic Supplement (U+0750-077F) for ٹ ڈ ڑ ں ھ ے
  • Quranic text: include Arabic Extended-A (U+08A0-08FF) for honorifics and Quranic notation

Tips

  • Set dir="rtl" on the relevant HTML elements; subsetting alone doesn't handle direction
  • Don't mix Arabic with non-Arabic fonts unless the Arabic font has good Latin coverage too, fallback chains often produce mismatched x-heights
  • Recommended fonts: Noto Naskh Arabic, Cairo, Almarai, IBM Plex Sans Arabic, Tajawal, all support contextual forms and have permissive licenses
  • Test subsetted Arabic fonts with real RTL content before deploying, visual rendering issues are easy to miss in LTR previews

Chinese

Chinese subsetting is fundamentally different from alphabetic scripts. A complete Chinese font supporting GB 18030 contains 27,000-30,000 ideographs and exceeds 10-30 MB. You cannot ship that as a single font file for the web. The strategy is frequency-band partitioning: split the character set into ~50-100 chunks of frequently-co-occurring characters, serve them as separate files, and use unicode-range in CSS to progressively load only the chunks the page actually needs.

Practical Subset Tiers

SubsetGlyphsCoverage
Top 500 Hanzi500~70% of common text
Top 2,500 Hanzi2,500~95% of common text
GB 2312 (Simplified)6,763~99% of modern Simplified Chinese
Big5 (Traditional)13,053Modern Traditional Chinese (Taiwan, HK)
GB 1803027,000+Comprehensive standard, includes minority languages

Unicode Ranges

/* CJK Unified Ideographs (main range) */
U+4E00-9FFF

/* CJK Unified Ideographs Extension A */
U+3400-4DBF

/* CJK Unified Ideographs Extension B (rare characters) */
U+20000-2A6DF

/* CJK Symbols and Punctuation */
U+3000-303F

/* Halfwidth and Fullwidth Forms */
U+FF00-FFEF

Frequency Partitioning Strategy

Google Fonts uses this approach for its CJK fonts (Noto Sans SC, Noto Sans TC). Instead of one massive subset, the font is split into ~100 chunks based on character frequency co-occurrence. Each chunk has a unique unicode-range covering the characters in that frequency band. When a page renders text, the browser only downloads the chunks containing the characters actually used.

For self-hosted Chinese fonts, replicate this approach: use a tool like cn-font-split or Google'spyftsubset to generate multiple subset files, then write @font-face declarations referencing each one with appropriate unicode-range values.

Tips

  • Simplified vs Traditional: they share ~50% of characters but produce visibly different glyphs, choose based on your audience (Mainland China = Simplified; Taiwan/HK = Traditional)
  • Static-text pages (e.g., a single product page) can be aggressively subset to just the characters present, use a build-time analysis script
  • Dynamic content (CMSs, user-generated text) must serve the full frequency-partitioned set, not a static subset
  • Recommended fonts: Noto Sans SC/TC, Source Han Sans, MiSans, all open-source with comprehensive coverage
  • Don't forget CJK punctuation (U+3000-303F), Chinese uses different quotation marks (「」『』) and full-width punctuation (,。)

Japanese

Japanese mixes three writing systems in everyday text: Kanji (thousands of Chinese-derived characters), Hiragana (46 syllabic characters for native Japanese words and grammar), and Katakana (46 syllabic characters for foreign loanwords). A comprehensive Japanese font ranges from 3-20 MB, with professional fonts exceeding 50 MB. Practical subsetting starts with the Joyo Kanji list, the 2,136 characters taught in Japanese schools.

Subset Tiers

SubsetGlyphsUse Case
Hiragana + Katakana only~180Children's content, transliteration
Joyo Kanji + Kana~2,500Modern Japanese text (~99% coverage)
JIS X 02086,879Comprehensive standard, names, place names
JIS X 021311,233Extended standard, classical literature

Unicode Ranges

/* Hiragana */
U+3040-309F

/* Katakana */
U+30A0-30FF

/* CJK Unified Ideographs (Kanji) */
U+4E00-9FFF

/* Half-width Katakana */
U+FF65-FF9F

/* CJK Symbols and Punctuation */
U+3000-303F

/* Fullwidth Forms (numbers, punctuation) */
U+FF00-FF60

Tips

  • Kana are mandatory: almost every Japanese sentence contains both Hiragana and Katakana, never subset out one or the other
  • Joyo Kanji as baseline: the 2,136-character Joyo list covers 99%+ of modern published text. Adding the Jinmei-yo list (863 additional characters used in personal names) gets you to ~2,999
  • Use frequency-band partitioning like Chinese for sites with extensive content
  • Vertical writing: Japanese supports vertical text (tategaki). If your design uses it, ensure the font includes the necessary OpenType vertical features (vert, vrt2)
  • Recommended fonts: Noto Sans JP, Source Han Sans, M PLUS, all open-source with comprehensive coverage and good vertical-writing support

Korean

Korean uses Hangul, an alphabetic script where letters combine into syllable blocks. Theoretically the system could be encoded with just the 24 Jamo (basic letters), but in practice Korean fonts ship 11,172 precomposed Hangul syllables, every possible combination. Plus Hanja (Chinese characters used in Korean), making a full Korean font 2-10 MB. The KS X 1001 standard's 2,350 syllable subset covers ~99% of modern Korean text and is the practical baseline.

Subset Tiers

SubsetGlyphsCoverage
KS X 10012,350~99% of modern Korean text
Full Hangul Syllables11,172All possible combinations
Full + Hanja~16,000+Academic text, classical works

Unicode Ranges

/* Hangul Syllables (precomposed) */
U+AC00-D7AF

/* Hangul Jamo (basic letters) */
U+1100-11FF

/* Hangul Compatibility Jamo */
U+3130-318F

/* Hangul Jamo Extended-A */
U+A960-A97F

/* Hanja (Chinese characters used in Korean) */
U+4E00-9FFF

Tips

  • For modern Korean web content, KS X 1001 (2,350 chars) is sufficient. Most users will never see the missing 8,000+ rare syllables
  • If your content includes proper names, place names, or formal academic text, expand to the full 11,172-syllable set
  • Hanja is rarely needed: modern Korean rarely uses Chinese characters except in academic/legal contexts. Skip Hanja unless you specifically need it
  • Use frequency partitioning for very large Korean sites, splitting the syllable set by frequency band can reduce initial download by 70%+
  • Recommended fonts: Noto Sans KR, Pretendard, Spoqa Han Sans, all open-source with comprehensive Hangul coverage
  • Test with both formal Korean and casual social media text, vocabulary differs significantly between registers

Universal Best Practices

Always Convert to WOFF2

After subsetting, convert TTF/OTF output to WOFF2. Brotli compression adds another 20-30% reduction on top of the subset savings. WOFF2 has 97%+ browser support.

Use unicode-range in @font-face

For multilingual sites, define multiple @font-face declarations with unicode-range. Browsers fetch only the subsets containing characters actually rendered on the page.

Test with Real Content

Subsets that work for sample text can fail on real content with unexpected characters. Test with actual production text and watch for tofu (□) where glyphs are missing.

Verify Licensing Permits Subsetting

OFL fonts allow subsetting explicitly. Many commercial EULAs prohibit modification , including subsetting. See our font modification rights guide.

Subset Your Fonts Now

Browser-based, no installation. Works for every script covered on this page. Output ready WOFF2 with @font-face CSS in one workflow.

Sarah Mitchell

Written & Verified by

Sarah Mitchell

Typography expert specializing in font design, web typography, and accessibility

Font Subsetting by Language FAQs

Common questions about subsetting fonts for Latin, Cyrillic, Arabic, Chinese, Japanese, and Korean