What is font subsetting?

Font subsetting removes unused characters (glyphs) from a font file to reduce its size. A Latin subset typically includes only A-Z, a-z, 0-9, and common punctuation, which can reduce file size by 70-90%.

What Unicode range covers Latin characters?

Basic Latin is U+0000-007F (ASCII), Latin-1 Supplement is U+0080-00FF, Latin Extended-A is U+0100-017F, and Latin Extended-B is U+0180-024F. For most English websites, U+0000-00FF is sufficient.

Will subsetting break my website for non-English users?

If your subset doesn't include characters needed by users (like accented characters for French or German), those characters will display in a fallback font. Always analyze your content's character requirements before subsetting.

How do I create a Latin font subset?

Use pyftsubset from FontTools: pyftsubset font.ttf --unicodes='U+0000-00FF' --output-file=font-latin.ttf. Then convert to WOFF2 for web use.

Should I create multiple subsets for different scripts?

Yes, for multilingual sites. Create separate subsets (Latin, Cyrillic, Greek) and use unicode-range in @font-face to load only the subset needed for each page's content.

What Unicode range covers Cyrillic characters?

Basic Cyrillic is U+0400-04FF, covering Russian, Ukrainian, Bulgarian, and other Slavic languages. Cyrillic Extended-A (U+2DE0-2DFF) and Extended-B (U+A640-A69F) cover historical and minority languages.

How do I create a Cyrillic font subset?

Use pyftsubset: pyftsubset font.ttf --unicodes='U+0400-04FF' --output-file=font-cyrillic.ttf. Include Latin characters too if your site has mixed content.

Should I combine Latin and Cyrillic in one subset?

For multilingual sites, create separate subsets and use unicode-range in @font-face. This loads only the needed character sets. For single-language sites, one combined subset is simpler.

Why do some Cyrillic characters look different?

Cyrillic has language-specific variations. Serbian/Macedonian б, г, д, п, т look different from Russian. Ensure your font supports the specific language variant you need.

What's a typical Cyrillic subset file size?

A well-subsetted Cyrillic + Latin WOFF2 file is typically 20-40KB, compared to 100KB+ for a full font. The exact size depends on the font design and included features.

What Unicode range covers Arabic characters?

Arabic is U+0600-06FF, Arabic Supplement is U+0750-077F, Arabic Extended-A is U+08A0-08FF. For basic Arabic, U+0600-06FF usually suffices.

Why are Arabic fonts complex to subset?

Arabic has contextual shaping, meaning letters change form based on position (initial, medial, final, isolated). Subsetting must preserve OpenType shaping tables (GSUB, GPOS) or text won't render correctly.

How do I subset Arabic fonts safely?

Use FontTools with care: pyftsubset font.ttf --unicodes='U+0600-06FF' --layout-features='*' preserves all OpenType features. Always test RTL rendering after subsetting.

Can I combine Arabic and Latin in one font file?

Yes, and most Arabic fonts include Latin characters. You can subset to both ranges or create separate files with unicode-range. Separate files load faster if only one script is used.

What's a typical Arabic font file size?

Arabic fonts are larger than Latin due to more glyphs and complex shaping tables. A well-optimized Arabic WOFF2 is typically 50-100KB, compared to 20-40KB for Latin-only.

Why are Chinese fonts so large?

Chinese uses tens of thousands of characters (CJK Unified Ideographs). A comprehensive Chinese font supporting GB 18030 standard can exceed 10-30MB, making optimization essential.

How do I optimize Chinese fonts for web?

Use Google Fonts' dynamic subsetting (automatically splits into ~100 slices), subset to frequently used characters (~3,000 covers 99% of modern text), or implement on-demand loading with unicode-range.

What Unicode range covers Chinese?

CJK Unified Ideographs: U+4E00-9FFF (~21,000 characters). CJK Extension A-G add more characters for rare/historical use. For modern Simplified Chinese, the base range is usually sufficient.

Should I use Google Fonts for Chinese text?

Google Fonts offers excellent Chinese font optimization. Noto Sans SC and other Chinese fonts are automatically subset into ~100 files, loading only characters used on each page.

Can I create a minimal Chinese subset?

Yes. The most common ~3,000 characters cover 99%+ of modern Chinese text. Use tools like glyphhanger to analyze your content, or subset to the GB2312 standard (~7,000 characters).

Why are Japanese fonts so large?

Japanese uses Kanji (thousands of characters from Chinese), Hiragana, Katakana, and often Latin characters. A comprehensive Japanese font can be 3-20MB, with professional fonts exceeding 50MB.

How do I optimize Japanese fonts for web?

Use dynamic subsetting (Google Fonts does this automatically), subset to JIS Level 1 kanji (~3,000 characters), or use the unicode-range technique to load characters on demand.

What Unicode ranges cover Japanese?

Hiragana: U+3040-309F, Katakana: U+30A0-30FF, CJK Unified Ideographs: U+4E00-9FFF. For basic Japanese, you need all three plus punctuation (U+3000-303F).

Should I use Google Fonts for Japanese text?

Google Fonts offers excellent Japanese font optimization with automatic dynamic subsetting. Fonts like Noto Sans JP are split into 100+ slices, loading only what's needed. This is recommended for most projects.

Can I create a custom Japanese font subset?

Yes, but it's complex. Use FontTools with a character list specific to your content, or tools like glyphhanger that analyze your pages. For dynamic content, dynamic subsetting is usually better.

Why are Korean fonts so large?

Korean uses Hangul, which has thousands of syllable blocks. A full Korean font contains 11,172 precomposed Hangul syllables plus Hanja characters, often resulting in 2-10MB files.

How can I reduce Korean font file size?

Use Google Fonts' Korean subset (loads in slices), subset to only used syllables, or use unicode-range to load character sets on demand. Tools like subfont analyze your content and create optimized subsets.

What Unicode range covers Korean characters?

Hangul Syllables: U+AC00-D7A3, Hangul Jamo: U+1100-11FF, Hangul Compatibility Jamo: U+3130-318F. For modern Korean text, U+AC00-D7A3 covers most needs.

Can I dynamically load Korean font subsets?

Yes, use unicode-range in @font-face with multiple subset files, or use Google Fonts' built-in Korean subsetting. The browser loads only the subsets needed for the page content.

What's the best format for Korean web fonts?

WOFF2 provides the best compression for Korean fonts, often achieving 50-60% size reduction. Always convert from a high-quality source font to avoid quality loss.

Font Subsetting by Language: Latin, Cyrillic, Arabic, Chinese, Japanese & Korean

Why Subset by Language

Modern web fonts ship with broad Unicode coverage by default, many include Latin, Latin Extended, Cyrillic, Greek, and Vietnamese in a single file. For an English-only site, that's 60-80% wasted bandwidth. For a Chinese site, the calculus inverts: you can't fit the full character set in one practical download, so subsetting becomes a delivery architecture problem, not just a size optimization.

Script	Approx. Glyphs	Full Font Size	Subset Strategy
Latin Basic	~95	100-200 KB	Single subset, 30 KB
Cyrillic	~250	200-400 KB	Single subset, 50-80 KB
Arabic	~1,000	300-600 KB	Single subset, 80-150 KB
Chinese (Simplified)	3,500-30,000	3-30 MB	Frequency-band partitioning
Japanese	7,000-15,000	3-20 MB	Joyo + Hiragana/Katakana subset
Korean	11,172 syllables	2-10 MB	KS X 1001 subset (2,350 chars)

The main lever is the CSS unicode-range descriptor inside @font-face. It tells the browser to download a particular subset only when the page actually contains characters in that range. For multilingual sites this turns "serve everything to everyone" into "serve only what's rendered." For CJK sites it makes progressive font loading possible in the first place.

How to Subset Fonts

The workflow is identical across all scripts. Differences are in which presets you select and which Unicode ranges you include, those details follow per-script below.

Open the Font Subsetter

Visit our font subsetter tool. Browser-based, no installation, processes fonts entirely in RAM.

Upload your font

Drag and drop TTF, OTF, WOFF, or WOFF2. The tool analyzes the file and reports which scripts the font supports and its current glyph count.

Pick presets or Unicode ranges

Choose from script-specific presets (Latin, Cyrillic, Arabic, CJK) or specify custom unicode-range values directly for fine-grained control.

Add common characters

Numbers, punctuation, and currency symbols are usually needed regardless of script. Most quality fonts include these in the script's own range.

Generate and download

Click subset. The tool produces an optimized TTF/OTF/WOFF2 file with only the requested glyphs. Verify the size reduction matches expectations.

Convert to WOFF2 if not already

After subsetting, convert to WOFF2 for an additional 20-30% reduction via Brotli compression. Use our converter if your subsetter outputs TTF.

Tools you'll use

• Font Subsetter, for the actual subsetting operation
• Unicode Range Generator, for CSS unicode-range descriptors
• Webfont Generator, to wrap subset output in WOFF2 with @font-face CSS
• Font Analyzer, to confirm which scripts a font already covers

Latin

The simplest case. Latin Basic (A-Z, a-z, 0-9, common punctuation, basic symbols) covers English and produces dramatic reductions, often 70-90% smaller than the source font. Latin Extended adds accented characters needed for most European languages (French, German, Spanish, Polish, Czech, Portuguese, Italian, Scandinavian languages).

Coverage Tiers

Subset	Unicode Range	Languages Covered
Basic Latin	U+0000-007F	English (ASCII)
Latin-1 Supplement	U+0080-00FF	Western European (French, German, Spanish, Italian, Portuguese)
Latin Extended-A	U+0100-017F	Central European (Polish, Czech, Hungarian, Croatian)
Latin Extended-B	U+0180-024F	Romanian, Welsh, Vietnamese partial
Latin Extended Additional	U+1E00-1EFF	Vietnamese, additional diacritics

Expected Size Reductions

70-80%

Basic Latin Only

English-only sites

60-70%

Latin + Extended-A

Most European languages

50-60%

Full Latin Extended

All Latin-script languages

Practical default for English-only:include Basic Latin + numbers + punctuation + currency. Skip Latin Extended unless your content has any non-English text including em dashes, ellipsis (…), and smart quotes (' ' " ") that live in Latin Extended.

Cyrillic

Cyrillic covers Russian, Ukrainian, Belarusian, Bulgarian, Serbian, Macedonian, and numerous minority languages across Eastern Europe and Central Asia. Single-language subsets work well, Russian alone needs 33 letters and produces 60-70% size reductions from a typical multi-script font.

Cyrillic Languages

Language	Letters	Unique Characters
Russian	33	Standard Cyrillic base
Ukrainian	33	ґ є і ї (unique to Ukrainian)
Bulgarian	30	Specific letter forms (Bulgarian localization)
Serbian (Cyrillic)	30	ђ ј љ њ ћ џ
Belarusian	32	ў (short u)

Unicode Ranges

/* Basic Cyrillic, covers all major Slavic languages */
U+0400-04FF

/* Cyrillic Supplement */
U+0500-052F

/* Cyrillic Extended-A, historical and minority languages */
U+2DE0-2DFF

/* Cyrillic Extended-B, additional historical chars */
U+A640-A69F

Tips

If you also display English on a Cyrillic site, include Basic Latin (U+0000-007F), most quality Cyrillic fonts already bundle it
Bulgarian forms: Bulgarian uses different glyph shapes for some letters. Quality fonts include OpenType Bulgarian localization features (loclBGR). Check the font's documentation
Ukrainian-specific: ensure ґ є і ї are included, they sit in U+0400-04FF but some restrictive subsets miss them
Recommended fonts: Roboto, Open Sans, Inter, Noto Sans, PT Sans all have good multi-language Cyrillic coverage

Arabic

Arabic adds complexity that Latin and Cyrillic don't face: right-to-left (RTL) direction, contextual letter forms (each letter has up to four shapes depending on its position in the word), and mandatory ligatures. A naive subset that drops the contextual positional variants will render text incorrectly. Arabic also covers Persian, Urdu, and other languages with extended character sets.

Unicode Ranges

/* Basic Arabic */
U+0600-06FF

/* Arabic Supplement (additional letters for African / South Asian languages) */
U+0750-077F

/* Arabic Extended-A (Quranic notation, additional letters) */
U+08A0-08FF

/* Arabic Presentation Forms-A (positional variants, KEEP) */
U+FB50-FDFF

/* Arabic Presentation Forms-B (additional positional variants, KEEP) */
U+FE70-FEFF

Critical Subsetting Rules

Don't drop contextual forms

Arabic letters change shape based on position: isolated, initial, medial, final. These live in U+FB50-FDFF and U+FE70-FEFF. A subset that excludes these will render Arabic text in disconnected isolated forms, readable but visually broken. Always include these ranges for any Arabic-supporting subset.

Language-Specific Coverage

Modern Standard Arabic: U+0600-06FF covers all standard letters
Persian (Farsi): needs پ چ ژ گ which sit in the basic Arabic range
Urdu: requires Arabic Supplement (U+0750-077F) for ٹ ڈ ڑ ں ھ ے
Quranic text: include Arabic Extended-A (U+08A0-08FF) for honorifics and Quranic notation

Tips

Set dir="rtl" on the relevant HTML elements; subsetting alone doesn't handle direction
Don't mix Arabic with non-Arabic fonts unless the Arabic font has good Latin coverage too, fallback chains often produce mismatched x-heights
Recommended fonts: Noto Naskh Arabic, Cairo, Almarai, IBM Plex Sans Arabic, Tajawal, all support contextual forms and have permissive licenses
Test subsetted Arabic fonts with real RTL content before deploying, visual rendering issues are easy to miss in LTR previews

Chinese

Chinese subsetting is fundamentally different from alphabetic scripts. A complete Chinese font supporting GB 18030 contains 27,000-30,000 ideographs and exceeds 10-30 MB. You cannot ship that as a single font file for the web. The strategy is frequency-band partitioning: split the character set into ~50-100 chunks of frequently-co-occurring characters, serve them as separate files, and use unicode-range in CSS to progressively load only the chunks the page actually needs.

Practical Subset Tiers

Subset	Glyphs	Coverage
Top 500 Hanzi	500	~70% of common text
Top 2,500 Hanzi	2,500	~95% of common text
GB 2312 (Simplified)	6,763	~99% of modern Simplified Chinese
Big5 (Traditional)	13,053	Modern Traditional Chinese (Taiwan, HK)
GB 18030	27,000+	Comprehensive standard, includes minority languages

Unicode Ranges

/* CJK Unified Ideographs (main range) */
U+4E00-9FFF

/* CJK Unified Ideographs Extension A */
U+3400-4DBF

/* CJK Unified Ideographs Extension B (rare characters) */
U+20000-2A6DF

/* CJK Symbols and Punctuation */
U+3000-303F

/* Halfwidth and Fullwidth Forms */
U+FF00-FFEF

Frequency Partitioning Strategy

Google Fonts uses this approach for its CJK fonts (Noto Sans SC, Noto Sans TC). Instead of one massive subset, the font is split into ~100 chunks based on character frequency co-occurrence. Each chunk has a unique unicode-range covering the characters in that frequency band. When a page renders text, the browser only downloads the chunks containing the characters actually used.

For self-hosted Chinese fonts, replicate this approach: use a tool like cn-font-splitor Google'spyftsubset to generate multiple subset files, then write @font-face declarations referencing each one with appropriate unicode-range values.

Tips

Simplified vs Traditional: they share ~50% of characters but produce visibly different glyphs, choose based on your audience (Mainland China = Simplified; Taiwan/HK = Traditional)
Static-text pages (e.g., a single product page) can be aggressively subset to just the characters present, use a build-time analysis script
Dynamic content (CMSs, user-generated text) must serve the full frequency-partitioned set, not a static subset
Recommended fonts: Noto Sans SC/TC, Source Han Sans, MiSans, all open-source with comprehensive coverage
Don't forget CJK punctuation (U+3000-303F), Chinese uses different quotation marks (「」『』) and full-width punctuation (，。)

Japanese

Japanese mixes three writing systems in everyday text: Kanji (thousands of Chinese-derived characters), Hiragana (46 syllabic characters for native Japanese words and grammar), and Katakana (46 syllabic characters for foreign loanwords). A comprehensive Japanese font ranges from 3-20 MB, with professional fonts exceeding 50 MB. Practical subsetting starts with the Joyo Kanji list, the 2,136 characters taught in Japanese schools.

Subset Tiers

Subset	Glyphs	Use Case
Hiragana + Katakana only	~180	Children's content, transliteration
Joyo Kanji + Kana	~2,500	Modern Japanese text (~99% coverage)
JIS X 0208	6,879	Comprehensive standard, names, place names
JIS X 0213	11,233	Extended standard, classical literature

Unicode Ranges

/* Hiragana */
U+3040-309F

/* Katakana */
U+30A0-30FF

/* CJK Unified Ideographs (Kanji) */
U+4E00-9FFF

/* Half-width Katakana */
U+FF65-FF9F

/* CJK Symbols and Punctuation */
U+3000-303F

/* Fullwidth Forms (numbers, punctuation) */
U+FF00-FF60

Tips

Kana are mandatory: almost every Japanese sentence contains both Hiragana and Katakana, never subset out one or the other
Joyo Kanji as baseline: the 2,136-character Joyo list covers 99%+ of modern published text. Adding the Jinmei-yo list (863 additional characters used in personal names) gets you to ~2,999
Use frequency-band partitioning like Chinese for sites with extensive content
Vertical writing: Japanese supports vertical text (tategaki). If your design uses it, ensure the font includes the necessary OpenType vertical features (vert, vrt2)
Recommended fonts: Noto Sans JP, Source Han Sans, M PLUS, all open-source with comprehensive coverage and good vertical-writing support

Korean

Korean uses Hangul, an alphabetic script where letters combine into syllable blocks. Theoretically the system could be encoded with just the 24 Jamo (basic letters), but in practice Korean fonts ship 11,172 precomposed Hangul syllables, every possible combination. Plus Hanja (Chinese characters used in Korean), making a full Korean font 2-10 MB. The KS X 1001 standard's 2,350 syllable subset covers ~99% of modern Korean text and is the practical baseline.

Subset Tiers

Subset	Glyphs	Coverage
KS X 1001	2,350	~99% of modern Korean text
Full Hangul Syllables	11,172	All possible combinations
Full + Hanja	~16,000+	Academic text, classical works

Unicode Ranges

/* Hangul Syllables (precomposed) */
U+AC00-D7AF

/* Hangul Jamo (basic letters) */
U+1100-11FF

/* Hangul Compatibility Jamo */
U+3130-318F

/* Hangul Jamo Extended-A */
U+A960-A97F

/* Hanja (Chinese characters used in Korean) */
U+4E00-9FFF

Tips

For modern Korean web content, KS X 1001 (2,350 chars) is sufficient. Most users will never see the missing 8,000+ rare syllables
If your content includes proper names, place names, or formal academic text, expand to the full 11,172-syllable set
Hanja is rarely needed: modern Korean rarely uses Chinese characters except in academic/legal contexts. Skip Hanja unless you specifically need it
Use frequency partitioning for very large Korean sites, splitting the syllable set by frequency band can reduce initial download by 70%+
Recommended fonts: Noto Sans KR, Pretendard, Spoqa Han Sans, all open-source with comprehensive Hangul coverage
Test with both formal Korean and casual social media text, vocabulary differs significantly between registers

Universal Best Practices

Always Convert to WOFF2

After subsetting, convert TTF/OTF output to WOFF2. Brotli compression adds another 20-30% reduction on top of the subset savings. WOFF2 has 97%+ browser support.

Use unicode-range in @font-face

For multilingual sites, define multiple @font-face declarations with unicode-range. Browsers fetch only the subsets containing characters actually rendered on the page.

Test with Real Content

Subsets that work for sample text can fail on real content with unexpected characters. Test with actual production text and watch for tofu (□) where glyphs are missing.

Verify Licensing Permits Subsetting

OFL fonts allow subsetting explicitly. Many commercial EULAs prohibit modification , including subsetting. See our font modification rights guide.

Subset Your Fonts Now

Browser-based, no installation. Works for every script covered on this page. Output ready WOFF2 with @font-face CSS in one workflow.

Open Font Subsetter Unicode Range Generator

Font Subsetting by Language

Why Subset by Language

How to Subset Fonts

Open the Font Subsetter

Upload your font

Pick presets or Unicode ranges

Add common characters

Generate and download

Convert to WOFF2 if not already

Latin

Coverage Tiers

Expected Size Reductions

Cyrillic

Cyrillic Languages

Unicode Ranges

Tips

Arabic

Unicode Ranges

Critical Subsetting Rules

Language-Specific Coverage

Tips

Chinese

Practical Subset Tiers

Unicode Ranges

Frequency Partitioning Strategy

Tips

Japanese

Subset Tiers

Unicode Ranges

Tips

Korean

Subset Tiers

Unicode Ranges

Tips

Universal Best Practices

Always Convert to WOFF2

Use unicode-range in @font-face

Test with Real Content

Verify Licensing Permits Subsetting

Subset Your Fonts Now

Sarah Mitchell

Font Subsetting by Language FAQs

What is font subsetting?

What Unicode range covers Latin characters?

Will subsetting break my website for non-English users?

How do I create a Latin font subset?

Should I create multiple subsets for different scripts?

What Unicode range covers Cyrillic characters?

How do I create a Cyrillic font subset?

Should I combine Latin and Cyrillic in one subset?

Why do some Cyrillic characters look different?

What's a typical Cyrillic subset file size?

What Unicode range covers Arabic characters?

Why are Arabic fonts complex to subset?

How do I subset Arabic fonts safely?

Can I combine Arabic and Latin in one font file?

What's a typical Arabic font file size?

Why are Chinese fonts so large?

How do I optimize Chinese fonts for web?

What Unicode range covers Chinese?

Should I use Google Fonts for Chinese text?

Can I create a minimal Chinese subset?

Why are Japanese fonts so large?

How do I optimize Japanese fonts for web?

What Unicode ranges cover Japanese?

Should I use Google Fonts for Japanese text?

Can I create a custom Japanese font subset?

Why are Korean fonts so large?

How can I reduce Korean font file size?

What Unicode range covers Korean characters?

Can I dynamically load Korean font subsets?

What's the best format for Korean web fonts?