CJK Font Optimization Guide
Chinese, Japanese, and Korean fonts pose the greatest web font challenge: unoptimized files run 5-20MB, but with the right subsetting and splitting strategies you can serve high-quality CJK typography at 100-500KB. This guide covers every technique from frequency-based subsetting to Google Fonts automatic slicing.
TL;DR - Key Takeaways
- • CJK fonts are 5-20MB unoptimized due to 20,000-80,000 glyphs; target 100-500KB with subsetting
- • Google Fonts auto-splits CJK into 100+ small unicode-range slices—use it when possible
- • Always set the
langattribute correctly; glyph shapes differ between Chinese, Japanese, and Korean - • Self-host for control and privacy; use Google Fonts for optimal automatic splitting
In this article
CJK—the collective term for Chinese, Japanese, and Korean scripts—presents a unique web typography challenge with no parallel in Latin or even Arabic web fonts. While a complete Latin character set requires roughly 200-300 glyphs, a standard Chinese font contains over 20,000 glyphs. Japanese fonts add hiragana, katakana, and extensive kanji. Korean's Hangul writing system alone has 11,172 possible syllable blocks. The cumulative result: a single unoptimized CJK font file frequently weighs 5 to 20 megabytes—compared to 15-50 kilobytes for a Latin WOFF2.
Despite this challenge, CJK web typography is entirely achievable with modern tooling. The three core strategies—frequency-based subsetting, unicode-range block splitting, and leveraging Google Fonts' automatic slicing infrastructure—can reduce per-page CJK font data to 100-500KB while maintaining comprehensive character coverage for typical content. Understanding when and how to apply each approach is the difference between a 12-second Chinese site and a sub-second one.
This guide examines each optimization technique in depth with real CSS and command-line examples, compares the trade-offs between self-hosting and CDN delivery, surveys the most widely used CJK fonts with their file size benchmarks, and explains the linguistic differences between Chinese, Japanese, and Korean that affect font choice and the critical importance of the HTML lang attribute.
Whether you are building a Mandarin e-commerce site, a Japanese blog, or a multilingual SaaS product that supports Korean, the techniques here will help you deliver beautiful, performant CJK typography to your users.
The CJK Font Size Challenge
The sheer number of characters in CJK writing systems is the root cause of large font files. Each ideograph or syllable block requires its own set of vector outlines, hinting instructions, and metric data. When you multiply that per-glyph overhead by tens of thousands of glyphs, file sizes balloon to levels that are completely impractical to download in one request.
| Script / Font | Typical Glyph Count | Uncompressed TTF | WOFF2 (compressed) |
|---|---|---|---|
| Latin (e.g., Inter Regular) | ~500 | ~120 KB | ~20 KB |
| Arabic (Noto Naskh Arabic) | ~1,000 | ~250 KB | ~80 KB |
| Korean (Noto Sans KR) | ~11,172 Hangul + hanja | ~8 MB | ~3.5 MB |
| Japanese (Noto Sans JP) | ~17,000 (kana + kanji) | ~12 MB | ~4.5 MB |
| Simplified Chinese (Noto Sans SC) | ~22,000 | ~16 MB | ~6 MB |
The numbers above illustrate why naively serving a full CJK font is untenable. A 6MB WOFF2 file on a 4G connection with 20Mbps throughput takes roughly 2.4 seconds just to download—before rendering begins. On congested networks or slower connections common in developing markets (key audiences for Chinese and Korean web content), that becomes 10-15 seconds.
Why WOFF2 Compression Helps Less for CJK
WOFF2 uses Brotli compression and typically achieves 40-50% size reduction over TTF for Latin fonts because Latin glyph outlines are relatively similar (lots of curves and straight lines that compress well). CJK glyphs are more structurally diverse—each ideograph has a unique combination of strokes—so WOFF2 achieves only 30-40% reduction on CJK fonts. The baseline TTF is simply larger, and compression cannot overcome glyph count.
This is why subsetting—removing glyphs not needed for your content—is the only reliable way to achieve practical CJK font file sizes.
Subsetting Strategies for CJK
Subsetting removes glyphs from a font file, keeping only the characters your content actually uses. For CJK, there are three main subsetting strategies, each with different trade-offs between file size and character coverage risk.
1. Frequency-Based Subsetting
The most aggressive approach: include only the most frequently used characters in Chinese text. Corpus analysis of billions of Chinese characters from newspapers, social media, and websites has produced well-established frequency lists:
3,000
Most common characters
Covers ~99.2% of Chinese web text
5,000
Extended common set
Covers ~99.9% of typical content
8,105
GB2312 standard set
Official Chinese standard for simplified chars
Using pyftsubset from the fonttools Python library, you can create a frequency-based subset with a Unicode list file:
# Install fonttools pip install fonttools brotli # Subset to top 3000 Chinese characters from a Unicode list file # (top3000-chinese.txt contains one Unicode codepoint per line, e.g. U+4E2D) pyftsubset NotoSansSC-Regular.ttf \ --unicodes-file=top3000-chinese.txt \ --output-file=NotoSansSC-3000-Regular.ttf \ --flavor=woff2 \ --layout-features="*" # Result: ~300-500KB WOFF2 vs 6MB original # Subset by specifying a text sample (characters actually used in your content) pyftsubset NotoSansSC-Regular.ttf \ --text-file=your-content-sample.txt \ --output-file=NotoSansSC-content-subset.woff2 \ --flavor=woff2 \ --layout-features="*"
Risk: Frequency-based subsetting can cause "tofu" (empty boxes) for rare characters. For user-generated content or search results you do not control, always include a full-font fallback in your CSS font stack or use Google Fonts which covers all characters.
2. Unicode Block Subsetting
Split the font by Unicode block rather than character frequency. This approach groups characters by their Unicode range, making it easier to reason about coverage and to create shareable subset files that work across projects.
| Block Name | Range | Chars | Content |
|---|---|---|---|
| CJK Unified Ideographs | U+4E00-9FFF | 20,902 | Core CJK characters |
| Hiragana | U+3040-309F | 96 | Japanese hiragana syllables |
| Katakana | U+30A0-30FF | 96 | Japanese katakana syllables |
| Hangul Syllables | U+AC00-D7AF | 11,172 | Modern Korean syllable blocks |
| CJK Extension A | U+3400-4DBF | 6,592 | Rare/historical CJK ideographs |
# Subset to core CJK Unified Ideographs block only pyftsubset NotoSansSC-Regular.ttf \ --unicodes="U+4E00-9FFF,U+3000-303F,U+FF00-FFEF" \ --output-file=NotoSansSC-core-cjk.woff2 \ --flavor=woff2 # Subset Japanese: hiragana + katakana + common kanji pyftsubset NotoSansJP-Regular.ttf \ --unicodes="U+3040-30FF,U+4E00-9FFF,U+FF00-FFEF,U+3000-303F" \ --output-file=NotoSansJP-web.woff2 \ --flavor=woff2 # Korean: Hangul syllables + compatibility jamo pyftsubset NotoSansKR-Regular.ttf \ --unicodes="U+AC00-D7AF,U+1100-11FF,U+3130-318F" \ --output-file=NotoSansKR-web.woff2 \ --flavor=woff2
3. Content-Specific Subsetting
The smallest possible subset: only the exact characters that appear in your content. This works for static sites or content that changes infrequently. Use a build tool to analyze your content and generate a character list, then subset the font at build time.
# Extract unique characters from HTML files and create subset
# (glyphhanger is a Node.js tool that automates this)
npx glyphhanger http://localhost:3000 \
--subset=NotoSansSC-Regular.ttf \
--formats=woff2
# Or use Python to extract chars and pipe to pyftsubset
python3 -c "
import sys, re
text = open('content.txt').read()
chars = set(text)
print(','.join(f'U+{ord(c):04X}' for c in sorted(chars)))
" | xargs -I{} pyftsubset NotoSansSC-Regular.ttf \
--unicodes={} \
--output-file=content-subset.woff2 \
--flavor=woff2Best for: Marketing landing pages, blog posts, and documentation sites with controlled content. For e-commerce product descriptions or user-generated content, this approach risks missing characters introduced after build time.
Unicode-Range Splitting
Rather than serving one large subset, you can split a CJK font into multiple smaller files and use the CSS unicode-range descriptor to declare which characters each file covers. The browser downloads only the slices containing characters actually present on the current page.
This technique is the foundation of how Google Fonts handles CJK, and you can replicate it with self-hosted fonts. The key insight is that a browser parsing a page with 500 unique Chinese characters needs data for those 500 characters only—not 22,000. By splitting into ~200-character slices, each page typically loads 3-8 small files (50-80KB each) instead of one enormous file.
CSS Unicode-Range Implementation
The following CSS declares a CJK font split into four files by Unicode block range. Each @font-face rule shares the same font-family name, so they appear as a single font family to the rest of your CSS, while the browser fetches only what it needs:
/* Slice 1: Hiragana and Katakana (Japanese phonetic) */
@font-face {
font-family: 'Noto Sans JP';
font-style: normal;
font-weight: 400;
font-display: swap;
src: url('/fonts/noto-sans-jp-kana.woff2') format('woff2');
unicode-range: U+3040-309F, U+30A0-30FF, U+FF00-FFEF;
}
/* Slice 2: Common CJK Ideographs (U+4E00-6FFF) */
@font-face {
font-family: 'Noto Sans JP';
font-style: normal;
font-weight: 400;
font-display: swap;
src: url('/fonts/noto-sans-jp-cjk-1.woff2') format('woff2');
unicode-range: U+4E00-6FFF;
}
/* Slice 3: Common CJK Ideographs (U+7000-9FFF) */
@font-face {
font-family: 'Noto Sans JP';
font-style: normal;
font-weight: 400;
font-display: swap;
src: url('/fonts/noto-sans-jp-cjk-2.woff2') format('woff2');
unicode-range: U+7000-9FFF;
}
/* Slice 4: CJK Extension A and punctuation */
@font-face {
font-family: 'Noto Sans JP';
font-style: normal;
font-weight: 400;
font-display: swap;
src: url('/fonts/noto-sans-jp-ext.woff2') format('woff2');
unicode-range: U+3400-4DBF, U+3000-303F;
}
/* Use the font—browser fetches only relevant slices */
body:lang(ja) {
font-family: 'Noto Sans JP', sans-serif;
}How the Browser Decides Which Slices to Download
When the browser encounters text using the Noto Sans JP font family, it scans the page's text content and compares each character's Unicode code point against all declared unicode-range values. A network request is initiated only for slices that contain at least one character present in the rendered text.
A Japanese article using primarily hiragana and common kanji in the U+4E00-6FFF range would fetch Slice 1 and Slice 2—roughly 150KB total—while Slices 3 and 4 would never download. A different article using rarer characters might fetch all four slices but still only 250-300KB of the full multi-megabyte font.
Creating Split Subsets with pyftsubset
#!/bin/bash # Script to create unicode-range slices from a full CJK font FONT="NotoSansJP-Regular.ttf" # Kana slice pyftsubset "$FONT" \ --unicodes="U+3040-309F,U+30A0-30FF,U+FF00-FFEF,U+3000-303F" \ --output-file="noto-jp-kana.woff2" --flavor=woff2 # CJK slice 1: U+4E00-6FFF pyftsubset "$FONT" \ --unicodes="U+4E00-6FFF" \ --output-file="noto-jp-cjk1.woff2" --flavor=woff2 # CJK slice 2: U+7000-9FFF pyftsubset "$FONT" \ --unicodes="U+7000-9FFF" \ --output-file="noto-jp-cjk2.woff2" --flavor=woff2 # Extension slice pyftsubset "$FONT" \ --unicodes="U+3400-4DBF" \ --output-file="noto-jp-ext.woff2" --flavor=woff2 echo "Slice sizes:" ls -lh noto-jp-*.woff2
Google Fonts CJK: Automatic Optimization
Google Fonts implements the most sophisticated CJK font optimization available without any manual configuration: it automatically generates 100 to 160 tiny unicode-range slices per CJK font, each containing roughly 100-200 characters. Each slice is a separately downloadable WOFF2 file, and the entire delivery is orchestrated via the CSS the Google Fonts API returns.
Google's slicing algorithm is not simply alphabetical—it is frequency-optimized, placing the most commonly used characters in the first slices so that most pages only need to download a small number of files. The infrastructure is also CDN-distributed globally, with aggressive HTTP/2 multiplexing so the 3-8 slice requests that a typical page triggers are fetched in a single connection round-trip.
Using Google Fonts for CJK
Loading a CJK font via Google Fonts is identical to loading any other Google Font:
<!-- HTML link tag --> <link rel="preconnect" href="https://fonts.googleapis.com"> <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> <link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@400;700&display=swap" rel="stylesheet"> <!-- Or load multiple CJK scripts --> <link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@400;700&family=Noto+Sans+JP:wght@400;700&display=swap" rel="stylesheet">
When the browser fetches the Google Fonts CSS URL, it receives a stylesheet containing approximately 150 @font-face rules. Each rule covers a narrow unicode-range and points to a unique WOFF2 file hosted on Google's CDN. A sample excerpt:
/* [0] - Most common characters (Google Fonts auto-generated) */
@font-face {
font-family: 'Noto Sans SC';
font-style: normal;
font-weight: 400;
font-display: swap;
src: url(https://fonts.gstatic.com/s/notosanssc/v37/k3kCo84MPvpLmixcA63oeAL7Iqp5IZJF9bmaG9-anYmTzY.woff2)
format('woff2');
unicode-range: U+1fa0e, U+1fa13, U+1fa1e, U+1fa2f, U+1fa6c, ...;
}
/* [1] */
@font-face {
font-family: 'Noto Sans SC';
font-style: normal;
font-weight: 400;
font-display: swap;
src: url(https://fonts.gstatic.com/s/notosanssc/v37/k3kCo84MPvpLmixcA63oeAL7Iqp5IZJF9bmaG9-bnYmTzY.woff2)
format('woff2');
unicode-range: U+1f9e, U+1f9f, U+2001, U+20189, ...;
}
/* ... continues for 100+ more slices ... */Advantages of Google Fonts for CJK
- +Automatic 100+ slice splitting without any manual work
- +Frequency-optimized slice order—most pages load only 3-8 files
- +Global CDN with excellent cache-hit rates across millions of sites
- +HTTP/2 push and multiplexing for parallel slice delivery
- +No GDPR concerns for China-market sites (gstatic.com is accessible)
Disadvantages of Google Fonts for CJK
- -External dependency—Google Fonts outages affect your typography
- -GDPR implications for EU users (IP sent to Google servers)
- -No control over exact slicing or font version updates
- -Limited font selection vs. commercial CJK font libraries
- -Extra DNS lookup and connection overhead for first-time visitors
Self-Host vs CDN Trade-offs
The decision to self-host CJK fonts or use a CDN is more consequential than for Latin fonts due to the complexity of CJK optimization. Self-hosting gives full control but requires significant infrastructure work to match the optimization level Google Fonts provides automatically.
| Factor | Self-Hosted | Google Fonts CDN | Other CDN (jsDelivr, Bunny) |
|---|---|---|---|
| Control | Full | None | Partial |
| CJK Optimization | Manual (complex) | Automatic (best) | Varies by font |
| Privacy (GDPR) | Full compliance | Requires consent | Bunny: GDPR-friendly |
| Setup Complexity | High | Minimal | Low-Medium |
| Cache Hit Rate | Your traffic only | Millions of sites share | Moderate |
| Font Selection | Any licensed font | Google Fonts catalog | Open-source only |
| Bandwidth Cost | Your server pays | Free | Free (open-source) |
Recommendation
For most projects, Google Fonts is the pragmatic choice for CJK due to automatic slicing and zero setup. Choose self-hosting when you need commercial fonts not available on Google Fonts, when GDPR compliance demands no third-party requests, or when your audience is primarily in a region where Google services are unreliable (e.g., mainland China, where you should use a local CDN like jsDelivr China mirror or Alibaba CDN).
CJK Font Choices
The following are the most widely used CJK web fonts, covering the most common use cases across Chinese, Japanese, and Korean content:
| Font Family | Scripts | Weights | Full WOFF2 | License | Notes |
|---|---|---|---|---|---|
| Noto Sans SC | Simplified Chinese | 100-900 | ~6 MB | OFL (free) | Google Fonts, best coverage |
| Noto Sans TC | Traditional Chinese | 100-900 | ~7 MB | OFL (free) | Taiwan/HK market standard |
| Noto Sans JP | Japanese (kana + kanji) | 100-900 | ~4.5 MB | OFL (free) | Most popular JP web font |
| Noto Sans KR | Korean (Hangul + hanja) | 100-900 | ~3.5 MB | OFL (free) | Standard for Korean sites |
| Source Han Sans | SC, TC, JP, KR | 7 weights | ~15 MB (pan-CJK) | OFL (free) | Adobe/Google collaboration; use regional variants |
| M PLUS 1p | Japanese + Latin | 9 weights | ~4 MB | OFL (free) | Modern, clean; popular for UI |
| IBM Plex Sans JP | Japanese + Latin | 6 weights | ~5 MB | OFL (free) | Technical/professional look |
| Kosugi Maru | Japanese + Latin | Regular only | ~3 MB | OFL (free) | Rounded, friendly; good for UX copy |
Chinese vs Japanese vs Korean Differences
CJK is often treated as a monolithic category, but Chinese, Japanese, and Korean have significant differences that affect font selection, glyph rendering, and the critical HTML lang attribute. Using the wrong regional font variant for your language is a common mistake that produces subtly wrong typography.
Shared Ideographs, Different Glyphs
The CJK Unified Ideographs Unicode block (U+4E00-9FFF) contains characters shared across Chinese, Japanese, and Korean writing. However, the same Unicode code point may have different preferred glyph forms in each language. The Unicode Consortium documented this as Han Unification, and it means a Japanese user and a Chinese user looking at the same Unicode character may expect to see subtly different stroke forms.
Example: The character U+8FBA (辺/邊/边)
In Simplified Chinese: 边 (simplified form). In Traditional Chinese: 邊 (traditional form). In Japanese: 辺 (Japanese standard form). Same Unicode meaning, three visually distinct glyphs. Serving a Simplified Chinese font to Japanese users produces incorrectly shaped characters that Japanese readers immediately recognize as wrong.
Always use the correct regional font variant: Noto Sans SC for Simplified Chinese, Noto Sans TC for Traditional Chinese, Noto Sans JP for Japanese, and Noto Sans KR for Korean. Never substitute one for another even though they share the same character ranges.
The Critical lang Attribute
The HTML lang attribute tells the browser which language variant to use when rendering shared Unicode characters. Without it, browsers default to their own heuristics—often producing Simplified Chinese glyphs even on Japanese pages, since Simplified Chinese fonts have wider OS distribution.
<!-- HTML: Set language on the root element -->
<html lang="ja"> <!-- Japanese -->
<html lang="zh-Hans"> <!-- Simplified Chinese -->
<html lang="zh-Hant"> <!-- Traditional Chinese -->
<html lang="ko"> <!-- Korean -->
<!-- CSS: Target specific language with :lang() selector -->
:lang(ja) {
font-family: 'Noto Sans JP', 'Hiragino Sans', sans-serif;
}
:lang(zh-Hans) {
font-family: 'Noto Sans SC', 'PingFang SC', sans-serif;
}
:lang(zh-Hant) {
font-family: 'Noto Sans TC', 'PingFang TC', sans-serif;
}
:lang(ko) {
font-family: 'Noto Sans KR', 'Apple SD Gothic Neo', sans-serif;
}Critical: On multilingual pages with mixed CJK content, use the lang attribute on individual elements to ensure each section uses the correct glyph forms. A page mixing Chinese and Japanese content without per-element lang attributes will render some characters in the wrong regional style.
Script-Specific Characteristics
Chinese
- • No phonetic syllabary—pure ideographs
- • Simplified (mainland) vs Traditional (TW/HK)
- • 20,000+ commonly used chars
- • Vertical text common in Traditional
- • Fullwidth punctuation standard
Japanese
- • Three scripts: hiragana, katakana, kanji
- • ~2,000 Joyo kanji in daily use
- • Latin often mixed in (loanwords)
- • Different preferred glyph shapes for shared kanji
- • Vertical text widely used in print
Korean
- • Hangul is primarily syllabic alphabet
- • 11,172 possible Hangul syllable blocks
- • Hanja (Chinese-origin) rarely used modernly
- • Latin commonly mixed in
- • More open vertical rhythm than CJK
Performance Benchmarks
The following benchmarks are based on a typical Chinese-language article page with approximately 800 unique Chinese characters, measured on a simulated 4G connection (20Mbps, 50ms RTT). Font loading time reflects only the CJK font bytes transferred, not total page load.
| Strategy | Font Data Transferred | Approx Load Time | Character Coverage Risk | Setup Effort |
|---|---|---|---|---|
| No optimization (full font) | ~6 MB | ~12-15 s | None | None |
| Basic block subset (U+4E00-9FFF) | ~2 MB | ~4 s | Low | Medium |
| Unicode-range split (4 slices) | ~500 KB | ~1.2 s | Low | High |
| Google Fonts (auto 100+ slices) | ~200-400 KB | ~0.5-0.8 s | None | Minimal |
| Aggressive content-specific subset | ~80-150 KB | ~0.2-0.3 s | High (static content only) | High |
Additional Performance Techniques
Preloading Priority Slices
<!-- Preload the most common slice --> <link rel="preload" href="/fonts/noto-sc-common.woff2" as="font" type="font/woff2" crossorigin>
font-display: swap for FOUT
@font-face {
font-family: 'Noto Sans SC';
/* swap = show system font immediately */
font-display: swap;
src: url(...) format('woff2');
}Subset Your CJK Fonts Online
Use our free font subsetter to reduce Chinese, Japanese, and Korean font files to the exact characters your content needs.
Open Font SubsetterWritten & Verified by
Sarah Mitchell
Product Designer, Font Specialist
CJK Font Optimization FAQs
Common questions about Chinese, Japanese, and Korean web font optimization
