Font Converter

CJK Font Optimization Guide

Chinese, Japanese, and Korean fonts pose the greatest web font challenge: unoptimized files run 5-20MB, but with the right subsetting and splitting strategies you can serve high-quality CJK typography at 100-500KB. This guide covers every technique from frequency-based subsetting to Google Fonts automatic slicing.

TL;DR - Key Takeaways

  • • CJK fonts are 5-20MB unoptimized due to 20,000-80,000 glyphs; target 100-500KB with subsetting
  • • Google Fonts auto-splits CJK into 100+ small unicode-range slices—use it when possible
  • • Always set the lang attribute correctly; glyph shapes differ between Chinese, Japanese, and Korean
  • • Self-host for control and privacy; use Google Fonts for optimal automatic splitting

Share this page to:

CJK—the collective term for Chinese, Japanese, and Korean scripts—presents a unique web typography challenge with no parallel in Latin or even Arabic web fonts. While a complete Latin character set requires roughly 200-300 glyphs, a standard Chinese font contains over 20,000 glyphs. Japanese fonts add hiragana, katakana, and extensive kanji. Korean's Hangul writing system alone has 11,172 possible syllable blocks. The cumulative result: a single unoptimized CJK font file frequently weighs 5 to 20 megabytes—compared to 15-50 kilobytes for a Latin WOFF2.

Despite this challenge, CJK web typography is entirely achievable with modern tooling. The three core strategies—frequency-based subsetting, unicode-range block splitting, and leveraging Google Fonts' automatic slicing infrastructure—can reduce per-page CJK font data to 100-500KB while maintaining comprehensive character coverage for typical content. Understanding when and how to apply each approach is the difference between a 12-second Chinese site and a sub-second one.

This guide examines each optimization technique in depth with real CSS and command-line examples, compares the trade-offs between self-hosting and CDN delivery, surveys the most widely used CJK fonts with their file size benchmarks, and explains the linguistic differences between Chinese, Japanese, and Korean that affect font choice and the critical importance of the HTML lang attribute.

Whether you are building a Mandarin e-commerce site, a Japanese blog, or a multilingual SaaS product that supports Korean, the techniques here will help you deliver beautiful, performant CJK typography to your users.

The CJK Font Size Challenge

The sheer number of characters in CJK writing systems is the root cause of large font files. Each ideograph or syllable block requires its own set of vector outlines, hinting instructions, and metric data. When you multiply that per-glyph overhead by tens of thousands of glyphs, file sizes balloon to levels that are completely impractical to download in one request.

Script / FontTypical Glyph CountUncompressed TTFWOFF2 (compressed)
Latin (e.g., Inter Regular)~500~120 KB~20 KB
Arabic (Noto Naskh Arabic)~1,000~250 KB~80 KB
Korean (Noto Sans KR)~11,172 Hangul + hanja~8 MB~3.5 MB
Japanese (Noto Sans JP)~17,000 (kana + kanji)~12 MB~4.5 MB
Simplified Chinese (Noto Sans SC)~22,000~16 MB~6 MB

The numbers above illustrate why naively serving a full CJK font is untenable. A 6MB WOFF2 file on a 4G connection with 20Mbps throughput takes roughly 2.4 seconds just to download—before rendering begins. On congested networks or slower connections common in developing markets (key audiences for Chinese and Korean web content), that becomes 10-15 seconds.

Why WOFF2 Compression Helps Less for CJK

WOFF2 uses Brotli compression and typically achieves 40-50% size reduction over TTF for Latin fonts because Latin glyph outlines are relatively similar (lots of curves and straight lines that compress well). CJK glyphs are more structurally diverse—each ideograph has a unique combination of strokes—so WOFF2 achieves only 30-40% reduction on CJK fonts. The baseline TTF is simply larger, and compression cannot overcome glyph count.

This is why subsetting—removing glyphs not needed for your content—is the only reliable way to achieve practical CJK font file sizes.

Subsetting Strategies for CJK

Subsetting removes glyphs from a font file, keeping only the characters your content actually uses. For CJK, there are three main subsetting strategies, each with different trade-offs between file size and character coverage risk.

1. Frequency-Based Subsetting

The most aggressive approach: include only the most frequently used characters in Chinese text. Corpus analysis of billions of Chinese characters from newspapers, social media, and websites has produced well-established frequency lists:

3,000

Most common characters

Covers ~99.2% of Chinese web text

5,000

Extended common set

Covers ~99.9% of typical content

8,105

GB2312 standard set

Official Chinese standard for simplified chars

Using pyftsubset from the fonttools Python library, you can create a frequency-based subset with a Unicode list file:

# Install fonttools
pip install fonttools brotli

# Subset to top 3000 Chinese characters from a Unicode list file
# (top3000-chinese.txt contains one Unicode codepoint per line, e.g. U+4E2D)
pyftsubset NotoSansSC-Regular.ttf \
  --unicodes-file=top3000-chinese.txt \
  --output-file=NotoSansSC-3000-Regular.ttf \
  --flavor=woff2 \
  --layout-features="*"

# Result: ~300-500KB WOFF2 vs 6MB original

# Subset by specifying a text sample (characters actually used in your content)
pyftsubset NotoSansSC-Regular.ttf \
  --text-file=your-content-sample.txt \
  --output-file=NotoSansSC-content-subset.woff2 \
  --flavor=woff2 \
  --layout-features="*"

Risk: Frequency-based subsetting can cause "tofu" (empty boxes) for rare characters. For user-generated content or search results you do not control, always include a full-font fallback in your CSS font stack or use Google Fonts which covers all characters.

2. Unicode Block Subsetting

Split the font by Unicode block rather than character frequency. This approach groups characters by their Unicode range, making it easier to reason about coverage and to create shareable subset files that work across projects.

Block NameRangeCharsContent
CJK Unified IdeographsU+4E00-9FFF20,902Core CJK characters
HiraganaU+3040-309F96Japanese hiragana syllables
KatakanaU+30A0-30FF96Japanese katakana syllables
Hangul SyllablesU+AC00-D7AF11,172Modern Korean syllable blocks
CJK Extension AU+3400-4DBF6,592Rare/historical CJK ideographs
# Subset to core CJK Unified Ideographs block only
pyftsubset NotoSansSC-Regular.ttf \
  --unicodes="U+4E00-9FFF,U+3000-303F,U+FF00-FFEF" \
  --output-file=NotoSansSC-core-cjk.woff2 \
  --flavor=woff2

# Subset Japanese: hiragana + katakana + common kanji
pyftsubset NotoSansJP-Regular.ttf \
  --unicodes="U+3040-30FF,U+4E00-9FFF,U+FF00-FFEF,U+3000-303F" \
  --output-file=NotoSansJP-web.woff2 \
  --flavor=woff2

# Korean: Hangul syllables + compatibility jamo
pyftsubset NotoSansKR-Regular.ttf \
  --unicodes="U+AC00-D7AF,U+1100-11FF,U+3130-318F" \
  --output-file=NotoSansKR-web.woff2 \
  --flavor=woff2

3. Content-Specific Subsetting

The smallest possible subset: only the exact characters that appear in your content. This works for static sites or content that changes infrequently. Use a build tool to analyze your content and generate a character list, then subset the font at build time.

# Extract unique characters from HTML files and create subset
# (glyphhanger is a Node.js tool that automates this)
npx glyphhanger http://localhost:3000 \
  --subset=NotoSansSC-Regular.ttf \
  --formats=woff2

# Or use Python to extract chars and pipe to pyftsubset
python3 -c "
import sys, re
text = open('content.txt').read()
chars = set(text)
print(','.join(f'U+{ord(c):04X}' for c in sorted(chars)))
" | xargs -I{} pyftsubset NotoSansSC-Regular.ttf \
  --unicodes={} \
  --output-file=content-subset.woff2 \
  --flavor=woff2

Best for: Marketing landing pages, blog posts, and documentation sites with controlled content. For e-commerce product descriptions or user-generated content, this approach risks missing characters introduced after build time.

Unicode-Range Splitting

Rather than serving one large subset, you can split a CJK font into multiple smaller files and use the CSS unicode-range descriptor to declare which characters each file covers. The browser downloads only the slices containing characters actually present on the current page.

This technique is the foundation of how Google Fonts handles CJK, and you can replicate it with self-hosted fonts. The key insight is that a browser parsing a page with 500 unique Chinese characters needs data for those 500 characters only—not 22,000. By splitting into ~200-character slices, each page typically loads 3-8 small files (50-80KB each) instead of one enormous file.

CSS Unicode-Range Implementation

The following CSS declares a CJK font split into four files by Unicode block range. Each @font-face rule shares the same font-family name, so they appear as a single font family to the rest of your CSS, while the browser fetches only what it needs:

/* Slice 1: Hiragana and Katakana (Japanese phonetic) */
@font-face {
  font-family: 'Noto Sans JP';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/noto-sans-jp-kana.woff2') format('woff2');
  unicode-range: U+3040-309F, U+30A0-30FF, U+FF00-FFEF;
}

/* Slice 2: Common CJK Ideographs (U+4E00-6FFF) */
@font-face {
  font-family: 'Noto Sans JP';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/noto-sans-jp-cjk-1.woff2') format('woff2');
  unicode-range: U+4E00-6FFF;
}

/* Slice 3: Common CJK Ideographs (U+7000-9FFF) */
@font-face {
  font-family: 'Noto Sans JP';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/noto-sans-jp-cjk-2.woff2') format('woff2');
  unicode-range: U+7000-9FFF;
}

/* Slice 4: CJK Extension A and punctuation */
@font-face {
  font-family: 'Noto Sans JP';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/noto-sans-jp-ext.woff2') format('woff2');
  unicode-range: U+3400-4DBF, U+3000-303F;
}

/* Use the font—browser fetches only relevant slices */
body:lang(ja) {
  font-family: 'Noto Sans JP', sans-serif;
}

How the Browser Decides Which Slices to Download

When the browser encounters text using the Noto Sans JP font family, it scans the page's text content and compares each character's Unicode code point against all declared unicode-range values. A network request is initiated only for slices that contain at least one character present in the rendered text.

A Japanese article using primarily hiragana and common kanji in the U+4E00-6FFF range would fetch Slice 1 and Slice 2—roughly 150KB total—while Slices 3 and 4 would never download. A different article using rarer characters might fetch all four slices but still only 250-300KB of the full multi-megabyte font.

Creating Split Subsets with pyftsubset

#!/bin/bash
# Script to create unicode-range slices from a full CJK font

FONT="NotoSansJP-Regular.ttf"

# Kana slice
pyftsubset "$FONT" \
  --unicodes="U+3040-309F,U+30A0-30FF,U+FF00-FFEF,U+3000-303F" \
  --output-file="noto-jp-kana.woff2" --flavor=woff2

# CJK slice 1: U+4E00-6FFF
pyftsubset "$FONT" \
  --unicodes="U+4E00-6FFF" \
  --output-file="noto-jp-cjk1.woff2" --flavor=woff2

# CJK slice 2: U+7000-9FFF
pyftsubset "$FONT" \
  --unicodes="U+7000-9FFF" \
  --output-file="noto-jp-cjk2.woff2" --flavor=woff2

# Extension slice
pyftsubset "$FONT" \
  --unicodes="U+3400-4DBF" \
  --output-file="noto-jp-ext.woff2" --flavor=woff2

echo "Slice sizes:"
ls -lh noto-jp-*.woff2

Google Fonts CJK: Automatic Optimization

Google Fonts implements the most sophisticated CJK font optimization available without any manual configuration: it automatically generates 100 to 160 tiny unicode-range slices per CJK font, each containing roughly 100-200 characters. Each slice is a separately downloadable WOFF2 file, and the entire delivery is orchestrated via the CSS the Google Fonts API returns.

Google's slicing algorithm is not simply alphabetical—it is frequency-optimized, placing the most commonly used characters in the first slices so that most pages only need to download a small number of files. The infrastructure is also CDN-distributed globally, with aggressive HTTP/2 multiplexing so the 3-8 slice requests that a typical page triggers are fetched in a single connection round-trip.

Using Google Fonts for CJK

Loading a CJK font via Google Fonts is identical to loading any other Google Font:

<!-- HTML link tag -->
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@400;700&display=swap" rel="stylesheet">

<!-- Or load multiple CJK scripts -->
<link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@400;700&family=Noto+Sans+JP:wght@400;700&display=swap" rel="stylesheet">

When the browser fetches the Google Fonts CSS URL, it receives a stylesheet containing approximately 150 @font-face rules. Each rule covers a narrow unicode-range and points to a unique WOFF2 file hosted on Google's CDN. A sample excerpt:

/* [0] - Most common characters (Google Fonts auto-generated) */
@font-face {
  font-family: 'Noto Sans SC';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url(https://fonts.gstatic.com/s/notosanssc/v37/k3kCo84MPvpLmixcA63oeAL7Iqp5IZJF9bmaG9-anYmTzY.woff2)
       format('woff2');
  unicode-range: U+1fa0e, U+1fa13, U+1fa1e, U+1fa2f, U+1fa6c, ...;
}
/* [1] */
@font-face {
  font-family: 'Noto Sans SC';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url(https://fonts.gstatic.com/s/notosanssc/v37/k3kCo84MPvpLmixcA63oeAL7Iqp5IZJF9bmaG9-bnYmTzY.woff2)
       format('woff2');
  unicode-range: U+1f9e, U+1f9f, U+2001, U+20189, ...;
}
/* ... continues for 100+ more slices ... */

Advantages of Google Fonts for CJK

  • +Automatic 100+ slice splitting without any manual work
  • +Frequency-optimized slice order—most pages load only 3-8 files
  • +Global CDN with excellent cache-hit rates across millions of sites
  • +HTTP/2 push and multiplexing for parallel slice delivery
  • +No GDPR concerns for China-market sites (gstatic.com is accessible)

Disadvantages of Google Fonts for CJK

  • -External dependency—Google Fonts outages affect your typography
  • -GDPR implications for EU users (IP sent to Google servers)
  • -No control over exact slicing or font version updates
  • -Limited font selection vs. commercial CJK font libraries
  • -Extra DNS lookup and connection overhead for first-time visitors

Self-Host vs CDN Trade-offs

The decision to self-host CJK fonts or use a CDN is more consequential than for Latin fonts due to the complexity of CJK optimization. Self-hosting gives full control but requires significant infrastructure work to match the optimization level Google Fonts provides automatically.

FactorSelf-HostedGoogle Fonts CDNOther CDN (jsDelivr, Bunny)
ControlFullNonePartial
CJK OptimizationManual (complex)Automatic (best)Varies by font
Privacy (GDPR)Full complianceRequires consentBunny: GDPR-friendly
Setup ComplexityHighMinimalLow-Medium
Cache Hit RateYour traffic onlyMillions of sites shareModerate
Font SelectionAny licensed fontGoogle Fonts catalogOpen-source only
Bandwidth CostYour server paysFreeFree (open-source)

Recommendation

For most projects, Google Fonts is the pragmatic choice for CJK due to automatic slicing and zero setup. Choose self-hosting when you need commercial fonts not available on Google Fonts, when GDPR compliance demands no third-party requests, or when your audience is primarily in a region where Google services are unreliable (e.g., mainland China, where you should use a local CDN like jsDelivr China mirror or Alibaba CDN).

CJK Font Choices

The following are the most widely used CJK web fonts, covering the most common use cases across Chinese, Japanese, and Korean content:

Font FamilyScriptsWeightsFull WOFF2LicenseNotes
Noto Sans SCSimplified Chinese100-900~6 MBOFL (free)Google Fonts, best coverage
Noto Sans TCTraditional Chinese100-900~7 MBOFL (free)Taiwan/HK market standard
Noto Sans JPJapanese (kana + kanji)100-900~4.5 MBOFL (free)Most popular JP web font
Noto Sans KRKorean (Hangul + hanja)100-900~3.5 MBOFL (free)Standard for Korean sites
Source Han SansSC, TC, JP, KR7 weights~15 MB (pan-CJK)OFL (free)Adobe/Google collaboration; use regional variants
M PLUS 1pJapanese + Latin9 weights~4 MBOFL (free)Modern, clean; popular for UI
IBM Plex Sans JPJapanese + Latin6 weights~5 MBOFL (free)Technical/professional look
Kosugi MaruJapanese + LatinRegular only~3 MBOFL (free)Rounded, friendly; good for UX copy

Chinese vs Japanese vs Korean Differences

CJK is often treated as a monolithic category, but Chinese, Japanese, and Korean have significant differences that affect font selection, glyph rendering, and the critical HTML lang attribute. Using the wrong regional font variant for your language is a common mistake that produces subtly wrong typography.

Shared Ideographs, Different Glyphs

The CJK Unified Ideographs Unicode block (U+4E00-9FFF) contains characters shared across Chinese, Japanese, and Korean writing. However, the same Unicode code point may have different preferred glyph forms in each language. The Unicode Consortium documented this as Han Unification, and it means a Japanese user and a Chinese user looking at the same Unicode character may expect to see subtly different stroke forms.

Example: The character U+8FBA (辺/邊/边)

In Simplified Chinese: 边 (simplified form). In Traditional Chinese: 邊 (traditional form). In Japanese: 辺 (Japanese standard form). Same Unicode meaning, three visually distinct glyphs. Serving a Simplified Chinese font to Japanese users produces incorrectly shaped characters that Japanese readers immediately recognize as wrong.

Always use the correct regional font variant: Noto Sans SC for Simplified Chinese, Noto Sans TC for Traditional Chinese, Noto Sans JP for Japanese, and Noto Sans KR for Korean. Never substitute one for another even though they share the same character ranges.

The Critical lang Attribute

The HTML lang attribute tells the browser which language variant to use when rendering shared Unicode characters. Without it, browsers default to their own heuristics—often producing Simplified Chinese glyphs even on Japanese pages, since Simplified Chinese fonts have wider OS distribution.

<!-- HTML: Set language on the root element -->
<html lang="ja">  <!-- Japanese -->
<html lang="zh-Hans">  <!-- Simplified Chinese -->
<html lang="zh-Hant">  <!-- Traditional Chinese -->
<html lang="ko">  <!-- Korean -->

<!-- CSS: Target specific language with :lang() selector -->
:lang(ja) {
  font-family: 'Noto Sans JP', 'Hiragino Sans', sans-serif;
}
:lang(zh-Hans) {
  font-family: 'Noto Sans SC', 'PingFang SC', sans-serif;
}
:lang(zh-Hant) {
  font-family: 'Noto Sans TC', 'PingFang TC', sans-serif;
}
:lang(ko) {
  font-family: 'Noto Sans KR', 'Apple SD Gothic Neo', sans-serif;
}

Critical: On multilingual pages with mixed CJK content, use the lang attribute on individual elements to ensure each section uses the correct glyph forms. A page mixing Chinese and Japanese content without per-element lang attributes will render some characters in the wrong regional style.

Script-Specific Characteristics

Chinese

  • • No phonetic syllabary—pure ideographs
  • • Simplified (mainland) vs Traditional (TW/HK)
  • • 20,000+ commonly used chars
  • • Vertical text common in Traditional
  • • Fullwidth punctuation standard

Japanese

  • • Three scripts: hiragana, katakana, kanji
  • • ~2,000 Joyo kanji in daily use
  • • Latin often mixed in (loanwords)
  • • Different preferred glyph shapes for shared kanji
  • • Vertical text widely used in print

Korean

  • • Hangul is primarily syllabic alphabet
  • • 11,172 possible Hangul syllable blocks
  • • Hanja (Chinese-origin) rarely used modernly
  • • Latin commonly mixed in
  • • More open vertical rhythm than CJK

Performance Benchmarks

The following benchmarks are based on a typical Chinese-language article page with approximately 800 unique Chinese characters, measured on a simulated 4G connection (20Mbps, 50ms RTT). Font loading time reflects only the CJK font bytes transferred, not total page load.

StrategyFont Data TransferredApprox Load TimeCharacter Coverage RiskSetup Effort
No optimization (full font)~6 MB~12-15 sNoneNone
Basic block subset (U+4E00-9FFF)~2 MB~4 sLowMedium
Unicode-range split (4 slices)~500 KB~1.2 sLowHigh
Google Fonts (auto 100+ slices)~200-400 KB~0.5-0.8 sNoneMinimal
Aggressive content-specific subset~80-150 KB~0.2-0.3 sHigh (static content only)High

Additional Performance Techniques

Preloading Priority Slices

<!-- Preload the most common slice -->
<link rel="preload"
  href="/fonts/noto-sc-common.woff2"
  as="font"
  type="font/woff2"
  crossorigin>

font-display: swap for FOUT

@font-face {
  font-family: 'Noto Sans SC';
  /* swap = show system font immediately */
  font-display: swap;
  src: url(...) format('woff2');
}

Subset Your CJK Fonts Online

Use our free font subsetter to reduce Chinese, Japanese, and Korean font files to the exact characters your content needs.

Open Font Subsetter
Sarah Mitchell

Written & Verified by

Sarah Mitchell

Product Designer, Font Specialist

CJK Font Optimization FAQs

Common questions about Chinese, Japanese, and Korean web font optimization