Multilingual Font Setup
Sites serving multiple languages need a delivery strategy that doesn't make every user download every script. The CSS unicode-range descriptor in @font-face tells the browser which Unicode characters a font file covers, the browser then only fetches that font if the page actually contains characters in that range. Combined with the :lang() CSS selector, you can ship per-language font stacks that adapt automatically.
unicode-range Pattern
/* English / Latin Basic, small file, fetched everywhere */
@font-face {
font-family: "Inter";
src: url("/fonts/inter-latin.woff2") format("woff2");
unicode-range: U+0000-00FF;
font-display: swap;
}
/* Cyrillic, only fetched when the page has Cyrillic characters */
@font-face {
font-family: "Inter";
src: url("/fonts/inter-cyrillic.woff2") format("woff2");
unicode-range: U+0400-04FF;
font-display: swap;
}
/* Japanese Kanji, only fetched on pages with kanji */
@font-face {
font-family: "Noto Sans JP";
src: url("/fonts/noto-jp-kanji.woff2") format("woff2");
unicode-range: U+4E00-9FFF;
font-display: swap;
}Per-Language Font Stacks with :lang()
The :lang() selector matches elements based on their lang attribute. Use it to apply language-specific font stacks, adjust letter-spacing for scripts that need it, or modify line-height for taller scripts like Thai or Tibetan.
/* Default font stack */
body {
font-family: "Inter", system-ui, sans-serif;
}
/* Japanese needs different metrics */
:lang(ja) {
font-family: "Noto Sans JP", "Hiragino Sans", system-ui, sans-serif;
line-height: 1.7; /* Taller for kanji clarity */
}
/* Arabic needs RTL support and bigger size */
:lang(ar) {
font-family: "Noto Naskh Arabic", "Cairo", system-ui, sans-serif;
font-size: 1.05em;
direction: rtl;
}
/* Thai needs more line-height for tone marks */
:lang(th) {
font-family: "Noto Sans Thai", system-ui, sans-serif;
line-height: 1.85;
}Pan-Unicode vs Language-Specific
You have two architectural choices for multilingual sites:
- Pan-Unicode fonts (Noto Sans family) cover all scripts in unified visual style. Easier to deploy but trade typographic quality for coverage. Files are larger.
- Language-specific fonts with unicode-range partitioning. Better typography for each script. Smaller per-language files. More setup work but typically the right answer for production sites.
For most production sites: use language-optimized primaries with pan-Unicode (Noto) as fallback. CJK fonts are typically 3-20× larger than Latin, splitting into per-script files via unicode-range is essential for web performance, not optional.
CJK: Chinese, Japanese & Korean Optimization
CJK fonts contain 20,000-80,000+ glyphs compared to ~200 for Latin. Chinese alone has over 20,000 commonly used characters; Japanese adds Hiragana, Katakana, and Kanji; Korean has 11,172 Hangul syllable blocks plus Hanja. Full CJK fonts are 5-20 MB uncompressed, orders of magnitude larger than any Latin web font. CJK web typography is fundamentally a delivery architecture problem, not a typography problem.
Why CJK Fonts Are So Large
| Script | Glyphs (approx) | Full Font Size | Recommended Subset |
|---|---|---|---|
| Latin | ~200 | 100-300 KB | Single subset |
| Simplified Chinese | 3,500-30,000+ | 3-30 MB | GB 2312 (6,763) or frequency-banded |
| Traditional Chinese | 5,000-13,000+ | 5-15 MB | Big5 (13,053) or frequency-banded |
| Japanese | 7,000-15,000 | 3-20 MB | Joyo Kanji + Kana (~2,500) |
| Korean | 11,172 + Hanja | 2-10 MB | KS X 1001 (2,350) |
Frequency-Band Partitioning
Google Fonts splits CJK fonts into 100+ small subsets (slices) of ~100-200 characters each, using unicode-range to load only needed slices. A page with 500 unique Chinese characters might load 3-5 small font files instead of the entire 5 MB font. This is the most bandwidth-efficient approach available.
For self-hosted CJK fonts, replicate this approach using cn-font-split or pyftsubset to generate multiple subset files. For per-script Unicode ranges and detailed subsetting strategy, see our font subsetting by language guide.
Regional Variants Matter
While Chinese, Japanese, and Korean share many CJK Unified Ideographs, each region has unique characters and different preferred glyph shapes for shared characters. Japanese uses different stroke styles for some kanji. Korean primarily uses Hangul syllables. Always pick the correct regional variant:
- Noto Sans SC, Simplified Chinese (Mainland China)
- Noto Sans TC, Traditional Chinese (Taiwan, Hong Kong)
- Noto Sans JP, Japanese
- Noto Sans KR, Korean
- Source Han Sans / Source Han Serif, Adobe's open-source unified CJK family
Hitting Sub-Second CJK Load
- Use unicode-range splitting to load only needed character subsets
- Preload the first subset covering the most common characters in your content
- Use WOFF2, saves 30-50% vs WOFF, especially impactful at CJK file sizes
- Set
font-display: swapfor immediate text rendering - Consider a service worker to cache font subsets across pages
- Don't forget CJK punctuation (U+3000-303F), Chinese uses 「」『』 , 。 not standard ASCII
Target: 100-500 KB total CJK font data per page. Achievable with proper unicode-range partitioning even for full content sites.
Right-to-Left: Arabic & Hebrew
RTL scripts add complexity that LTR doesn't face: text flows right-to-left, Arabic letters change shape based on position (isolated, initial, medial, final), and proper joining requires specific OpenType features. A naive RTL implementation that just sets direction: rtl without addressing contextual forms will render Arabic as disconnected characters , technically readable but visually broken.
CSS for RTL
<!-- HTML: set dir on the root or container -->
<html lang="ar" dir="rtl">
<!-- or for mixed-language sites -->
<div lang="ar" dir="rtl">...</div>
/* CSS: use logical properties, not physical */
.card {
margin-inline-start: 1rem; /* not margin-left */
padding-inline-end: 0.5rem; /* not padding-right */
border-inline-start: 2px solid;
}
/* Bidirectional text isolation */
.user-input {
unicode-bidi: isolate; /* prevents bidi conflicts */
}
/* RTL-specific font stack */
:lang(ar) {
font-family: "Noto Naskh Arabic", "Cairo", system-ui, sans-serif;
direction: rtl;
}Required OpenType Features for Arabic
Arabic fonts must include these OpenType features for proper rendering. Quality fonts (Noto Naskh Arabic, IBM Plex Arabic, Cairo, Amiri, Tajawal) include them all by default, but verify when subsetting:
| Feature | Tag | Purpose |
|---|---|---|
| Contextual Alternates | calt | Letter form changes based on neighbors |
| Initial / Medial / Final / Isolated | init / medi / fina / isol | Position-specific letter shapes |
| Mark / Mark-to-Mark Positioning | mark / mkmk | Diacritic positioning |
| Required Ligatures | rlig | Lam-alif, etc. |
Bidirectional (BiDi) Text
Mixed RTL/LTR content (an English word inside Arabic prose, an Arabic name in an English article) uses the Unicode Bidirectional Algorithm. Browsers handle it automatically when the dir attribute is correct. For inline directional content, wrap in <bdo> or <span dir="ltr">. The CSS unicode-bidi property controls bidi behavior. Always test with real mixed-direction content, visual rendering issues are easy to miss in pure-LTR previews.
File Sizes
Subsetted Arabic WOFF2: typically 50-100 KB covering U+0600-06FF (Basic Arabic) plus common punctuation. Full Arabic with all presentation forms: 150-300 KB. Hebrew is smaller, ~30-60 KB subsetted.
Recommended Fonts
- Arabic body text: Noto Naskh Arabic, IBM Plex Arabic, Cairo, Tajawal
- Arabic display / headings: Cairo, Lalezar, Reem Kufi, Amiri (serif/Naskh)
- Hebrew body: Noto Sans Hebrew, Rubik, Heebo
- Hebrew serif: Frank Ruhl Libre, David Libre
All available on Google Fonts with permissive licensing. For licensing details, see our open source font licenses guide.
Indic Scripts
Indic scripts (Devanagari, Bengali, Tamil, Telugu, Gujarati, Punjabi, Kannada, Malayalam, Oriya) are among the most complex in the world for digital typography. They require consonant conjuncts (multiple consonants combining into ligatures), vowel sign reordering (visual position differs from logical order), above/below base marks, and contextual letter forms. The rendering engine, HarfBuzz on Linux/Android/Chrome, Core Text on macOS/iOS, DirectWrite on Windows, must apply specific OpenType layout features in the correct order.
Required OpenType Features
Indic fonts require these OpenType GSUB/GPOS features for correct rendering:
| Feature | Tag | Purpose |
|---|---|---|
| Half forms | half | Half-letter forms in conjuncts |
| Pre-base substitutions | pres | Glyphs that appear before base |
| Below-base substitutions | blws | Glyphs that appear below base |
| Above-base substitutions | abvs | Glyphs that appear above base |
| Post-base forms | pstf | Glyphs that appear after base |
| Akhand (Devanagari) | akhn | Indivisible Akhand ligatures |
| Mark / Mark-to-Mark | mark / mkmk | Diacritic / mark positioning |
Test Strings for Conjuncts
Always test Indic fonts with conjuncts, not just individual characters. A font may render isolated letters perfectly while breaking on conjuncts. For Devanagari:
- Conjuncts: क्ष (ksha), त्र (tra), श्र (shra), ज्ञ (gya), द्व (dva)
- Vowel sign placement: कि (ki), कू (ku), कृ (kri), कै (kai)
- Combined: श्रद्धा (shraddha), विद्यालय (vidyalaya)
Compare rendering across Chrome, Firefox, and Safari, their HarfBuzz implementations can differ slightly. Issues that pass on one browser may fail on another.
Recommended Indic Fonts
- Pan-Indic coverage: Google's Noto Sans family, Noto Sans Devanagari, Noto Sans Tamil, Noto Sans Bengali, Noto Sans Telugu, Noto Sans Gujarati, etc.
- Devanagari specifically: Hind, Mukta, Poppins, Tiro Devanagari Hindi (excellent for body text)
- All major Indic scripts (open source): Lohit fonts cover 9+ scripts
- Verified for production: use our font license checker on any commercial Indic font
Subsetting Indic Fonts
Subset carefully, removing characters can break conjuncts that depend on specific glyph combinations. Always include the full Unicode block for your script (e.g., U+0900-097F for Devanagari) plus the necessary ligature glyphs. Use font tools that preserve OpenType layout tables during subsetting (pyftsubset preserves them by default with the right flags). Test conjunct rendering after subsetting, failed conjuncts are often invisible in casual review.
Latin Extended Coverage
English uses Latin Basic (U+0000-007F). Most European languages need additional Unicode blocks for accented and modified characters. Vietnamese is the most demanding Latin-based script with 134+ unique accented characters in Latin Extended Additional. Budget fonts often skip Extended-A/B coverage, so always verify before deploying for non-English markets.
The Latin Unicode Blocks
| Block | Range | Languages Covered |
|---|---|---|
| Latin Basic | U+0000-007F | English (ASCII) |
| Latin-1 Supplement | U+0080-00FF | French, German, Spanish, Italian, Portuguese, Scandinavian |
| Latin Extended-A | U+0100-017F | Polish, Czech, Hungarian, Croatian, Turkish, Romanian (partial) |
| Latin Extended-B | U+0180-024F | Romanian (full), Welsh, Vietnamese (partial) |
| Latin Extended Additional | U+1E00-1EFF | Vietnamese (full), additional diacritics |
| Combining Diacritics | U+0300-036F | Combining accent marks (used by all) |
Language-Specific Requirements
- Polish: ą, ę, ł, ń, ó, ś, ź, ż (Extended-A)
- Czech: á, č, ď, é, ě, í, ň, ó, ř, š, ť, ú, ů, ý, ž (Extended-A)
- Romanian: ă, â, î, ș, ț (Extended-A and Extended-B)
- Turkish: ç, ğ, ı, İ, ö, ş, ü (Extended-A; note dotless i)
- Hungarian: á, é, í, ó, ö, ő, ú, ü, ű (Extended-A)
- Vietnamese: ă, â, đ, ê, ô, ơ, ư + 5 tone marks producing 134+ combos (Extended Additional)
Checking Font Coverage
Verify a font covers your target languages before deploying:
- Use our font analyzer to inspect Unicode coverage
- Render a test string with all language-specific characters in the font
- Missing glyphs appear as tofu (□) or fall through to a different font (visible style mismatch)
- Vietnamese sites require explicit Vietnamese testing, some fonts claim "European" coverage but skip Latin Extended Additional
Recommended unicode-range
/* Full European language support */ unicode-range: U+0000-024F, /* Basic + Latin-1 + Extended-A + Extended-B */ U+1E00-1EFF, /* Latin Extended Additional (Vietnamese) */ U+0300-036F, /* Combining diacritics */ U+2000-206F; /* General punctuation */
Emoji Font Support
Emojis look different on different devices because each operating system ships its own emoji font with unique designs: Apple Color Emoji on iOS/macOS, Noto Color Emoji on Android, Segoe UI Emoji on Windows, Samsung Color Emoji on Samsung devices. The Unicode standard defines emoji meaning but not visual design. There is no CSS property to force a specific emoji appearance, but you can sidestep platform fonts entirely with image libraries like Twemoji.
Color Font Formats
| Format | Type | Used By |
|---|---|---|
| COLR / CPAL | Vector | Windows, Chrome |
| COLRv1 | Vector + gradients | Chrome 98+, Firefox 107+ |
| CBDT / CBLC | Bitmap | Android, Google |
| sbix | Bitmap | Apple |
| SVG-in-OpenType | Vector | Firefox |
Cross-Platform Consistency: Twemoji
For consistent emoji appearance across all platforms, use an emoji image library. Twemoji (Twitter's open-source emoji set) is the most popular, it replaces native emojis with SVG or PNG sprites at runtime. OpenMoji is a similar open-source option. Libraries like emoji-mart provide drop-in replacement components.
/* Emoji-aware font stack */
body {
font-family:
"Inter",
system-ui,
-apple-system,
"Segoe UI",
Roboto,
"Apple Color Emoji",
"Segoe UI Emoji",
"Noto Color Emoji",
sans-serif;
}
/* Or use Twemoji for cross-platform consistency */
import twemoji from 'twemoji';
twemoji.parse(document.body); // Replaces emojis with SVG/PNGAccessibility
Screen readers typically announce emoji by their Unicode name (e.g., "smiling face with open mouth"). When emojis convey meaning beyond decoration, wrap them in a span with role="img" and aria-label for clearer context. For purely decorative emojis, use aria-hidden="true" to skip them.
<!-- Meaningful emoji with aria-label --> <span role="img" aria-label="warning">⚠️</span> System maintenance scheduled <!-- Decorative emoji hidden from screen readers --> <span aria-hidden="true">✨</span> New feature!
Font Fallback Chains
A font fallback chain is the ordered list of fonts in your font-family property. When the first font cannot render a character, the browser tries the next, and so on. A well-designed chain ensures text remains readable even if custom fonts fail to load, prevents invisible text or missing characters, and (with metric overrides) eliminates layout shift during font swap.
system-ui and Per-OS UI Fonts
The system-ui generic font family maps to each OS's default UI font: San Francisco on macOS/iOS, Segoe UI on Windows, Roboto on Android. Excellent readability and zero loading time since the font is already installed.
/* Modern native UI stack, no custom font load */ font-family: system-ui, sans-serif; /* Explicit per-OS stack (older browser support) */ font-family: -apple-system, /* iOS / macOS Safari */ BlinkMacSystemFont, /* macOS Chrome */ "Segoe UI", /* Windows */ Roboto, /* Android */ Oxygen-Sans, /* KDE Linux */ Ubuntu, /* Ubuntu */ Cantarell, /* GNOME */ "Helvetica Neue", /* macOS legacy */ sans-serif;
CSS Metric Overrides, Eliminating CLS
When a fallback font swaps to your web font, different metrics cause text reflow and layout shift (CLS). CSS metric override descriptors fix this by aligning the fallback's metrics to the web font:
| Descriptor | Effect |
|---|---|
size-adjust | Scales the overall font size |
ascent-override | Sets the ascent metric |
descent-override | Sets the descent metric |
line-gap-override | Sets the line-gap metric |
/* Fallback @font-face matched to Inter web font metrics */
@font-face {
font-family: "Fallback for Inter";
src: local("Arial");
size-adjust: 107%;
ascent-override: 90%;
descent-override: 22%;
line-gap-override: 0%;
}
body {
font-family: "Inter Variable", "Fallback for Inter", sans-serif;
}Next.js automates this with next/font, calculating override values from the actual web font at build time. For projects not using Next.js, tools like fontaine calculate them programmatically.
Multilingual Fallback Chain
For a site serving multiple languages, build a stack that covers each script with a suitable system font, ending with a generic fallback:
font-family: "Inter", /* Latin */ "Noto Sans JP", /* Japanese */ "Hiragino Sans", /* Japanese fallback */ "Microsoft YaHei", /* Chinese fallback (Windows) */ "Noto Sans Arabic", /* Arabic */ "Noto Sans Devanagari", /* Hindi/Marathi */ system-ui, /* OS default */ sans-serif; /* Final generic */
The browser walks the chain per-character, for a paragraph with mixed scripts, it selects the appropriate font for each character automatically.
Optimize Your Multilingual Fonts
Convert, subset, and deploy fonts for any writing system. Browser-based tooling, no installation, ready WOFF2 output with @font-face CSS.
