Font Converter

Multilingual Font Setup Guide

A complete technical guide to serving the right fonts for every language on your website. Learn how unicode-range splits font loading by script, how :lang() applies per-language stacks, and how to keep multilingual font payloads small without sacrificing typographic quality.

TL;DR - Key Takeaways

  • unicode-range loads only the font subsets containing characters actually present on the page
  • :lang() selectors apply the correct font stack per language without JavaScript
  • • A pan-Unicode font (Noto Sans) simplifies setup but trades per-script typographic quality for breadth
  • • CJK script fonts add 3-10x more font data than a Latin-only font—always subset or use Google Fonts slicing

Share this page to:

Building a website that serves readers in multiple languages is straightforward once you understand two CSS primitives that most developers overlook: the unicode-range @font-face descriptor and the :lang() pseudo-class selector. Together, they form the backbone of every well-engineered multilingual font system, from small bilingual blogs to large-scale international publishing platforms.

The fundamental challenge of multilingual typography is that different writing systems have vastly different requirements. A Latin font covering English, French, and German needs roughly 200-400 glyphs and weighs 15-40KB as WOFF2. A Japanese font covering all common kanji needs 6,000-10,000 glyphs and can reach 2-20MB uncompressed. Serving both from the same @font-face declaration without splitting would force every visitor—including English-only readers—to download the entire Japanese font. That is not acceptable performance.

The solution is selective loading: declare multiple @font-face rules each covering a specific Unicode range, and let the browser download only the subsets whose characters appear in the rendered page. Pair this with :lang() selectors to switch font families entirely for different language contexts, and you have a system that is both bandwidth-efficient and typographically excellent in every supported language.

This guide walks through the complete technical implementation: how unicode-range works at the browser level, how to write effective :lang() stacks for the major writing systems, performance strategies for sites that mix Latin, CJK, Arabic, and Indic scripts, and a practical checklist for shipping multilingual font support correctly the first time.

Understanding Multilingual Font Requirements

Different writing systems impose fundamentally different demands on web font infrastructure. Understanding these differences determines which loading strategy is appropriate for your site.

Latin and Latin Extended Scripts

Latin-based languages—English, French, German, Spanish, Portuguese, Polish, Czech, Vietnamese, and many others—share a core alphabet augmented with diacritical marks. Basic Latin covers ASCII (U+0020-U+007E). Latin-1 Supplement (U+0080-U+00FF) adds accented characters for Western European languages. Latin Extended-A and -B (U+0100-U+024F) add characters for Central/Eastern European languages. Vietnamese requires the Latin Extended Additional block (U+1E00-U+1EFF) for its stacked tone marks.

A single well-subsetted WOFF2 covering all Latin ranges typically weighs 25-60KB—small enough that splitting by language within Latin is rarely worth the complexity. Use a single Latin font file covering U+0000-U+024F and U+1E00-U+1EFF for broad European coverage.

CJK Scripts: Chinese, Japanese, Korean

CJK Unified Ideographs (U+4E00-U+9FFF) cover 20,902 characters shared between Chinese, Japanese, and Korean, though each language uses different subsets with different preferred glyph forms. Simplified Chinese adds characters not in the unified block; Traditional Chinese uses different glyph shapes for many shared ideographs; Japanese adds hiragana (U+3040-U+309F), katakana (U+30A0-U+30FF), and its own kanji selection; Korean uses Hangul syllables (U+AC00-U+D7A3, 11,172 characters).

This glyph count makes CJK fonts categorically different from Latin fonts. A full Japanese font weighing 5-15MB as WOFF2 is not viable to serve monolithically. The solution is either Google Fonts automatic slicing (which splits CJK fonts into 100+ subsets of ~150 characters each) or manual subsetting to the specific characters your content uses.

Arabic, Hebrew, and RTL Scripts

Arabic (U+0600-U+06FF) and Hebrew (U+0590-U+05FF) are right-to-left scripts with additional complexity: Arabic letters take different forms depending on their position in a word (initial, medial, final, isolated). This requires OpenType features—calt (contextual alternates), init, medi, fina, isol—to be present in the font and rendered by the browser's text shaping engine.

A well-subsetted Arabic WOFF2 covering the Arabic block plus Arabic Presentation Forms typically weighs 50-120KB. RTL scripts also require HTML dir="rtl" attributes and CSS direction: rtl in addition to font setup.

Indic Scripts: Devanagari, Tamil, Bengali, and Others

Indic scripts (Devanagari U+0900-U+097F, Tamil U+0B80-U+0BFF, Bengali U+0980-U+09FF, and others) require complex text shaping involving consonant conjuncts, vowel sign reordering, and mark positioning. The OpenType layout features half, pres, blws, abvs, pstf, and akhn must all be present and correctly implemented. Without them, Devanagari text will render as disconnected letters rather than properly formed words. Indic WOFF2 subsets are generally 40-100KB due to the moderate glyph count combined with complex glyph tables.

When a Single Pan-Unicode Font Makes Sense

Google's Noto Sans family covers virtually every writing system with a single design language. When your site has low traffic in secondary languages or your content management system generates text in many scripts unpredictably, using Noto Sans with Google Fonts automatic subsetting is a pragmatic choice. The trade-off is typographic quality: Noto's Japanese glyphs are functional but not as refined as Noto Sans JP or Hiragino Sans; its Arabic is readable but not as elegant as IBM Plex Arabic. Use language-optimized fonts for your primary content languages and Noto as a fallback for everything else.

Using unicode-range for Efficient Loading

The unicode-range CSS descriptor tells the browser which Unicode code points a @font-face covers. The browser inspects every character in the rendered page and downloads only the font files whose unicode-range intersects with characters actually present. A page containing only English text will not download a CJK font file, even if the CJK @font-face is declared in the stylesheet.

This is distinct from the CSS font-family fallback mechanism. With unicode-range, multiple @font-face rules can share the same family name but cover different character ranges. The browser assembles the complete font virtually from these separate files, selecting the appropriate file per character.

Latin and CJK Split Example

This pattern splits a font family across three files: Latin (small, always needed for UI), Japanese CJK (large, only downloaded if kanji or kana appear), and Korean Hangul (medium, only if Hangul appears). All three share the family name SiteFont.

/* Latin: covers Basic Latin, Latin-1 Supplement,
   Latin Extended-A/B, Latin Extended Additional */
@font-face {
  font-family: 'SiteFont';
  src: url('/fonts/sitefont-latin.woff2') format('woff2');
  font-weight: 400;
  font-style: normal;
  font-display: swap;
  unicode-range: U+0000-00FF, U+0100-024F, U+1E00-1EFF,
                 U+2000-206F, U+2074, U+20AC, U+2122,
                 U+2191, U+2193, U+2212, U+2215, U+FEFF;
}

/* Japanese: hiragana, katakana, and common kanji */
@font-face {
  font-family: 'SiteFont';
  src: url('/fonts/sitefont-japanese.woff2') format('woff2');
  font-weight: 400;
  font-style: normal;
  font-display: swap;
  unicode-range: U+3040-309F,  /* Hiragana */
                 U+30A0-30FF,  /* Katakana */
                 U+4E00-9FFF,  /* CJK Unified Ideographs */
                 U+3400-4DBF,  /* CJK Extension A */
                 U+FF00-FFEF;  /* Halfwidth/Fullwidth Forms */
}

/* Korean: Hangul syllables and Jamo */
@font-face {
  font-family: 'SiteFont';
  src: url('/fonts/sitefont-korean.woff2') format('woff2');
  font-weight: 400;
  font-style: normal;
  font-display: swap;
  unicode-range: U+1100-11FF,  /* Hangul Jamo */
                 U+AC00-D7AF;  /* Hangul Syllables */
}

body {
  font-family: 'SiteFont', system-ui, sans-serif;
}

Arabic unicode-range Split

Arabic requires its own font file with the correct OpenType shaping tables. The unicode-range must include not just the Arabic block but also Arabic Supplement (U+0750-U+077F) for additional letters used in Persian, Urdu, and other Arabic-script languages, plus the Arabic Presentation Forms blocks that contain contextual glyph variants.

/* Arabic font with full shaping support */
@font-face {
  font-family: 'SiteFont';
  src: url('/fonts/sitefont-arabic.woff2') format('woff2');
  font-weight: 400;
  font-style: normal;
  font-display: swap;
  unicode-range: U+0600-06FF,  /* Arabic */
                 U+0750-077F,  /* Arabic Supplement */
                 U+08A0-08FF,  /* Arabic Extended-A */
                 U+FB50-FDFF,  /* Arabic Presentation Forms-A */
                 U+FE70-FEFF;  /* Arabic Presentation Forms-B */
}

/* Persian/Farsi-specific additions */
@font-face {
  font-family: 'SiteFont';
  src: url('/fonts/sitefont-persian.woff2') format('woff2');
  font-weight: 400;
  font-style: normal;
  font-display: swap;
  unicode-range: U+0600-06FF, U+0750-077F,
                 U+200C-200D; /* ZWNJ and ZWJ for Persian */
}

Devanagari (Hindi, Sanskrit, Marathi)

/* Devanagari for Hindi, Sanskrit, Marathi */
@font-face {
  font-family: 'SiteFont';
  src: url('/fonts/sitefont-devanagari.woff2') format('woff2');
  font-weight: 400;
  font-style: normal;
  font-display: swap;
  unicode-range: U+0900-097F,  /* Devanagari */
                 U+0980-09FF,  /* Bengali (shared usage) */
                 U+1CD0-1CFF,  /* Vedic Extensions */
                 U+200C-200D,  /* ZWNJ, ZWJ (essential for conjuncts) */
                 U+20B9,       /* Indian Rupee sign ₹ */
                 U+25CC;       /* Dotted circle (placeholder) */
}

The ZWNJ (U+200C, Zero Width Non-Joiner) and ZWJ (U+200D, Zero Width Joiner) are invisible control characters critical to correct Indic rendering. ZWNJ prevents consonant joining (producing half-forms); ZWJ forces joining. Always include them in Devanagari and other Indic unicode-range declarations.

Language-Specific Font Stacks with :lang()

While unicode-range controls which font files load, the :lang() CSS pseudo-class controls which font families apply to specific language contexts. These two mechanisms solve different problems and work best together.

The :lang() selector matches elements whose language is set via the lang HTML attribute—either on the element itself or inherited from an ancestor. It supports language subtags: :lang(zh) matches both lang="zh-Hans" and lang="zh-Hant", while :lang(zh-Hans) matches only Simplified Chinese. Use the most specific subtag your content requires.

Complete :lang() Font Stack Example

This example shows a real-world :lang() implementation covering Japanese, Simplified Chinese, Traditional Chinese, Korean, Arabic, and Hindi. Each stack lists a web font first, then high-quality system fonts that ship with relevant operating systems, then a generic fallback.

/* Japanese */
:lang(ja) {
  font-family: 'Noto Sans JP', 'Hiragino Kaku Gothic ProN',
               'Yu Gothic', 'Meiryo', sans-serif;
  word-break: break-all;  /* Japanese doesn't use spaces */
  line-height: 1.8;       /* More space needed for CJK */
}

/* Simplified Chinese */
:lang(zh-Hans),
:lang(zh-CN) {
  font-family: 'Noto Sans SC', 'PingFang SC',
               'Microsoft YaHei', 'Source Han Sans CN',
               sans-serif;
  word-break: break-all;
  line-height: 1.8;
}

/* Traditional Chinese */
:lang(zh-Hant),
:lang(zh-TW),
:lang(zh-HK) {
  font-family: 'Noto Sans TC', 'PingFang TC',
               'Microsoft JhengHei', 'Source Han Sans TW',
               sans-serif;
  word-break: break-all;
  line-height: 1.8;
}

/* Korean */
:lang(ko) {
  font-family: 'Noto Sans KR', 'Apple SD Gothic Neo',
               'Malgun Gothic', 'Nanum Gothic', sans-serif;
  word-break: keep-all;   /* Korean: break at syllable boundaries */
  line-height: 1.7;
}

/* Arabic and Arabic-script languages */
:lang(ar) {
  font-family: 'Noto Naskh Arabic', 'IBM Plex Arabic',
               'Cairo', 'Amiri', 'Arial', sans-serif;
  direction: rtl;
  text-align: right;
  line-height: 1.9;  /* Arabic script needs more vertical space */
}

/* Persian/Farsi */
:lang(fa) {
  font-family: 'Vazirmatn', 'Noto Naskh Arabic',
               'Tahoma', sans-serif;
  direction: rtl;
  text-align: right;
  line-height: 1.9;
}

/* Hindi (Devanagari) */
:lang(hi) {
  font-family: 'Noto Sans Devanagari', 'Hind',
               'Mangal', 'Kokila', sans-serif;
  line-height: 1.8;  /* Devanagari marks extend above/below */
}

/* Thai */
:lang(th) {
  font-family: 'Noto Sans Thai', 'Sarabun',
               'Tahoma', 'Leelawadee UI', sans-serif;
  line-height: 1.9;  /* Thai stacks tone marks above letters */
}

Applying lang Attributes in HTML

The :lang() selector only works if the corresponding HTML lang attribute is set. For multilingual content, set the base language on the <html> element and override it on specific sections or inline elements that use a different language.

<!-- Base language set on html element -->
<html lang="en">

<!-- Override for a Japanese paragraph -->
<p lang="ja">日本語のテキストがここに表示されます。</p>

<!-- Mixed content: English page with Arabic quote -->
<blockquote lang="ar" dir="rtl">
  الخط العربي جميل وله تاريخ عريق
</blockquote>

<!-- CMS-generated content with dynamic lang -->
<article lang="{{ page.language }}">
  {{ page.content }}
</article>

Adjusting Typography Per Script

Beyond font-family, :lang() lets you tune spacing and layout properties that differ by script. Letter-spacing that improves Latin readability often looks wrong for Arabic (which uses contextual joining). Word-break rules differ between CJK (break anywhere) and Korean (break at word boundaries). Line-height needs to increase for scripts with tall stacks like Thai or Tibetan.

/* Latin: letter-spacing is fine */
:lang(en),
:lang(fr),
:lang(de) {
  letter-spacing: 0.01em;
}

/* CJK: never add letter-spacing to CJK text */
:lang(ja),
:lang(zh),
:lang(ko) {
  letter-spacing: 0;
}

/* Arabic: letter-spacing disrupts joining glyphs */
:lang(ar),
:lang(fa),
:lang(ur) {
  letter-spacing: 0;
  font-feature-settings: 'calt' 1, 'clig' 1;
}

/* Devanagari: disable Latin font features */
:lang(hi),
:lang(mr),
:lang(sa) {
  font-feature-settings: 'kern' 1;
}

Performance Optimization for Multilingual Sites

Multilingual font loading is one of the most significant performance challenges for international websites. A page serving Japanese, Arabic, and Latin content without optimization could easily require 10-30MB of font data. With the right techniques, that drops to under 500KB of actually-needed font data per page.

Strategy 1: Preload the Primary Script Font

Preload only the font file for the primary language of each page. For a Japanese page, preload the Japanese CJK subset. For an Arabic page, preload the Arabic font. Do not preload all language fonts on every page—that defeats the purpose of unicode-range splitting.

<!-- In <head>, preload for a Japanese page -->
<link rel="preload"
      href="/fonts/noto-sans-jp-400.woff2"
      as="font"
      type="font/woff2"
      crossorigin="anonymous">

<!-- Latin preload for English/European pages -->
<link rel="preload"
      href="/fonts/inter-latin-400.woff2"
      as="font"
      type="font/woff2"
      crossorigin="anonymous">

Strategy 2: Use Google Fonts for CJK Auto-Slicing

Google Fonts automatically splits CJK fonts into 100-300 micro-subsets of ~100-200 characters each, each with its own unicode-range. A page with 400 unique kanji characters downloads only the 2-4 slices covering those specific characters, not the entire 5MB font. This is the most bandwidth-efficient approach for CJK and is difficult to replicate with self-hosted fonts.

/* Google Fonts handles CJK subsetting automatically.
   Use the display=swap parameter for font-display: swap */

/* In HTML <head> */
/* https://fonts.googleapis.com/css2?
   family=Noto+Sans+JP:wght@400;700&
   family=Noto+Sans+SC:wght@400;700&
   family=Inter:wght@400;700&
   display=swap */

/* Or via @import in CSS */
@import url('https://fonts.googleapis.com/css2?
  family=Noto+Sans+JP:wght@400;700&
  family=Noto+Sans+SC:wght@400;700&
  family=Inter:wght@400;700&
  display=swap');

Strategy 3: font-display: swap for Non-Blocking Render

Always use font-display: swap (or optional for non-critical scripts) on all @font-face declarations. This prevents invisible text (FOIT) by rendering immediately with a system font, then swapping to the web font once loaded. For above-the-fold content, pair with preload links to minimize the swap delay.

/* Primary font: swap (text shown immediately) */
@font-face {
  font-family: 'SiteFont';
  src: url('/fonts/sitefont-latin.woff2') format('woff2');
  font-display: swap;
  unicode-range: U+0000-00FF, U+0100-024F;
}

/* Secondary language font: optional
   (browser may skip if load time is too long) */
@font-face {
  font-family: 'SiteFont';
  src: url('/fonts/sitefont-arabic.woff2') format('woff2');
  font-display: optional;
  unicode-range: U+0600-06FF, U+FB50-FDFF, U+FE70-FEFF;
}

Strategy 4: Self-Hosted Font Subsetting with pyftsubset

For sites with known content, subset fonts to only the characters that actually appear in your content using the pyftsubset tool from the fonttools library. A CJK font subset to the 3,000 most common characters (covering 99%+ of typical web content) weighs 300-800KB instead of 5-20MB.

# Install fonttools
pip install fonttools brotli

# Subset NotoSansJP to the 3000 most common kanji
# plus hiragana and katakana
pyftsubset NotoSansJP-Regular.ttf   --unicodes="U+3040-309F,U+30A0-30FF,U+4E00-9FFF"   --flavor=woff2   --output-file=NotoSansJP-subset.woff2

# Subset for specific content using a text file
pyftsubset NotoSansJP-Regular.ttf   --text-file=content-characters.txt   --flavor=woff2   --output-file=NotoSansJP-content.woff2

Strategy 5: Lazy-Load Non-Primary Scripts

For sites where non-primary language content is below the fold or in collapsible sections, defer font loading until the content is visible. Use an Intersection Observer to add the @font-face rules only when the multilingual section scrolls into view. This prevents any font loading cost for users who never scroll to that section.

// Lazy-load Japanese font when section is visible
const observer = new IntersectionObserver((entries) => {
  entries.forEach(entry => {
    if (entry.isIntersecting) {
      const style = document.createElement('style');
      style.textContent = `
        @font-face {
          font-family: 'SiteFont';
          src: url('/fonts/sitefont-japanese.woff2')
               format('woff2');
          font-display: swap;
          unicode-range: U+3040-30FF, U+4E00-9FFF;
        }
      `;
      document.head.appendChild(style);
      observer.disconnect();
    }
  });
});

const japaneseSection = document.querySelector('[lang="ja"]');
if (japaneseSection) observer.observe(japaneseSection);

Common Font Combinations for Major Languages

This table summarizes recommended web fonts, system font fallbacks, and unicode-range values for the most commonly supported languages. All listed web fonts are available via Google Fonts unless noted otherwise.

Language / ScriptPrimary Web FontSystem FallbackKey unicode-range
English (Latin)Inter, Roboto, Source Sans 3system-ui, -apple-systemU+0000-00FF
French, German, SpanishInter, Lato, Open Sanssystem-ui, ArialU+0000-024F
Polish, Czech, RomanianInter, Nunito, Ralewaysystem-ui, CalibriU+0100-017F
VietnameseBe Vietnam Pro, Nunitosystem-ui, ArialU+1E00-1EFF
Russian / CyrillicInter, Roboto, PT SansArial, TahomaU+0400-04FF
GreekInter, Roboto, Source Sans 3Arial, HelveticaU+0370-03FF
JapaneseNoto Sans JP, M PLUS Rounded 1cHiragino Kaku Gothic, Yu GothicU+3040-30FF, U+4E00-9FFF
Simplified ChineseNoto Sans SC, ZCOOL XiaoWeiPingFang SC, Microsoft YaHeiU+4E00-9FFF, U+3400-4DBF
Traditional ChineseNoto Sans TC, Zen Old MinchoPingFang TC, Microsoft JhengHeiU+4E00-9FFF, U+F900-FAFF
KoreanNoto Sans KR, Nanum GothicApple SD Gothic Neo, Malgun GothicU+AC00-D7AF, U+1100-11FF
ArabicNoto Naskh Arabic, IBM Plex Arabic, CairoTahoma, ArialU+0600-06FF, U+FB50-FDFF
HebrewHeebo, Rubik, Frank Ruhl LibreArial, TahomaU+0590-05FF, U+FB1D-FB4F
Hindi (Devanagari)Noto Sans Devanagari, Hind, MuktaMangal, KokilaU+0900-097F
ThaiNoto Sans Thai, Sarabun, KanitLeelawadee UI, TahomaU+0E00-0E7F

Implementation Checklist

Follow these steps in order when adding multilingual font support to a new or existing website. Each step builds on the previous one and can be verified independently before moving on.

1

Audit Your Language Requirements

Identify every language and script your site serves. Use analytics to find what languages your actual visitors use. Prioritize fonts for the top 3-5 languages by traffic; add others as secondary concerns. Create a list of Unicode ranges needed per language using the table above as a reference.

2

Set lang Attributes Correctly Across Your Site

Ensure the <html lang="..."> attribute reflects each page's primary language. For multilingual pages or embedded foreign-language content, add lang attributes to the specific elements. Without lang attributes, :lang() selectors will not function and screen readers will mispronounce content.

3

Choose Fonts and Prepare Subsets

For Latin and Cyrillic, select a high-quality variable or static font and subset it to U+0000-024F plus any extended ranges needed. For CJK, use Google Fonts (Noto Sans JP/SC/TC/KR) for automatic slicing, or use pyftsubset to create a frequency-ordered subset if self-hosting. For Arabic and Indic, verify the font has the required OpenType shaping tables before deploying.

4

Write @font-face Declarations with unicode-range

Create @font-face rules for each subset, sharing the same font-family name across all rules for the same typeface family. Include font-display: swap on every declaration. Add crossorigin="anonymous" to any preload links for cross-origin fonts (including Google Fonts). Test by temporarily removing individual font files to verify the browser correctly falls through to the next font in the stack.

5

Write :lang() CSS Rules

Add :lang() declarations to your global stylesheet for each supported language. Include font-family with web font first, system fonts second, and generic fallback last. Adjust line-height, word-break, direction, and letter-spacing as appropriate for each script. Group related language rules together and add comments explaining why each typographic adjustment is needed.

6

Add Preload Links for Critical Fonts

Add <link rel="preload"> for the primary language font on each page type. For a server-rendered multilingual application, generate preload links dynamically based on the page's primary language. Do not preload fonts for secondary languages or scripts that appear below the fold—let unicode-range handle on-demand loading.

7

Measure and Verify in Browser DevTools

Open Network DevTools and filter by font. Load an English-only page and verify only Latin font subsets download. Load a Japanese page and verify the Japanese font subset downloads while others do not. Check font loading timing and ensure no render-blocking font requests. Use Lighthouse to verify LCP and CLS scores are not degraded by font loading.

8

Test Rendering with Native Speakers

Automated tests cannot catch typographic quality issues. Have native speakers review rendering for each language, checking specifically: conjunct formation in Indic scripts, Arabic letter joining and direction, CJK glyph shapes (Simplified vs Traditional Chinese), Korean syllable boundaries, and Thai tone mark stacking. Small errors in these areas signal an incorrect font or missing OpenType features.

Multilingual Font Setup FAQs

Common questions about setting up fonts for multilingual websites

Sarah Mitchell

Written & Verified by

Sarah Mitchell

Product Designer, Font Specialist

Related Resources

Ready to Optimize Your Multilingual Fonts?

Generate precise unicode-range values, subset fonts to specific language character sets, and convert to WOFF2 for maximum compression—all free.