Font Converter

Multilingual & International Fonts

Web typography for the world's major writing systems. CJK optimization for Chinese/Japanese/Korean, RTL setup for Arabic and Hebrew, Indic script rendering, emoji support across platforms, font fallback chains, and Latin extended coverage , all in one practical reference.

TL;DR by script

  • -Multilingual setup: unicode-range + :lang() selectors. Browser only fetches subsets actually needed.
  • -CJK: 20,000+ glyphs. Use Google Fonts' frequency-band partitioning or split into 100+ subsets.
  • -RTL (Arabic/Hebrew): Set dir="rtl", use logical CSS properties, keep Arabic contextual forms.
  • -Indic: Complex shaping (conjuncts, vowel reordering). Need full OpenType GSUB/GPOS tables.
  • -Latin Extended: European languages need Extended-A/B; Vietnamese needs Extended Additional.
  • -Emoji: Native fonts differ per OS; use Twemoji for cross-platform consistency.
  • -Fallback chains: Use size-adjust + ascent-override to eliminate CLS on font swap.

Multilingual Font Setup

Sites serving multiple languages need a delivery strategy that doesn't make every user download every script. The CSS unicode-range descriptor in @font-face tells the browser which Unicode characters a font file covers, the browser then only fetches that font if the page actually contains characters in that range. Combined with the :lang() CSS selector, you can ship per-language font stacks that adapt automatically.

unicode-range Pattern

/* English / Latin Basic, small file, fetched everywhere */
@font-face {
  font-family: "Inter";
  src: url("/fonts/inter-latin.woff2") format("woff2");
  unicode-range: U+0000-00FF;
  font-display: swap;
}

/* Cyrillic, only fetched when the page has Cyrillic characters */
@font-face {
  font-family: "Inter";
  src: url("/fonts/inter-cyrillic.woff2") format("woff2");
  unicode-range: U+0400-04FF;
  font-display: swap;
}

/* Japanese Kanji, only fetched on pages with kanji */
@font-face {
  font-family: "Noto Sans JP";
  src: url("/fonts/noto-jp-kanji.woff2") format("woff2");
  unicode-range: U+4E00-9FFF;
  font-display: swap;
}

Per-Language Font Stacks with :lang()

The :lang() selector matches elements based on their lang attribute. Use it to apply language-specific font stacks, adjust letter-spacing for scripts that need it, or modify line-height for taller scripts like Thai or Tibetan.

/* Default font stack */
body {
  font-family: "Inter", system-ui, sans-serif;
}

/* Japanese needs different metrics */
:lang(ja) {
  font-family: "Noto Sans JP", "Hiragino Sans", system-ui, sans-serif;
  line-height: 1.7;        /* Taller for kanji clarity */
}

/* Arabic needs RTL support and bigger size */
:lang(ar) {
  font-family: "Noto Naskh Arabic", "Cairo", system-ui, sans-serif;
  font-size: 1.05em;
  direction: rtl;
}

/* Thai needs more line-height for tone marks */
:lang(th) {
  font-family: "Noto Sans Thai", system-ui, sans-serif;
  line-height: 1.85;
}

Pan-Unicode vs Language-Specific

You have two architectural choices for multilingual sites:

  • Pan-Unicode fonts (Noto Sans family) cover all scripts in unified visual style. Easier to deploy but trade typographic quality for coverage. Files are larger.
  • Language-specific fonts with unicode-range partitioning. Better typography for each script. Smaller per-language files. More setup work but typically the right answer for production sites.

For most production sites: use language-optimized primaries with pan-Unicode (Noto) as fallback. CJK fonts are typically 3-20× larger than Latin, splitting into per-script files via unicode-range is essential for web performance, not optional.

CJK: Chinese, Japanese & Korean Optimization

CJK fonts contain 20,000-80,000+ glyphs compared to ~200 for Latin. Chinese alone has over 20,000 commonly used characters; Japanese adds Hiragana, Katakana, and Kanji; Korean has 11,172 Hangul syllable blocks plus Hanja. Full CJK fonts are 5-20 MB uncompressed, orders of magnitude larger than any Latin web font. CJK web typography is fundamentally a delivery architecture problem, not a typography problem.

Why CJK Fonts Are So Large

ScriptGlyphs (approx)Full Font SizeRecommended Subset
Latin~200100-300 KBSingle subset
Simplified Chinese3,500-30,000+3-30 MBGB 2312 (6,763) or frequency-banded
Traditional Chinese5,000-13,000+5-15 MBBig5 (13,053) or frequency-banded
Japanese7,000-15,0003-20 MBJoyo Kanji + Kana (~2,500)
Korean11,172 + Hanja2-10 MBKS X 1001 (2,350)

Frequency-Band Partitioning

Google Fonts splits CJK fonts into 100+ small subsets (slices) of ~100-200 characters each, using unicode-range to load only needed slices. A page with 500 unique Chinese characters might load 3-5 small font files instead of the entire 5 MB font. This is the most bandwidth-efficient approach available.

For self-hosted CJK fonts, replicate this approach using cn-font-split or pyftsubset to generate multiple subset files. For per-script Unicode ranges and detailed subsetting strategy, see our font subsetting by language guide.

Regional Variants Matter

While Chinese, Japanese, and Korean share many CJK Unified Ideographs, each region has unique characters and different preferred glyph shapes for shared characters. Japanese uses different stroke styles for some kanji. Korean primarily uses Hangul syllables. Always pick the correct regional variant:

  • Noto Sans SC, Simplified Chinese (Mainland China)
  • Noto Sans TC, Traditional Chinese (Taiwan, Hong Kong)
  • Noto Sans JP, Japanese
  • Noto Sans KR, Korean
  • Source Han Sans / Source Han Serif, Adobe's open-source unified CJK family

Hitting Sub-Second CJK Load

  • Use unicode-range splitting to load only needed character subsets
  • Preload the first subset covering the most common characters in your content
  • Use WOFF2, saves 30-50% vs WOFF, especially impactful at CJK file sizes
  • Set font-display: swap for immediate text rendering
  • Consider a service worker to cache font subsets across pages
  • Don't forget CJK punctuation (U+3000-303F), Chinese uses 「」『』 , 。 not standard ASCII

Target: 100-500 KB total CJK font data per page. Achievable with proper unicode-range partitioning even for full content sites.

Right-to-Left: Arabic & Hebrew

RTL scripts add complexity that LTR doesn't face: text flows right-to-left, Arabic letters change shape based on position (isolated, initial, medial, final), and proper joining requires specific OpenType features. A naive RTL implementation that just sets direction: rtl without addressing contextual forms will render Arabic as disconnected characters , technically readable but visually broken.

CSS for RTL

<!-- HTML: set dir on the root or container -->
<html lang="ar" dir="rtl">
<!-- or for mixed-language sites -->
<div lang="ar" dir="rtl">...</div>

/* CSS: use logical properties, not physical */
.card {
  margin-inline-start: 1rem;  /* not margin-left */
  padding-inline-end: 0.5rem; /* not padding-right */
  border-inline-start: 2px solid;
}

/* Bidirectional text isolation */
.user-input {
  unicode-bidi: isolate;  /* prevents bidi conflicts */
}

/* RTL-specific font stack */
:lang(ar) {
  font-family: "Noto Naskh Arabic", "Cairo", system-ui, sans-serif;
  direction: rtl;
}

Required OpenType Features for Arabic

Arabic fonts must include these OpenType features for proper rendering. Quality fonts (Noto Naskh Arabic, IBM Plex Arabic, Cairo, Amiri, Tajawal) include them all by default, but verify when subsetting:

FeatureTagPurpose
Contextual AlternatescaltLetter form changes based on neighbors
Initial / Medial / Final / Isolatedinit / medi / fina / isolPosition-specific letter shapes
Mark / Mark-to-Mark Positioningmark / mkmkDiacritic positioning
Required LigaturesrligLam-alif, etc.

Bidirectional (BiDi) Text

Mixed RTL/LTR content (an English word inside Arabic prose, an Arabic name in an English article) uses the Unicode Bidirectional Algorithm. Browsers handle it automatically when the dir attribute is correct. For inline directional content, wrap in <bdo> or <span dir="ltr">. The CSS unicode-bidi property controls bidi behavior. Always test with real mixed-direction content, visual rendering issues are easy to miss in pure-LTR previews.

File Sizes

Subsetted Arabic WOFF2: typically 50-100 KB covering U+0600-06FF (Basic Arabic) plus common punctuation. Full Arabic with all presentation forms: 150-300 KB. Hebrew is smaller, ~30-60 KB subsetted.

Recommended Fonts

  • Arabic body text: Noto Naskh Arabic, IBM Plex Arabic, Cairo, Tajawal
  • Arabic display / headings: Cairo, Lalezar, Reem Kufi, Amiri (serif/Naskh)
  • Hebrew body: Noto Sans Hebrew, Rubik, Heebo
  • Hebrew serif: Frank Ruhl Libre, David Libre

All available on Google Fonts with permissive licensing. For licensing details, see our open source font licenses guide.

Indic Scripts

Indic scripts (Devanagari, Bengali, Tamil, Telugu, Gujarati, Punjabi, Kannada, Malayalam, Oriya) are among the most complex in the world for digital typography. They require consonant conjuncts (multiple consonants combining into ligatures), vowel sign reordering (visual position differs from logical order), above/below base marks, and contextual letter forms. The rendering engine, HarfBuzz on Linux/Android/Chrome, Core Text on macOS/iOS, DirectWrite on Windows, must apply specific OpenType layout features in the correct order.

Required OpenType Features

Indic fonts require these OpenType GSUB/GPOS features for correct rendering:

FeatureTagPurpose
Half formshalfHalf-letter forms in conjuncts
Pre-base substitutionspresGlyphs that appear before base
Below-base substitutionsblwsGlyphs that appear below base
Above-base substitutionsabvsGlyphs that appear above base
Post-base formspstfGlyphs that appear after base
Akhand (Devanagari)akhnIndivisible Akhand ligatures
Mark / Mark-to-Markmark / mkmkDiacritic / mark positioning

Test Strings for Conjuncts

Always test Indic fonts with conjuncts, not just individual characters. A font may render isolated letters perfectly while breaking on conjuncts. For Devanagari:

  • Conjuncts: क्ष (ksha), त्र (tra), श्र (shra), ज्ञ (gya), द्व (dva)
  • Vowel sign placement: कि (ki), कू (ku), कृ (kri), कै (kai)
  • Combined: श्रद्धा (shraddha), विद्यालय (vidyalaya)

Compare rendering across Chrome, Firefox, and Safari, their HarfBuzz implementations can differ slightly. Issues that pass on one browser may fail on another.

Recommended Indic Fonts

  • Pan-Indic coverage: Google's Noto Sans family, Noto Sans Devanagari, Noto Sans Tamil, Noto Sans Bengali, Noto Sans Telugu, Noto Sans Gujarati, etc.
  • Devanagari specifically: Hind, Mukta, Poppins, Tiro Devanagari Hindi (excellent for body text)
  • All major Indic scripts (open source): Lohit fonts cover 9+ scripts
  • Verified for production: use our font license checker on any commercial Indic font

Subsetting Indic Fonts

Subset carefully, removing characters can break conjuncts that depend on specific glyph combinations. Always include the full Unicode block for your script (e.g., U+0900-097F for Devanagari) plus the necessary ligature glyphs. Use font tools that preserve OpenType layout tables during subsetting (pyftsubset preserves them by default with the right flags). Test conjunct rendering after subsetting, failed conjuncts are often invisible in casual review.

Latin Extended Coverage

English uses Latin Basic (U+0000-007F). Most European languages need additional Unicode blocks for accented and modified characters. Vietnamese is the most demanding Latin-based script with 134+ unique accented characters in Latin Extended Additional. Budget fonts often skip Extended-A/B coverage, so always verify before deploying for non-English markets.

The Latin Unicode Blocks

BlockRangeLanguages Covered
Latin BasicU+0000-007FEnglish (ASCII)
Latin-1 SupplementU+0080-00FFFrench, German, Spanish, Italian, Portuguese, Scandinavian
Latin Extended-AU+0100-017FPolish, Czech, Hungarian, Croatian, Turkish, Romanian (partial)
Latin Extended-BU+0180-024FRomanian (full), Welsh, Vietnamese (partial)
Latin Extended AdditionalU+1E00-1EFFVietnamese (full), additional diacritics
Combining DiacriticsU+0300-036FCombining accent marks (used by all)

Language-Specific Requirements

  • Polish: ą, ę, ł, ń, ó, ś, ź, ż (Extended-A)
  • Czech: á, č, ď, é, ě, í, ň, ó, ř, š, ť, ú, ů, ý, ž (Extended-A)
  • Romanian: ă, â, î, ș, ț (Extended-A and Extended-B)
  • Turkish: ç, ğ, ı, İ, ö, ş, ü (Extended-A; note dotless i)
  • Hungarian: á, é, í, ó, ö, ő, ú, ü, ű (Extended-A)
  • Vietnamese: ă, â, đ, ê, ô, ơ, ư + 5 tone marks producing 134+ combos (Extended Additional)

Checking Font Coverage

Verify a font covers your target languages before deploying:

  • Use our font analyzer to inspect Unicode coverage
  • Render a test string with all language-specific characters in the font
  • Missing glyphs appear as tofu (□) or fall through to a different font (visible style mismatch)
  • Vietnamese sites require explicit Vietnamese testing, some fonts claim "European" coverage but skip Latin Extended Additional

Recommended unicode-range

/* Full European language support */
unicode-range:
  U+0000-024F,        /* Basic + Latin-1 + Extended-A + Extended-B */
  U+1E00-1EFF,        /* Latin Extended Additional (Vietnamese) */
  U+0300-036F,        /* Combining diacritics */
  U+2000-206F;        /* General punctuation */

Emoji Font Support

Emojis look different on different devices because each operating system ships its own emoji font with unique designs: Apple Color Emoji on iOS/macOS, Noto Color Emoji on Android, Segoe UI Emoji on Windows, Samsung Color Emoji on Samsung devices. The Unicode standard defines emoji meaning but not visual design. There is no CSS property to force a specific emoji appearance, but you can sidestep platform fonts entirely with image libraries like Twemoji.

Color Font Formats

FormatTypeUsed By
COLR / CPALVectorWindows, Chrome
COLRv1Vector + gradientsChrome 98+, Firefox 107+
CBDT / CBLCBitmapAndroid, Google
sbixBitmapApple
SVG-in-OpenTypeVectorFirefox

Cross-Platform Consistency: Twemoji

For consistent emoji appearance across all platforms, use an emoji image library. Twemoji (Twitter's open-source emoji set) is the most popular, it replaces native emojis with SVG or PNG sprites at runtime. OpenMoji is a similar open-source option. Libraries like emoji-mart provide drop-in replacement components.

/* Emoji-aware font stack */
body {
  font-family:
    "Inter",
    system-ui,
    -apple-system,
    "Segoe UI",
    Roboto,
    "Apple Color Emoji",
    "Segoe UI Emoji",
    "Noto Color Emoji",
    sans-serif;
}

/* Or use Twemoji for cross-platform consistency */
import twemoji from 'twemoji';
twemoji.parse(document.body);  // Replaces emojis with SVG/PNG

Accessibility

Screen readers typically announce emoji by their Unicode name (e.g., "smiling face with open mouth"). When emojis convey meaning beyond decoration, wrap them in a span with role="img" and aria-label for clearer context. For purely decorative emojis, use aria-hidden="true" to skip them.

<!-- Meaningful emoji with aria-label -->
<span role="img" aria-label="warning">⚠️</span> System maintenance scheduled

<!-- Decorative emoji hidden from screen readers -->
<span aria-hidden="true">✨</span> New feature!

Font Fallback Chains

A font fallback chain is the ordered list of fonts in your font-family property. When the first font cannot render a character, the browser tries the next, and so on. A well-designed chain ensures text remains readable even if custom fonts fail to load, prevents invisible text or missing characters, and (with metric overrides) eliminates layout shift during font swap.

system-ui and Per-OS UI Fonts

The system-ui generic font family maps to each OS's default UI font: San Francisco on macOS/iOS, Segoe UI on Windows, Roboto on Android. Excellent readability and zero loading time since the font is already installed.

/* Modern native UI stack, no custom font load */
font-family: system-ui, sans-serif;

/* Explicit per-OS stack (older browser support) */
font-family:
  -apple-system,           /* iOS / macOS Safari */
  BlinkMacSystemFont,      /* macOS Chrome */
  "Segoe UI",              /* Windows */
  Roboto,                  /* Android */
  Oxygen-Sans,             /* KDE Linux */
  Ubuntu,                  /* Ubuntu */
  Cantarell,               /* GNOME */
  "Helvetica Neue",        /* macOS legacy */
  sans-serif;

CSS Metric Overrides, Eliminating CLS

When a fallback font swaps to your web font, different metrics cause text reflow and layout shift (CLS). CSS metric override descriptors fix this by aligning the fallback's metrics to the web font:

DescriptorEffect
size-adjustScales the overall font size
ascent-overrideSets the ascent metric
descent-overrideSets the descent metric
line-gap-overrideSets the line-gap metric
/* Fallback @font-face matched to Inter web font metrics */
@font-face {
  font-family: "Fallback for Inter";
  src: local("Arial");
  size-adjust: 107%;
  ascent-override: 90%;
  descent-override: 22%;
  line-gap-override: 0%;
}

body {
  font-family: "Inter Variable", "Fallback for Inter", sans-serif;
}

Next.js automates this with next/font, calculating override values from the actual web font at build time. For projects not using Next.js, tools like fontaine calculate them programmatically.

Multilingual Fallback Chain

For a site serving multiple languages, build a stack that covers each script with a suitable system font, ending with a generic fallback:

font-family:
  "Inter",                 /* Latin */
  "Noto Sans JP",          /* Japanese */
  "Hiragino Sans",         /* Japanese fallback */
  "Microsoft YaHei",       /* Chinese fallback (Windows) */
  "Noto Sans Arabic",      /* Arabic */
  "Noto Sans Devanagari",  /* Hindi/Marathi */
  system-ui,               /* OS default */
  sans-serif;              /* Final generic */

The browser walks the chain per-character, for a paragraph with mixed scripts, it selects the appropriate font for each character automatically.

Optimize Your Multilingual Fonts

Convert, subset, and deploy fonts for any writing system. Browser-based tooling, no installation, ready WOFF2 output with @font-face CSS.

Sarah Mitchell

Written & Verified by

Sarah Mitchell

Typography expert specializing in font design, web typography, and accessibility

Multilingual & International Fonts FAQs

Common questions about CJK, RTL, Indic, emoji, fallback chains, and Latin extended