Font Converter

Latin Extended Font Support

Basic Latin covers only English ASCII. Most European languages and Vietnamese require extended character blocks that many fonts omit entirely.

TL;DR — Key Takeaways

  • • Basic Latin (U+0000–007F) covers only English — French é and German ä already live in Latin-1 Supplement (U+0080–00FF)
  • • Latin Extended-A (U+0100–017F) covers most Central/Eastern European languages: Polish ą ę ł, Czech ě ř ů, Turkish ğ ı
  • • Vietnamese is the most demanding Latin script — 134+ unique characters spread across Latin Extended Additional (U+1E00–1EFF)
  • • Subset with unicode-range: U+0000-024F, U+1E00-1EFF to cover all European languages plus Vietnamese
  • • Budget and display fonts routinely omit ogonek (ą, ę), caron (ě, š), and dotless i (ı) — always test before deploying

Share this page to:

Latin Character Sets Explained

Unicode organises Latin characters into several contiguous blocks, each extending coverage to more languages. Understanding which block covers which languages is the first step to reliable international typography. The blocks build on each other: Latin-1 Supplement extends Basic Latin, Extended-A extends Latin-1, and so on.

A font file does not have to include every block. A font designed for English may ship only U+0020–007E. A font marketed for “European languages” may include through U+017F but skip the Extended Additional block, silently breaking Vietnamese rendering. Always verify coverage against the specific languages your audience uses.

Block NameUnicode RangeCharacters IncludedLanguages Supported
Basic LatinU+0000–007FA–Z, a–z, 0–9, common punctuation, control charactersEnglish, Swahili, Indonesian, Malay, and other non-accented Latin scripts
Latin-1 SupplementU+0080–00FFà á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ß and capitalsFrench, Spanish, German, Portuguese, Italian, Dutch, Swedish, Norwegian, Danish, Icelandic
Latin Extended-AU+0100–017Fā ă ą ć ĉ ċ č ď đ ē ĕ ė ę ě ĝ ğ ġ ģ ĥ ħ ĩ ī ĭ į ı ĵ ķ ĺ ļ ľ ł ń ņ ň ŋ ō ŏ ő œ ŕ ŗ ř ś ŝ ş š ţ ť ŧ ũ ū ŭ ů ű ų ŵ ŷ ź ż žPolish, Czech, Slovak, Slovenian, Croatian, Hungarian, Turkish, Lithuanian, Latvian, Estonian, Welsh
Latin Extended-BU+0180–024Fș ț ȁ ȃ ȅ ȇ ȉ ȋ ȍ ȏ ȑ ȓ ȕ ȗ and African language additions, phonetic alphabet extensionsRomanian (ș ț with comma below), African languages, phonetic transcription
Latin Extended AdditionalU+1E00–1EFFPrecomposed characters with multiple diacritics: ạ ắ ặ ấ ầ ậ ẹ ế ề ệ ọ ố ồ ộ ợ ụ ứ ừ ự ỵ ỳ ỷ ỹ and all uppercase equivalentsVietnamese (primary), Welsh (ẃ ẁ ẅ), and scholarly/historical Latin

Important distinction: Romanian uses two different sets of characters. Legacy Romanian text often uses cedilla-based characters (ş ţ at U+015F and U+0163 in Latin Extended-A), while modern standards require comma-below variants (ș ț at U+0219 and U+021B in Latin Extended-B). Many older fonts include cedilla but not comma-below — a meaningful difference for Romanian readers and search engines.

Language Coverage Requirements

Each language has specific characters that are non-negotiable. Missing a single character does not just break that word — it typically falls back to an entirely different font family, disrupting visual consistency. The table below lists the minimum character requirements for each major European language and Vietnamese.

LanguageRequired CharactersUnicode BlockExample Text
Frenché è ê ë à â ô û ù î ï ç œ æ and capitals É È Ê Ë À Â Ô Û Ù Î Ï Ç Œ ÆU+0080–00FFC'est déjà l'été; où êtes-vous?
Germanä ö ü ß and capitals Ä Ö Ü (ẞ for capital eszett, U+1E9E)U+0080–00FFStraße, Öffnungszeiten, überprüfen
Polishą ć ę ł ń ó ś ź ż and capitals Ą Ć Ę Ł Ń Ó Ś Ź Ż (9 additional letters)U+0100–017FŁódź, żółw, źródło, świętość
Czechá č ď é ě í ň ó ř š ť ú ů ý ž and capitals Á Č Ď É Ě Í Ň Ó Ř Š Ť Ú Ů Ý ŽU+0100–017FPříliš žluťoučký kůň, Dvořák, Škoda
Turkishğ ı İ ş ç ö ü Ğ Ş Ç Ö Ü — notably dotless ı (U+0131) and dotted İ (U+0130)U+0100–017FTürkçe, İstanbul, Atatürk, şehir
Romanianș ț ă â î Ș Ț Ă Â Î — comma-below variants (U+0219, U+021B) are distinct from cedillaU+0180–024FBucurești, sfințire, înainte
Vietnameseă â đ ê ô ơ ư (base letters) + 5 tone marks (acute, grave, hook, tilde, dot below) applied to each vowel = 134+ precomposed characters in U+1E00–1EFFU+1E00–1EFFHà Nội, Thành phố Hồ Chí Minh, tiếng Việt

Hungarian adds ő and ű (double acute accent, U+0151 and U+0171) to the basic Extended-A set. Lithuanian uses ą ę į ų (ogonek below vowels) plus ė (dot above) and ū (macron). Latvian uses ā ē ī ū (macron) and ģ ķ ļ ņ ŗ (cedilla below). All three Baltic languages are covered by Latin Extended-A through U+017F.

Testing Font Character Support

The most reliable test is rendering real-language text in the font and looking for tofu — the empty rectangular boxes that appear when a glyph is missing. Each missing character triggers a fallback to the next font in your stack, potentially changing weight, width, and colour mid-word. Copy the test strings below into your browser's developer tools or a live preview environment.

French test string

C'est déjà l'été où l'œuvre de Ségolène dépasse toute espérance — vœux, cœur, île, Ça va?

German test string

Zwölf Boxkämpfer jagen Viktor quer über den großen Sylter Deich — Ärzte, Öffnungszeiten, Überprüfung, Straße

Polish test string

Pchła śniadła łasicę w źródle. Zażółć gęślą jaźń. Łódź, Ząbki, Świętokrzyskie, Bydgoszcz.

Czech test string

Příliš žluťoučký kůň úpěl ďábelské ódy. Dvořák, Špilberk, Třeboň, Žďár nad Sázavou.

Turkish test string

Pijamalı hasta yağız şoföre çabucak güvendi. İstanbul, Atatürk, Öğrenci, Çalışmak, Güneş.

Vietnamese test string

Tiếng Việt có sáu thanh điệu. Hà Nội, Thành phố Hồ Chí Minh, Đà Nẵng, Huế. Bầu trời xanh, mùa xuân đẹp.

Beyond visual inspection, use programmatic coverage checks. Our Font Analyzer reports which Unicode code points are present in a font file, so you can verify coverage against your target character set before deploying. Third-party tools like Wakamai Fondue and FontDrop also display Unicode coverage tables without requiring font upload to a remote server.

Testing tip: Use CSS to expose fallback fonts

Set your font stack to font-family: 'YourFont', fantasy; during testing. The fantasy generic is visually distinct — any character falling through will be obviously wrong rather than silently using a similar-looking system font.

Subsetting for Latin Extended

The CSS unicode-range descriptor in @font-face enables selective font loading: the browser only downloads a font file when a character in that range actually appears on the page. This is the foundation of progressive Latin loading — deliver the most common characters first, then extended characters on demand.

Progressive loading: Basic Latin first

/* Tier 1: Basic Latin — loads immediately, always needed */
@font-face {
  font-family: 'Inter';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/inter-latin-basic.woff2') format('woff2');
  unicode-range: U+0020-007F, U+00A0-00FF; /* Basic + Latin-1 Supplement */
}

/* Tier 2: Latin Extended-A — loads when Polish, Czech, Turkish etc. appear */
@font-face {
  font-family: 'Inter';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/inter-latin-ext-a.woff2') format('woff2');
  unicode-range: U+0100-017F; /* Latin Extended-A */
}

/* Tier 3: Latin Extended-B — loads for Romanian comma-below ș ț */
@font-face {
  font-family: 'Inter';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/inter-latin-ext-b.woff2') format('woff2');
  unicode-range: U+0180-024F; /* Latin Extended-B */
}

/* Tier 4: Latin Extended Additional — loads only for Vietnamese */
@font-face {
  font-family: 'Inter';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/inter-latin-ext-additional.woff2') format('woff2');
  unicode-range: U+1E00-1EFF; /* Latin Extended Additional */
}

Full European coverage — single unicode-range

If you prefer a single subset file for all European languages (excluding Vietnamese), use this consolidated range. It adds roughly 15–25 KB compared to Basic Latin alone, well worth the coverage.

/* All European Latin — one file, all accents */
@font-face {
  font-family: 'YourFont';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/yourfont-latin-all.woff2') format('woff2');
  unicode-range:
    U+0020-007F,  /* Basic Latin */
    U+00A0-00FF,  /* Latin-1 Supplement */
    U+0100-017F,  /* Latin Extended-A */
    U+0180-024F,  /* Latin Extended-B (Romanian ș ț) */
    U+0300-036F,  /* Combining Diacritical Marks (for dynamically composed text) */
    U+2000-206F,  /* General Punctuation (em dash, curly quotes, etc.) */
    U+20A0-20CF;  /* Currency Symbols (€, £, ¥) */
}

Use our Font Subsetter to generate these split files from any TTF or OTF. The tool accepts a unicode-range string directly and outputs a WOFF2 file containing exactly those glyphs. Use our Unicode Range Generator to build custom range strings for any combination of languages.

Common Missing Characters

Font designers working primarily for Western European markets routinely omit certain Extended-A and Extended-B characters from their glyphsets. The omissions follow a predictable pattern: characters with complex diacritics that sit below the baseline (which require careful sidebearing and kerning work) are more often skipped than those with marks above the x-height.

ą ę

Ogonek (U+0105, U+0119)

The ogonek is a small hook below the letter, used in Polish (ą, ę), Lithuanian (ą, ę, į, ų), and Old Norse. It attaches below the baseline and requires custom sidebearing adjustments. Budget and display fonts frequently omit all ogonek variants. Polish without ą ę is immediately illegible to native readers — words like będę (I will be) and (her) are rendered unrecognisable.

ě š č

Caron / Háček (U+011B, U+0161, U+010D)

The caron (ˇ) appears in Czech (ě, š, č, ž, ř, ď, ť, ň), Slovak, Slovenian, Croatian, and Lithuanian. It is shaped differently when placed over lowercase ‘d’ and ‘t’ — it becomes a right-handed apostrophe (ď, ť) rather than a wedge. Some fonts include the wedge form but not the apostrophe form, breaking Czech text. The special character ř (r with caron) is unique to Czech and is one of the most commonly missing glyphs.

ș ț

Comma Below (U+0219, U+021B)

Romanian requires ș (s with comma below, U+0219) and ț (t with comma below, U+021B), found in Latin Extended-B. These are distinct from ş (s with cedilla, U+015F) and ţ (t with cedilla, U+0163) in Latin Extended-A. The difference is visually subtle but typographically and semantically significant. Most fonts that include Extended-A include the cedilla variants but not the comma-below variants — leaving Romanian text rendered in a deprecated form. Use a font that explicitly lists comma-below coverage.

ı İ

Dotless i and Dotted I (U+0131, U+0130)

Turkish has a four-way i/I distinction: the regular dotted i/I pair (U+0069/U+0049) and the dotless ı/İ pair (U+0131/U+0130). This is not a stylistic choice — it is a grammatical distinction. The word ılık (lukewarm) and ilk (first) are different words. Fonts designed without Turkish in mind omit U+0131 and U+0130. Additionally, the uppercase İ has a dot that must be positioned above the capital height, which requires explicit glyph design work that budget font makers skip.

ő ű

Double Acute Accent (U+0151, U+0171)

Hungarian uses ő (o with double acute, U+0151) and ű (u with double acute, U+0171), which look like two acute accents rather than one. The double acute (˝) is distinct from a diaeresis (¨) — they are different Unicode characters. Font designers sometimes map the visual appearance of double-acute characters to diaeresis glyphs as a shortcut, which breaks Unicode text searches and copy-paste behaviour. Always verify that ő and ű render with the correct double-acute shape, not a diaeresis.

Why budget fonts skip these characters

Adding below-baseline diacritics (ogonek, comma below) requires extra design work because the mark must not collide with descenders on the following line. The font designer must adjust spacing metrics and sometimes redraw surrounding character shapes. For a typeface targeting Western Europe, this work has no commercial payoff. The result: fonts labelled “Latin Extended” that cover only the above-baseline portion of Extended-A and silently drop Central European languages.

Font Recommendations by Latin Coverage

The table below rates popular web fonts against Latin coverage tiers. Coverage is verified against the fonts' published character maps and Google Fonts coverage data. Checkmarks indicate full, intended support with properly designed glyphs — not fallback or approximated characters.

FontLatin BasicExtended-AExtended-BVietnameseLicense
InterOFL (open)
Roboto~Apache 2.0 (open)
Open Sans~OFL (open)
Lato~OFL (open)
Noto SansOFL (open)
Source Sans 3OFL (open)
IBM Plex SansOFL (open)

✓ = full coverage with intentional glyph design. ~ = partial coverage (most characters present but some edge cases missing). ✗ = not covered. Roboto and Open Sans cover Romanian comma-below for common characters but may miss some extended Latin-B additions. Lato's Vietnamese support is partial — basic tone marks work but some stacked diacritic combinations (e.g., ặ, ệ, ộ) fall back to the system font.

Vietnamese: The Most Demanding Latin Script

Vietnamese is written entirely in the Latin script — a legacy of 17th-century Portuguese missionaries who created Chữ Quốc Ngữ (National Language Script). Despite using Latin letters, Vietnamese typography is fundamentally different from European Latin typography because it stacks diacritics: a vowel can carry both a base modification (circumflex, breve, or horn) and a tone mark (acute, grave, hook, tilde, or dot below), resulting in characters like ậ (a + circumflex + dot below) and ợ (o + horn + hook below).

Vietnamese vowel system

a ă â
a-family
e ê
e-family
o ô ơ
o-family
u ư
u-family

Each vowel in each family can combine with 5 tone marks (plus unmarked/level tone), creating up to 6 forms per vowel. For â, that means: â ấ ầ ẩ ẫ ậ — six distinct characters, all in U+1E00–1EFF.

The total number of distinct precomposed Vietnamese characters in Unicode is 134, all concentrated in the Latin Extended Additional block (U+1E00–1EFF). This block is the single most frequently omitted block in Latin fonts — it adds roughly 8–15 KB to a WOFF2 file, a cost many font designers consider unjustified for a non-European market.

The consequence for Vietnamese text rendering without proper font support is severe: not just missing box characters, but complete failure to display tone-marked vowels — making the text phonetically ambiguous and often semantically incorrect. Vietnamese relies on tone marks to distinguish words that are otherwise identical in consonant and vowel structure.

/* Vietnamese unicode-range — use this for any site with Vietnamese content */

@font-face {
  font-family: 'YourFont';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/yourfont-vietnamese.woff2') format('woff2');
  unicode-range:
    U+0102-0103,  /* ă Ă */
    U+0110-0111,  /* đ Đ */
    U+0128-0129,  /* ĩ Ĩ */
    U+0168-0169,  /* ũ Ũ */
    U+01A0-01A1,  /* ơ Ơ */
    U+01AF-01B0,  /* ư Ư */
    U+1EA0-1EF9,  /* full Vietnamese precomposed block */
    U+20AB;       /* ₫ Vietnamese Dong sign */
}

For production Vietnamese sites, Inter, Noto Sans, Source Sans 3, and IBM Plex Sans all provide reliable coverage. Test with the sentence Tiếng Việt có sáu thanh điệu: ngang, huyền, sắc, hỏi, ngã, nặng — this string exercises all six tones across multiple vowel families and will immediately reveal any missing glyphs.

Sarah Mitchell

Written & Verified by

Sarah Mitchell

Product Designer, Font Specialist

Latin Extended Font Support FAQs

Common questions about Latin character sets, diacritical marks, and font coverage

Related Resources