What is the difference between Latin Basic, Latin-1 Supplement, and Latin Extended-A?

Latin Basic (U+0000-007F) covers English ASCII characters. Latin-1 Supplement (U+0080-00FF) adds accented characters for French, German, Spanish, and Portuguese. Latin Extended-A (U+0100-017F) adds characters for Polish, Czech, Hungarian, Romanian, Turkish, and other Central/Eastern European languages.

Which European languages need Latin Extended characters?

Polish needs ą, ę, ł, ź, ż (Extended-A). Czech needs ě, ř, ů, ď, ť (Extended-A). Romanian needs ș, ț (Extended-B). Turkish needs ğ, ı, İ, ş (Extended-A). Hungarian needs ő, ű (Extended-A). Vietnamese needs the most extensive set, requiring Latin Extended Additional (U+1E00-1EFF).

How do I check if a font supports the characters I need?

Use our Font Analyzer tool to inspect a font's character coverage, or check the font's Unicode coverage in tools like FontDrop or Wakamai Fondue. Create a test string with all special characters for your target languages and render it in the font. Missing characters will show as empty boxes (tofu) or fall through to a different font.

What unicode-range should I use for full European language support?

For most European languages: unicode-range: U+0000-024F, U+1E00-1EFF, U+2000-206F. This covers Latin Basic through Extended-B, Latin Extended Additional (for Vietnamese), and General Punctuation. For maximum coverage, add U+0300-036F (combining diacritical marks) for proper accent rendering.

Why is Vietnamese the most demanding Latin-based script?

Vietnamese uses all 26 Latin letters plus 7 modified letters (ă, â, đ, ê, ô, ơ, ư) combined with 5 tone marks, creating 134+ unique accented characters. Many are in Latin Extended Additional (U+1E00-1EFF), which budget fonts often omit. Always test Vietnamese text specifically when choosing fonts for Southeast Asian markets.

Latin Extended Font Support | Diacritical Marks & Accented Characters

In this article

Latin Character Sets Explained

Unicode organises Latin characters into several contiguous blocks, each extending coverage to more languages. Understanding which block covers which languages is the first step to reliable international typography. The blocks build on each other: Latin-1 Supplement extends Basic Latin, Extended-A extends Latin-1, and so on.

A font file does not have to include every block. A font designed for English may ship only U+0020–007E. A font marketed for “European languages” may include through U+017F but skip the Extended Additional block, silently breaking Vietnamese rendering. When a single font cannot cover all required blocks, define font fallback chains to handle gaps gracefully. Always verify coverage against the specific languages your audience uses.

Block Name	Unicode Range	Characters Included	Languages Supported
Basic Latin	U+0000–007F	A–Z, a–z, 0–9, common punctuation, control characters	English, Swahili, Indonesian, Malay, and other non-accented Latin scripts
Latin-1 Supplement	U+0080–00FF	à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ß and capitals	French, Spanish, German, Portuguese, Italian, Dutch, Swedish, Norwegian, Danish, Icelandic
Latin Extended-A	U+0100–017F	ā ă ą ć ĉ ċ č ď đ ē ĕ ė ę ě ĝ ğ ġ ģ ĥ ħ ĩ ī ĭ į ı ĵ ķ ĺ ļ ľ ł ń ņ ň ŋ ō ŏ ő œ ŕ ŗ ř ś ŝ ş š ţ ť ŧ ũ ū ŭ ů ű ų ŵ ŷ ź ż ž	Polish, Czech, Slovak, Slovenian, Croatian, Hungarian, Turkish, Lithuanian, Latvian, Estonian, Welsh
Latin Extended-B	U+0180–024F	ș ț ȁ ȃ ȅ ȇ ȉ ȋ ȍ ȏ ȑ ȓ ȕ ȗ and African language additions, phonetic alphabet extensions	Romanian (ș ț with comma below), African languages, phonetic transcription
Latin Extended Additional	U+1E00–1EFF	Precomposed characters with multiple diacritics: ạ ắ ặ ấ ầ ậ ẹ ế ề ệ ọ ố ồ ộ ợ ụ ứ ừ ự ỵ ỳ ỷ ỹ and all uppercase equivalents	Vietnamese (primary), Welsh (ẃ ẁ ẅ), and scholarly/historical Latin

Important distinction: Romanian uses two different sets of characters. Legacy Romanian text often uses cedilla-based characters (ş ţ at U+015F and U+0163 in Latin Extended-A), while modern standards require comma-below variants (ș ț at U+0219 and U+021B in Latin Extended-B). Many older fonts include cedilla but not comma-below, a meaningful difference for Romanian readers and search engines.

Language Coverage Requirements

Each language has specific characters that are non-negotiable. Missing a single character does not just break that word. It typically falls back to an entirely different font family, disrupting visual consistency. The table below lists the minimum character requirements for each major European language and Vietnamese.

Language	Required Characters	Unicode Block	Example Text
French	é è ê ë à â ô û ù î ï ç œ æ and capitals É È Ê Ë À Â Ô Û Ù Î Ï Ç Œ Æ	U+0080–00FF	C'est déjà l'été; où êtes-vous?
German	ä ö ü ß and capitals Ä Ö Ü (ẞ for capital eszett, U+1E9E)	U+0080–00FF	Straße, Öffnungszeiten, überprüfen
Polish	ą ć ę ł ń ó ś ź ż and capitals Ą Ć Ę Ł Ń Ó Ś Ź Ż (9 additional letters)	U+0100–017F	Łódź, żółw, źródło, świętość
Czech	á č ď é ě í ň ó ř š ť ú ů ý ž and capitals Á Č Ď É Ě Í Ň Ó Ř Š Ť Ú Ů Ý Ž	U+0100–017F	Příliš žluťoučký kůň, Dvořák, Škoda
Turkish	ğ ı İ ş ç ö ü Ğ Ş Ç Ö Ü, notably dotless ı (U+0131) and dotted İ (U+0130)	U+0100–017F	Türkçe, İstanbul, Atatürk, şehir
Romanian	ș ț ă â î Ș Ț Ă Â Î. Comma-below variants (U+0219, U+021B) are distinct from cedilla	U+0180–024F	București, sfințire, înainte
Vietnamese	ă â đ ê ô ơ ư (base letters) + 5 tone marks (acute, grave, hook, tilde, dot below) applied to each vowel = 134+ precomposed characters in U+1E00–1EFF	U+1E00–1EFF	Hà Nội, Thành phố Hồ Chí Minh, tiếng Việt

Hungarian adds ő and ű (double acute accent, U+0151 and U+0171) to the basic Extended-A set. Lithuanian uses ą ę į ų (ogonek below vowels) plus ė (dot above) and ū (macron). Latvian uses ā ē ī ū (macron) and ģ ķ ļ ņ ŗ (cedilla below). All three Baltic languages are covered by Latin Extended-A through U+017F.

Testing Font Character Support

The most reliable test is rendering real-language text in the font and looking for tofu (the empty rectangular boxes that appear when a glyph is missing). Each missing character triggers a fallback to the next font in your stack, potentially changing weight, width, and colour mid-word. Copy the test strings below into your browser's developer tools or a live preview environment.

French test string

C'est déjà l'été où l'œuvre de Ségolène dépasse toute espérance — vœux, cœur, île, Ça va?

German test string

Zwölf Boxkämpfer jagen Viktor quer über den großen Sylter Deich — Ärzte, Öffnungszeiten, Überprüfung, Straße

Polish test string

Pchła śniadła łasicę w źródle. Zażółć gęślą jaźń. Łódź, Ząbki, Świętokrzyskie, Bydgoszcz.

Czech test string

Příliš žluťoučký kůň úpěl ďábelské ódy. Dvořák, Špilberk, Třeboň, Žďár nad Sázavou.

Turkish test string

Pijamalı hasta yağız şoföre çabucak güvendi. İstanbul, Atatürk, Öğrenci, Çalışmak, Güneş.

Vietnamese test string

Tiếng Việt có sáu thanh điệu. Hà Nội, Thành phố Hồ Chí Minh, Đà Nẵng, Huế. Bầu trời xanh, mùa xuân đẹp.

Beyond visual inspection, use programmatic coverage checks. Our Font Analyzer reports which Unicode code points are present in a font file, so you can verify coverage against your target character set before deploying. Third-party tools like Wakamai Fondue and FontDrop also display Unicode coverage tables without requiring font upload to a remote server.

Testing tip: Use CSS to expose fallback fonts

Set your font stack to font-family: 'YourFont', fantasy; during testing. The fantasy generic is visually distinct. Any character falling through will be obviously wrong rather than silently using a similar-looking system font.

Subsetting for Latin Extended

The CSS unicode-range descriptor in @font-face enables selective font loading: the browser only downloads a font file when a character in that range actually appears on the page. This is the foundation of progressive Latin loading: deliver the most common characters first, then extended characters on demand.

Progressive loading: Basic Latin first

/* Tier 1: Basic Latin — loads immediately, always needed */
@font-face {
  font-family: 'Inter';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/inter-latin-basic.woff2') format('woff2');
  unicode-range: U+0020-007F, U+00A0-00FF; /* Basic + Latin-1 Supplement */
}

/* Tier 2: Latin Extended-A — loads when Polish, Czech, Turkish etc. appear */
@font-face {
  font-family: 'Inter';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/inter-latin-ext-a.woff2') format('woff2');
  unicode-range: U+0100-017F; /* Latin Extended-A */
}

/* Tier 3: Latin Extended-B — loads for Romanian comma-below ș ț */
@font-face {
  font-family: 'Inter';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/inter-latin-ext-b.woff2') format('woff2');
  unicode-range: U+0180-024F; /* Latin Extended-B */
}

/* Tier 4: Latin Extended Additional — loads only for Vietnamese */
@font-face {
  font-family: 'Inter';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/inter-latin-ext-additional.woff2') format('woff2');
  unicode-range: U+1E00-1EFF; /* Latin Extended Additional */
}

Full European coverage with a single unicode-range

If you prefer a single subset file for all European languages (excluding Vietnamese), use this consolidated range. It adds roughly 15–25 KB compared to Basic Latin alone, well worth the coverage.

/* All European Latin — one file, all accents */
@font-face {
  font-family: 'YourFont';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/yourfont-latin-all.woff2') format('woff2');
  unicode-range:
    U+0020-007F,  /* Basic Latin */
    U+00A0-00FF,  /* Latin-1 Supplement */
    U+0100-017F,  /* Latin Extended-A */
    U+0180-024F,  /* Latin Extended-B (Romanian ș ț) */
    U+0300-036F,  /* Combining Diacritical Marks (for dynamically composed text) */
    U+2000-206F,  /* General Punctuation (em dash, curly quotes, etc.) */
    U+20A0-20CF;  /* Currency Symbols (€, £, ¥) */
}

Use our Font Subsetter to generate these split files from any TTF or OTF. The tool accepts a unicode-range string directly and outputs a WOFF2 file containing exactly those glyphs. Use our Unicode Range Generator to build custom range strings for any combination of languages.

Common Missing Characters

Font designers working primarily for Western European markets routinely omit certain Extended-A and Extended-B characters from their glyphsets. The omissions follow a predictable pattern: characters with complex diacritics that sit below the baseline (which require careful sidebearing and kerning work) are more often skipped than those with marks above the x-height.

ą ę

Ogonek (U+0105, U+0119)

The ogonek is a small hook below the letter, used in Polish (ą, ę), Lithuanian (ą, ę, į, ų), and Old Norse. It attaches below the baseline and requires custom sidebearing adjustments. Budget and display fonts frequently omit all ogonek variants. Polish without ą ę is immediately illegible to native readers, as words like będę (I will be) and ją (her) are rendered unrecognisable.

ě š č

Caron / Háček (U+011B, U+0161, U+010D)

The caron (ˇ) appears in Czech (ě, š, č, ž, ř, ď, ť, ň), Slovak, Slovenian, Croatian, and Lithuanian. When placed over lowercase ‘d’ and ‘t’, it takes the shape of a right-handed apostrophe (ď, ť) rather than a wedge. Some fonts include the wedge form but not the apostrophe form, breaking Czech text. The special character ř (r with caron) is unique to Czech and is one of the most commonly missing glyphs.

ș ț

Comma Below (U+0219, U+021B)

Romanian requires ș (s with comma below, U+0219) and ț (t with comma below, U+021B), found in Latin Extended-B. These are distinct from ş (s with cedilla, U+015F) and ţ (t with cedilla, U+0163) in Latin Extended-A. The difference is visually subtle but typographically and semantically significant. Most fonts that include Extended-A include the cedilla variants but not the comma-below variants, leaving Romanian text rendered in a deprecated form. Use a font that explicitly lists comma-below coverage.

ı İ

Dotless i and Dotted I (U+0131, U+0130)

Turkish has a four-way i/I distinction: the regular dotted i/I pair (U+0069/U+0049) and the dotless ı/İ pair (U+0131/U+0130). This is not a stylistic choice but a grammatical distinction. The word ılık (lukewarm) and ilk (first) are different words. Fonts designed without Turkish in mind omit U+0131 and U+0130. Additionally, the uppercase İ has a dot that must be positioned above the capital height, which requires explicit glyph design work that budget font makers skip.

ő ű

Double Acute Accent (U+0151, U+0171)

Hungarian uses ő (o with double acute, U+0151) and ű (u with double acute, U+0171), which look like two acute accents rather than one. The double acute (˝) is a different Unicode character from the diaeresis (¨). Font designers sometimes map the visual appearance of double-acute characters to diaeresis glyphs as a shortcut, which breaks Unicode text searches and copy-paste behaviour. Always verify that ő and ű render with the correct double-acute shape, not a diaeresis.

Why budget fonts skip these characters

Adding below-baseline diacritics (ogonek, comma below) requires extra design work because the mark must not collide with descenders on the following line. The font designer must adjust spacing metrics and sometimes redraw surrounding character shapes. For a typeface targeting Western Europe, this work has no commercial payoff. The result: fonts labelled “Latin Extended” that cover only the above-baseline portion of Extended-A and silently drop Central European languages.

Font Recommendations by Latin Coverage

The table below rates popular web fonts against Latin coverage tiers. Coverage is verified against the fonts' published character maps and Google Fonts coverage data. Checkmarks indicate full, intended support with properly designed glyphs, not fallback or approximated characters.

Font	Latin Basic	Extended-A	Extended-B	Vietnamese	License
Inter	✓	✓	✓	✓	OFL (open)
Roboto	✓	✓	~	✓	Apache 2.0 (open)
Open Sans	✓	✓	~	✓	OFL (open)
Lato	✓	✓	✗	~	OFL (open)
Noto Sans	✓	✓	✓	✓	OFL (open)
Source Sans 3	✓	✓	✓	✓	OFL (open)
IBM Plex Sans	✓	✓	✓	✓	OFL (open)

✓ = full coverage with intentional glyph design. ~ = partial coverage (most characters present but some edge cases missing). ✗ = not covered. Roboto and Open Sans cover Romanian comma-below for common characters but may miss some extended Latin-B additions. Lato's Vietnamese support is partial: basic tone marks work but some stacked diacritic combinations (e.g., ặ, ệ, ộ) fall back to the system font.

Vietnamese: The Most Demanding Latin Script

Vietnamese is written entirely in the Latin script, a legacy of 17th-century Portuguese missionaries who created Chữ Quốc Ngữ (National Language Script). Despite using Latin letters, Vietnamese typography is fundamentally different from European Latin typography because it stacks diacritics: a vowel can carry both a base modification (circumflex, breve, or horn) and a tone mark (acute, grave, hook, tilde, or dot below), resulting in characters like ậ (a + circumflex + dot below) and ợ (o + horn + hook below).

Vietnamese vowel system

a ă â

a-family

e ê

e-family

o ô ơ

o-family

u ư

u-family

Each vowel in each family can combine with 5 tone marks (plus unmarked/level tone), creating up to 6 forms per vowel. For â, that means: â ấ ầ ẩ ẫ ậ, six distinct characters all in U+1E00–1EFF.

The total number of distinct precomposed Vietnamese characters in Unicode is 134, all concentrated in the Latin Extended Additional block (U+1E00–1EFF). This block is the single most frequently omitted block in Latin fonts. It adds roughly 8–15 KB to a WOFF2 file, a cost many font designers consider unjustified for a non-European market.

The consequence for Vietnamese text rendering without proper font support is severe: not just missing box characters, but complete failure to display tone-marked vowels, making the text phonetically ambiguous and often semantically incorrect. Vietnamese relies on tone marks to distinguish words that are otherwise identical in consonant and vowel structure.

/* Vietnamese unicode-range — use this for any site with Vietnamese content */

@font-face {
  font-family: 'YourFont';
  font-style: normal;
  font-weight: 400;
  font-display: swap;
  src: url('/fonts/yourfont-vietnamese.woff2') format('woff2');
  unicode-range:
    U+0102-0103,  /* ă Ă */
    U+0110-0111,  /* đ Đ */
    U+0128-0129,  /* ĩ Ĩ */
    U+0168-0169,  /* ũ Ũ */
    U+01A0-01A1,  /* ơ Ơ */
    U+01AF-01B0,  /* ư Ư */
    U+1EA0-1EF9,  /* full Vietnamese precomposed block */
    U+20AB;       /* ₫ Vietnamese Dong sign */
}

For production Vietnamese sites, Inter, Noto Sans, Source Sans 3, and IBM Plex Sans all provide reliable coverage. Test with the sentence Tiếng Việt có sáu thanh điệu: ngang, huyền, sắc, hỏi, ngã, nặng, which exercises all six tones across multiple vowel families and will immediately reveal any missing glyphs.

Written & Verified by

Sarah Mitchell

Typography expert specializing in font design, web typography, and accessibility

Latin Extended Font Support FAQs

Common questions about Latin character sets, diacritical marks, and font coverage

Latin Extended Font Support

TL;DR: Key Takeaways