# Font Discipline for PPTX Exports Companion to `layout-discipline.md`. The rail / cursor primitives in that file catch geometric drift; this file catches the typography drift that geometry can't see β€” variable-font traps, missing CJK slots, fake italic on Han characters. These are the bugs that pass `verify_layout.py` and still look wrong. Read this when: - The audit table has 🟑 entries about italic / em / font fallback. - PowerPoint silently swaps to Calibri / Arial / Microsoft JhengHei / Georgia after you specified a different family. - `unzip pptx | grep typeface` shows a face that isn't in your design system. ## Layer 1 β€” Font mapping in the export script Walk each CSS class used by the source HTML and confirm the export script maps it to the **same** font family. ⚠️ **Trap:** the visual category your eye reads is not always the class's semantic category. Editorial decks routinely bind `.lead`, `.callout`, or `.q-big` to a serif face, not the sans-serif you'd guess from "lead". Open the HTML's CSS, read the `font-family` declaration for each class, and copy the literal family name into the export's font table. Don't rely on visual intuition; rely on grep. > **Coverage gap for Latin-slot scripts (Cyrillic / Greek / Vietnamese).** > Russian / Ukrainian / Greek runs go through ``, not `` β€” > they use the Latin slot. Many display fonts (Playfair Display, Source > Serif 4) ship with weak or missing Cyrillic / Greek glyphs, and most > drop Vietnamese Extended diacritics (αΊΏαΊ«α»‘α»—). PowerPoint silently falls > back to Calibri / Times New Roman per missing glyph, producing > mid-paragraph face shifts that look like a styling bug. > > When mapping a CSS class to a Latin font, check the font actually > covers your scripts: > > ```bash > # macOS / Linux: list the unicode blocks a font supports > fc-query -f '%{charset}\n' "$(fc-match -f '%{file}\n' 'Playfair Display')" | head > ``` > > ```powershell > # Windows: PowerShell + System.Drawing reads the registered family list > [System.Reflection.Assembly]::LoadWithPartialName("System.Drawing") | Out-Null > $f = New-Object System.Drawing.Text.PrivateFontCollection > # Coverage detail (Unicode ranges) is best read in fontforge: > # File β†’ Open β†’ pick the .ttf / .otf β†’ Element β†’ Font Info β†’ OS/2 β†’ Unicode Ranges. > ``` > > Cross-platform fallback: open the font in fontforge β†’ Element β†’ Font Info β†’ OS/2 β†’ Unicode Ranges. > > If coverage is missing, either swap to a face that has it (e.g. > Inter / IBM Plex Sans for Cyrillic; Be Vietnam Pro for Vietnamese) or > set a different `` per language run. ## Layer 2 β€” Font presence on the rendering machine PowerPoint uses the OS font cache. If the family name in your XML isn't installed, PowerPoint silently falls back. Check: ```bash fc-list | grep -i "noto serif" # Linux / WSL mdfind "kMDItemFSName == '*NotoSerif*'" # macOS ``` ```powershell # Windows (PowerShell) Get-ChildItem -Path "$env:WINDIR\Fonts","$env:LOCALAPPDATA\Microsoft\Windows\Fonts" ` -Filter "*NotoSerif*" -ErrorAction SilentlyContinue ``` Install missing families: ```bash brew install --cask \ font-noto-serif-tc \ font-playfair-display \ font-source-serif-4 \ font-ibm-plex-mono ``` The `verify_layout.py` script can't see this β€” it only checks geometry. A standalone font audit step is required. ## Layer 3 β€” Variable fonts vs. static families ← most common trap Modern fonts often ship as a **single variable file** containing all weights (`NotoSerifTC[wght].ttf`). Looks elegant, but PowerPoint Mac / Windows have spotty support: - macOS reports the variable font's family name as its **default static instance** β€” usually ExtraLight or Regular. - PowerPoint asks the OS for "Noto Serif TC, weight 700"; the OS reports the family as `Noto Serif TC ExtraLight`; PowerPoint can't match β†’ falls back to a system serif. Diagnose: ```bash ls -la ~/Library/Fonts/ | grep -i NotoSerif ``` | What you see | Verdict | | -------------------------------------- | --------------------------------------- | | One `*[wght].ttf` file | Variable. PowerPoint may not match. | | Multiple `*-Regular.otf`, `*-Bold.otf` | Static family. Safe. | Fix by using the static family equivalent: | Don't use (variable) | Use instead (static) | | --------------------------- | --------------------------------- | | `Noto Serif TC` (variable) | `Noto Serif CJK TC` | | `Source Serif 4` (variable) | `Source Serif Pro` / `Source Serif 4` static instances | | `Inter` (variable) | Per-weight `Inter Regular` / `Inter Bold` | After fixing the export, re-run `extract_pptx.py` and confirm the `font` field matches the static name. ## Layer 4 β€” PPTX XML's three-language slots PowerPoint chooses a typeface per run by language script. Each run can declare three: | Attribute | Used for | | ----------------------- | -------------------------------- | | `` | Latin script (a-z, A-Z, digits) | | `` | East Asian (CJK) β€” **Chinese / Japanese / Korean go here** | | `` | Complex script (Arabic, Hebrew, Thai) | Audit a file: ```bash unzip -o /path/to/deck.pptx -d /tmp/audit grep -h -oE 'typeface="[^"]+"' /tmp/audit/ppt/slides/slide*.xml | sort -u ``` Expected output: only the design-system fonts. If you see `Microsoft JhengHei`, `Calibri`, `Arial`, `Georgia`, `Consolas`, something has fallen back. **Common defect:** export script writes `` only. Chinese runs have no `` directive β†’ PowerPoint picks the OS default (Microsoft JhengHei on Windows, Hiragino Sans on Mac). Result: Chinese characters in the wrong serif/sans family. Fix: when adding a run with mixed-language content, set all three attributes that apply. ```python from pptx.oxml.ns import qn def set_run_fonts(run, latin: str | None = None, ea: str | None = None, cs: str | None = None): rPr = run._r.get_or_add_rPr() if latin: el = rPr.find(qn('a:latin')) if el is None: el = rPr.makeelement(qn('a:latin'), {}) rPr.append(el) el.set('typeface', latin) if ea: el = rPr.find(qn('a:ea')) if el is None: el = rPr.makeelement(qn('a:ea'), {}) rPr.append(el) el.set('typeface', ea) if cs: el = rPr.find(qn('a:cs')) if el is None: el = rPr.makeelement(qn('a:cs'), {}) rPr.append(el) el.set('typeface', cs) ``` PptxGenJS sets all three by default; raw XML injection or python-pptx without explicit `ea` slot does not. ## Layer 5 β€” Italic + script interaction 🚨 **`italic=True` is a Latin-script feature.** Apply it only to runs whose characters belong to scripts where italic is part of the writing tradition (Latin, Cyrillic, Greek). For everything else β€” CJK, Arabic, Hebrew, Devanagari, Thai, Khmer β€” PowerPoint synthesizes a slanted bitmap that looks mechanically deformed. The chain of failures, using CJK as the canonical example: 1. `` slot has Playfair Display Italic (a Latin-only font). 2. The CJK characters in the run have no glyph in Playfair β†’ PowerPoint substitutes a system CJK font. 3. The substituted CJK font is forced into `italic=True` β†’ since no real CJK italic exists, PowerPoint synthesizes a slanted bitmap β†’ characters look mechanically deformed. The same pattern triggers for Arabic, Hebrew, Devanagari, and Thai β€” none of these scripts has an italic tradition, and faking it produces a slant that's visually broken. **Rule:** italic only applies to runs whose primary script supports it (Latin / Cyrillic / Greek). Indicate emphasis on other scripts via: - color tone (`COLOR_INK_60` for muted, full ink for emphasis) - weight contrast (Regular 400 vs. Bold 700) - a script-native italic variant **only if one actually ships** β€” most don't Practical implementation: ```python # Unicode ranges where italic should be suppressed. # Principle: include scripts whose writing tradition has no italic style. # Synthesized italic on these scripts produces a slanted bitmap that looks # mechanically deformed. NO_ITALIC_RANGES = ( (0x3400, 0x9FFF), # CJK Unified Ideographs (0xF900, 0xFAFF), # CJK Compatibility Ideographs (0x3040, 0x30FF), # Hiragana + Katakana (0xAC00, 0xD7AF), # Hangul Syllables (0x0590, 0x05FF), # Hebrew (0x0600, 0x06FF), # Arabic (0x0750, 0x077F), # Arabic Supplement # Indic scripts β€” none have an italic tradition; PowerPoint synthesizes # a fake slant on all of them. Add new ranges here when the deck mixes # in additional scripts (e.g. Sinhala U+0D80–U+0DFF). (0x0900, 0x097F), # Devanagari (Hindi, Marathi, Sanskrit) (0x0980, 0x09FF), # Bengali (0x0A00, 0x0A7F), # Gurmukhi (Punjabi) (0x0A80, 0x0AFF), # Gujarati (0x0B00, 0x0B7F), # Oriya (0x0B80, 0x0BFF), # Tamil (0x0C00, 0x0C7F), # Telugu (0x0C80, 0x0CFF), # Kannada (0x0D00, 0x0D7F), # Malayalam # Southeast Asian (0x0E00, 0x0E7F), # Thai (0x0E80, 0x0EFF), # Lao (0x1780, 0x17FF), # Khmer ) def has_no_italic_script(text: str) -> bool: return any( any(lo <= ord(c) <= hi for lo, hi in NO_ITALIC_RANGES) for c in text ) def add_run_with_italic_safety(p, text, *, latin_face: str, ea_face: str, cs_face: str | None, size_pt: int, italic: bool, **kwargs): """Drop italic if the run contains characters from scripts without italic tradition. Args: latin_face: Font for Latin / Cyrillic / Greek runs (a:latin slot). ea_face: Font for CJK runs (a:ea slot). cs_face: Font for complex scripts β€” Arabic, Hebrew, Devanagari, Thai, etc. (a:cs slot). Pass None when the run contains no complex-script characters; set_run_fonts skips the slot. """ r = p.add_run() r.text = text r.font.size = Pt(size_pt) r.font.italic = italic and not has_no_italic_script(text) set_run_fonts(r, latin=latin_face, ea=ea_face, cs=cs_face) return r ``` For mixed-script runs (e.g. `"In 2026 ι–‹ε§‹"`), split into multiple runs at language boundaries so the italic attribute can apply to the Latin run only. ## Beyond CJK β€” other scripts The five layers above are written in CJK examples because that's the most common pairing in Open Design today, but the same machinery applies to other scripts. Quick reference: | Script family | XML slot | Italic OK? | Most common defect | Recommended faces | | ------------------------ | ---------- | ---------- | ----------------------------------------------------------------------------------- | ------------------------------------------------ | | Latin (en, de, es, vi…) | `a:latin` | βœ… | Vietnamese Extended diacritics dropped β†’ fallback Calibri mid-paragraph | Be Vietnam Pro, IBM Plex Sans, Source Sans 3 | | Cyrillic (ru, uk, bg) | `a:latin` | βœ… | Display fonts (Playfair, Source Serif) lack Cyrillic β†’ fallback Calibri | Inter, IBM Plex Sans, Roboto | | Greek (el) | `a:latin` | βœ… | Same as Cyrillic β€” display faces missing Greek β†’ fallback | Inter, IBM Plex Sans | | CJK (zh, ja, ko) | `a:ea` | ❌ | Variable-font trap (Layer 3); missing `a:ea` slot β†’ fallback Microsoft JhengHei | Noto Sans CJK *, Source Han Sans, IBM Plex Sans JP | | Arabic / Hebrew / Persian | `a:cs` | ❌ | `` not set β†’ text direction breaks; kashida changes width | Noto Naskh Arabic, IBM Plex Sans Arabic, Amiri | | Devanagari / Bengali | `a:cs` | ❌ | PowerPoint defaults to Mangal/Vrinda (low fidelity); cluster shaping bumps line height | Noto Sans Devanagari, Mukta, Hind | | Thai / Lao / Khmer | `a:cs` | ❌ | No inter-word spaces β†’ PowerPoint's break engine produces poor wraps; tone marks bump line height | Noto Sans Thai, Sarabun, Noto Sans Khmer | For RTL scripts (Arabic / Hebrew / Persian), set both `` and `` on the run's `rPr`. Right-alignment, bidi text flow, and chrome / footer mirroring are out of scope for `verify_layout.py` today and need manual review β€” see the Tier 2 follow-up note in the audit checklist. > **RTL discipline scope.** Full RTL support is roughly 15–20% of the > font + layout discipline surface area: Unicode TR9 bidi resolution, > chrome / footer / page-number mirroring, kashida (Arabic > elongation) interaction with line-fill, and right-anchored > alignment. This skill covers the typeface + slot mechanics only; > bidi and mirroring are flagged for a Tier 2 `rtl-discipline.md` > follow-up when fa / ar / he usage volume justifies the investment. ## Line height per script The `Cursor.take(gap=Inches(0.12))` default suits 14pt Latin body copy. Other scripts need more vertical headroom because of stacked diacritics, matras, or tone marks: | Script | Recommended `gap` at 14pt body | | ---------------------------------------- | ------------------------------ | | Latin (no Vietnamese Extended) | `Inches(0.12)` (default) | | Latin (with Vietnamese Extended αΊΏαΊ«α»—) | `Inches(0.14)` | | CJK | `Inches(0.14–0.16)` | | Devanagari / Bengali (matras / conjuncts)| `Inches(0.16–0.18)` | | Thai / Lao / Khmer (tone marks above) | `Inches(0.16–0.18)` | | Arabic / Hebrew | `Inches(0.13)` | When the deck mixes scripts, take the max β€” line breathing-room is visual, an under-spaced Thai run in an otherwise Latin deck reads as "the Thai slide is broken". > **Source for these numbers.** Measured against Noto Sans / Noto > Serif / IBM Plex line-height at 14pt body with full diacritic stacks > (e.g. Devanagari conjuncts ΰ€·ΰ₯ΰ€Ÿΰ₯ΰ€°, Thai 4-mark sequences ΰΈΰ€ΌΰΉΰΉ‰, stacked > Vietnamese α»—). Adjust downward for condensed faces (Inter Condensed, > Noto Sans Condensed) and upward for display sizes β‰₯ 24pt where > diacritic ratios grow. ## Audit checklist After re-export, confirm all five layers: - [ ] Layer 1: Each CSS class in the HTML maps to the intended family in the export script's font table. - [ ] Layer 2: All declared families exist on the rendering machine (`fc-list | grep`). - [ ] Layer 3: No variable-font filename pretending to be a static family. `~/Library/Fonts/` shows multi-file static families for every face used. - [ ] Layer 4: `unzip + grep typeface` returns only the design-system fonts. No `Microsoft JhengHei` / `Calibri` / `Arial` / `Georgia` / `Consolas` residue. - [ ] Layer 5: No run from a no-italic script (CJK / Arabic / Hebrew / Devanagari / Thai) has `italic=True` set with a Latin italic face in the `` slot. - [ ] **Beyond CJK:** RTL slides set `` on the paragraph's `pPr` β€” verify with: ```bash unzip -o deck.pptx -d /tmp/audit grep -h '