Files
open-design/skills/pptx-html-fidelity-audit/references/font-discipline.md
T
Zakaria a46764fb1b
ci / Validate workspace (push) Has been cancelled
landing-page-ci / Validate landing page (push) Has been cancelled
landing-page-deploy / Deploy landing page (push) Has been cancelled
github-metrics / Generate repository metrics SVG (push) Has been cancelled
refresh-contributors-wall / Refresh contributors wall cache bust (push) Waiting to run
first-commit
2026-05-04 14:58:14 -04:00

16 KiB
Raw Blame History

Font Discipline for PPTX Exports

Companion to layout-discipline.md. The rail / cursor primitives in that file catch geometric drift; this file catches the typography drift that geometry can't see — variable-font traps, missing CJK slots, fake italic on Han characters. These are the bugs that pass verify_layout.py and still look wrong.

Read this when:

  • The audit table has 🟡 entries about italic / em / font fallback.
  • PowerPoint silently swaps to Calibri / Arial / Microsoft JhengHei / Georgia after you specified a different family.
  • unzip pptx | grep typeface shows a face that isn't in your design system.

Layer 1 — Font mapping in the export script

Walk each CSS class used by the source HTML and confirm the export script maps it to the same font family.

⚠️ Trap: the visual category your eye reads is not always the class's semantic category. Editorial decks routinely bind .lead, .callout, or .q-big to a serif face, not the sans-serif you'd guess from "lead". Open the HTML's CSS, read the font-family declaration for each class, and copy the literal family name into the export's font table.

Don't rely on visual intuition; rely on grep.

Coverage gap for Latin-slot scripts (Cyrillic / Greek / Vietnamese). Russian / Ukrainian / Greek runs go through <a:latin>, not <a:ea> — they use the Latin slot. Many display fonts (Playfair Display, Source Serif 4) ship with weak or missing Cyrillic / Greek glyphs, and most drop Vietnamese Extended diacritics (ếẫỡỗ). PowerPoint silently falls back to Calibri / Times New Roman per missing glyph, producing mid-paragraph face shifts that look like a styling bug.

When mapping a CSS class to a Latin font, check the font actually covers your scripts:

# macOS / Linux: list the unicode blocks a font supports
fc-query -f '%{charset}\n' "$(fc-match -f '%{file}\n' 'Playfair Display')" | head
# Windows: PowerShell + System.Drawing reads the registered family list
[System.Reflection.Assembly]::LoadWithPartialName("System.Drawing") | Out-Null
$f = New-Object System.Drawing.Text.PrivateFontCollection
# Coverage detail (Unicode ranges) is best read in fontforge:
# File → Open → pick the .ttf / .otf → Element → Font Info → OS/2 → Unicode Ranges.

Cross-platform fallback: open the font in fontforge → Element → Font Info → OS/2 → Unicode Ranges.

If coverage is missing, either swap to a face that has it (e.g. Inter / IBM Plex Sans for Cyrillic; Be Vietnam Pro for Vietnamese) or set a different <a:latin> per language run.

Layer 2 — Font presence on the rendering machine

PowerPoint uses the OS font cache. If the family name in your XML isn't installed, PowerPoint silently falls back. Check:

fc-list | grep -i "noto serif"            # Linux / WSL
mdfind "kMDItemFSName == '*NotoSerif*'"   # macOS
# Windows (PowerShell)
Get-ChildItem -Path "$env:WINDIR\Fonts","$env:LOCALAPPDATA\Microsoft\Windows\Fonts" `
  -Filter "*NotoSerif*" -ErrorAction SilentlyContinue

Install missing families:

brew install --cask \
  font-noto-serif-tc \
  font-playfair-display \
  font-source-serif-4 \
  font-ibm-plex-mono

The verify_layout.py script can't see this — it only checks geometry. A standalone font audit step is required.

Layer 3 — Variable fonts vs. static families ← most common trap

Modern fonts often ship as a single variable file containing all weights (NotoSerifTC[wght].ttf). Looks elegant, but PowerPoint Mac / Windows have spotty support:

  • macOS reports the variable font's family name as its default static instance — usually ExtraLight or Regular.
  • PowerPoint asks the OS for "Noto Serif TC, weight 700"; the OS reports the family as Noto Serif TC ExtraLight; PowerPoint can't match → falls back to a system serif.

Diagnose:

ls -la ~/Library/Fonts/ | grep -i NotoSerif
What you see Verdict
One *[wght].ttf file Variable. PowerPoint may not match.
Multiple *-Regular.otf, *-Bold.otf Static family. Safe.

Fix by using the static family equivalent:

Don't use (variable) Use instead (static)
Noto Serif TC (variable) Noto Serif CJK TC
Source Serif 4 (variable) Source Serif Pro / Source Serif 4 static instances
Inter (variable) Per-weight Inter Regular / Inter Bold

After fixing the export, re-run extract_pptx.py and confirm the font field matches the static name.

Layer 4 — PPTX XML's three-language slots

PowerPoint chooses a typeface per run by language script. Each run can declare three:

Attribute Used for
<a:latin typeface=…> Latin script (a-z, A-Z, digits)
<a:ea typeface=…> East Asian (CJK) — Chinese / Japanese / Korean go here
<a:cs typeface=…> Complex script (Arabic, Hebrew, Thai)

Audit a file:

unzip -o /path/to/deck.pptx -d /tmp/audit
grep -h -oE 'typeface="[^"]+"' /tmp/audit/ppt/slides/slide*.xml | sort -u

Expected output: only the design-system fonts. If you see Microsoft JhengHei, Calibri, Arial, Georgia, Consolas, something has fallen back.

Common defect: export script writes <a:latin> only. Chinese runs have no <a:ea> directive → PowerPoint picks the OS default (Microsoft JhengHei on Windows, Hiragino Sans on Mac). Result: Chinese characters in the wrong serif/sans family.

Fix: when adding a run with mixed-language content, set all three attributes that apply.

from pptx.oxml.ns import qn

def set_run_fonts(run, latin: str | None = None, ea: str | None = None, cs: str | None = None):
    rPr = run._r.get_or_add_rPr()
    if latin:
        el = rPr.find(qn('a:latin'))
        if el is None:
            el = rPr.makeelement(qn('a:latin'), {})
            rPr.append(el)
        el.set('typeface', latin)
    if ea:
        el = rPr.find(qn('a:ea'))
        if el is None:
            el = rPr.makeelement(qn('a:ea'), {})
            rPr.append(el)
        el.set('typeface', ea)
    if cs:
        el = rPr.find(qn('a:cs'))
        if el is None:
            el = rPr.makeelement(qn('a:cs'), {})
            rPr.append(el)
        el.set('typeface', cs)

PptxGenJS sets all three by default; raw XML injection or python-pptx without explicit ea slot does not.

Layer 5 — Italic + script interaction

🚨 italic=True is a Latin-script feature. Apply it only to runs whose characters belong to scripts where italic is part of the writing tradition (Latin, Cyrillic, Greek). For everything else — CJK, Arabic, Hebrew, Devanagari, Thai, Khmer — PowerPoint synthesizes a slanted bitmap that looks mechanically deformed. The chain of failures, using CJK as the canonical example:

  1. <a:latin> slot has Playfair Display Italic (a Latin-only font).
  2. The CJK characters in the run have no glyph in Playfair → PowerPoint substitutes a system CJK font.
  3. The substituted CJK font is forced into italic=True → since no real CJK italic exists, PowerPoint synthesizes a slanted bitmap → characters look mechanically deformed.

The same pattern triggers for Arabic, Hebrew, Devanagari, and Thai — none of these scripts has an italic tradition, and faking it produces a slant that's visually broken.

Rule: italic only applies to runs whose primary script supports it (Latin / Cyrillic / Greek). Indicate emphasis on other scripts via:

  • color tone (COLOR_INK_60 for muted, full ink for emphasis)
  • weight contrast (Regular 400 vs. Bold 700)
  • a script-native italic variant only if one actually ships — most don't

Practical implementation:

# Unicode ranges where italic should be suppressed.
# Principle: include scripts whose writing tradition has no italic style.
# Synthesized italic on these scripts produces a slanted bitmap that looks
# mechanically deformed.
NO_ITALIC_RANGES = (
    (0x3400, 0x9FFF),    # CJK Unified Ideographs
    (0xF900, 0xFAFF),    # CJK Compatibility Ideographs
    (0x3040, 0x30FF),    # Hiragana + Katakana
    (0xAC00, 0xD7AF),    # Hangul Syllables
    (0x0590, 0x05FF),    # Hebrew
    (0x0600, 0x06FF),    # Arabic
    (0x0750, 0x077F),    # Arabic Supplement
    # Indic scripts — none have an italic tradition; PowerPoint synthesizes
    # a fake slant on all of them. Add new ranges here when the deck mixes
    # in additional scripts (e.g. Sinhala U+0D80U+0DFF).
    (0x0900, 0x097F),    # Devanagari (Hindi, Marathi, Sanskrit)
    (0x0980, 0x09FF),    # Bengali
    (0x0A00, 0x0A7F),    # Gurmukhi (Punjabi)
    (0x0A80, 0x0AFF),    # Gujarati
    (0x0B00, 0x0B7F),    # Oriya
    (0x0B80, 0x0BFF),    # Tamil
    (0x0C00, 0x0C7F),    # Telugu
    (0x0C80, 0x0CFF),    # Kannada
    (0x0D00, 0x0D7F),    # Malayalam
    # Southeast Asian
    (0x0E00, 0x0E7F),    # Thai
    (0x0E80, 0x0EFF),    # Lao
    (0x1780, 0x17FF),    # Khmer
)


def has_no_italic_script(text: str) -> bool:
    return any(
        any(lo <= ord(c) <= hi for lo, hi in NO_ITALIC_RANGES)
        for c in text
    )


def add_run_with_italic_safety(p, text, *, latin_face: str, ea_face: str,
                               cs_face: str | None, size_pt: int,
                               italic: bool, **kwargs):
    """Drop italic if the run contains characters from scripts without italic tradition.

    Args:
        latin_face: Font for Latin / Cyrillic / Greek runs (a:latin slot).
        ea_face: Font for CJK runs (a:ea slot).
        cs_face: Font for complex scripts — Arabic, Hebrew, Devanagari,
            Thai, etc. (a:cs slot). Pass None when the run contains no
            complex-script characters; set_run_fonts skips the slot.
    """
    r = p.add_run()
    r.text = text
    r.font.size = Pt(size_pt)
    r.font.italic = italic and not has_no_italic_script(text)
    set_run_fonts(r, latin=latin_face, ea=ea_face, cs=cs_face)
    return r

For mixed-script runs (e.g. "In <em>2026</em> 開始"), split into multiple runs at language boundaries so the italic attribute can apply to the Latin run only.

Beyond CJK — other scripts

The five layers above are written in CJK examples because that's the most common pairing in Open Design today, but the same machinery applies to other scripts. Quick reference:

Script family XML slot Italic OK? Most common defect Recommended faces
Latin (en, de, es, vi…) a:latin Vietnamese Extended diacritics dropped → fallback Calibri mid-paragraph Be Vietnam Pro, IBM Plex Sans, Source Sans 3
Cyrillic (ru, uk, bg) a:latin Display fonts (Playfair, Source Serif) lack Cyrillic → fallback Calibri Inter, IBM Plex Sans, Roboto
Greek (el) a:latin Same as Cyrillic — display faces missing Greek → fallback Inter, IBM Plex Sans
CJK (zh, ja, ko) a:ea Variable-font trap (Layer 3); missing a:ea slot → fallback Microsoft JhengHei Noto Sans CJK *, Source Han Sans, IBM Plex Sans JP
Arabic / Hebrew / Persian a:cs <a:rtl val="1"/> not set → text direction breaks; kashida changes width Noto Naskh Arabic, IBM Plex Sans Arabic, Amiri
Devanagari / Bengali a:cs PowerPoint defaults to Mangal/Vrinda (low fidelity); cluster shaping bumps line height Noto Sans Devanagari, Mukta, Hind
Thai / Lao / Khmer a:cs No inter-word spaces → PowerPoint's break engine produces poor wraps; tone marks bump line height Noto Sans Thai, Sarabun, Noto Sans Khmer

For RTL scripts (Arabic / Hebrew / Persian), set both <a:cs typeface=…> and <a:rtl val="1"/> on the run's rPr. Right-alignment, bidi text flow, and chrome / footer mirroring are out of scope for verify_layout.py today and need manual review — see the Tier 2 follow-up note in the audit checklist.

RTL discipline scope. Full RTL support is roughly 1520% of the font + layout discipline surface area: Unicode TR9 bidi resolution, chrome / footer / page-number mirroring, kashida (Arabic elongation) interaction with line-fill, and right-anchored alignment. This skill covers the typeface + slot mechanics only; bidi and mirroring are flagged for a Tier 2 rtl-discipline.md follow-up when fa / ar / he usage volume justifies the investment.

Line height per script

The Cursor.take(gap=Inches(0.12)) default suits 14pt Latin body copy. Other scripts need more vertical headroom because of stacked diacritics, matras, or tone marks:

Script Recommended gap at 14pt body
Latin (no Vietnamese Extended) Inches(0.12) (default)
Latin (with Vietnamese Extended ếẫỗ) Inches(0.14)
CJK Inches(0.140.16)
Devanagari / Bengali (matras / conjuncts) Inches(0.160.18)
Thai / Lao / Khmer (tone marks above) Inches(0.160.18)
Arabic / Hebrew Inches(0.13)

When the deck mixes scripts, take the max — line breathing-room is visual, an under-spaced Thai run in an otherwise Latin deck reads as "the Thai slide is broken".

Source for these numbers. Measured against Noto Sans / Noto Serif / IBM Plex line-height at 14pt body with full diacritic stacks (e.g. Devanagari conjuncts ष्ट्र, Thai 4-mark sequences ก़ํ้, stacked Vietnamese ỗ). Adjust downward for condensed faces (Inter Condensed, Noto Sans Condensed) and upward for display sizes ≥ 24pt where diacritic ratios grow.

Audit checklist

After re-export, confirm all five layers:

  • Layer 1: Each CSS class in the HTML maps to the intended family in the export script's font table.

  • Layer 2: All declared families exist on the rendering machine (fc-list | grep).

  • Layer 3: No variable-font filename pretending to be a static family. ~/Library/Fonts/ shows multi-file static families for every face used.

  • Layer 4: unzip + grep typeface returns only the design-system fonts. No Microsoft JhengHei / Calibri / Arial / Georgia / Consolas residue.

  • Layer 5: No run from a no-italic script (CJK / Arabic / Hebrew / Devanagari / Thai) has italic=True set with a Latin italic face in the <a:latin> slot.

  • Beyond CJK: RTL slides set <a:rtl val="1"/> on the paragraph's pPr — verify with:

    ```bash
    unzip -o deck.pptx -d /tmp/audit
    grep -h '<a:rtl' /tmp/audit/ppt/slides/*.xml | sort -u
    # Expect a hit for every fa / ar / he slide; empty output on
    # an RTL deck means the directionality wasn't propagated.
    ```
    
    Cursor `gap` is bumped per the line-height table above when the
    deck includes Vietnamese, Devanagari, Thai, or Khmer content.
    

If all five pass and the user still reports "the type looks wrong", ask for a screenshot pointing at the specific glyph or word — the remaining bugs are usually license-restricted fonts not embedded into the file (see SKILL.md Step 5 verification).