a46764fb1b
ci / Validate workspace (push) Has been cancelled
landing-page-ci / Validate landing page (push) Has been cancelled
landing-page-deploy / Deploy landing page (push) Has been cancelled
github-metrics / Generate repository metrics SVG (push) Has been cancelled
refresh-contributors-wall / Refresh contributors wall cache bust (push) Waiting to run
364 lines
16 KiB
Markdown
364 lines
16 KiB
Markdown
# Font Discipline for PPTX Exports
|
||
|
||
Companion to `layout-discipline.md`. The rail / cursor primitives in that
|
||
file catch geometric drift; this file catches the typography drift that
|
||
geometry can't see — variable-font traps, missing CJK slots, fake italic
|
||
on Han characters. These are the bugs that pass `verify_layout.py` and
|
||
still look wrong.
|
||
|
||
Read this when:
|
||
|
||
- The audit table has 🟡 entries about italic / em / font fallback.
|
||
- PowerPoint silently swaps to Calibri / Arial / Microsoft JhengHei /
|
||
Georgia after you specified a different family.
|
||
- `unzip pptx | grep typeface` shows a face that isn't in your design system.
|
||
|
||
## Layer 1 — Font mapping in the export script
|
||
|
||
Walk each CSS class used by the source HTML and confirm the export
|
||
script maps it to the **same** font family.
|
||
|
||
⚠️ **Trap:** the visual category your eye reads is not always the
|
||
class's semantic category. Editorial decks routinely bind `.lead`,
|
||
`.callout`, or `.q-big` to a serif face, not the sans-serif you'd guess
|
||
from "lead". Open the HTML's CSS, read the `font-family` declaration
|
||
for each class, and copy the literal family name into the export's
|
||
font table.
|
||
|
||
Don't rely on visual intuition; rely on grep.
|
||
|
||
> **Coverage gap for Latin-slot scripts (Cyrillic / Greek / Vietnamese).**
|
||
> Russian / Ukrainian / Greek runs go through `<a:latin>`, not `<a:ea>` —
|
||
> they use the Latin slot. Many display fonts (Playfair Display, Source
|
||
> Serif 4) ship with weak or missing Cyrillic / Greek glyphs, and most
|
||
> drop Vietnamese Extended diacritics (ếẫỡỗ). PowerPoint silently falls
|
||
> back to Calibri / Times New Roman per missing glyph, producing
|
||
> mid-paragraph face shifts that look like a styling bug.
|
||
>
|
||
> When mapping a CSS class to a Latin font, check the font actually
|
||
> covers your scripts:
|
||
>
|
||
> ```bash
|
||
> # macOS / Linux: list the unicode blocks a font supports
|
||
> fc-query -f '%{charset}\n' "$(fc-match -f '%{file}\n' 'Playfair Display')" | head
|
||
> ```
|
||
>
|
||
> ```powershell
|
||
> # Windows: PowerShell + System.Drawing reads the registered family list
|
||
> [System.Reflection.Assembly]::LoadWithPartialName("System.Drawing") | Out-Null
|
||
> $f = New-Object System.Drawing.Text.PrivateFontCollection
|
||
> # Coverage detail (Unicode ranges) is best read in fontforge:
|
||
> # File → Open → pick the .ttf / .otf → Element → Font Info → OS/2 → Unicode Ranges.
|
||
> ```
|
||
>
|
||
> Cross-platform fallback: open the font in fontforge → Element → Font Info → OS/2 → Unicode Ranges.
|
||
>
|
||
> If coverage is missing, either swap to a face that has it (e.g.
|
||
> Inter / IBM Plex Sans for Cyrillic; Be Vietnam Pro for Vietnamese) or
|
||
> set a different `<a:latin>` per language run.
|
||
|
||
## Layer 2 — Font presence on the rendering machine
|
||
|
||
PowerPoint uses the OS font cache. If the family name in your XML isn't
|
||
installed, PowerPoint silently falls back. Check:
|
||
|
||
```bash
|
||
fc-list | grep -i "noto serif" # Linux / WSL
|
||
mdfind "kMDItemFSName == '*NotoSerif*'" # macOS
|
||
```
|
||
|
||
```powershell
|
||
# Windows (PowerShell)
|
||
Get-ChildItem -Path "$env:WINDIR\Fonts","$env:LOCALAPPDATA\Microsoft\Windows\Fonts" `
|
||
-Filter "*NotoSerif*" -ErrorAction SilentlyContinue
|
||
```
|
||
|
||
Install missing families:
|
||
|
||
```bash
|
||
brew install --cask \
|
||
font-noto-serif-tc \
|
||
font-playfair-display \
|
||
font-source-serif-4 \
|
||
font-ibm-plex-mono
|
||
```
|
||
|
||
The `verify_layout.py` script can't see this — it only checks
|
||
geometry. A standalone font audit step is required.
|
||
|
||
## Layer 3 — Variable fonts vs. static families ← most common trap
|
||
|
||
Modern fonts often ship as a **single variable file** containing all
|
||
weights (`NotoSerifTC[wght].ttf`). Looks elegant, but PowerPoint Mac /
|
||
Windows have spotty support:
|
||
|
||
- macOS reports the variable font's family name as its **default static
|
||
instance** — usually ExtraLight or Regular.
|
||
- PowerPoint asks the OS for "Noto Serif TC, weight 700"; the OS
|
||
reports the family as `Noto Serif TC ExtraLight`; PowerPoint can't
|
||
match → falls back to a system serif.
|
||
|
||
Diagnose:
|
||
|
||
```bash
|
||
ls -la ~/Library/Fonts/ | grep -i NotoSerif
|
||
```
|
||
|
||
| What you see | Verdict |
|
||
| -------------------------------------- | --------------------------------------- |
|
||
| One `*[wght].ttf` file | Variable. PowerPoint may not match. |
|
||
| Multiple `*-Regular.otf`, `*-Bold.otf` | Static family. Safe. |
|
||
|
||
Fix by using the static family equivalent:
|
||
|
||
| Don't use (variable) | Use instead (static) |
|
||
| --------------------------- | --------------------------------- |
|
||
| `Noto Serif TC` (variable) | `Noto Serif CJK TC` |
|
||
| `Source Serif 4` (variable) | `Source Serif Pro` / `Source Serif 4` static instances |
|
||
| `Inter` (variable) | Per-weight `Inter Regular` / `Inter Bold` |
|
||
|
||
After fixing the export, re-run `extract_pptx.py` and confirm the
|
||
`font` field matches the static name.
|
||
|
||
## Layer 4 — PPTX XML's three-language slots
|
||
|
||
PowerPoint chooses a typeface per run by language script. Each run can
|
||
declare three:
|
||
|
||
| Attribute | Used for |
|
||
| ----------------------- | -------------------------------- |
|
||
| `<a:latin typeface=…>` | Latin script (a-z, A-Z, digits) |
|
||
| `<a:ea typeface=…>` | East Asian (CJK) — **Chinese / Japanese / Korean go here** |
|
||
| `<a:cs typeface=…>` | Complex script (Arabic, Hebrew, Thai) |
|
||
|
||
Audit a file:
|
||
|
||
```bash
|
||
unzip -o /path/to/deck.pptx -d /tmp/audit
|
||
grep -h -oE 'typeface="[^"]+"' /tmp/audit/ppt/slides/slide*.xml | sort -u
|
||
```
|
||
|
||
Expected output: only the design-system fonts. If you see
|
||
`Microsoft JhengHei`, `Calibri`, `Arial`, `Georgia`, `Consolas`,
|
||
something has fallen back.
|
||
|
||
**Common defect:** export script writes `<a:latin>` only. Chinese runs
|
||
have no `<a:ea>` directive → PowerPoint picks the OS default
|
||
(Microsoft JhengHei on Windows, Hiragino Sans on Mac). Result: Chinese
|
||
characters in the wrong serif/sans family.
|
||
|
||
Fix: when adding a run with mixed-language content, set all three
|
||
attributes that apply.
|
||
|
||
```python
|
||
from pptx.oxml.ns import qn
|
||
|
||
def set_run_fonts(run, latin: str | None = None, ea: str | None = None, cs: str | None = None):
|
||
rPr = run._r.get_or_add_rPr()
|
||
if latin:
|
||
el = rPr.find(qn('a:latin'))
|
||
if el is None:
|
||
el = rPr.makeelement(qn('a:latin'), {})
|
||
rPr.append(el)
|
||
el.set('typeface', latin)
|
||
if ea:
|
||
el = rPr.find(qn('a:ea'))
|
||
if el is None:
|
||
el = rPr.makeelement(qn('a:ea'), {})
|
||
rPr.append(el)
|
||
el.set('typeface', ea)
|
||
if cs:
|
||
el = rPr.find(qn('a:cs'))
|
||
if el is None:
|
||
el = rPr.makeelement(qn('a:cs'), {})
|
||
rPr.append(el)
|
||
el.set('typeface', cs)
|
||
```
|
||
|
||
PptxGenJS sets all three by default; raw XML injection or python-pptx
|
||
without explicit `ea` slot does not.
|
||
|
||
## Layer 5 — Italic + script interaction
|
||
|
||
🚨 **`italic=True` is a Latin-script feature.** Apply it only to runs
|
||
whose characters belong to scripts where italic is part of the writing
|
||
tradition (Latin, Cyrillic, Greek). For everything else — CJK, Arabic,
|
||
Hebrew, Devanagari, Thai, Khmer — PowerPoint synthesizes a slanted
|
||
bitmap that looks mechanically deformed. The chain of failures, using
|
||
CJK as the canonical example:
|
||
|
||
1. `<a:latin>` slot has Playfair Display Italic (a Latin-only font).
|
||
2. The CJK characters in the run have no glyph in Playfair → PowerPoint
|
||
substitutes a system CJK font.
|
||
3. The substituted CJK font is forced into `italic=True` → since no
|
||
real CJK italic exists, PowerPoint synthesizes a slanted bitmap →
|
||
characters look mechanically deformed.
|
||
|
||
The same pattern triggers for Arabic, Hebrew, Devanagari, and Thai —
|
||
none of these scripts has an italic tradition, and faking it produces
|
||
a slant that's visually broken.
|
||
|
||
**Rule:** italic only applies to runs whose primary script supports it
|
||
(Latin / Cyrillic / Greek). Indicate emphasis on other scripts via:
|
||
|
||
- color tone (`COLOR_INK_60` for muted, full ink for emphasis)
|
||
- weight contrast (Regular 400 vs. Bold 700)
|
||
- a script-native italic variant **only if one actually ships** — most
|
||
don't
|
||
|
||
Practical implementation:
|
||
|
||
```python
|
||
# Unicode ranges where italic should be suppressed.
|
||
# Principle: include scripts whose writing tradition has no italic style.
|
||
# Synthesized italic on these scripts produces a slanted bitmap that looks
|
||
# mechanically deformed.
|
||
NO_ITALIC_RANGES = (
|
||
(0x3400, 0x9FFF), # CJK Unified Ideographs
|
||
(0xF900, 0xFAFF), # CJK Compatibility Ideographs
|
||
(0x3040, 0x30FF), # Hiragana + Katakana
|
||
(0xAC00, 0xD7AF), # Hangul Syllables
|
||
(0x0590, 0x05FF), # Hebrew
|
||
(0x0600, 0x06FF), # Arabic
|
||
(0x0750, 0x077F), # Arabic Supplement
|
||
# Indic scripts — none have an italic tradition; PowerPoint synthesizes
|
||
# a fake slant on all of them. Add new ranges here when the deck mixes
|
||
# in additional scripts (e.g. Sinhala U+0D80–U+0DFF).
|
||
(0x0900, 0x097F), # Devanagari (Hindi, Marathi, Sanskrit)
|
||
(0x0980, 0x09FF), # Bengali
|
||
(0x0A00, 0x0A7F), # Gurmukhi (Punjabi)
|
||
(0x0A80, 0x0AFF), # Gujarati
|
||
(0x0B00, 0x0B7F), # Oriya
|
||
(0x0B80, 0x0BFF), # Tamil
|
||
(0x0C00, 0x0C7F), # Telugu
|
||
(0x0C80, 0x0CFF), # Kannada
|
||
(0x0D00, 0x0D7F), # Malayalam
|
||
# Southeast Asian
|
||
(0x0E00, 0x0E7F), # Thai
|
||
(0x0E80, 0x0EFF), # Lao
|
||
(0x1780, 0x17FF), # Khmer
|
||
)
|
||
|
||
|
||
def has_no_italic_script(text: str) -> bool:
|
||
return any(
|
||
any(lo <= ord(c) <= hi for lo, hi in NO_ITALIC_RANGES)
|
||
for c in text
|
||
)
|
||
|
||
|
||
def add_run_with_italic_safety(p, text, *, latin_face: str, ea_face: str,
|
||
cs_face: str | None, size_pt: int,
|
||
italic: bool, **kwargs):
|
||
"""Drop italic if the run contains characters from scripts without italic tradition.
|
||
|
||
Args:
|
||
latin_face: Font for Latin / Cyrillic / Greek runs (a:latin slot).
|
||
ea_face: Font for CJK runs (a:ea slot).
|
||
cs_face: Font for complex scripts — Arabic, Hebrew, Devanagari,
|
||
Thai, etc. (a:cs slot). Pass None when the run contains no
|
||
complex-script characters; set_run_fonts skips the slot.
|
||
"""
|
||
r = p.add_run()
|
||
r.text = text
|
||
r.font.size = Pt(size_pt)
|
||
r.font.italic = italic and not has_no_italic_script(text)
|
||
set_run_fonts(r, latin=latin_face, ea=ea_face, cs=cs_face)
|
||
return r
|
||
```
|
||
|
||
For mixed-script runs (e.g. `"In <em>2026</em> 開始"`), split into
|
||
multiple runs at language boundaries so the italic attribute can apply
|
||
to the Latin run only.
|
||
|
||
## Beyond CJK — other scripts
|
||
|
||
The five layers above are written in CJK examples because that's the
|
||
most common pairing in Open Design today, but the same machinery
|
||
applies to other scripts. Quick reference:
|
||
|
||
| Script family | XML slot | Italic OK? | Most common defect | Recommended faces |
|
||
| ------------------------ | ---------- | ---------- | ----------------------------------------------------------------------------------- | ------------------------------------------------ |
|
||
| Latin (en, de, es, vi…) | `a:latin` | ✅ | Vietnamese Extended diacritics dropped → fallback Calibri mid-paragraph | Be Vietnam Pro, IBM Plex Sans, Source Sans 3 |
|
||
| Cyrillic (ru, uk, bg) | `a:latin` | ✅ | Display fonts (Playfair, Source Serif) lack Cyrillic → fallback Calibri | Inter, IBM Plex Sans, Roboto |
|
||
| Greek (el) | `a:latin` | ✅ | Same as Cyrillic — display faces missing Greek → fallback | Inter, IBM Plex Sans |
|
||
| CJK (zh, ja, ko) | `a:ea` | ❌ | Variable-font trap (Layer 3); missing `a:ea` slot → fallback Microsoft JhengHei | Noto Sans CJK *, Source Han Sans, IBM Plex Sans JP |
|
||
| Arabic / Hebrew / Persian | `a:cs` | ❌ | `<a:rtl val="1"/>` not set → text direction breaks; kashida changes width | Noto Naskh Arabic, IBM Plex Sans Arabic, Amiri |
|
||
| Devanagari / Bengali | `a:cs` | ❌ | PowerPoint defaults to Mangal/Vrinda (low fidelity); cluster shaping bumps line height | Noto Sans Devanagari, Mukta, Hind |
|
||
| Thai / Lao / Khmer | `a:cs` | ❌ | No inter-word spaces → PowerPoint's break engine produces poor wraps; tone marks bump line height | Noto Sans Thai, Sarabun, Noto Sans Khmer |
|
||
|
||
For RTL scripts (Arabic / Hebrew / Persian), set both `<a:cs typeface=…>`
|
||
and `<a:rtl val="1"/>` on the run's `rPr`. Right-alignment, bidi text
|
||
flow, and chrome / footer mirroring are out of scope for `verify_layout.py`
|
||
today and need manual review — see the Tier 2 follow-up note in the
|
||
audit checklist.
|
||
|
||
> **RTL discipline scope.** Full RTL support is roughly 15–20% of the
|
||
> font + layout discipline surface area: Unicode TR9 bidi resolution,
|
||
> chrome / footer / page-number mirroring, kashida (Arabic
|
||
> elongation) interaction with line-fill, and right-anchored
|
||
> alignment. This skill covers the typeface + slot mechanics only;
|
||
> bidi and mirroring are flagged for a Tier 2 `rtl-discipline.md`
|
||
> follow-up when fa / ar / he usage volume justifies the investment.
|
||
|
||
## Line height per script
|
||
|
||
The `Cursor.take(gap=Inches(0.12))` default suits 14pt Latin body copy.
|
||
Other scripts need more vertical headroom because of stacked diacritics,
|
||
matras, or tone marks:
|
||
|
||
| Script | Recommended `gap` at 14pt body |
|
||
| ---------------------------------------- | ------------------------------ |
|
||
| Latin (no Vietnamese Extended) | `Inches(0.12)` (default) |
|
||
| Latin (with Vietnamese Extended ếẫỗ) | `Inches(0.14)` |
|
||
| CJK | `Inches(0.14–0.16)` |
|
||
| Devanagari / Bengali (matras / conjuncts)| `Inches(0.16–0.18)` |
|
||
| Thai / Lao / Khmer (tone marks above) | `Inches(0.16–0.18)` |
|
||
| Arabic / Hebrew | `Inches(0.13)` |
|
||
|
||
When the deck mixes scripts, take the max — line breathing-room is
|
||
visual, an under-spaced Thai run in an otherwise Latin deck reads as
|
||
"the Thai slide is broken".
|
||
|
||
> **Source for these numbers.** Measured against Noto Sans / Noto
|
||
> Serif / IBM Plex line-height at 14pt body with full diacritic stacks
|
||
> (e.g. Devanagari conjuncts ष्ट्र, Thai 4-mark sequences ก़ํ้, stacked
|
||
> Vietnamese ỗ). Adjust downward for condensed faces (Inter Condensed,
|
||
> Noto Sans Condensed) and upward for display sizes ≥ 24pt where
|
||
> diacritic ratios grow.
|
||
|
||
## Audit checklist
|
||
|
||
After re-export, confirm all five layers:
|
||
|
||
- [ ] Layer 1: Each CSS class in the HTML maps to the intended family
|
||
in the export script's font table.
|
||
- [ ] Layer 2: All declared families exist on the rendering machine
|
||
(`fc-list | grep`).
|
||
- [ ] Layer 3: No variable-font filename pretending to be a static
|
||
family. `~/Library/Fonts/` shows multi-file static families for
|
||
every face used.
|
||
- [ ] Layer 4: `unzip + grep typeface` returns only the design-system
|
||
fonts. No `Microsoft JhengHei` / `Calibri` / `Arial` / `Georgia`
|
||
/ `Consolas` residue.
|
||
- [ ] Layer 5: No run from a no-italic script (CJK / Arabic / Hebrew /
|
||
Devanagari / Thai) has `italic=True` set with a Latin italic
|
||
face in the `<a:latin>` slot.
|
||
- [ ] **Beyond CJK:** RTL slides set `<a:rtl val="1"/>` on the
|
||
paragraph's `pPr` — verify with:
|
||
|
||
```bash
|
||
unzip -o deck.pptx -d /tmp/audit
|
||
grep -h '<a:rtl' /tmp/audit/ppt/slides/*.xml | sort -u
|
||
# Expect a hit for every fa / ar / he slide; empty output on
|
||
# an RTL deck means the directionality wasn't propagated.
|
||
```
|
||
|
||
Cursor `gap` is bumped per the line-height table above when the
|
||
deck includes Vietnamese, Devanagari, Thai, or Khmer content.
|
||
|
||
If all five pass and the user still reports "the type looks wrong",
|
||
ask for a screenshot pointing at the specific glyph or word — the
|
||
remaining bugs are usually license-restricted fonts not embedded into
|
||
the file (see `SKILL.md` Step 5 verification).
|