first-commit
ci / Validate workspace (push) Has been cancelled
landing-page-ci / Validate landing page (push) Has been cancelled
landing-page-deploy / Deploy landing page (push) Has been cancelled
github-metrics / Generate repository metrics SVG (push) Has been cancelled
refresh-contributors-wall / Refresh contributors wall cache bust (push) Waiting to run
ci / Validate workspace (push) Has been cancelled
landing-page-ci / Validate landing page (push) Has been cancelled
landing-page-deploy / Deploy landing page (push) Has been cancelled
github-metrics / Generate repository metrics SVG (push) Has been cancelled
refresh-contributors-wall / Refresh contributors wall cache bust (push) Waiting to run
This commit is contained in:
@@ -0,0 +1,254 @@
|
||||
---
|
||||
name: pptx-html-fidelity-audit
|
||||
description: Audit a python-pptx export against its source HTML deck, identify layout/content drift (footer overflow, cropped content, missing italic/em, lost styling, off-rhythm spacing), and re-export with strict footer-rail + cursor-flow layout discipline. Use this skill whenever the user has a .pptx that was generated from an HTML slide deck and asks to compare/audit/verify/fix the export — including phrases like "compare ppt with html", "fidelity audit", "fix the pptx", "ppt is cut off", "footer overlap", "italic missing in pptx", "re-export the deck", "pptx-html-fidelity-audit", or any case where a python-pptx → HTML round-trip needs verification or repair. Also trigger when the user shows you a deck.html and a deck.pptx side by side and is debugging visual differences.
|
||||
triggers:
|
||||
- "pptx fidelity"
|
||||
- "pptx audit"
|
||||
- "ppt 跑掉"
|
||||
- "字型不對"
|
||||
- "footer overlap"
|
||||
- "verify pptx"
|
||||
- "html to pptx"
|
||||
od:
|
||||
mode: utility
|
||||
scenario: engineering
|
||||
---
|
||||
|
||||
# PPTX ↔ HTML Fidelity Audit
|
||||
|
||||
A repeatable workflow for catching the ways a `python-pptx` export silently drifts from its HTML source — and fixing them with a layout discipline that prevents the same regressions on the next pass.
|
||||
|
||||
## When this skill applies
|
||||
|
||||
The user has:
|
||||
|
||||
- A source HTML slide deck (typically a single-file deck with `<section class="slide">` blocks):
|
||||
|
||||
```html
|
||||
<section class="slide light">
|
||||
<div class="chrome">2026 · Q2 review</div>
|
||||
<span class="kicker">Pillar 03</span>
|
||||
<h2 class="h-xl">Shipping <em>velocity</em> doubled</h2>
|
||||
<p class="lead">…</p>
|
||||
<div class="foot">page 5 / 14</div>
|
||||
</section>
|
||||
```
|
||||
|
||||
- A PPTX file generated from that deck via python-pptx (or similar).
|
||||
- A suspicion (or visible evidence) that the PPTX doesn't match the HTML — text bleeding into the footer, italic words gone flat, hero slides not centered, sections cropped, tag styling lost.
|
||||
|
||||
If the user only has *one* of those two artifacts, this skill doesn't apply yet — first generate the missing one, or ask the user to provide it.
|
||||
|
||||
## Why this is hard (and why a skill helps)
|
||||
|
||||
PPTX is a fixed-canvas, absolute-positioned medium. HTML is a fluid, flow-based medium. A naive python-pptx export pins each block at hand-picked `(top, left)` coordinates, which works for the *first slide it was tested on* and silently fails for every other slide whose content has different intrinsic height. The result is the most common drift modes:
|
||||
|
||||
1. **Footer overflow** — content's `top + height` crosses into the footer row.
|
||||
2. **Off-canvas content** — bottom of last block exceeds `7.5"` (16:9 canvas).
|
||||
3. **Italic loss** — `<em>` in HTML never gets `run.font.italic = True`.
|
||||
4. **Hero slides not centered** — vertical-stack slides use `MARGIN_TOP` instead of computing center.
|
||||
5. **Box bounds intruding** — the text fits, but the *shape's bounding box* is oversized and visually crosses the rail.
|
||||
6. **Tag/styling loss** — colored chrome rows, kicker uppercase tracking, mono-vs-serif assignments quietly fall back to defaults.
|
||||
|
||||
Every one of these is a *layout discipline* problem, not a content problem. Once you adopt the discipline, they stop happening.
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
The audit is five steps. Don't skip any of them — the discipline only works if the audit produces a real list of issues to drive the re-export. A fix-without-audit pass tends to leave half the issues alive.
|
||||
|
||||
### Step 1 — Extract ground truth from the PPTX
|
||||
|
||||
Run `scripts/extract_pptx.py <path-to.pptx> > pptx_dump.json`. The script walks every shape on every slide and dumps text, position (`top` / `left`), size (`width` / `height`), and per-run typography (font name, size pt, bold, italic, color). This is the *actual* state of the export — don't trust the export script's intent, trust the dump.
|
||||
|
||||
For 14-slide decks, the dump is ~30–60 KB and human-readable.
|
||||
|
||||
### Step 2 — Walk the HTML structure
|
||||
|
||||
Read the source HTML and enumerate `<section class="slide">` blocks. For each, note:
|
||||
|
||||
- The slide's theme (`light` / `dark` / `hero light` / `hero dark`).
|
||||
- The `chrome` row text (top metadata).
|
||||
- The `kicker` (small uppercase eyebrow above the headline).
|
||||
- The headline (h-hero / h-xl / etc.) and any sub-head.
|
||||
- The body copy and any structured blocks (pipeline steps, cards, pillars, observation cards).
|
||||
- The `foot` row (bottom metadata).
|
||||
- Any `<em>` or italic-styled spans — italic is the silent regression.
|
||||
|
||||
Map each HTML slide to a PPTX slide index. For decks following the convention "slide 1 = cover, slide N = closing", the mapping is positional.
|
||||
|
||||
### Step 3 — Build the audit table
|
||||
|
||||
For each slide, walk shapes from the dump and check against expected layout rules. Use this exact table format — the severity column is what drives the fix priority:
|
||||
|
||||
```
|
||||
| Slide | Issue | Severity |
|
||||
|---|---|---|
|
||||
| 1 cover | meta-row 底端 6.95" 蓋過 footer (6.7") | 🔴 |
|
||||
| 5 checklist | row B 步驟描述底端 7.2" 切到 footer | 🔴 |
|
||||
| 8 3E | 收束段落直接坐在 footer 起點 | 🔴 |
|
||||
| 9 on-day | step 描述底端剛好碰 footer,無安全距 | 🟠 |
|
||||
| 多處 | em (Playfair italic) 未保留 | 🟡 |
|
||||
```
|
||||
|
||||
Severity rubric:
|
||||
|
||||
- 🔴 **critical** — content cropped, text invisible, footer overlap, off-canvas. Must fix.
|
||||
- 🟠 **high** — content visible but visual hierarchy broken, no breathing room, hero not centered. Should fix.
|
||||
- 🟡 **medium** — italic/em missing, font fallback wrong, color drift. Fix in this pass.
|
||||
- 🟢 **low** — minor spacing/alignment, sub-pixel offsets. Note but don't block.
|
||||
|
||||
After the table, write a short root-cause section: 90 % of the issues usually come from 2–3 systemic causes (e.g. "no footer rail enforced", "hero stacks pinned to MARGIN_TOP instead of centered", "italic never propagated"). Naming the systemic causes makes the re-export script much smaller and more correct.
|
||||
|
||||
### Step 4 — Re-export with footer-rail + cursor-flow layout discipline
|
||||
|
||||
This is the load-bearing technique. See `references/layout-discipline.md` for the full rules; the summary:
|
||||
|
||||
**Define the rails up front, once, for the whole deck:**
|
||||
|
||||
```python
|
||||
from pptx.util import Inches
|
||||
|
||||
CANVAS_W = Inches(13.333) # 16:9
|
||||
CANVAS_H = Inches(7.5)
|
||||
MARGIN_X = Inches(0.6)
|
||||
MARGIN_TOP = Inches(0.5)
|
||||
CONTENT_MAX_Y = Inches(6.70) # NOTHING in content area may cross this
|
||||
FOOTER_TOP = Inches(6.85) # footer row pinned here, edge-to-edge
|
||||
```
|
||||
|
||||
> **Customizing the rails.** The defaults above suit a 16:9 canvas with a slim footer. If your design system uses a wider footer or a 4:3 canvas, override these constants in your export script and pass the same values to `verify_layout.py` via `--content-max-y` / `--canvas-h` / `--canvas-w`. See `references/layout-discipline.md` §1 for the full constant table.
|
||||
|
||||
|
||||
**Use a cursor for content blocks instead of pinning each block at an absolute y:**
|
||||
|
||||
```python
|
||||
class Cursor:
|
||||
"""Advances down the slide; refuses to cross the footer rail."""
|
||||
def __init__(self, y_start, cap=CONTENT_MAX_Y):
|
||||
self.y = y_start
|
||||
self.cap = cap
|
||||
def take(self, h, gap=Inches(0.12)): # ~1 line of whitespace at 14pt; tighten/loosen per design system
|
||||
top = self.y
|
||||
self.y = top + h + gap
|
||||
if self.y > self.cap:
|
||||
raise OverflowError(
|
||||
f"cursor at {self.y} exceeds footer rail {self.cap}; "
|
||||
f"reduce block height or split slide"
|
||||
)
|
||||
return top
|
||||
```
|
||||
|
||||
For each slide, instantiate `Cursor(MARGIN_TOP)` and `take(height)` each block in reading order. The slide refuses to render if any block would cross the rail, so overflows become loud build errors instead of silent visual bugs.
|
||||
|
||||
**Hero (vertically-centered) slides use a budget instead of a cursor:**
|
||||
|
||||
```python
|
||||
def hero_layout(blocks):
|
||||
"""blocks = list of (height, gap_after) tuples in reading order."""
|
||||
total = sum(h + g for h, g in blocks)
|
||||
y_start = (CANVAS_H - total) / 2
|
||||
return Cursor(y_start)
|
||||
```
|
||||
|
||||
That single change kills "hero slide content sticks to top" — the most common hero defect.
|
||||
|
||||
**Tighten box height to fit text + minimal padding.** PowerPoint reveals shape bounds when they overlap (selection halos, Z-order conflicts), and an oversized box can visually cross the footer rail even when the text inside doesn't. Compute box height from text metrics + ~0.05" pad, not from generous wrappers.
|
||||
|
||||
**Preserve italic / em explicitly:**
|
||||
|
||||
```python
|
||||
def add_run(p, text, font, size_pt, italic=False, bold=False, color=None):
|
||||
r = p.add_run()
|
||||
r.text = text
|
||||
r.font.name = font
|
||||
r.font.size = Pt(size_pt)
|
||||
r.font.italic = italic
|
||||
r.font.bold = bold
|
||||
if color:
|
||||
r.font.color.rgb = color
|
||||
return r
|
||||
```
|
||||
|
||||
When walking HTML, detect `<em>` / `<i>` / inline style `font-style: italic` and pass `italic=True`. Use the EN serif face (Playfair Display, Source Serif, or fallback Georgia) for italic display copy — the CJK serif typically has no italic and looks broken if you try to italicize it.
|
||||
|
||||
For deeper font issues that the layout rails can't catch — variable-font traps where PowerPoint silently swaps to Calibri / Microsoft JhengHei, missing `<a:ea>` slot causing CJK runs to fall back, fake-italic on Han characters — read `references/font-discipline.md`. The five layers there cover everything `verify_layout.py` can't see.
|
||||
|
||||
### Step 5 — Verify post-export
|
||||
|
||||
After writing the new `.pptx`, run `scripts/verify_layout.py <path-to.pptx>`. The script:
|
||||
|
||||
- Walks every shape on every slide.
|
||||
- Asserts `top + height ≤ CONTENT_MAX_Y` for content shapes (footer/page-number shapes are allowed below the rail).
|
||||
- Asserts `top + height ≤ CANVAS_H` for all shapes (no off-canvas).
|
||||
- Asserts `left + width ≤ CANVAS_W` and `left ≥ 0`.
|
||||
- Reports violations as a single block: slide index, shape name, observed bottom, rail.
|
||||
|
||||
Zero violations is the gate for "this re-export is shippable". Don't claim the audit is fixed without running the verifier — the human eye misses 1–2 mm overflow at zoom-out, the script doesn't.
|
||||
|
||||
---
|
||||
|
||||
## Output to the user
|
||||
|
||||
After Step 5 passes, report:
|
||||
|
||||
1. **Audit table** — the table from Step 3.
|
||||
2. **Root causes** — 1-paragraph systemic explanation.
|
||||
3. **Fix list** — terse list of what was changed and why (e.g. "hero slides switched to budget centering", "all content blocks routed through Cursor", "em runs explicitly italic").
|
||||
4. **Verification** — "0 rail violations across N slides, file size X KB".
|
||||
5. **Path** — absolute path to the re-exported `.pptx`.
|
||||
|
||||
The user is reading for two reasons: confirming the visible bugs are fixed, and trusting the systemic fix is right. Cover both.
|
||||
|
||||
---
|
||||
|
||||
## Bundled resources
|
||||
|
||||
- `scripts/extract_pptx.py` — dump every shape on every slide as JSON. Run before the audit. **Important:** also run on the *original* export to compare, and on the *re-exported* one to confirm.
|
||||
- `scripts/verify_layout.py` — post-export rail checker. Returns nonzero exit code on violations so it slots into a CI pipeline if needed.
|
||||
- `references/layout-discipline.md` — the full footer-rail + cursor-flow rule set with code snippets for each common slide type (hero, content, pipeline, two-column, observation grid).
|
||||
- `references/font-discipline.md` — five-layer font audit: mapping, presence, variable-vs-static traps, the three XML language slots (`latin` / `ea` / `cs`), CJK + Latin italic interaction.
|
||||
- `references/audit-table-template.md` — copy-pasteable table template with severity legend.
|
||||
|
||||
Read the references when:
|
||||
|
||||
- The deck has slide types beyond what the SKILL.md covers (multi-column dashboards, embedded images, charts) → `layout-discipline.md`.
|
||||
- The audit shows 🟡 typography issues — italic missing, CJK falling back, unexpected `Calibri` / `Microsoft JhengHei` in the XML → `font-discipline.md`.
|
||||
- You want to drop the audit table directly into a report or markdown deliverable → `audit-table-template.md`.
|
||||
|
||||
---
|
||||
|
||||
## Anti-patterns to avoid
|
||||
|
||||
- **Patching individual slides without naming the systemic cause.** If you fix slide 5 by lowering its block by 0.2", you'll be back fixing slide 9, 11, and 14 next. Find the rule that produced all four problems.
|
||||
- **Trusting the original export script's intent.** Always run the extractor against the actual file. Drift between intent and reality is the bug.
|
||||
- **Skipping verification because "it looked fine in PowerPoint preview".** Preview anti-aliasing hides 1–2 mm overflows. The script doesn't.
|
||||
- **Italicizing scripts that have no italic tradition.** CJK, Arabic, Hebrew, Devanagari, Thai, and Khmer all produce a synthesized slant when forced into `italic=True`, and the result looks mechanically deformed. Italicize *only* runs whose primary script supports italic — Latin, Cyrillic, Greek. See `references/font-discipline.md` Layer 5 for the implementation pattern.
|
||||
- **Using `MARGIN_TOP` for hero slides.** Hero slides need *budget centering*, not top-anchored. This is the most common hero defect and the cheapest to fix.
|
||||
|
||||
---
|
||||
|
||||
## Why geometry-based verification, not visual diff
|
||||
|
||||
An earlier iteration of this skill leaned on visual diffing — render the
|
||||
.pptx through Keynote → PDF → PNG, screenshot the HTML through Chrome
|
||||
headless, stitch them side-by-side with `magick`. It worked, but with
|
||||
three sharp drawbacks:
|
||||
|
||||
- **Platform lock-in.** Keynote AppleScript is macOS-only; `magick` and
|
||||
font-discovery commands vary across OSes; CI pipelines on Linux can't
|
||||
reproduce the chain.
|
||||
- **Imprecision.** A 1-2 mm overflow gets anti-aliased away in a PNG
|
||||
preview. The human eye misses it; the script catches it as a hard
|
||||
numeric violation.
|
||||
- **Setup cost.** Every contributor needs the full graphics toolchain
|
||||
installed before they can audit. Geometry checks need only
|
||||
`python-pptx`.
|
||||
|
||||
Geometry-based verification gives up one thing the visual diff is good
|
||||
at: catching cases where shape positions are correct but the rendered
|
||||
glyph looks wrong (font fallback, kerning bugs, missing weight). When
|
||||
that case appears, fall back to a manual screenshot review — the
|
||||
five-layer audit in `references/font-discipline.md` covers most of the
|
||||
underlying causes.
|
||||
@@ -0,0 +1,58 @@
|
||||
# Audit Table Template
|
||||
|
||||
Drop-in markdown template for the Step-3 audit deliverable. Keep the column order and severity legend stable across audits — readers learn to scan for 🔴 first.
|
||||
|
||||
## Template
|
||||
|
||||
```markdown
|
||||
**Fidelity audit · `<deck-name>` · <date>**
|
||||
|
||||
| Slide | Issue | Severity |
|
||||
|---|---|---|
|
||||
| 1 cover | meta-row 底端 6.95" 蓋過 footer (6.7") | 🔴 |
|
||||
| 2 principle | meta-row 蓋 footer | 🔴 |
|
||||
| 5 checklist | row B 步驟描述底端 7.2" 切到 footer | 🔴 |
|
||||
| 8 3E | 收束段落直接坐在 footer 起點 | 🔴 |
|
||||
| 9 on-day | step 描述底端剛好碰 footer,無安全距 | 🟠 |
|
||||
| 10 obs | row 2 obs-card 底端 6.95" 切 footer | 🔴 |
|
||||
| 11 P&D | Note 段底端 7.34" 完全壓在 footer 之下 | 🔴 |
|
||||
| 13 deliv. | pipeline 描述底端 7.05" 切 footer | 🔴 |
|
||||
| 14 closing | meta-row 底端 7.24" 壓到 footer 之外 | 🔴 |
|
||||
| 多處 | em (Playfair italic)、特殊字級對比未保留 | 🟡 |
|
||||
|
||||
**Root causes**
|
||||
|
||||
1. **No footer rail enforced.** Content blocks pinned at hand-picked y-coordinates; the script had no `CONTENT_MAX_Y` invariant, so `top + height` silently crossed `6.7"` whenever the content was taller than the test slide.
|
||||
2. **Hero slides anchored at `MARGIN_TOP`.** Vertical centering was done by eye; cover and chapter-intro slides drift down as block heights vary.
|
||||
3. **Italic propagation skipped.** `<em>` spans in HTML mapped to plain runs; the EN serif italic identity was lost across all hero slides.
|
||||
|
||||
**Fix plan**
|
||||
|
||||
- Introduce `CONTENT_MAX_Y = 6.70"` and `FOOTER_TOP = 6.85"` as module-level constants.
|
||||
- Route all content blocks through a `Cursor` that refuses to cross the rail.
|
||||
- Switch hero slides to `hero_layout(blocks)` — compute total stack height, center on canvas.
|
||||
- Tighten `desc_h` (pipeline `0.85"`, checklist `0.65"`) to fit text + 0.05" pad.
|
||||
- Add `italic=True` path in `add_run()` that swaps to EN serif for italic Latin runs; skip italic for CJK.
|
||||
- Add post-export `verify_layout.py` step; require zero rail violations.
|
||||
```
|
||||
|
||||
## Severity legend (reproduce inline in reports)
|
||||
|
||||
```markdown
|
||||
- 🔴 **critical** — content cropped, text invisible, footer overlap, off-canvas. Must fix.
|
||||
- 🟠 **high** — content visible but visual hierarchy broken, no breathing room. Should fix.
|
||||
- 🟡 **medium** — italic/em missing, font fallback wrong, color drift. Fix in this pass.
|
||||
- 🟢 **low** — minor spacing/alignment, sub-pixel offsets. Note but don't block.
|
||||
```
|
||||
|
||||
## Verification footer (append after re-export)
|
||||
|
||||
```markdown
|
||||
**Verification**
|
||||
|
||||
- ✅ 0 rail violations across 14 slides
|
||||
- ✅ All shapes within canvas (`top + height ≤ 7.5"`, `left + width ≤ 13.333"`)
|
||||
- ✅ Italic preserved on all `<em>` runs (EN serif), skipped on CJK runs
|
||||
- ✅ Hero slides centered (cover, 03 act-i, 06 act-ii, 11 act-iii, 13 closing)
|
||||
- File: `<absolute-path>.pptx` · 54.7 KB
|
||||
```
|
||||
@@ -0,0 +1,363 @@
|
||||
# Font Discipline for PPTX Exports
|
||||
|
||||
Companion to `layout-discipline.md`. The rail / cursor primitives in that
|
||||
file catch geometric drift; this file catches the typography drift that
|
||||
geometry can't see — variable-font traps, missing CJK slots, fake italic
|
||||
on Han characters. These are the bugs that pass `verify_layout.py` and
|
||||
still look wrong.
|
||||
|
||||
Read this when:
|
||||
|
||||
- The audit table has 🟡 entries about italic / em / font fallback.
|
||||
- PowerPoint silently swaps to Calibri / Arial / Microsoft JhengHei /
|
||||
Georgia after you specified a different family.
|
||||
- `unzip pptx | grep typeface` shows a face that isn't in your design system.
|
||||
|
||||
## Layer 1 — Font mapping in the export script
|
||||
|
||||
Walk each CSS class used by the source HTML and confirm the export
|
||||
script maps it to the **same** font family.
|
||||
|
||||
⚠️ **Trap:** the visual category your eye reads is not always the
|
||||
class's semantic category. Editorial decks routinely bind `.lead`,
|
||||
`.callout`, or `.q-big` to a serif face, not the sans-serif you'd guess
|
||||
from "lead". Open the HTML's CSS, read the `font-family` declaration
|
||||
for each class, and copy the literal family name into the export's
|
||||
font table.
|
||||
|
||||
Don't rely on visual intuition; rely on grep.
|
||||
|
||||
> **Coverage gap for Latin-slot scripts (Cyrillic / Greek / Vietnamese).**
|
||||
> Russian / Ukrainian / Greek runs go through `<a:latin>`, not `<a:ea>` —
|
||||
> they use the Latin slot. Many display fonts (Playfair Display, Source
|
||||
> Serif 4) ship with weak or missing Cyrillic / Greek glyphs, and most
|
||||
> drop Vietnamese Extended diacritics (ếẫỡỗ). PowerPoint silently falls
|
||||
> back to Calibri / Times New Roman per missing glyph, producing
|
||||
> mid-paragraph face shifts that look like a styling bug.
|
||||
>
|
||||
> When mapping a CSS class to a Latin font, check the font actually
|
||||
> covers your scripts:
|
||||
>
|
||||
> ```bash
|
||||
> # macOS / Linux: list the unicode blocks a font supports
|
||||
> fc-query -f '%{charset}\n' "$(fc-match -f '%{file}\n' 'Playfair Display')" | head
|
||||
> ```
|
||||
>
|
||||
> ```powershell
|
||||
> # Windows: PowerShell + System.Drawing reads the registered family list
|
||||
> [System.Reflection.Assembly]::LoadWithPartialName("System.Drawing") | Out-Null
|
||||
> $f = New-Object System.Drawing.Text.PrivateFontCollection
|
||||
> # Coverage detail (Unicode ranges) is best read in fontforge:
|
||||
> # File → Open → pick the .ttf / .otf → Element → Font Info → OS/2 → Unicode Ranges.
|
||||
> ```
|
||||
>
|
||||
> Cross-platform fallback: open the font in fontforge → Element → Font Info → OS/2 → Unicode Ranges.
|
||||
>
|
||||
> If coverage is missing, either swap to a face that has it (e.g.
|
||||
> Inter / IBM Plex Sans for Cyrillic; Be Vietnam Pro for Vietnamese) or
|
||||
> set a different `<a:latin>` per language run.
|
||||
|
||||
## Layer 2 — Font presence on the rendering machine
|
||||
|
||||
PowerPoint uses the OS font cache. If the family name in your XML isn't
|
||||
installed, PowerPoint silently falls back. Check:
|
||||
|
||||
```bash
|
||||
fc-list | grep -i "noto serif" # Linux / WSL
|
||||
mdfind "kMDItemFSName == '*NotoSerif*'" # macOS
|
||||
```
|
||||
|
||||
```powershell
|
||||
# Windows (PowerShell)
|
||||
Get-ChildItem -Path "$env:WINDIR\Fonts","$env:LOCALAPPDATA\Microsoft\Windows\Fonts" `
|
||||
-Filter "*NotoSerif*" -ErrorAction SilentlyContinue
|
||||
```
|
||||
|
||||
Install missing families:
|
||||
|
||||
```bash
|
||||
brew install --cask \
|
||||
font-noto-serif-tc \
|
||||
font-playfair-display \
|
||||
font-source-serif-4 \
|
||||
font-ibm-plex-mono
|
||||
```
|
||||
|
||||
The `verify_layout.py` script can't see this — it only checks
|
||||
geometry. A standalone font audit step is required.
|
||||
|
||||
## Layer 3 — Variable fonts vs. static families ← most common trap
|
||||
|
||||
Modern fonts often ship as a **single variable file** containing all
|
||||
weights (`NotoSerifTC[wght].ttf`). Looks elegant, but PowerPoint Mac /
|
||||
Windows have spotty support:
|
||||
|
||||
- macOS reports the variable font's family name as its **default static
|
||||
instance** — usually ExtraLight or Regular.
|
||||
- PowerPoint asks the OS for "Noto Serif TC, weight 700"; the OS
|
||||
reports the family as `Noto Serif TC ExtraLight`; PowerPoint can't
|
||||
match → falls back to a system serif.
|
||||
|
||||
Diagnose:
|
||||
|
||||
```bash
|
||||
ls -la ~/Library/Fonts/ | grep -i NotoSerif
|
||||
```
|
||||
|
||||
| What you see | Verdict |
|
||||
| -------------------------------------- | --------------------------------------- |
|
||||
| One `*[wght].ttf` file | Variable. PowerPoint may not match. |
|
||||
| Multiple `*-Regular.otf`, `*-Bold.otf` | Static family. Safe. |
|
||||
|
||||
Fix by using the static family equivalent:
|
||||
|
||||
| Don't use (variable) | Use instead (static) |
|
||||
| --------------------------- | --------------------------------- |
|
||||
| `Noto Serif TC` (variable) | `Noto Serif CJK TC` |
|
||||
| `Source Serif 4` (variable) | `Source Serif Pro` / `Source Serif 4` static instances |
|
||||
| `Inter` (variable) | Per-weight `Inter Regular` / `Inter Bold` |
|
||||
|
||||
After fixing the export, re-run `extract_pptx.py` and confirm the
|
||||
`font` field matches the static name.
|
||||
|
||||
## Layer 4 — PPTX XML's three-language slots
|
||||
|
||||
PowerPoint chooses a typeface per run by language script. Each run can
|
||||
declare three:
|
||||
|
||||
| Attribute | Used for |
|
||||
| ----------------------- | -------------------------------- |
|
||||
| `<a:latin typeface=…>` | Latin script (a-z, A-Z, digits) |
|
||||
| `<a:ea typeface=…>` | East Asian (CJK) — **Chinese / Japanese / Korean go here** |
|
||||
| `<a:cs typeface=…>` | Complex script (Arabic, Hebrew, Thai) |
|
||||
|
||||
Audit a file:
|
||||
|
||||
```bash
|
||||
unzip -o /path/to/deck.pptx -d /tmp/audit
|
||||
grep -h -oE 'typeface="[^"]+"' /tmp/audit/ppt/slides/slide*.xml | sort -u
|
||||
```
|
||||
|
||||
Expected output: only the design-system fonts. If you see
|
||||
`Microsoft JhengHei`, `Calibri`, `Arial`, `Georgia`, `Consolas`,
|
||||
something has fallen back.
|
||||
|
||||
**Common defect:** export script writes `<a:latin>` only. Chinese runs
|
||||
have no `<a:ea>` directive → PowerPoint picks the OS default
|
||||
(Microsoft JhengHei on Windows, Hiragino Sans on Mac). Result: Chinese
|
||||
characters in the wrong serif/sans family.
|
||||
|
||||
Fix: when adding a run with mixed-language content, set all three
|
||||
attributes that apply.
|
||||
|
||||
```python
|
||||
from pptx.oxml.ns import qn
|
||||
|
||||
def set_run_fonts(run, latin: str | None = None, ea: str | None = None, cs: str | None = None):
|
||||
rPr = run._r.get_or_add_rPr()
|
||||
if latin:
|
||||
el = rPr.find(qn('a:latin'))
|
||||
if el is None:
|
||||
el = rPr.makeelement(qn('a:latin'), {})
|
||||
rPr.append(el)
|
||||
el.set('typeface', latin)
|
||||
if ea:
|
||||
el = rPr.find(qn('a:ea'))
|
||||
if el is None:
|
||||
el = rPr.makeelement(qn('a:ea'), {})
|
||||
rPr.append(el)
|
||||
el.set('typeface', ea)
|
||||
if cs:
|
||||
el = rPr.find(qn('a:cs'))
|
||||
if el is None:
|
||||
el = rPr.makeelement(qn('a:cs'), {})
|
||||
rPr.append(el)
|
||||
el.set('typeface', cs)
|
||||
```
|
||||
|
||||
PptxGenJS sets all three by default; raw XML injection or python-pptx
|
||||
without explicit `ea` slot does not.
|
||||
|
||||
## Layer 5 — Italic + script interaction
|
||||
|
||||
🚨 **`italic=True` is a Latin-script feature.** Apply it only to runs
|
||||
whose characters belong to scripts where italic is part of the writing
|
||||
tradition (Latin, Cyrillic, Greek). For everything else — CJK, Arabic,
|
||||
Hebrew, Devanagari, Thai, Khmer — PowerPoint synthesizes a slanted
|
||||
bitmap that looks mechanically deformed. The chain of failures, using
|
||||
CJK as the canonical example:
|
||||
|
||||
1. `<a:latin>` slot has Playfair Display Italic (a Latin-only font).
|
||||
2. The CJK characters in the run have no glyph in Playfair → PowerPoint
|
||||
substitutes a system CJK font.
|
||||
3. The substituted CJK font is forced into `italic=True` → since no
|
||||
real CJK italic exists, PowerPoint synthesizes a slanted bitmap →
|
||||
characters look mechanically deformed.
|
||||
|
||||
The same pattern triggers for Arabic, Hebrew, Devanagari, and Thai —
|
||||
none of these scripts has an italic tradition, and faking it produces
|
||||
a slant that's visually broken.
|
||||
|
||||
**Rule:** italic only applies to runs whose primary script supports it
|
||||
(Latin / Cyrillic / Greek). Indicate emphasis on other scripts via:
|
||||
|
||||
- color tone (`COLOR_INK_60` for muted, full ink for emphasis)
|
||||
- weight contrast (Regular 400 vs. Bold 700)
|
||||
- a script-native italic variant **only if one actually ships** — most
|
||||
don't
|
||||
|
||||
Practical implementation:
|
||||
|
||||
```python
|
||||
# Unicode ranges where italic should be suppressed.
|
||||
# Principle: include scripts whose writing tradition has no italic style.
|
||||
# Synthesized italic on these scripts produces a slanted bitmap that looks
|
||||
# mechanically deformed.
|
||||
NO_ITALIC_RANGES = (
|
||||
(0x3400, 0x9FFF), # CJK Unified Ideographs
|
||||
(0xF900, 0xFAFF), # CJK Compatibility Ideographs
|
||||
(0x3040, 0x30FF), # Hiragana + Katakana
|
||||
(0xAC00, 0xD7AF), # Hangul Syllables
|
||||
(0x0590, 0x05FF), # Hebrew
|
||||
(0x0600, 0x06FF), # Arabic
|
||||
(0x0750, 0x077F), # Arabic Supplement
|
||||
# Indic scripts — none have an italic tradition; PowerPoint synthesizes
|
||||
# a fake slant on all of them. Add new ranges here when the deck mixes
|
||||
# in additional scripts (e.g. Sinhala U+0D80–U+0DFF).
|
||||
(0x0900, 0x097F), # Devanagari (Hindi, Marathi, Sanskrit)
|
||||
(0x0980, 0x09FF), # Bengali
|
||||
(0x0A00, 0x0A7F), # Gurmukhi (Punjabi)
|
||||
(0x0A80, 0x0AFF), # Gujarati
|
||||
(0x0B00, 0x0B7F), # Oriya
|
||||
(0x0B80, 0x0BFF), # Tamil
|
||||
(0x0C00, 0x0C7F), # Telugu
|
||||
(0x0C80, 0x0CFF), # Kannada
|
||||
(0x0D00, 0x0D7F), # Malayalam
|
||||
# Southeast Asian
|
||||
(0x0E00, 0x0E7F), # Thai
|
||||
(0x0E80, 0x0EFF), # Lao
|
||||
(0x1780, 0x17FF), # Khmer
|
||||
)
|
||||
|
||||
|
||||
def has_no_italic_script(text: str) -> bool:
|
||||
return any(
|
||||
any(lo <= ord(c) <= hi for lo, hi in NO_ITALIC_RANGES)
|
||||
for c in text
|
||||
)
|
||||
|
||||
|
||||
def add_run_with_italic_safety(p, text, *, latin_face: str, ea_face: str,
|
||||
cs_face: str | None, size_pt: int,
|
||||
italic: bool, **kwargs):
|
||||
"""Drop italic if the run contains characters from scripts without italic tradition.
|
||||
|
||||
Args:
|
||||
latin_face: Font for Latin / Cyrillic / Greek runs (a:latin slot).
|
||||
ea_face: Font for CJK runs (a:ea slot).
|
||||
cs_face: Font for complex scripts — Arabic, Hebrew, Devanagari,
|
||||
Thai, etc. (a:cs slot). Pass None when the run contains no
|
||||
complex-script characters; set_run_fonts skips the slot.
|
||||
"""
|
||||
r = p.add_run()
|
||||
r.text = text
|
||||
r.font.size = Pt(size_pt)
|
||||
r.font.italic = italic and not has_no_italic_script(text)
|
||||
set_run_fonts(r, latin=latin_face, ea=ea_face, cs=cs_face)
|
||||
return r
|
||||
```
|
||||
|
||||
For mixed-script runs (e.g. `"In <em>2026</em> 開始"`), split into
|
||||
multiple runs at language boundaries so the italic attribute can apply
|
||||
to the Latin run only.
|
||||
|
||||
## Beyond CJK — other scripts
|
||||
|
||||
The five layers above are written in CJK examples because that's the
|
||||
most common pairing in Open Design today, but the same machinery
|
||||
applies to other scripts. Quick reference:
|
||||
|
||||
| Script family | XML slot | Italic OK? | Most common defect | Recommended faces |
|
||||
| ------------------------ | ---------- | ---------- | ----------------------------------------------------------------------------------- | ------------------------------------------------ |
|
||||
| Latin (en, de, es, vi…) | `a:latin` | ✅ | Vietnamese Extended diacritics dropped → fallback Calibri mid-paragraph | Be Vietnam Pro, IBM Plex Sans, Source Sans 3 |
|
||||
| Cyrillic (ru, uk, bg) | `a:latin` | ✅ | Display fonts (Playfair, Source Serif) lack Cyrillic → fallback Calibri | Inter, IBM Plex Sans, Roboto |
|
||||
| Greek (el) | `a:latin` | ✅ | Same as Cyrillic — display faces missing Greek → fallback | Inter, IBM Plex Sans |
|
||||
| CJK (zh, ja, ko) | `a:ea` | ❌ | Variable-font trap (Layer 3); missing `a:ea` slot → fallback Microsoft JhengHei | Noto Sans CJK *, Source Han Sans, IBM Plex Sans JP |
|
||||
| Arabic / Hebrew / Persian | `a:cs` | ❌ | `<a:rtl val="1"/>` not set → text direction breaks; kashida changes width | Noto Naskh Arabic, IBM Plex Sans Arabic, Amiri |
|
||||
| Devanagari / Bengali | `a:cs` | ❌ | PowerPoint defaults to Mangal/Vrinda (low fidelity); cluster shaping bumps line height | Noto Sans Devanagari, Mukta, Hind |
|
||||
| Thai / Lao / Khmer | `a:cs` | ❌ | No inter-word spaces → PowerPoint's break engine produces poor wraps; tone marks bump line height | Noto Sans Thai, Sarabun, Noto Sans Khmer |
|
||||
|
||||
For RTL scripts (Arabic / Hebrew / Persian), set both `<a:cs typeface=…>`
|
||||
and `<a:rtl val="1"/>` on the run's `rPr`. Right-alignment, bidi text
|
||||
flow, and chrome / footer mirroring are out of scope for `verify_layout.py`
|
||||
today and need manual review — see the Tier 2 follow-up note in the
|
||||
audit checklist.
|
||||
|
||||
> **RTL discipline scope.** Full RTL support is roughly 15–20% of the
|
||||
> font + layout discipline surface area: Unicode TR9 bidi resolution,
|
||||
> chrome / footer / page-number mirroring, kashida (Arabic
|
||||
> elongation) interaction with line-fill, and right-anchored
|
||||
> alignment. This skill covers the typeface + slot mechanics only;
|
||||
> bidi and mirroring are flagged for a Tier 2 `rtl-discipline.md`
|
||||
> follow-up when fa / ar / he usage volume justifies the investment.
|
||||
|
||||
## Line height per script
|
||||
|
||||
The `Cursor.take(gap=Inches(0.12))` default suits 14pt Latin body copy.
|
||||
Other scripts need more vertical headroom because of stacked diacritics,
|
||||
matras, or tone marks:
|
||||
|
||||
| Script | Recommended `gap` at 14pt body |
|
||||
| ---------------------------------------- | ------------------------------ |
|
||||
| Latin (no Vietnamese Extended) | `Inches(0.12)` (default) |
|
||||
| Latin (with Vietnamese Extended ếẫỗ) | `Inches(0.14)` |
|
||||
| CJK | `Inches(0.14–0.16)` |
|
||||
| Devanagari / Bengali (matras / conjuncts)| `Inches(0.16–0.18)` |
|
||||
| Thai / Lao / Khmer (tone marks above) | `Inches(0.16–0.18)` |
|
||||
| Arabic / Hebrew | `Inches(0.13)` |
|
||||
|
||||
When the deck mixes scripts, take the max — line breathing-room is
|
||||
visual, an under-spaced Thai run in an otherwise Latin deck reads as
|
||||
"the Thai slide is broken".
|
||||
|
||||
> **Source for these numbers.** Measured against Noto Sans / Noto
|
||||
> Serif / IBM Plex line-height at 14pt body with full diacritic stacks
|
||||
> (e.g. Devanagari conjuncts ष्ट्र, Thai 4-mark sequences ก़ํ้, stacked
|
||||
> Vietnamese ỗ). Adjust downward for condensed faces (Inter Condensed,
|
||||
> Noto Sans Condensed) and upward for display sizes ≥ 24pt where
|
||||
> diacritic ratios grow.
|
||||
|
||||
## Audit checklist
|
||||
|
||||
After re-export, confirm all five layers:
|
||||
|
||||
- [ ] Layer 1: Each CSS class in the HTML maps to the intended family
|
||||
in the export script's font table.
|
||||
- [ ] Layer 2: All declared families exist on the rendering machine
|
||||
(`fc-list | grep`).
|
||||
- [ ] Layer 3: No variable-font filename pretending to be a static
|
||||
family. `~/Library/Fonts/` shows multi-file static families for
|
||||
every face used.
|
||||
- [ ] Layer 4: `unzip + grep typeface` returns only the design-system
|
||||
fonts. No `Microsoft JhengHei` / `Calibri` / `Arial` / `Georgia`
|
||||
/ `Consolas` residue.
|
||||
- [ ] Layer 5: No run from a no-italic script (CJK / Arabic / Hebrew /
|
||||
Devanagari / Thai) has `italic=True` set with a Latin italic
|
||||
face in the `<a:latin>` slot.
|
||||
- [ ] **Beyond CJK:** RTL slides set `<a:rtl val="1"/>` on the
|
||||
paragraph's `pPr` — verify with:
|
||||
|
||||
```bash
|
||||
unzip -o deck.pptx -d /tmp/audit
|
||||
grep -h '<a:rtl' /tmp/audit/ppt/slides/*.xml | sort -u
|
||||
# Expect a hit for every fa / ar / he slide; empty output on
|
||||
# an RTL deck means the directionality wasn't propagated.
|
||||
```
|
||||
|
||||
Cursor `gap` is bumped per the line-height table above when the
|
||||
deck includes Vietnamese, Devanagari, Thai, or Khmer content.
|
||||
|
||||
If all five pass and the user still reports "the type looks wrong",
|
||||
ask for a screenshot pointing at the specific glyph or word — the
|
||||
remaining bugs are usually license-restricted fonts not embedded into
|
||||
the file (see `SKILL.md` Step 5 verification).
|
||||
@@ -0,0 +1,371 @@
|
||||
# Footer-Rail + Cursor-Flow Layout Discipline
|
||||
|
||||
The full rule set referenced from `SKILL.md` Step 4. Read this when the deck has slide types beyond simple title-+-body or when you're building the re-export script from scratch.
|
||||
|
||||
> **How to use this file.** Skim §1-3 once to internalize the rules
|
||||
> (constants, `Cursor`, hero budget centering). Then jump to the slide-type
|
||||
> snippet that matches what you're building — pipeline, two-column,
|
||||
> observation grid, etc. — and adapt. The file is meant to be navigated,
|
||||
> not read end-to-end.
|
||||
|
||||
## 1. Constants — define once at the top of the export script
|
||||
|
||||
```python
|
||||
from pptx.util import Inches, Pt, Emu
|
||||
from pptx.dml.color import RGBColor
|
||||
|
||||
# Canvas (16:9). Override only if the deck explicitly targets 4:3 or 1:1.
|
||||
CANVAS_W = Inches(13.333)
|
||||
CANVAS_H = Inches(7.5)
|
||||
|
||||
# Margins
|
||||
MARGIN_X = Inches(0.6) # left / right symmetric
|
||||
MARGIN_TOP = Inches(0.5) # below the chrome row
|
||||
CONTENT_LEFT = MARGIN_X
|
||||
CONTENT_RIGHT = CANVAS_W - MARGIN_X
|
||||
CONTENT_W = CONTENT_RIGHT - CONTENT_LEFT
|
||||
|
||||
# Vertical rails — the load-bearing pair
|
||||
CHROME_TOP = Inches(0.32) # top metadata row
|
||||
CHROME_H = Inches(0.20)
|
||||
CONTENT_TOP = MARGIN_TOP # cursor starts here on content slides
|
||||
CONTENT_MAX_Y = Inches(6.70) # NOTHING in content area may cross
|
||||
FOOTER_TOP = Inches(6.85) # foot row pinned here
|
||||
FOOTER_H = Inches(0.22)
|
||||
|
||||
# Theme colors — derive from the HTML :root block, do not invent
|
||||
COLOR_INK = RGBColor(0x0a, 0x1f, 0x3d) # dark theme background / light text color
|
||||
COLOR_PAPER = RGBColor(0xf1, 0xf3, 0xf5) # light theme background / dark text color
|
||||
COLOR_INK_60 = RGBColor(0x68, 0x77, 0x8e) # 60 % opacity ink (precomputed)
|
||||
COLOR_PAPER_60 = RGBColor(0x9b, 0xa0, 0xa6) # 60 % opacity paper
|
||||
|
||||
# Typography stacks. EN italic uses serif-en; CJK never italicizes.
|
||||
FONT_SERIF_EN = "Playfair Display"
|
||||
FONT_SERIF_FB = "Source Serif 4"
|
||||
FONT_SERIF_ZH = "Noto Serif TC"
|
||||
FONT_SANS_ZH = "Noto Sans TC"
|
||||
FONT_MONO = "IBM Plex Mono"
|
||||
```
|
||||
|
||||
## 2. The Cursor primitive
|
||||
|
||||
Used on all non-hero slides. The cursor advances down the slide and refuses to cross `CONTENT_MAX_Y`.
|
||||
|
||||
```python
|
||||
class Cursor:
|
||||
def __init__(self, y_start=CONTENT_TOP, cap=CONTENT_MAX_Y):
|
||||
self.y = y_start
|
||||
self.cap = cap
|
||||
self.history = [] # list of (top, height, label) for debugging
|
||||
|
||||
def take(self, h, gap=Inches(0.12), label=""):
|
||||
top = self.y
|
||||
self.y = top + h + gap
|
||||
self.history.append((top, h, label))
|
||||
if self.y > self.cap:
|
||||
raise OverflowError(
|
||||
f"Cursor exceeded rail at '{label}': "
|
||||
f"y={self.y} cap={self.cap}; "
|
||||
f"history={self.history}"
|
||||
)
|
||||
return top
|
||||
|
||||
def remaining(self):
|
||||
return self.cap - self.y
|
||||
```
|
||||
|
||||
Usage:
|
||||
|
||||
```python
|
||||
c = Cursor()
|
||||
add_kicker(slide, top=c.take(Inches(0.18), label="kicker"))
|
||||
add_h_xl(slide, top=c.take(Inches(1.0), label="h-xl"))
|
||||
add_lead(slide, top=c.take(Inches(0.8), label="lead"))
|
||||
add_pipeline(slide, top=c.take(Inches(2.6), label="pipeline"))
|
||||
```
|
||||
|
||||
> **Per-script `gap` tuning.** The default `Inches(0.12)` matches 14pt
|
||||
> Latin body copy. Decks that include CJK, Devanagari, Thai, or
|
||||
> Khmer need more breathing room — line clusters and stacked tone
|
||||
> marks bump the rendered line height. Pass an explicit `gap=` per
|
||||
> block, or override the `Cursor` default at the top of your export.
|
||||
> The full per-script table is in
|
||||
> [`font-discipline.md` § Line height per script](font-discipline.md).
|
||||
>
|
||||
> **Detecting the highest-demand script in a mixed deck.** A deck
|
||||
> can mix `en` slides with `th` slides — locale alone isn't the
|
||||
> signal. Scan each slide's text against the Unicode ranges in
|
||||
> `font-discipline.md` Layer 5's `NO_ITALIC_RANGES` (extend with the
|
||||
> Vietnamese Extended block U+1E00–U+1EFF for ếẫỗ), record the
|
||||
> per-slide max-gap, and instantiate the slide's `Cursor` with that
|
||||
> value. For a uniform deck-wide setting, take the max across all
|
||||
> slides.
|
||||
|
||||
If a slide raises `OverflowError`, fix one of three things:
|
||||
|
||||
1. **Reduce block height** — the box was generously sized; tighten to actual text height.
|
||||
2. **Reduce gap** — the inter-block gap is excessive; trim from `0.18"` to `0.10"`.
|
||||
3. **Split the slide** — the content genuinely doesn't fit; this is a design problem, not a layout problem.
|
||||
|
||||
Don't "solve" it by raising `CONTENT_MAX_Y`. The rail exists for a reason — content that crosses it will overlap the footer at full-screen presentation.
|
||||
|
||||
## 3. Hero slides — budget centering, not cursor flow
|
||||
|
||||
Hero slides (cover, chapter intros, big-quote pages) are vertically centered. The cursor model would put them at the top with empty space below — visually wrong.
|
||||
|
||||
```python
|
||||
def hero_layout(blocks):
|
||||
"""
|
||||
blocks: list of (height, gap_after) tuples in top-to-bottom reading order.
|
||||
Returns a Cursor whose y_start is computed so the stack is centered.
|
||||
"""
|
||||
total_h = sum(h + g for h, g in blocks)
|
||||
y_start = (CANVAS_H - total_h) / 2
|
||||
# Pin cap to bottom of available area so we still catch overflow.
|
||||
return Cursor(y_start=y_start, cap=CANVAS_H - FOOTER_H - Inches(0.2))
|
||||
```
|
||||
|
||||
Hero usage:
|
||||
|
||||
```python
|
||||
# Plan the stack first.
|
||||
HERO_BLOCKS = [
|
||||
(Inches(0.18), Inches(0.30)), # kicker
|
||||
(Inches(1.50), Inches(0.20)), # h-hero
|
||||
(Inches(0.45), Inches(0.40)), # h-sub
|
||||
(Inches(0.70), Inches(0.30)), # lead
|
||||
(Inches(0.20), Inches(0.00)), # meta-row
|
||||
]
|
||||
c = hero_layout(HERO_BLOCKS)
|
||||
for (h, g), block_fn in zip(HERO_BLOCKS, [k_kicker, k_hero, k_sub, k_lead, k_meta]):
|
||||
block_fn(slide, top=c.take(h, gap=g))
|
||||
```
|
||||
|
||||
The pattern reads as: "list each block's actual height, then center the entire stack". One source of truth, no manual `MARGIN_TOP`.
|
||||
|
||||
## 4. Footer is always pinned, never advanced
|
||||
|
||||
Don't route the footer through the cursor — it has its own rail.
|
||||
|
||||
```python
|
||||
def add_footer(slide, left_text, right_text, theme="dark"):
|
||||
color = COLOR_PAPER_60 if theme == "dark" else COLOR_INK_60
|
||||
add_text(slide,
|
||||
left=CONTENT_LEFT, top=FOOTER_TOP,
|
||||
width=CONTENT_W / 2, height=FOOTER_H,
|
||||
text=left_text, font=FONT_MONO, size_pt=9,
|
||||
color=color, align="left", letter_spacing=2.0)
|
||||
add_text(slide,
|
||||
left=CANVAS_W / 2, top=FOOTER_TOP,
|
||||
width=CONTENT_W / 2, height=FOOTER_H,
|
||||
text=right_text, font=FONT_MONO, size_pt=9,
|
||||
color=color, align="right", letter_spacing=2.0)
|
||||
```
|
||||
|
||||
`add_chrome` is the same idea pinned at `CHROME_TOP`. Both rails sit *outside* the content area, so they never collide with the cursor.
|
||||
|
||||
## 5. Box height ≠ text height — but tight is better than loose
|
||||
|
||||
PowerPoint draws shape bounds visibly when:
|
||||
|
||||
- Two shapes overlap (selection halos in editor, faint anti-alias seam in presentation mode).
|
||||
- A shape with a fill or border crosses the rail.
|
||||
- Z-order conflicts cause one shape to clip another.
|
||||
|
||||
So even when the *text* fits within the content area, an oversized *box* can intrude. Tighten box height to:
|
||||
|
||||
```
|
||||
box_h = (n_lines * line_height_pt + 2 * pad_pt) / 72
|
||||
```
|
||||
|
||||
where `pad_pt` is 2–4 pt (≈ 0.03–0.05"). For multi-line text frames, set `text_frame.word_wrap = True` and don't pad vertically — let the text frame's intrinsic metrics size itself.
|
||||
|
||||
For headline blocks with a known line count, you can also set:
|
||||
|
||||
```python
|
||||
tf = shape.text_frame
|
||||
tf.auto_size = MSO_AUTO_SIZE.SHAPE_TO_FIT_TEXT
|
||||
```
|
||||
|
||||
Then read `shape.height` *after* adding text to find the actual height for the cursor.
|
||||
|
||||
## 6. Italic preservation — only EN serif, never CJK
|
||||
|
||||
The single most common silent regression. HTML `<em>`, `<i>`, and inline `font-style: italic` should all map to `run.font.italic = True`. But:
|
||||
|
||||
- **EN/Latin display copy** (Playfair Display, Source Serif) has a real italic. Use it.
|
||||
- **CJK display copy** (Noto Serif TC, Source Han Serif) has no italic. Synthesizing produces a slanted bitmap that looks broken. Skip italic for CJK runs even if the HTML had `<em>` around the CJK text.
|
||||
- **EN body copy** can use sans italic if the body family supports it; if not, swap to serif italic for the duration of the run.
|
||||
|
||||
```python
|
||||
def add_run(p, text, *, font, size_pt, italic=False, bold=False, color=None):
|
||||
r = p.add_run()
|
||||
r.text = text
|
||||
# If italic is requested, force an EN serif that supports it.
|
||||
if italic:
|
||||
r.font.name = FONT_SERIF_EN if not _is_cjk(text) else font
|
||||
r.font.italic = not _is_cjk(text)
|
||||
else:
|
||||
r.font.name = font
|
||||
r.font.italic = False
|
||||
r.font.size = Pt(size_pt)
|
||||
r.font.bold = bool(bold)
|
||||
if color is not None:
|
||||
r.font.color.rgb = color
|
||||
return r
|
||||
|
||||
def _is_cjk(s):
|
||||
return any('\u4e00' <= c <= '\u9fff' or '\u3040' <= c <= '\u30ff' for c in s)
|
||||
```
|
||||
|
||||
When walking HTML, detect italic spans:
|
||||
|
||||
```python
|
||||
from html.parser import HTMLParser
|
||||
|
||||
class ItalicSpans(HTMLParser):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.italic_depth = 0
|
||||
self.runs = [] # list of (text, italic_bool)
|
||||
self._buf = []
|
||||
self._italic = False
|
||||
|
||||
def handle_starttag(self, tag, attrs):
|
||||
if tag in ("em", "i"):
|
||||
self._flush()
|
||||
self.italic_depth += 1
|
||||
self._italic = True
|
||||
elif tag == "span":
|
||||
style = dict(attrs).get("style", "")
|
||||
if "italic" in style:
|
||||
self._flush()
|
||||
self.italic_depth += 1
|
||||
self._italic = True
|
||||
|
||||
def handle_endtag(self, tag):
|
||||
if tag in ("em", "i", "span") and self.italic_depth > 0:
|
||||
self._flush()
|
||||
self.italic_depth -= 1
|
||||
self._italic = self.italic_depth > 0
|
||||
|
||||
def handle_data(self, data):
|
||||
self._buf.append(data)
|
||||
|
||||
def _flush(self):
|
||||
if self._buf:
|
||||
self.runs.append(("".join(self._buf), self._italic))
|
||||
self._buf = []
|
||||
```
|
||||
|
||||
## 7. Slide-type recipes
|
||||
|
||||
### 7.1 Cover / hero with vertical center
|
||||
|
||||
```python
|
||||
def slide_cover(prs, *, title, subtitle, lead, meta, chrome_l, chrome_r):
|
||||
slide = prs.slides.add_slide(blank_layout)
|
||||
paint_bg(slide, COLOR_INK)
|
||||
add_chrome(slide, chrome_l, chrome_r, theme="dark")
|
||||
|
||||
blocks = [
|
||||
(Inches(0.18), Inches(0.32)), # kicker
|
||||
(Inches(1.50), Inches(0.18)), # h-hero
|
||||
(Inches(0.45), Inches(0.36)), # h-sub
|
||||
(Inches(0.70), Inches(0.30)), # lead
|
||||
(Inches(0.20), Inches(0.00)), # meta
|
||||
]
|
||||
c = hero_layout(blocks)
|
||||
add_kicker(slide, top=c.take(*blocks[0]), text="SOP · Coach Edition")
|
||||
add_h_hero(slide, top=c.take(*blocks[1]), text=title)
|
||||
add_h_sub(slide, top=c.take(*blocks[2]), text=subtitle)
|
||||
add_lead(slide, top=c.take(*blocks[3]), text=lead)
|
||||
add_meta_row(slide, top=c.take(*blocks[4]), items=meta)
|
||||
|
||||
add_footer(slide, "主責教練 SOP", "— 2026 —", theme="dark")
|
||||
```
|
||||
|
||||
### 7.2 Content with pipeline (4–5 step horizontal flow)
|
||||
|
||||
```python
|
||||
def slide_pipeline(prs, *, kicker, headline, intro, label, steps):
|
||||
slide = prs.slides.add_slide(blank_layout)
|
||||
paint_bg(slide, COLOR_PAPER)
|
||||
add_chrome(slide, "On-Day · Coach Actions", "08 / 14", theme="light")
|
||||
|
||||
c = Cursor()
|
||||
add_kicker(slide, top=c.take(Inches(0.18), label="kicker"), text=kicker)
|
||||
add_h_xl(slide, top=c.take(Inches(0.95), label="h-xl"), text=headline)
|
||||
add_lead(slide, top=c.take(Inches(0.65), label="lead"), text=intro)
|
||||
add_pipeline(slide,
|
||||
top=c.take(Inches(2.30), label="pipeline"),
|
||||
section_label=label,
|
||||
steps=steps,
|
||||
n_cols=len(steps))
|
||||
|
||||
add_footer(slide, "Page 08 · 教練當天行動", "Witness, don't intervene", theme="light")
|
||||
```
|
||||
|
||||
`add_pipeline` internally lays out N step cards across `CONTENT_W` with `step_h` derived from the longest step's text height. Don't fix `step_h` to a constant — let it grow to fit, and let the cursor's overflow guard catch problems.
|
||||
|
||||
### 7.3 Two-column comparison / concern cards
|
||||
|
||||
```python
|
||||
def slide_two_col(prs, *, kicker, headline, intro, left, right):
|
||||
slide = prs.slides.add_slide(blank_layout)
|
||||
paint_bg(slide, COLOR_INK)
|
||||
add_chrome(slide, "First-Time Caveats · 首辦提醒", "05 / 14", theme="dark")
|
||||
|
||||
c = Cursor()
|
||||
add_kicker(slide, top=c.take(Inches(0.18)), text=kicker)
|
||||
add_h_xl(slide, top=c.take(Inches(0.95)), text=headline)
|
||||
add_lead(slide, top=c.take(Inches(0.55)), text=intro)
|
||||
pair_top = c.take(Inches(3.00), label="pair")
|
||||
col_w = (CONTENT_W - Inches(0.4)) / 2
|
||||
add_concern_card(slide, left=CONTENT_LEFT, top=pair_top, w=col_w, h=Inches(2.9), data=left)
|
||||
add_concern_card(slide, left=CONTENT_LEFT + col_w + Inches(0.4), top=pair_top, w=col_w, h=Inches(2.9), data=right)
|
||||
|
||||
add_footer(slide, "Page 05 · 首次辦理特別提醒", "典禮 ≠ 領導日", theme="dark")
|
||||
```
|
||||
|
||||
Notice the pattern: `c.take(Inches(3.00), label="pair")` reserves 3.0" of vertical space for *the whole pair row*; then the two columns are placed side-by-side at that `top`. The cursor doesn't know about columns, only about row heights.
|
||||
|
||||
### 7.4 Observation grid (3 × 2 cards)
|
||||
|
||||
```python
|
||||
def slide_obs_grid(prs, *, kicker, headline, intro, cards):
|
||||
assert len(cards) == 6
|
||||
slide = prs.slides.add_slide(blank_layout)
|
||||
paint_bg(slide, COLOR_PAPER)
|
||||
add_chrome(slide, "Observation · 觀察筆記", "09 / 14", theme="light")
|
||||
|
||||
c = Cursor()
|
||||
add_kicker(slide, top=c.take(Inches(0.18)), text=kicker)
|
||||
add_h_xl(slide, top=c.take(Inches(0.95)), text=headline)
|
||||
add_lead(slide, top=c.take(Inches(0.55)), text=intro)
|
||||
grid_top = c.take(Inches(2.40), label="3x2 grid")
|
||||
|
||||
col_w = (CONTENT_W - Inches(0.6)) / 3
|
||||
row_h = Inches(1.10)
|
||||
for i, card in enumerate(cards):
|
||||
col = i % 3
|
||||
row = i // 3
|
||||
x = CONTENT_LEFT + col * (col_w + Inches(0.3))
|
||||
y = grid_top + row * (row_h + Inches(0.20))
|
||||
add_obs_card(slide, left=x, top=y, w=col_w, h=row_h, data=card)
|
||||
|
||||
add_footer(slide, "Page 09 · 觀察筆記六項指標", "記錄用 · 不當場評分", theme="light")
|
||||
```
|
||||
|
||||
## 8. Common pitfalls and how the discipline catches them
|
||||
|
||||
| Pitfall | How the discipline catches it |
|
||||
|---|---|
|
||||
| Hero slide stuck to top | `hero_layout(blocks)` budgets total height and centers automatically |
|
||||
| Last content block crosses footer | `Cursor.take()` raises `OverflowError` before render |
|
||||
| Box bounds intrude on rail | tighten `box_h` to text height + 0.05" pad; verifier flags violations |
|
||||
| Italic gone flat | `add_run(..., italic=True)` swaps to EN serif; CJK skipped |
|
||||
| Footer text overlaps content | footer pinned at `FOOTER_TOP`, never routed through cursor |
|
||||
| Chrome row drifts down on long titles | chrome pinned at `CHROME_TOP`, never advanced |
|
||||
| Off-canvas content | `verify_layout.py` asserts `top + height ≤ CANVAS_H` |
|
||||
| Mixed font fallback | always pass `font=FONT_*` constant; never let python-pptx pick |
|
||||
@@ -0,0 +1,2 @@
|
||||
__pycache__/
|
||||
*.pyc
|
||||
+134
@@ -0,0 +1,134 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Extract every shape on every slide of a .pptx into a JSON dump.
|
||||
|
||||
Usage:
|
||||
python extract_pptx.py <path/to/deck.pptx> # prints to stdout
|
||||
python extract_pptx.py <path/to/deck.pptx> -o dump.json
|
||||
|
||||
The dump captures the *actual* state of the export — text content, position,
|
||||
size, and per-run typography (font name, size, bold, italic, color). Use this
|
||||
as the ground truth for the fidelity audit; do not trust the export script's
|
||||
intent.
|
||||
|
||||
Coordinates are reported in inches (rounded to 3 decimals) so they're
|
||||
human-readable when comparing against rails like CONTENT_MAX_Y = 6.70".
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
from pptx import Presentation
|
||||
from pptx.util import Emu
|
||||
except ImportError:
|
||||
sys.stderr.write(
|
||||
"python-pptx is required. Install with: pip install python-pptx\n"
|
||||
)
|
||||
sys.exit(2)
|
||||
|
||||
|
||||
def emu_to_in(emu: int | None) -> float | None:
|
||||
if emu is None:
|
||||
return None
|
||||
return round(emu / 914400, 3)
|
||||
|
||||
|
||||
def color_repr(color) -> str | None:
|
||||
"""Best-effort color extraction. Returns hex string or None."""
|
||||
if color is None:
|
||||
return None
|
||||
try:
|
||||
# ColorFormat.type may be None when no explicit color is set.
|
||||
if color.type is None:
|
||||
return None
|
||||
rgb = color.rgb
|
||||
if rgb is None:
|
||||
return None
|
||||
return f"#{str(rgb).lower()}"
|
||||
except (AttributeError, ValueError, TypeError):
|
||||
return None
|
||||
|
||||
|
||||
def extract_runs(text_frame) -> list[dict]:
|
||||
runs = []
|
||||
for para in text_frame.paragraphs:
|
||||
for run in para.runs:
|
||||
font = run.font
|
||||
runs.append({
|
||||
"text": run.text,
|
||||
"font": font.name,
|
||||
"size_pt": float(font.size.pt) if font.size is not None else None,
|
||||
"bold": bool(font.bold) if font.bold is not None else None,
|
||||
"italic": bool(font.italic) if font.italic is not None else None,
|
||||
# Color is independent of font name/size: a run can inherit
|
||||
# font from the theme yet set its own color. Color drift is
|
||||
# one of the things this audit needs to catch, so don't gate
|
||||
# the extraction on unrelated font attributes.
|
||||
"color": color_repr(font.color),
|
||||
})
|
||||
return runs
|
||||
|
||||
|
||||
def extract_shape(shape) -> dict:
|
||||
data = {
|
||||
"name": shape.name,
|
||||
"shape_type": str(shape.shape_type) if shape.shape_type is not None else None,
|
||||
"left_in": emu_to_in(shape.left),
|
||||
"top_in": emu_to_in(shape.top),
|
||||
"width_in": emu_to_in(shape.width),
|
||||
"height_in": emu_to_in(shape.height),
|
||||
}
|
||||
if shape.left is not None and shape.height is not None and shape.top is not None:
|
||||
data["bottom_in"] = emu_to_in(shape.top + shape.height)
|
||||
data["right_in"] = emu_to_in(shape.left + shape.width)
|
||||
if shape.has_text_frame:
|
||||
tf = shape.text_frame
|
||||
data["text"] = tf.text
|
||||
data["runs"] = extract_runs(tf)
|
||||
return data
|
||||
|
||||
|
||||
def extract_pptx(path: Path) -> dict:
|
||||
prs = Presentation(str(path))
|
||||
canvas = {
|
||||
"width_in": emu_to_in(prs.slide_width),
|
||||
"height_in": emu_to_in(prs.slide_height),
|
||||
}
|
||||
slides = []
|
||||
for i, slide in enumerate(prs.slides, 1):
|
||||
shapes = [extract_shape(s) for s in slide.shapes]
|
||||
slides.append({"index": i, "shapes": shapes})
|
||||
return {
|
||||
"source": str(path),
|
||||
"canvas": canvas,
|
||||
"slide_count": len(slides),
|
||||
"slides": slides,
|
||||
}
|
||||
|
||||
|
||||
def main() -> int:
|
||||
ap = argparse.ArgumentParser(description=__doc__.split("\n\n")[0])
|
||||
ap.add_argument("path", type=Path, help=".pptx file to extract")
|
||||
ap.add_argument("-o", "--output", type=Path, help="write JSON to this path; default stdout")
|
||||
args = ap.parse_args()
|
||||
|
||||
if not args.path.exists():
|
||||
ap.error(f"file not found: {args.path}")
|
||||
|
||||
data = extract_pptx(args.path)
|
||||
payload = json.dumps(data, ensure_ascii=False, indent=2)
|
||||
if args.output:
|
||||
args.output.write_text(payload, encoding="utf-8")
|
||||
sys.stderr.write(f"wrote {args.output} ({len(payload)} bytes, {data['slide_count']} slides)\n")
|
||||
else:
|
||||
sys.stdout.write(payload)
|
||||
sys.stdout.write("\n")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
+144
@@ -0,0 +1,144 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Verify a re-exported .pptx against footer-rail + canvas-bound invariants.
|
||||
|
||||
Usage:
|
||||
python verify_layout.py <path/to/deck.pptx>
|
||||
python verify_layout.py <path/to/deck.pptx> --content-max-y 6.70 --canvas-h 7.5
|
||||
|
||||
Exits 0 on no violations, 1 on any violation. Prints a single block of
|
||||
violations sorted by slide index, one per line:
|
||||
|
||||
slide 5 shape 'desc-row-B-1' bottom 7.214" crosses footer rail 6.70"
|
||||
slide 11 shape 'note-paragraph' bottom 7.342" exceeds canvas 7.50"
|
||||
|
||||
Use this as the gate for "this re-export is shippable". Don't claim the audit
|
||||
is fixed without running this script — the human eye misses 1–2 mm overflow
|
||||
at zoom-out, the script doesn't.
|
||||
|
||||
Footer / chrome shapes are exempt from the content rail. Two heuristics
|
||||
identify them, in this order:
|
||||
|
||||
1. **By name** — any shape whose name contains "footer", "foot", "chrome",
|
||||
"page", or "pagination" (case-insensitive). Use semantic names in your
|
||||
export script if you can.
|
||||
2. **By position** — any shape whose `top` is at or below the footer-zone
|
||||
threshold (default `--footer-zone-top 6.80`). This catches python-pptx's
|
||||
auto-generated names like "TextBox 3" when the export script didn't name
|
||||
them. The threshold sits ~0.10" above FOOTER_TOP so chrome rows pinned
|
||||
exactly at FOOTER_TOP are still recognized.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
from pptx import Presentation
|
||||
except ImportError:
|
||||
sys.stderr.write(
|
||||
"python-pptx is required. Install with: pip install python-pptx\n"
|
||||
)
|
||||
sys.exit(2)
|
||||
|
||||
|
||||
FOOTER_NAME_HINTS = ("footer", "foot", "chrome", "page", "pagination")
|
||||
EPS_IN = 0.005 # ignore sub-pixel overflows (~0.13mm)
|
||||
|
||||
|
||||
def is_footer_by_name(name: str) -> bool:
|
||||
n = (name or "").lower()
|
||||
return any(hint in n for hint in FOOTER_NAME_HINTS)
|
||||
|
||||
|
||||
def emu_to_in(emu: int | None) -> float:
|
||||
return (emu or 0) / 914400
|
||||
|
||||
|
||||
def verify(path: Path, content_max_y: float, canvas_w: float, canvas_h: float,
|
||||
footer_zone_top: float) -> list[str]:
|
||||
prs = Presentation(str(path))
|
||||
violations: list[str] = []
|
||||
|
||||
actual_w = emu_to_in(prs.slide_width)
|
||||
actual_h = emu_to_in(prs.slide_height)
|
||||
if abs(actual_w - canvas_w) > EPS_IN or abs(actual_h - canvas_h) > EPS_IN:
|
||||
violations.append(
|
||||
f"canvas mismatch: file is {actual_w:.3f}\" x {actual_h:.3f}\", "
|
||||
f"expected {canvas_w}\" x {canvas_h}\""
|
||||
)
|
||||
|
||||
for i, slide in enumerate(prs.slides, 1):
|
||||
for shape in slide.shapes:
|
||||
if shape.top is None or shape.height is None:
|
||||
continue
|
||||
top = emu_to_in(shape.top)
|
||||
left = emu_to_in(shape.left)
|
||||
bottom = top + emu_to_in(shape.height)
|
||||
right = left + emu_to_in(shape.width)
|
||||
name = shape.name or "<unnamed>"
|
||||
|
||||
# Off-canvas (hard fail for any shape).
|
||||
if bottom > canvas_h + EPS_IN:
|
||||
violations.append(
|
||||
f"slide {i:<2} shape '{name}' bottom {bottom:.3f}\" "
|
||||
f"exceeds canvas {canvas_h}\""
|
||||
)
|
||||
if right > canvas_w + EPS_IN:
|
||||
violations.append(
|
||||
f"slide {i:<2} shape '{name}' right {right:.3f}\" "
|
||||
f"exceeds canvas width {canvas_w}\""
|
||||
)
|
||||
if top < -EPS_IN:
|
||||
violations.append(
|
||||
f"slide {i:<2} shape '{name}' top {top:.3f}\" is negative"
|
||||
)
|
||||
if left < -EPS_IN:
|
||||
violations.append(
|
||||
f"slide {i:<2} shape '{name}' left {left:.3f}\" is negative"
|
||||
)
|
||||
|
||||
# Footer rail (only enforced on content shapes).
|
||||
# Shape is exempt if (a) named like a footer, or
|
||||
# (b) pinned at-or-below the footer zone threshold.
|
||||
if is_footer_by_name(name) or top >= footer_zone_top - EPS_IN:
|
||||
continue
|
||||
if bottom > content_max_y + EPS_IN:
|
||||
violations.append(
|
||||
f"slide {i:<2} shape '{name}' bottom {bottom:.3f}\" "
|
||||
f"crosses footer rail {content_max_y}\""
|
||||
)
|
||||
|
||||
return violations
|
||||
|
||||
|
||||
def main() -> int:
|
||||
ap = argparse.ArgumentParser(description=__doc__.split("\n\n")[0])
|
||||
ap.add_argument("path", type=Path, help=".pptx file to verify")
|
||||
ap.add_argument("--content-max-y", type=float, default=6.70,
|
||||
help="content rail in inches; nothing in content area may cross (default 6.70)")
|
||||
ap.add_argument("--canvas-w", type=float, default=13.333,
|
||||
help="expected canvas width in inches (default 13.333 = 16:9)")
|
||||
ap.add_argument("--canvas-h", type=float, default=7.5,
|
||||
help="expected canvas height in inches (default 7.5 = 16:9)")
|
||||
ap.add_argument("--footer-zone-top", type=float, default=6.80,
|
||||
help="any shape with top >= this is treated as footer/chrome "
|
||||
"(default 6.80; sits 0.10\" above the typical FOOTER_TOP=6.85\")")
|
||||
args = ap.parse_args()
|
||||
|
||||
if not args.path.exists():
|
||||
ap.error(f"file not found: {args.path}")
|
||||
|
||||
violations = verify(args.path, args.content_max_y, args.canvas_w, args.canvas_h,
|
||||
args.footer_zone_top)
|
||||
if violations:
|
||||
sys.stderr.write("\n".join(violations) + "\n")
|
||||
sys.stderr.write(f"\n{len(violations)} violation(s) found in {args.path}\n")
|
||||
return 1
|
||||
sys.stderr.write(f"OK: 0 violations across all slides in {args.path}\n")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
Reference in New Issue
Block a user