first-commit
ci / Validate workspace (push) Has been cancelled
landing-page-ci / Validate landing page (push) Has been cancelled
landing-page-deploy / Deploy landing page (push) Has been cancelled
github-metrics / Generate repository metrics SVG (push) Has been cancelled
refresh-contributors-wall / Refresh contributors wall cache bust (push) Waiting to run

This commit is contained in:
Zakaria
2026-05-04 14:58:14 -04:00
commit a46764fb1b
1210 changed files with 233231 additions and 0 deletions
+254
View File
@@ -0,0 +1,254 @@
---
name: pptx-html-fidelity-audit
description: Audit a python-pptx export against its source HTML deck, identify layout/content drift (footer overflow, cropped content, missing italic/em, lost styling, off-rhythm spacing), and re-export with strict footer-rail + cursor-flow layout discipline. Use this skill whenever the user has a .pptx that was generated from an HTML slide deck and asks to compare/audit/verify/fix the export — including phrases like "compare ppt with html", "fidelity audit", "fix the pptx", "ppt is cut off", "footer overlap", "italic missing in pptx", "re-export the deck", "pptx-html-fidelity-audit", or any case where a python-pptx → HTML round-trip needs verification or repair. Also trigger when the user shows you a deck.html and a deck.pptx side by side and is debugging visual differences.
triggers:
- "pptx fidelity"
- "pptx audit"
- "ppt 跑掉"
- "字型不對"
- "footer overlap"
- "verify pptx"
- "html to pptx"
od:
mode: utility
scenario: engineering
---
# PPTX ↔ HTML Fidelity Audit
A repeatable workflow for catching the ways a `python-pptx` export silently drifts from its HTML source — and fixing them with a layout discipline that prevents the same regressions on the next pass.
## When this skill applies
The user has:
- A source HTML slide deck (typically a single-file deck with `<section class="slide">` blocks):
```html
<section class="slide light">
<div class="chrome">2026 · Q2 review</div>
<span class="kicker">Pillar 03</span>
<h2 class="h-xl">Shipping <em>velocity</em> doubled</h2>
<p class="lead">…</p>
<div class="foot">page 5 / 14</div>
</section>
```
- A PPTX file generated from that deck via python-pptx (or similar).
- A suspicion (or visible evidence) that the PPTX doesn't match the HTML — text bleeding into the footer, italic words gone flat, hero slides not centered, sections cropped, tag styling lost.
If the user only has *one* of those two artifacts, this skill doesn't apply yet — first generate the missing one, or ask the user to provide it.
## Why this is hard (and why a skill helps)
PPTX is a fixed-canvas, absolute-positioned medium. HTML is a fluid, flow-based medium. A naive python-pptx export pins each block at hand-picked `(top, left)` coordinates, which works for the *first slide it was tested on* and silently fails for every other slide whose content has different intrinsic height. The result is the most common drift modes:
1. **Footer overflow** — content's `top + height` crosses into the footer row.
2. **Off-canvas content** — bottom of last block exceeds `7.5"` (16:9 canvas).
3. **Italic loss** — `<em>` in HTML never gets `run.font.italic = True`.
4. **Hero slides not centered** — vertical-stack slides use `MARGIN_TOP` instead of computing center.
5. **Box bounds intruding** — the text fits, but the *shape's bounding box* is oversized and visually crosses the rail.
6. **Tag/styling loss** — colored chrome rows, kicker uppercase tracking, mono-vs-serif assignments quietly fall back to defaults.
Every one of these is a *layout discipline* problem, not a content problem. Once you adopt the discipline, they stop happening.
---
## Workflow
The audit is five steps. Don't skip any of them — the discipline only works if the audit produces a real list of issues to drive the re-export. A fix-without-audit pass tends to leave half the issues alive.
### Step 1 — Extract ground truth from the PPTX
Run `scripts/extract_pptx.py <path-to.pptx> > pptx_dump.json`. The script walks every shape on every slide and dumps text, position (`top` / `left`), size (`width` / `height`), and per-run typography (font name, size pt, bold, italic, color). This is the *actual* state of the export — don't trust the export script's intent, trust the dump.
For 14-slide decks, the dump is ~3060 KB and human-readable.
### Step 2 — Walk the HTML structure
Read the source HTML and enumerate `<section class="slide">` blocks. For each, note:
- The slide's theme (`light` / `dark` / `hero light` / `hero dark`).
- The `chrome` row text (top metadata).
- The `kicker` (small uppercase eyebrow above the headline).
- The headline (h-hero / h-xl / etc.) and any sub-head.
- The body copy and any structured blocks (pipeline steps, cards, pillars, observation cards).
- The `foot` row (bottom metadata).
- Any `<em>` or italic-styled spans — italic is the silent regression.
Map each HTML slide to a PPTX slide index. For decks following the convention "slide 1 = cover, slide N = closing", the mapping is positional.
### Step 3 — Build the audit table
For each slide, walk shapes from the dump and check against expected layout rules. Use this exact table format — the severity column is what drives the fix priority:
```
| Slide | Issue | Severity |
|---|---|---|
| 1 cover | meta-row 底端 6.95" 蓋過 footer (6.7") | 🔴 |
| 5 checklist | row B 步驟描述底端 7.2" 切到 footer | 🔴 |
| 8 3E | 收束段落直接坐在 footer 起點 | 🔴 |
| 9 on-day | step 描述底端剛好碰 footer,無安全距 | 🟠 |
| 多處 | em (Playfair italic) 未保留 | 🟡 |
```
Severity rubric:
- 🔴 **critical** — content cropped, text invisible, footer overlap, off-canvas. Must fix.
- 🟠 **high** — content visible but visual hierarchy broken, no breathing room, hero not centered. Should fix.
- 🟡 **medium** — italic/em missing, font fallback wrong, color drift. Fix in this pass.
- 🟢 **low** — minor spacing/alignment, sub-pixel offsets. Note but don't block.
After the table, write a short root-cause section: 90 % of the issues usually come from 23 systemic causes (e.g. "no footer rail enforced", "hero stacks pinned to MARGIN_TOP instead of centered", "italic never propagated"). Naming the systemic causes makes the re-export script much smaller and more correct.
### Step 4 — Re-export with footer-rail + cursor-flow layout discipline
This is the load-bearing technique. See `references/layout-discipline.md` for the full rules; the summary:
**Define the rails up front, once, for the whole deck:**
```python
from pptx.util import Inches
CANVAS_W = Inches(13.333) # 16:9
CANVAS_H = Inches(7.5)
MARGIN_X = Inches(0.6)
MARGIN_TOP = Inches(0.5)
CONTENT_MAX_Y = Inches(6.70) # NOTHING in content area may cross this
FOOTER_TOP = Inches(6.85) # footer row pinned here, edge-to-edge
```
> **Customizing the rails.** The defaults above suit a 16:9 canvas with a slim footer. If your design system uses a wider footer or a 4:3 canvas, override these constants in your export script and pass the same values to `verify_layout.py` via `--content-max-y` / `--canvas-h` / `--canvas-w`. See `references/layout-discipline.md` §1 for the full constant table.
**Use a cursor for content blocks instead of pinning each block at an absolute y:**
```python
class Cursor:
"""Advances down the slide; refuses to cross the footer rail."""
def __init__(self, y_start, cap=CONTENT_MAX_Y):
self.y = y_start
self.cap = cap
def take(self, h, gap=Inches(0.12)): # ~1 line of whitespace at 14pt; tighten/loosen per design system
top = self.y
self.y = top + h + gap
if self.y > self.cap:
raise OverflowError(
f"cursor at {self.y} exceeds footer rail {self.cap}; "
f"reduce block height or split slide"
)
return top
```
For each slide, instantiate `Cursor(MARGIN_TOP)` and `take(height)` each block in reading order. The slide refuses to render if any block would cross the rail, so overflows become loud build errors instead of silent visual bugs.
**Hero (vertically-centered) slides use a budget instead of a cursor:**
```python
def hero_layout(blocks):
"""blocks = list of (height, gap_after) tuples in reading order."""
total = sum(h + g for h, g in blocks)
y_start = (CANVAS_H - total) / 2
return Cursor(y_start)
```
That single change kills "hero slide content sticks to top" — the most common hero defect.
**Tighten box height to fit text + minimal padding.** PowerPoint reveals shape bounds when they overlap (selection halos, Z-order conflicts), and an oversized box can visually cross the footer rail even when the text inside doesn't. Compute box height from text metrics + ~0.05" pad, not from generous wrappers.
**Preserve italic / em explicitly:**
```python
def add_run(p, text, font, size_pt, italic=False, bold=False, color=None):
r = p.add_run()
r.text = text
r.font.name = font
r.font.size = Pt(size_pt)
r.font.italic = italic
r.font.bold = bold
if color:
r.font.color.rgb = color
return r
```
When walking HTML, detect `<em>` / `<i>` / inline style `font-style: italic` and pass `italic=True`. Use the EN serif face (Playfair Display, Source Serif, or fallback Georgia) for italic display copy — the CJK serif typically has no italic and looks broken if you try to italicize it.
For deeper font issues that the layout rails can't catch — variable-font traps where PowerPoint silently swaps to Calibri / Microsoft JhengHei, missing `<a:ea>` slot causing CJK runs to fall back, fake-italic on Han characters — read `references/font-discipline.md`. The five layers there cover everything `verify_layout.py` can't see.
### Step 5 — Verify post-export
After writing the new `.pptx`, run `scripts/verify_layout.py <path-to.pptx>`. The script:
- Walks every shape on every slide.
- Asserts `top + height ≤ CONTENT_MAX_Y` for content shapes (footer/page-number shapes are allowed below the rail).
- Asserts `top + height ≤ CANVAS_H` for all shapes (no off-canvas).
- Asserts `left + width ≤ CANVAS_W` and `left ≥ 0`.
- Reports violations as a single block: slide index, shape name, observed bottom, rail.
Zero violations is the gate for "this re-export is shippable". Don't claim the audit is fixed without running the verifier — the human eye misses 12 mm overflow at zoom-out, the script doesn't.
---
## Output to the user
After Step 5 passes, report:
1. **Audit table** — the table from Step 3.
2. **Root causes** — 1-paragraph systemic explanation.
3. **Fix list** — terse list of what was changed and why (e.g. "hero slides switched to budget centering", "all content blocks routed through Cursor", "em runs explicitly italic").
4. **Verification** — "0 rail violations across N slides, file size X KB".
5. **Path** — absolute path to the re-exported `.pptx`.
The user is reading for two reasons: confirming the visible bugs are fixed, and trusting the systemic fix is right. Cover both.
---
## Bundled resources
- `scripts/extract_pptx.py` — dump every shape on every slide as JSON. Run before the audit. **Important:** also run on the *original* export to compare, and on the *re-exported* one to confirm.
- `scripts/verify_layout.py` — post-export rail checker. Returns nonzero exit code on violations so it slots into a CI pipeline if needed.
- `references/layout-discipline.md` — the full footer-rail + cursor-flow rule set with code snippets for each common slide type (hero, content, pipeline, two-column, observation grid).
- `references/font-discipline.md` — five-layer font audit: mapping, presence, variable-vs-static traps, the three XML language slots (`latin` / `ea` / `cs`), CJK + Latin italic interaction.
- `references/audit-table-template.md` — copy-pasteable table template with severity legend.
Read the references when:
- The deck has slide types beyond what the SKILL.md covers (multi-column dashboards, embedded images, charts) → `layout-discipline.md`.
- The audit shows 🟡 typography issues — italic missing, CJK falling back, unexpected `Calibri` / `Microsoft JhengHei` in the XML → `font-discipline.md`.
- You want to drop the audit table directly into a report or markdown deliverable → `audit-table-template.md`.
---
## Anti-patterns to avoid
- **Patching individual slides without naming the systemic cause.** If you fix slide 5 by lowering its block by 0.2", you'll be back fixing slide 9, 11, and 14 next. Find the rule that produced all four problems.
- **Trusting the original export script's intent.** Always run the extractor against the actual file. Drift between intent and reality is the bug.
- **Skipping verification because "it looked fine in PowerPoint preview".** Preview anti-aliasing hides 12 mm overflows. The script doesn't.
- **Italicizing scripts that have no italic tradition.** CJK, Arabic, Hebrew, Devanagari, Thai, and Khmer all produce a synthesized slant when forced into `italic=True`, and the result looks mechanically deformed. Italicize *only* runs whose primary script supports italic — Latin, Cyrillic, Greek. See `references/font-discipline.md` Layer 5 for the implementation pattern.
- **Using `MARGIN_TOP` for hero slides.** Hero slides need *budget centering*, not top-anchored. This is the most common hero defect and the cheapest to fix.
---
## Why geometry-based verification, not visual diff
An earlier iteration of this skill leaned on visual diffing — render the
.pptx through Keynote → PDF → PNG, screenshot the HTML through Chrome
headless, stitch them side-by-side with `magick`. It worked, but with
three sharp drawbacks:
- **Platform lock-in.** Keynote AppleScript is macOS-only; `magick` and
font-discovery commands vary across OSes; CI pipelines on Linux can't
reproduce the chain.
- **Imprecision.** A 1-2 mm overflow gets anti-aliased away in a PNG
preview. The human eye misses it; the script catches it as a hard
numeric violation.
- **Setup cost.** Every contributor needs the full graphics toolchain
installed before they can audit. Geometry checks need only
`python-pptx`.
Geometry-based verification gives up one thing the visual diff is good
at: catching cases where shape positions are correct but the rendered
glyph looks wrong (font fallback, kerning bugs, missing weight). When
that case appears, fall back to a manual screenshot review — the
five-layer audit in `references/font-discipline.md` covers most of the
underlying causes.
@@ -0,0 +1,58 @@
# Audit Table Template
Drop-in markdown template for the Step-3 audit deliverable. Keep the column order and severity legend stable across audits — readers learn to scan for 🔴 first.
## Template
```markdown
**Fidelity audit · `<deck-name>` · <date>**
| Slide | Issue | Severity |
|---|---|---|
| 1 cover | meta-row 底端 6.95" 蓋過 footer (6.7") | 🔴 |
| 2 principle | meta-row 蓋 footer | 🔴 |
| 5 checklist | row B 步驟描述底端 7.2" 切到 footer | 🔴 |
| 8 3E | 收束段落直接坐在 footer 起點 | 🔴 |
| 9 on-day | step 描述底端剛好碰 footer,無安全距 | 🟠 |
| 10 obs | row 2 obs-card 底端 6.95" 切 footer | 🔴 |
| 11 P&D | Note 段底端 7.34" 完全壓在 footer 之下 | 🔴 |
| 13 deliv. | pipeline 描述底端 7.05" 切 footer | 🔴 |
| 14 closing | meta-row 底端 7.24" 壓到 footer 之外 | 🔴 |
| 多處 | em (Playfair italic)、特殊字級對比未保留 | 🟡 |
**Root causes**
1. **No footer rail enforced.** Content blocks pinned at hand-picked y-coordinates; the script had no `CONTENT_MAX_Y` invariant, so `top + height` silently crossed `6.7"` whenever the content was taller than the test slide.
2. **Hero slides anchored at `MARGIN_TOP`.** Vertical centering was done by eye; cover and chapter-intro slides drift down as block heights vary.
3. **Italic propagation skipped.** `<em>` spans in HTML mapped to plain runs; the EN serif italic identity was lost across all hero slides.
**Fix plan**
- Introduce `CONTENT_MAX_Y = 6.70"` and `FOOTER_TOP = 6.85"` as module-level constants.
- Route all content blocks through a `Cursor` that refuses to cross the rail.
- Switch hero slides to `hero_layout(blocks)` — compute total stack height, center on canvas.
- Tighten `desc_h` (pipeline `0.85"`, checklist `0.65"`) to fit text + 0.05" pad.
- Add `italic=True` path in `add_run()` that swaps to EN serif for italic Latin runs; skip italic for CJK.
- Add post-export `verify_layout.py` step; require zero rail violations.
```
## Severity legend (reproduce inline in reports)
```markdown
- 🔴 **critical** — content cropped, text invisible, footer overlap, off-canvas. Must fix.
- 🟠 **high** — content visible but visual hierarchy broken, no breathing room. Should fix.
- 🟡 **medium** — italic/em missing, font fallback wrong, color drift. Fix in this pass.
- 🟢 **low** — minor spacing/alignment, sub-pixel offsets. Note but don't block.
```
## Verification footer (append after re-export)
```markdown
**Verification**
- ✅ 0 rail violations across 14 slides
- ✅ All shapes within canvas (`top + height ≤ 7.5"`, `left + width ≤ 13.333"`)
- ✅ Italic preserved on all `<em>` runs (EN serif), skipped on CJK runs
- ✅ Hero slides centered (cover, 03 act-i, 06 act-ii, 11 act-iii, 13 closing)
- File: `<absolute-path>.pptx` · 54.7 KB
```
@@ -0,0 +1,363 @@
# Font Discipline for PPTX Exports
Companion to `layout-discipline.md`. The rail / cursor primitives in that
file catch geometric drift; this file catches the typography drift that
geometry can't see — variable-font traps, missing CJK slots, fake italic
on Han characters. These are the bugs that pass `verify_layout.py` and
still look wrong.
Read this when:
- The audit table has 🟡 entries about italic / em / font fallback.
- PowerPoint silently swaps to Calibri / Arial / Microsoft JhengHei /
Georgia after you specified a different family.
- `unzip pptx | grep typeface` shows a face that isn't in your design system.
## Layer 1 — Font mapping in the export script
Walk each CSS class used by the source HTML and confirm the export
script maps it to the **same** font family.
⚠️ **Trap:** the visual category your eye reads is not always the
class's semantic category. Editorial decks routinely bind `.lead`,
`.callout`, or `.q-big` to a serif face, not the sans-serif you'd guess
from "lead". Open the HTML's CSS, read the `font-family` declaration
for each class, and copy the literal family name into the export's
font table.
Don't rely on visual intuition; rely on grep.
> **Coverage gap for Latin-slot scripts (Cyrillic / Greek / Vietnamese).**
> Russian / Ukrainian / Greek runs go through `<a:latin>`, not `<a:ea>` —
> they use the Latin slot. Many display fonts (Playfair Display, Source
> Serif 4) ship with weak or missing Cyrillic / Greek glyphs, and most
> drop Vietnamese Extended diacritics (ếẫỡỗ). PowerPoint silently falls
> back to Calibri / Times New Roman per missing glyph, producing
> mid-paragraph face shifts that look like a styling bug.
>
> When mapping a CSS class to a Latin font, check the font actually
> covers your scripts:
>
> ```bash
> # macOS / Linux: list the unicode blocks a font supports
> fc-query -f '%{charset}\n' "$(fc-match -f '%{file}\n' 'Playfair Display')" | head
> ```
>
> ```powershell
> # Windows: PowerShell + System.Drawing reads the registered family list
> [System.Reflection.Assembly]::LoadWithPartialName("System.Drawing") | Out-Null
> $f = New-Object System.Drawing.Text.PrivateFontCollection
> # Coverage detail (Unicode ranges) is best read in fontforge:
> # File → Open → pick the .ttf / .otf → Element → Font Info → OS/2 → Unicode Ranges.
> ```
>
> Cross-platform fallback: open the font in fontforge → Element → Font Info → OS/2 → Unicode Ranges.
>
> If coverage is missing, either swap to a face that has it (e.g.
> Inter / IBM Plex Sans for Cyrillic; Be Vietnam Pro for Vietnamese) or
> set a different `<a:latin>` per language run.
## Layer 2 — Font presence on the rendering machine
PowerPoint uses the OS font cache. If the family name in your XML isn't
installed, PowerPoint silently falls back. Check:
```bash
fc-list | grep -i "noto serif" # Linux / WSL
mdfind "kMDItemFSName == '*NotoSerif*'" # macOS
```
```powershell
# Windows (PowerShell)
Get-ChildItem -Path "$env:WINDIR\Fonts","$env:LOCALAPPDATA\Microsoft\Windows\Fonts" `
-Filter "*NotoSerif*" -ErrorAction SilentlyContinue
```
Install missing families:
```bash
brew install --cask \
font-noto-serif-tc \
font-playfair-display \
font-source-serif-4 \
font-ibm-plex-mono
```
The `verify_layout.py` script can't see this — it only checks
geometry. A standalone font audit step is required.
## Layer 3 — Variable fonts vs. static families ← most common trap
Modern fonts often ship as a **single variable file** containing all
weights (`NotoSerifTC[wght].ttf`). Looks elegant, but PowerPoint Mac /
Windows have spotty support:
- macOS reports the variable font's family name as its **default static
instance** — usually ExtraLight or Regular.
- PowerPoint asks the OS for "Noto Serif TC, weight 700"; the OS
reports the family as `Noto Serif TC ExtraLight`; PowerPoint can't
match → falls back to a system serif.
Diagnose:
```bash
ls -la ~/Library/Fonts/ | grep -i NotoSerif
```
| What you see | Verdict |
| -------------------------------------- | --------------------------------------- |
| One `*[wght].ttf` file | Variable. PowerPoint may not match. |
| Multiple `*-Regular.otf`, `*-Bold.otf` | Static family. Safe. |
Fix by using the static family equivalent:
| Don't use (variable) | Use instead (static) |
| --------------------------- | --------------------------------- |
| `Noto Serif TC` (variable) | `Noto Serif CJK TC` |
| `Source Serif 4` (variable) | `Source Serif Pro` / `Source Serif 4` static instances |
| `Inter` (variable) | Per-weight `Inter Regular` / `Inter Bold` |
After fixing the export, re-run `extract_pptx.py` and confirm the
`font` field matches the static name.
## Layer 4 — PPTX XML's three-language slots
PowerPoint chooses a typeface per run by language script. Each run can
declare three:
| Attribute | Used for |
| ----------------------- | -------------------------------- |
| `<a:latin typeface=…>` | Latin script (a-z, A-Z, digits) |
| `<a:ea typeface=…>` | East Asian (CJK) — **Chinese / Japanese / Korean go here** |
| `<a:cs typeface=…>` | Complex script (Arabic, Hebrew, Thai) |
Audit a file:
```bash
unzip -o /path/to/deck.pptx -d /tmp/audit
grep -h -oE 'typeface="[^"]+"' /tmp/audit/ppt/slides/slide*.xml | sort -u
```
Expected output: only the design-system fonts. If you see
`Microsoft JhengHei`, `Calibri`, `Arial`, `Georgia`, `Consolas`,
something has fallen back.
**Common defect:** export script writes `<a:latin>` only. Chinese runs
have no `<a:ea>` directive → PowerPoint picks the OS default
(Microsoft JhengHei on Windows, Hiragino Sans on Mac). Result: Chinese
characters in the wrong serif/sans family.
Fix: when adding a run with mixed-language content, set all three
attributes that apply.
```python
from pptx.oxml.ns import qn
def set_run_fonts(run, latin: str | None = None, ea: str | None = None, cs: str | None = None):
rPr = run._r.get_or_add_rPr()
if latin:
el = rPr.find(qn('a:latin'))
if el is None:
el = rPr.makeelement(qn('a:latin'), {})
rPr.append(el)
el.set('typeface', latin)
if ea:
el = rPr.find(qn('a:ea'))
if el is None:
el = rPr.makeelement(qn('a:ea'), {})
rPr.append(el)
el.set('typeface', ea)
if cs:
el = rPr.find(qn('a:cs'))
if el is None:
el = rPr.makeelement(qn('a:cs'), {})
rPr.append(el)
el.set('typeface', cs)
```
PptxGenJS sets all three by default; raw XML injection or python-pptx
without explicit `ea` slot does not.
## Layer 5 — Italic + script interaction
🚨 **`italic=True` is a Latin-script feature.** Apply it only to runs
whose characters belong to scripts where italic is part of the writing
tradition (Latin, Cyrillic, Greek). For everything else — CJK, Arabic,
Hebrew, Devanagari, Thai, Khmer — PowerPoint synthesizes a slanted
bitmap that looks mechanically deformed. The chain of failures, using
CJK as the canonical example:
1. `<a:latin>` slot has Playfair Display Italic (a Latin-only font).
2. The CJK characters in the run have no glyph in Playfair → PowerPoint
substitutes a system CJK font.
3. The substituted CJK font is forced into `italic=True` → since no
real CJK italic exists, PowerPoint synthesizes a slanted bitmap →
characters look mechanically deformed.
The same pattern triggers for Arabic, Hebrew, Devanagari, and Thai —
none of these scripts has an italic tradition, and faking it produces
a slant that's visually broken.
**Rule:** italic only applies to runs whose primary script supports it
(Latin / Cyrillic / Greek). Indicate emphasis on other scripts via:
- color tone (`COLOR_INK_60` for muted, full ink for emphasis)
- weight contrast (Regular 400 vs. Bold 700)
- a script-native italic variant **only if one actually ships** — most
don't
Practical implementation:
```python
# Unicode ranges where italic should be suppressed.
# Principle: include scripts whose writing tradition has no italic style.
# Synthesized italic on these scripts produces a slanted bitmap that looks
# mechanically deformed.
NO_ITALIC_RANGES = (
(0x3400, 0x9FFF), # CJK Unified Ideographs
(0xF900, 0xFAFF), # CJK Compatibility Ideographs
(0x3040, 0x30FF), # Hiragana + Katakana
(0xAC00, 0xD7AF), # Hangul Syllables
(0x0590, 0x05FF), # Hebrew
(0x0600, 0x06FF), # Arabic
(0x0750, 0x077F), # Arabic Supplement
# Indic scripts — none have an italic tradition; PowerPoint synthesizes
# a fake slant on all of them. Add new ranges here when the deck mixes
# in additional scripts (e.g. Sinhala U+0D80U+0DFF).
(0x0900, 0x097F), # Devanagari (Hindi, Marathi, Sanskrit)
(0x0980, 0x09FF), # Bengali
(0x0A00, 0x0A7F), # Gurmukhi (Punjabi)
(0x0A80, 0x0AFF), # Gujarati
(0x0B00, 0x0B7F), # Oriya
(0x0B80, 0x0BFF), # Tamil
(0x0C00, 0x0C7F), # Telugu
(0x0C80, 0x0CFF), # Kannada
(0x0D00, 0x0D7F), # Malayalam
# Southeast Asian
(0x0E00, 0x0E7F), # Thai
(0x0E80, 0x0EFF), # Lao
(0x1780, 0x17FF), # Khmer
)
def has_no_italic_script(text: str) -> bool:
return any(
any(lo <= ord(c) <= hi for lo, hi in NO_ITALIC_RANGES)
for c in text
)
def add_run_with_italic_safety(p, text, *, latin_face: str, ea_face: str,
cs_face: str | None, size_pt: int,
italic: bool, **kwargs):
"""Drop italic if the run contains characters from scripts without italic tradition.
Args:
latin_face: Font for Latin / Cyrillic / Greek runs (a:latin slot).
ea_face: Font for CJK runs (a:ea slot).
cs_face: Font for complex scripts — Arabic, Hebrew, Devanagari,
Thai, etc. (a:cs slot). Pass None when the run contains no
complex-script characters; set_run_fonts skips the slot.
"""
r = p.add_run()
r.text = text
r.font.size = Pt(size_pt)
r.font.italic = italic and not has_no_italic_script(text)
set_run_fonts(r, latin=latin_face, ea=ea_face, cs=cs_face)
return r
```
For mixed-script runs (e.g. `"In <em>2026</em> 開始"`), split into
multiple runs at language boundaries so the italic attribute can apply
to the Latin run only.
## Beyond CJK — other scripts
The five layers above are written in CJK examples because that's the
most common pairing in Open Design today, but the same machinery
applies to other scripts. Quick reference:
| Script family | XML slot | Italic OK? | Most common defect | Recommended faces |
| ------------------------ | ---------- | ---------- | ----------------------------------------------------------------------------------- | ------------------------------------------------ |
| Latin (en, de, es, vi…) | `a:latin` | ✅ | Vietnamese Extended diacritics dropped → fallback Calibri mid-paragraph | Be Vietnam Pro, IBM Plex Sans, Source Sans 3 |
| Cyrillic (ru, uk, bg) | `a:latin` | ✅ | Display fonts (Playfair, Source Serif) lack Cyrillic → fallback Calibri | Inter, IBM Plex Sans, Roboto |
| Greek (el) | `a:latin` | ✅ | Same as Cyrillic — display faces missing Greek → fallback | Inter, IBM Plex Sans |
| CJK (zh, ja, ko) | `a:ea` | ❌ | Variable-font trap (Layer 3); missing `a:ea` slot → fallback Microsoft JhengHei | Noto Sans CJK *, Source Han Sans, IBM Plex Sans JP |
| Arabic / Hebrew / Persian | `a:cs` | ❌ | `<a:rtl val="1"/>` not set → text direction breaks; kashida changes width | Noto Naskh Arabic, IBM Plex Sans Arabic, Amiri |
| Devanagari / Bengali | `a:cs` | ❌ | PowerPoint defaults to Mangal/Vrinda (low fidelity); cluster shaping bumps line height | Noto Sans Devanagari, Mukta, Hind |
| Thai / Lao / Khmer | `a:cs` | ❌ | No inter-word spaces → PowerPoint's break engine produces poor wraps; tone marks bump line height | Noto Sans Thai, Sarabun, Noto Sans Khmer |
For RTL scripts (Arabic / Hebrew / Persian), set both `<a:cs typeface=…>`
and `<a:rtl val="1"/>` on the run's `rPr`. Right-alignment, bidi text
flow, and chrome / footer mirroring are out of scope for `verify_layout.py`
today and need manual review — see the Tier 2 follow-up note in the
audit checklist.
> **RTL discipline scope.** Full RTL support is roughly 1520% of the
> font + layout discipline surface area: Unicode TR9 bidi resolution,
> chrome / footer / page-number mirroring, kashida (Arabic
> elongation) interaction with line-fill, and right-anchored
> alignment. This skill covers the typeface + slot mechanics only;
> bidi and mirroring are flagged for a Tier 2 `rtl-discipline.md`
> follow-up when fa / ar / he usage volume justifies the investment.
## Line height per script
The `Cursor.take(gap=Inches(0.12))` default suits 14pt Latin body copy.
Other scripts need more vertical headroom because of stacked diacritics,
matras, or tone marks:
| Script | Recommended `gap` at 14pt body |
| ---------------------------------------- | ------------------------------ |
| Latin (no Vietnamese Extended) | `Inches(0.12)` (default) |
| Latin (with Vietnamese Extended ếẫỗ) | `Inches(0.14)` |
| CJK | `Inches(0.140.16)` |
| Devanagari / Bengali (matras / conjuncts)| `Inches(0.160.18)` |
| Thai / Lao / Khmer (tone marks above) | `Inches(0.160.18)` |
| Arabic / Hebrew | `Inches(0.13)` |
When the deck mixes scripts, take the max — line breathing-room is
visual, an under-spaced Thai run in an otherwise Latin deck reads as
"the Thai slide is broken".
> **Source for these numbers.** Measured against Noto Sans / Noto
> Serif / IBM Plex line-height at 14pt body with full diacritic stacks
> (e.g. Devanagari conjuncts ष्ट्र, Thai 4-mark sequences ก़ํ้, stacked
> Vietnamese ỗ). Adjust downward for condensed faces (Inter Condensed,
> Noto Sans Condensed) and upward for display sizes ≥ 24pt where
> diacritic ratios grow.
## Audit checklist
After re-export, confirm all five layers:
- [ ] Layer 1: Each CSS class in the HTML maps to the intended family
in the export script's font table.
- [ ] Layer 2: All declared families exist on the rendering machine
(`fc-list | grep`).
- [ ] Layer 3: No variable-font filename pretending to be a static
family. `~/Library/Fonts/` shows multi-file static families for
every face used.
- [ ] Layer 4: `unzip + grep typeface` returns only the design-system
fonts. No `Microsoft JhengHei` / `Calibri` / `Arial` / `Georgia`
/ `Consolas` residue.
- [ ] Layer 5: No run from a no-italic script (CJK / Arabic / Hebrew /
Devanagari / Thai) has `italic=True` set with a Latin italic
face in the `<a:latin>` slot.
- [ ] **Beyond CJK:** RTL slides set `<a:rtl val="1"/>` on the
paragraph's `pPr` — verify with:
```bash
unzip -o deck.pptx -d /tmp/audit
grep -h '<a:rtl' /tmp/audit/ppt/slides/*.xml | sort -u
# Expect a hit for every fa / ar / he slide; empty output on
# an RTL deck means the directionality wasn't propagated.
```
Cursor `gap` is bumped per the line-height table above when the
deck includes Vietnamese, Devanagari, Thai, or Khmer content.
If all five pass and the user still reports "the type looks wrong",
ask for a screenshot pointing at the specific glyph or word — the
remaining bugs are usually license-restricted fonts not embedded into
the file (see `SKILL.md` Step 5 verification).
@@ -0,0 +1,371 @@
# Footer-Rail + Cursor-Flow Layout Discipline
The full rule set referenced from `SKILL.md` Step 4. Read this when the deck has slide types beyond simple title-+-body or when you're building the re-export script from scratch.
> **How to use this file.** Skim §1-3 once to internalize the rules
> (constants, `Cursor`, hero budget centering). Then jump to the slide-type
> snippet that matches what you're building — pipeline, two-column,
> observation grid, etc. — and adapt. The file is meant to be navigated,
> not read end-to-end.
## 1. Constants — define once at the top of the export script
```python
from pptx.util import Inches, Pt, Emu
from pptx.dml.color import RGBColor
# Canvas (16:9). Override only if the deck explicitly targets 4:3 or 1:1.
CANVAS_W = Inches(13.333)
CANVAS_H = Inches(7.5)
# Margins
MARGIN_X = Inches(0.6) # left / right symmetric
MARGIN_TOP = Inches(0.5) # below the chrome row
CONTENT_LEFT = MARGIN_X
CONTENT_RIGHT = CANVAS_W - MARGIN_X
CONTENT_W = CONTENT_RIGHT - CONTENT_LEFT
# Vertical rails — the load-bearing pair
CHROME_TOP = Inches(0.32) # top metadata row
CHROME_H = Inches(0.20)
CONTENT_TOP = MARGIN_TOP # cursor starts here on content slides
CONTENT_MAX_Y = Inches(6.70) # NOTHING in content area may cross
FOOTER_TOP = Inches(6.85) # foot row pinned here
FOOTER_H = Inches(0.22)
# Theme colors — derive from the HTML :root block, do not invent
COLOR_INK = RGBColor(0x0a, 0x1f, 0x3d) # dark theme background / light text color
COLOR_PAPER = RGBColor(0xf1, 0xf3, 0xf5) # light theme background / dark text color
COLOR_INK_60 = RGBColor(0x68, 0x77, 0x8e) # 60 % opacity ink (precomputed)
COLOR_PAPER_60 = RGBColor(0x9b, 0xa0, 0xa6) # 60 % opacity paper
# Typography stacks. EN italic uses serif-en; CJK never italicizes.
FONT_SERIF_EN = "Playfair Display"
FONT_SERIF_FB = "Source Serif 4"
FONT_SERIF_ZH = "Noto Serif TC"
FONT_SANS_ZH = "Noto Sans TC"
FONT_MONO = "IBM Plex Mono"
```
## 2. The Cursor primitive
Used on all non-hero slides. The cursor advances down the slide and refuses to cross `CONTENT_MAX_Y`.
```python
class Cursor:
def __init__(self, y_start=CONTENT_TOP, cap=CONTENT_MAX_Y):
self.y = y_start
self.cap = cap
self.history = [] # list of (top, height, label) for debugging
def take(self, h, gap=Inches(0.12), label=""):
top = self.y
self.y = top + h + gap
self.history.append((top, h, label))
if self.y > self.cap:
raise OverflowError(
f"Cursor exceeded rail at '{label}': "
f"y={self.y} cap={self.cap}; "
f"history={self.history}"
)
return top
def remaining(self):
return self.cap - self.y
```
Usage:
```python
c = Cursor()
add_kicker(slide, top=c.take(Inches(0.18), label="kicker"))
add_h_xl(slide, top=c.take(Inches(1.0), label="h-xl"))
add_lead(slide, top=c.take(Inches(0.8), label="lead"))
add_pipeline(slide, top=c.take(Inches(2.6), label="pipeline"))
```
> **Per-script `gap` tuning.** The default `Inches(0.12)` matches 14pt
> Latin body copy. Decks that include CJK, Devanagari, Thai, or
> Khmer need more breathing room — line clusters and stacked tone
> marks bump the rendered line height. Pass an explicit `gap=` per
> block, or override the `Cursor` default at the top of your export.
> The full per-script table is in
> [`font-discipline.md` § Line height per script](font-discipline.md).
>
> **Detecting the highest-demand script in a mixed deck.** A deck
> can mix `en` slides with `th` slides — locale alone isn't the
> signal. Scan each slide's text against the Unicode ranges in
> `font-discipline.md` Layer 5's `NO_ITALIC_RANGES` (extend with the
> Vietnamese Extended block U+1E00U+1EFF for ếẫỗ), record the
> per-slide max-gap, and instantiate the slide's `Cursor` with that
> value. For a uniform deck-wide setting, take the max across all
> slides.
If a slide raises `OverflowError`, fix one of three things:
1. **Reduce block height** — the box was generously sized; tighten to actual text height.
2. **Reduce gap** — the inter-block gap is excessive; trim from `0.18"` to `0.10"`.
3. **Split the slide** — the content genuinely doesn't fit; this is a design problem, not a layout problem.
Don't "solve" it by raising `CONTENT_MAX_Y`. The rail exists for a reason — content that crosses it will overlap the footer at full-screen presentation.
## 3. Hero slides — budget centering, not cursor flow
Hero slides (cover, chapter intros, big-quote pages) are vertically centered. The cursor model would put them at the top with empty space below — visually wrong.
```python
def hero_layout(blocks):
"""
blocks: list of (height, gap_after) tuples in top-to-bottom reading order.
Returns a Cursor whose y_start is computed so the stack is centered.
"""
total_h = sum(h + g for h, g in blocks)
y_start = (CANVAS_H - total_h) / 2
# Pin cap to bottom of available area so we still catch overflow.
return Cursor(y_start=y_start, cap=CANVAS_H - FOOTER_H - Inches(0.2))
```
Hero usage:
```python
# Plan the stack first.
HERO_BLOCKS = [
(Inches(0.18), Inches(0.30)), # kicker
(Inches(1.50), Inches(0.20)), # h-hero
(Inches(0.45), Inches(0.40)), # h-sub
(Inches(0.70), Inches(0.30)), # lead
(Inches(0.20), Inches(0.00)), # meta-row
]
c = hero_layout(HERO_BLOCKS)
for (h, g), block_fn in zip(HERO_BLOCKS, [k_kicker, k_hero, k_sub, k_lead, k_meta]):
block_fn(slide, top=c.take(h, gap=g))
```
The pattern reads as: "list each block's actual height, then center the entire stack". One source of truth, no manual `MARGIN_TOP`.
## 4. Footer is always pinned, never advanced
Don't route the footer through the cursor — it has its own rail.
```python
def add_footer(slide, left_text, right_text, theme="dark"):
color = COLOR_PAPER_60 if theme == "dark" else COLOR_INK_60
add_text(slide,
left=CONTENT_LEFT, top=FOOTER_TOP,
width=CONTENT_W / 2, height=FOOTER_H,
text=left_text, font=FONT_MONO, size_pt=9,
color=color, align="left", letter_spacing=2.0)
add_text(slide,
left=CANVAS_W / 2, top=FOOTER_TOP,
width=CONTENT_W / 2, height=FOOTER_H,
text=right_text, font=FONT_MONO, size_pt=9,
color=color, align="right", letter_spacing=2.0)
```
`add_chrome` is the same idea pinned at `CHROME_TOP`. Both rails sit *outside* the content area, so they never collide with the cursor.
## 5. Box height ≠ text height — but tight is better than loose
PowerPoint draws shape bounds visibly when:
- Two shapes overlap (selection halos in editor, faint anti-alias seam in presentation mode).
- A shape with a fill or border crosses the rail.
- Z-order conflicts cause one shape to clip another.
So even when the *text* fits within the content area, an oversized *box* can intrude. Tighten box height to:
```
box_h = (n_lines * line_height_pt + 2 * pad_pt) / 72
```
where `pad_pt` is 24 pt (≈ 0.030.05"). For multi-line text frames, set `text_frame.word_wrap = True` and don't pad vertically — let the text frame's intrinsic metrics size itself.
For headline blocks with a known line count, you can also set:
```python
tf = shape.text_frame
tf.auto_size = MSO_AUTO_SIZE.SHAPE_TO_FIT_TEXT
```
Then read `shape.height` *after* adding text to find the actual height for the cursor.
## 6. Italic preservation — only EN serif, never CJK
The single most common silent regression. HTML `<em>`, `<i>`, and inline `font-style: italic` should all map to `run.font.italic = True`. But:
- **EN/Latin display copy** (Playfair Display, Source Serif) has a real italic. Use it.
- **CJK display copy** (Noto Serif TC, Source Han Serif) has no italic. Synthesizing produces a slanted bitmap that looks broken. Skip italic for CJK runs even if the HTML had `<em>` around the CJK text.
- **EN body copy** can use sans italic if the body family supports it; if not, swap to serif italic for the duration of the run.
```python
def add_run(p, text, *, font, size_pt, italic=False, bold=False, color=None):
r = p.add_run()
r.text = text
# If italic is requested, force an EN serif that supports it.
if italic:
r.font.name = FONT_SERIF_EN if not _is_cjk(text) else font
r.font.italic = not _is_cjk(text)
else:
r.font.name = font
r.font.italic = False
r.font.size = Pt(size_pt)
r.font.bold = bool(bold)
if color is not None:
r.font.color.rgb = color
return r
def _is_cjk(s):
return any('\u4e00' <= c <= '\u9fff' or '\u3040' <= c <= '\u30ff' for c in s)
```
When walking HTML, detect italic spans:
```python
from html.parser import HTMLParser
class ItalicSpans(HTMLParser):
def __init__(self):
super().__init__()
self.italic_depth = 0
self.runs = [] # list of (text, italic_bool)
self._buf = []
self._italic = False
def handle_starttag(self, tag, attrs):
if tag in ("em", "i"):
self._flush()
self.italic_depth += 1
self._italic = True
elif tag == "span":
style = dict(attrs).get("style", "")
if "italic" in style:
self._flush()
self.italic_depth += 1
self._italic = True
def handle_endtag(self, tag):
if tag in ("em", "i", "span") and self.italic_depth > 0:
self._flush()
self.italic_depth -= 1
self._italic = self.italic_depth > 0
def handle_data(self, data):
self._buf.append(data)
def _flush(self):
if self._buf:
self.runs.append(("".join(self._buf), self._italic))
self._buf = []
```
## 7. Slide-type recipes
### 7.1 Cover / hero with vertical center
```python
def slide_cover(prs, *, title, subtitle, lead, meta, chrome_l, chrome_r):
slide = prs.slides.add_slide(blank_layout)
paint_bg(slide, COLOR_INK)
add_chrome(slide, chrome_l, chrome_r, theme="dark")
blocks = [
(Inches(0.18), Inches(0.32)), # kicker
(Inches(1.50), Inches(0.18)), # h-hero
(Inches(0.45), Inches(0.36)), # h-sub
(Inches(0.70), Inches(0.30)), # lead
(Inches(0.20), Inches(0.00)), # meta
]
c = hero_layout(blocks)
add_kicker(slide, top=c.take(*blocks[0]), text="SOP · Coach Edition")
add_h_hero(slide, top=c.take(*blocks[1]), text=title)
add_h_sub(slide, top=c.take(*blocks[2]), text=subtitle)
add_lead(slide, top=c.take(*blocks[3]), text=lead)
add_meta_row(slide, top=c.take(*blocks[4]), items=meta)
add_footer(slide, "主責教練 SOP", "— 2026 —", theme="dark")
```
### 7.2 Content with pipeline (45 step horizontal flow)
```python
def slide_pipeline(prs, *, kicker, headline, intro, label, steps):
slide = prs.slides.add_slide(blank_layout)
paint_bg(slide, COLOR_PAPER)
add_chrome(slide, "On-Day · Coach Actions", "08 / 14", theme="light")
c = Cursor()
add_kicker(slide, top=c.take(Inches(0.18), label="kicker"), text=kicker)
add_h_xl(slide, top=c.take(Inches(0.95), label="h-xl"), text=headline)
add_lead(slide, top=c.take(Inches(0.65), label="lead"), text=intro)
add_pipeline(slide,
top=c.take(Inches(2.30), label="pipeline"),
section_label=label,
steps=steps,
n_cols=len(steps))
add_footer(slide, "Page 08 · 教練當天行動", "Witness, don't intervene", theme="light")
```
`add_pipeline` internally lays out N step cards across `CONTENT_W` with `step_h` derived from the longest step's text height. Don't fix `step_h` to a constant — let it grow to fit, and let the cursor's overflow guard catch problems.
### 7.3 Two-column comparison / concern cards
```python
def slide_two_col(prs, *, kicker, headline, intro, left, right):
slide = prs.slides.add_slide(blank_layout)
paint_bg(slide, COLOR_INK)
add_chrome(slide, "First-Time Caveats · 首辦提醒", "05 / 14", theme="dark")
c = Cursor()
add_kicker(slide, top=c.take(Inches(0.18)), text=kicker)
add_h_xl(slide, top=c.take(Inches(0.95)), text=headline)
add_lead(slide, top=c.take(Inches(0.55)), text=intro)
pair_top = c.take(Inches(3.00), label="pair")
col_w = (CONTENT_W - Inches(0.4)) / 2
add_concern_card(slide, left=CONTENT_LEFT, top=pair_top, w=col_w, h=Inches(2.9), data=left)
add_concern_card(slide, left=CONTENT_LEFT + col_w + Inches(0.4), top=pair_top, w=col_w, h=Inches(2.9), data=right)
add_footer(slide, "Page 05 · 首次辦理特別提醒", "典禮 ≠ 領導日", theme="dark")
```
Notice the pattern: `c.take(Inches(3.00), label="pair")` reserves 3.0" of vertical space for *the whole pair row*; then the two columns are placed side-by-side at that `top`. The cursor doesn't know about columns, only about row heights.
### 7.4 Observation grid (3 × 2 cards)
```python
def slide_obs_grid(prs, *, kicker, headline, intro, cards):
assert len(cards) == 6
slide = prs.slides.add_slide(blank_layout)
paint_bg(slide, COLOR_PAPER)
add_chrome(slide, "Observation · 觀察筆記", "09 / 14", theme="light")
c = Cursor()
add_kicker(slide, top=c.take(Inches(0.18)), text=kicker)
add_h_xl(slide, top=c.take(Inches(0.95)), text=headline)
add_lead(slide, top=c.take(Inches(0.55)), text=intro)
grid_top = c.take(Inches(2.40), label="3x2 grid")
col_w = (CONTENT_W - Inches(0.6)) / 3
row_h = Inches(1.10)
for i, card in enumerate(cards):
col = i % 3
row = i // 3
x = CONTENT_LEFT + col * (col_w + Inches(0.3))
y = grid_top + row * (row_h + Inches(0.20))
add_obs_card(slide, left=x, top=y, w=col_w, h=row_h, data=card)
add_footer(slide, "Page 09 · 觀察筆記六項指標", "記錄用 · 不當場評分", theme="light")
```
## 8. Common pitfalls and how the discipline catches them
| Pitfall | How the discipline catches it |
|---|---|
| Hero slide stuck to top | `hero_layout(blocks)` budgets total height and centers automatically |
| Last content block crosses footer | `Cursor.take()` raises `OverflowError` before render |
| Box bounds intrude on rail | tighten `box_h` to text height + 0.05" pad; verifier flags violations |
| Italic gone flat | `add_run(..., italic=True)` swaps to EN serif; CJK skipped |
| Footer text overlaps content | footer pinned at `FOOTER_TOP`, never routed through cursor |
| Chrome row drifts down on long titles | chrome pinned at `CHROME_TOP`, never advanced |
| Off-canvas content | `verify_layout.py` asserts `top + height ≤ CANVAS_H` |
| Mixed font fallback | always pass `font=FONT_*` constant; never let python-pptx pick |
@@ -0,0 +1,2 @@
__pycache__/
*.pyc
+134
View File
@@ -0,0 +1,134 @@
#!/usr/bin/env python3
"""
Extract every shape on every slide of a .pptx into a JSON dump.
Usage:
python extract_pptx.py <path/to/deck.pptx> # prints to stdout
python extract_pptx.py <path/to/deck.pptx> -o dump.json
The dump captures the *actual* state of the export text content, position,
size, and per-run typography (font name, size, bold, italic, color). Use this
as the ground truth for the fidelity audit; do not trust the export script's
intent.
Coordinates are reported in inches (rounded to 3 decimals) so they're
human-readable when comparing against rails like CONTENT_MAX_Y = 6.70".
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
try:
from pptx import Presentation
from pptx.util import Emu
except ImportError:
sys.stderr.write(
"python-pptx is required. Install with: pip install python-pptx\n"
)
sys.exit(2)
def emu_to_in(emu: int | None) -> float | None:
if emu is None:
return None
return round(emu / 914400, 3)
def color_repr(color) -> str | None:
"""Best-effort color extraction. Returns hex string or None."""
if color is None:
return None
try:
# ColorFormat.type may be None when no explicit color is set.
if color.type is None:
return None
rgb = color.rgb
if rgb is None:
return None
return f"#{str(rgb).lower()}"
except (AttributeError, ValueError, TypeError):
return None
def extract_runs(text_frame) -> list[dict]:
runs = []
for para in text_frame.paragraphs:
for run in para.runs:
font = run.font
runs.append({
"text": run.text,
"font": font.name,
"size_pt": float(font.size.pt) if font.size is not None else None,
"bold": bool(font.bold) if font.bold is not None else None,
"italic": bool(font.italic) if font.italic is not None else None,
# Color is independent of font name/size: a run can inherit
# font from the theme yet set its own color. Color drift is
# one of the things this audit needs to catch, so don't gate
# the extraction on unrelated font attributes.
"color": color_repr(font.color),
})
return runs
def extract_shape(shape) -> dict:
data = {
"name": shape.name,
"shape_type": str(shape.shape_type) if shape.shape_type is not None else None,
"left_in": emu_to_in(shape.left),
"top_in": emu_to_in(shape.top),
"width_in": emu_to_in(shape.width),
"height_in": emu_to_in(shape.height),
}
if shape.left is not None and shape.height is not None and shape.top is not None:
data["bottom_in"] = emu_to_in(shape.top + shape.height)
data["right_in"] = emu_to_in(shape.left + shape.width)
if shape.has_text_frame:
tf = shape.text_frame
data["text"] = tf.text
data["runs"] = extract_runs(tf)
return data
def extract_pptx(path: Path) -> dict:
prs = Presentation(str(path))
canvas = {
"width_in": emu_to_in(prs.slide_width),
"height_in": emu_to_in(prs.slide_height),
}
slides = []
for i, slide in enumerate(prs.slides, 1):
shapes = [extract_shape(s) for s in slide.shapes]
slides.append({"index": i, "shapes": shapes})
return {
"source": str(path),
"canvas": canvas,
"slide_count": len(slides),
"slides": slides,
}
def main() -> int:
ap = argparse.ArgumentParser(description=__doc__.split("\n\n")[0])
ap.add_argument("path", type=Path, help=".pptx file to extract")
ap.add_argument("-o", "--output", type=Path, help="write JSON to this path; default stdout")
args = ap.parse_args()
if not args.path.exists():
ap.error(f"file not found: {args.path}")
data = extract_pptx(args.path)
payload = json.dumps(data, ensure_ascii=False, indent=2)
if args.output:
args.output.write_text(payload, encoding="utf-8")
sys.stderr.write(f"wrote {args.output} ({len(payload)} bytes, {data['slide_count']} slides)\n")
else:
sys.stdout.write(payload)
sys.stdout.write("\n")
return 0
if __name__ == "__main__":
raise SystemExit(main())
+144
View File
@@ -0,0 +1,144 @@
#!/usr/bin/env python3
"""
Verify a re-exported .pptx against footer-rail + canvas-bound invariants.
Usage:
python verify_layout.py <path/to/deck.pptx>
python verify_layout.py <path/to/deck.pptx> --content-max-y 6.70 --canvas-h 7.5
Exits 0 on no violations, 1 on any violation. Prints a single block of
violations sorted by slide index, one per line:
slide 5 shape 'desc-row-B-1' bottom 7.214" crosses footer rail 6.70"
slide 11 shape 'note-paragraph' bottom 7.342" exceeds canvas 7.50"
Use this as the gate for "this re-export is shippable". Don't claim the audit
is fixed without running this script the human eye misses 12 mm overflow
at zoom-out, the script doesn't.
Footer / chrome shapes are exempt from the content rail. Two heuristics
identify them, in this order:
1. **By name** any shape whose name contains "footer", "foot", "chrome",
"page", or "pagination" (case-insensitive). Use semantic names in your
export script if you can.
2. **By position** any shape whose `top` is at or below the footer-zone
threshold (default `--footer-zone-top 6.80`). This catches python-pptx's
auto-generated names like "TextBox 3" when the export script didn't name
them. The threshold sits ~0.10" above FOOTER_TOP so chrome rows pinned
exactly at FOOTER_TOP are still recognized.
"""
from __future__ import annotations
import argparse
import sys
from pathlib import Path
try:
from pptx import Presentation
except ImportError:
sys.stderr.write(
"python-pptx is required. Install with: pip install python-pptx\n"
)
sys.exit(2)
FOOTER_NAME_HINTS = ("footer", "foot", "chrome", "page", "pagination")
EPS_IN = 0.005 # ignore sub-pixel overflows (~0.13mm)
def is_footer_by_name(name: str) -> bool:
n = (name or "").lower()
return any(hint in n for hint in FOOTER_NAME_HINTS)
def emu_to_in(emu: int | None) -> float:
return (emu or 0) / 914400
def verify(path: Path, content_max_y: float, canvas_w: float, canvas_h: float,
footer_zone_top: float) -> list[str]:
prs = Presentation(str(path))
violations: list[str] = []
actual_w = emu_to_in(prs.slide_width)
actual_h = emu_to_in(prs.slide_height)
if abs(actual_w - canvas_w) > EPS_IN or abs(actual_h - canvas_h) > EPS_IN:
violations.append(
f"canvas mismatch: file is {actual_w:.3f}\" x {actual_h:.3f}\", "
f"expected {canvas_w}\" x {canvas_h}\""
)
for i, slide in enumerate(prs.slides, 1):
for shape in slide.shapes:
if shape.top is None or shape.height is None:
continue
top = emu_to_in(shape.top)
left = emu_to_in(shape.left)
bottom = top + emu_to_in(shape.height)
right = left + emu_to_in(shape.width)
name = shape.name or "<unnamed>"
# Off-canvas (hard fail for any shape).
if bottom > canvas_h + EPS_IN:
violations.append(
f"slide {i:<2} shape '{name}' bottom {bottom:.3f}\" "
f"exceeds canvas {canvas_h}\""
)
if right > canvas_w + EPS_IN:
violations.append(
f"slide {i:<2} shape '{name}' right {right:.3f}\" "
f"exceeds canvas width {canvas_w}\""
)
if top < -EPS_IN:
violations.append(
f"slide {i:<2} shape '{name}' top {top:.3f}\" is negative"
)
if left < -EPS_IN:
violations.append(
f"slide {i:<2} shape '{name}' left {left:.3f}\" is negative"
)
# Footer rail (only enforced on content shapes).
# Shape is exempt if (a) named like a footer, or
# (b) pinned at-or-below the footer zone threshold.
if is_footer_by_name(name) or top >= footer_zone_top - EPS_IN:
continue
if bottom > content_max_y + EPS_IN:
violations.append(
f"slide {i:<2} shape '{name}' bottom {bottom:.3f}\" "
f"crosses footer rail {content_max_y}\""
)
return violations
def main() -> int:
ap = argparse.ArgumentParser(description=__doc__.split("\n\n")[0])
ap.add_argument("path", type=Path, help=".pptx file to verify")
ap.add_argument("--content-max-y", type=float, default=6.70,
help="content rail in inches; nothing in content area may cross (default 6.70)")
ap.add_argument("--canvas-w", type=float, default=13.333,
help="expected canvas width in inches (default 13.333 = 16:9)")
ap.add_argument("--canvas-h", type=float, default=7.5,
help="expected canvas height in inches (default 7.5 = 16:9)")
ap.add_argument("--footer-zone-top", type=float, default=6.80,
help="any shape with top >= this is treated as footer/chrome "
"(default 6.80; sits 0.10\" above the typical FOOTER_TOP=6.85\")")
args = ap.parse_args()
if not args.path.exists():
ap.error(f"file not found: {args.path}")
violations = verify(args.path, args.content_max_y, args.canvas_w, args.canvas_h,
args.footer_zone_top)
if violations:
sys.stderr.write("\n".join(violations) + "\n")
sys.stderr.write(f"\n{len(violations)} violation(s) found in {args.path}\n")
return 1
sys.stderr.write(f"OK: 0 violations across all slides in {args.path}\n")
return 0
if __name__ == "__main__":
raise SystemExit(main())