first-commit
ci / Validate workspace (push) Has been cancelled
landing-page-ci / Validate landing page (push) Has been cancelled
landing-page-deploy / Deploy landing page (push) Has been cancelled
github-metrics / Generate repository metrics SVG (push) Has been cancelled
refresh-contributors-wall / Refresh contributors wall cache bust (push) Waiting to run
ci / Validate workspace (push) Has been cancelled
landing-page-ci / Validate landing page (push) Has been cancelled
landing-page-deploy / Deploy landing page (push) Has been cancelled
github-metrics / Generate repository metrics SVG (push) Has been cancelled
refresh-contributors-wall / Refresh contributors wall cache bust (push) Waiting to run
This commit is contained in:
@@ -0,0 +1,75 @@
|
||||
# Text-to-Speech
|
||||
|
||||
Generate speech audio locally using Kokoro-82M (no API key, runs on CPU).
|
||||
|
||||
## Voice Selection
|
||||
|
||||
Match voice to content. Default is `af_heart`.
|
||||
|
||||
| Content type | Voice | Why |
|
||||
| ------------- | --------------------- | -------------------------- |
|
||||
| Product demo | `af_heart`/`af_nova` | Warm, professional |
|
||||
| Tutorial | `am_adam`/`bf_emma` | Neutral, easy to follow |
|
||||
| Marketing | `af_sky`/`am_michael` | Energetic or authoritative |
|
||||
| Documentation | `bf_emma`/`bm_george` | Clear British English |
|
||||
| Casual | `af_heart`/`af_sky` | Approachable, natural |
|
||||
|
||||
Run `npx hyperframes tts --list` for all 54 voices (8 languages).
|
||||
|
||||
## Multilingual Phonemization
|
||||
|
||||
Kokoro voice IDs encode language in the first letter: `a`=American English, `b`=British English, `e`=Spanish, `f`=French, `h`=Hindi, `i`=Italian, `j`=Japanese, `p`=Brazilian Portuguese, `z`=Mandarin. The CLI auto-detects the phonemizer locale from that prefix — you don't need to pass `--lang` when the voice matches the text.
|
||||
|
||||
```bash
|
||||
npx hyperframes tts "La reunión empieza a las nueve" --voice ef_dora --output es.wav
|
||||
npx hyperframes tts "今日はいい天気ですね" --voice jf_alpha --output ja.wav
|
||||
```
|
||||
|
||||
Use `--lang` only to override auto-detection (e.g. stylized accents):
|
||||
|
||||
```bash
|
||||
npx hyperframes tts "Hello there" --voice af_heart --lang fr-fr --output accented.wav
|
||||
```
|
||||
|
||||
Valid `--lang` codes: `en-us`, `en-gb`, `es`, `fr-fr`, `hi`, `it`, `pt-br`, `ja`, `zh`.
|
||||
|
||||
Non-English phonemization requires `espeak-ng` installed system-wide (`brew install espeak-ng` on macOS, `apt-get install espeak-ng` on Debian/Ubuntu).
|
||||
|
||||
## Speed Tuning
|
||||
|
||||
- **0.7-0.8** — Tutorial, complex content
|
||||
- **1.0** — Natural pace (default)
|
||||
- **1.1-1.2** — Intros, upbeat content
|
||||
- **1.5+** — Rarely appropriate
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
npx hyperframes tts "Your script here" --voice af_nova --output narration.wav
|
||||
npx hyperframes tts script.txt --voice bf_emma --output narration.wav
|
||||
```
|
||||
|
||||
In compositions:
|
||||
|
||||
```html
|
||||
<audio
|
||||
id="narration"
|
||||
data-start="0"
|
||||
data-duration="auto"
|
||||
data-track-index="2"
|
||||
src="narration.wav"
|
||||
data-volume="1"
|
||||
></audio>
|
||||
```
|
||||
|
||||
## TTS + Captions Workflow
|
||||
|
||||
```bash
|
||||
npx hyperframes tts script.txt --voice af_heart --output narration.wav
|
||||
npx hyperframes transcribe narration.wav # → transcript.json with word-level timestamps
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.8+ with `kokoro-onnx` and `soundfile`
|
||||
- Model downloads on first use (~311 MB + ~27 MB voices, cached in `~/.cache/hyperframes/tts/`)
|
||||
Reference in New Issue
Block a user