Text-to-Speech
The Arc XP Audio API can generate spoken audio from text input using text-to-speech (TTS). This is useful for producing narrated versions of articles, accessibility audio, or any content where you need synthesized speech.
Prerequisites
- An Arc XP API token (see the Developer Center).
- At least one voice configured in your organizationâs TTS settings (see step 1).
1. List Available Voices
Audio API comes with some preset voices configured. If youâd like to configure these voices yourself, browse the voices available to your organization:
curl -H "Authorization: Bearer YOUR_API_TOKEN" \ https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/settings/voicesThe response includes metadata for each voice:
{ "voices": [ { "id": "voice_abc123", "name": "Rachel", "use_case": "narration", "gender": "female", "accent": "american", "age": "young", "description": "A clear, warm voice ideal for news narration.", "preview_url": "https://...", "supported_languages": ["en", "es", "fr"] } ]}Note the id field â youâll need it when generating speech.
2. Preview a Voice
Before creating a full audio clip, you can generate a short preview (~10 seconds) to audition a voice. No audio clip record is created.
curl -X POST https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/tts/preview \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "input_text": "This is a short preview of what this voice sounds like.", "voice_id": "voice_abc123" }'The response contains a uri pointing to the generated preview audio file:
{ "uri": "https://..."}3. Generate a Full Audio Clip
Once youâve chosen a voice, create an audio clip with TTS in a single request. This creates the clip record and starts speech synthesis:
curl -X POST https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/clips/tts \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "title": "Article Narration: Breaking News Story", "description": "TTS narration of the breaking news article.", "tags": ["narration", "news"], "input_text": "The full text of the article you want narrated goes here. It can be up to 30,000 characters.", "voice_id": "voice_abc123" }'The API returns 202 Accepted with the new clipâs ID and a Location header:
{ "content_id": "abc123def456"}Use that returned content_id as the {audio_id} placeholder in later clip endpoints, including publish.
Monitoring Progress with SSE
TTS generation is asynchronous. To stream real-time progress updates, include the Accept: text/event-stream header:
curl -X POST https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/clips/tts \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Accept: text/event-stream" \ -H "Content-Type: application/json" \ -d '{ "title": "Article Narration", "input_text": "The full text goes here...", "voice_id": "voice_abc123" }'The SSE stream emits progress notifications such as tts_started, tts_generating_speech, encoding_started, and encoding_complete (or tts_failed / encoding_failed).
4. Publish the Clip
Once the clip reaches READY state, publish it to make it available for delivery:
curl -X POST https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/clips/{audio_id}/publish \ -H "Authorization: Bearer YOUR_API_TOKEN"Pronunciation Dictionaries
If your content includes names, technical terms, or brand names that the TTS engine mispronounces, you can configure a pronunciation dictionary at the organization level.
Pronunciation rules use IPA (International Phonetic Alphabet) phonemes to define how specific words should be pronounced.
Configure Pronunciation Rules
curl -X PATCH https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/settings/ \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "tts_settings": { "pronunciation_dictionary": { "rules": [ { "grapheme": ["Arc XP", "ArcXP"], "phoneme": "ÉËrk Éks piË" }, { "grapheme": ["GIF"], "phoneme": "ÉĄÉŞf" } ] } } }'Each rule maps one or more grapheme strings (the text as written) to a phoneme (how it should be spoken). The dictionary is applied automatically to all future TTS operations for your organization.
Configure Voices
You can also set which voices are available to your organization through the settings endpoint:
curl -X PATCH https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/settings/ \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "tts_settings": { "voices": [ { "voice_id": "voice_abc123" }, { "voice_id": "voice_def456" } ] } }'TTS Fields Reference
Create TTS Clip (POST /clips/tts)
| Field | Required | Description |
|---|---|---|
title | Yes | Clip title (3â500 characters). |
input_text | Yes | The text to synthesize (up to 30,000 characters). |
voice_id | Yes | ID of the voice to use (from /settings/voices). |
description | No | Clip description (up to 4,000 characters). |
tags | No | Tags for organization and filtering. |
circulation | No | Site ownership details. |
Preview Voice (POST /tts/preview)
| Field | Required | Description |
|---|---|---|
input_text | Yes | Sample text to synthesize (up to 500 characters, ~10 seconds). |
voice_id | Yes | ID of the voice to audition. |