Skip to content

Text-to-Speech

The Arc XP Audio API can generate spoken audio from text input using text-to-speech (TTS). This is useful for producing narrated versions of articles, accessibility audio, or any content where you need synthesized speech.

Prerequisites

  • An Arc XP API token (see the Developer Center).
  • At least one voice configured in your organization’s TTS settings (see step 1).

1. List Available Voices

Audio API comes with some preset voices configured. If you’d like to configure these voices yourself, browse the voices available to your organization:

Terminal window
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/settings/voices

The response includes metadata for each voice:

{
"voices": [
{
"id": "voice_abc123",
"name": "Rachel",
"use_case": "narration",
"gender": "female",
"accent": "american",
"age": "young",
"description": "A clear, warm voice ideal for news narration.",
"preview_url": "https://...",
"supported_languages": ["en", "es", "fr"]
}
]
}

Note the id field — you’ll need it when generating speech.

2. Preview a Voice

Before creating a full audio clip, you can generate a short preview (~10 seconds) to audition a voice. No audio clip record is created.

Terminal window
curl -X POST https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/tts/preview \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"input_text": "This is a short preview of what this voice sounds like.",
"voice_id": "voice_abc123"
}'

The response contains a uri pointing to the generated preview audio file:

{
"uri": "https://..."
}

3. Generate a Full Audio Clip

Once you’ve chosen a voice, create an audio clip with TTS in a single request. This creates the clip record and starts speech synthesis:

Terminal window
curl -X POST https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/clips/tts \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "Article Narration: Breaking News Story",
"description": "TTS narration of the breaking news article.",
"tags": ["narration", "news"],
"input_text": "The full text of the article you want narrated goes here. It can be up to 30,000 characters.",
"voice_id": "voice_abc123"
}'

The API returns 202 Accepted with the new clip’s ID and a Location header:

{
"content_id": "abc123def456"
}

Use that returned content_id as the {audio_id} placeholder in later clip endpoints, including publish.

Monitoring Progress with SSE

TTS generation is asynchronous. To stream real-time progress updates, include the Accept: text/event-stream header:

Terminal window
curl -X POST https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/clips/tts \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Accept: text/event-stream" \
-H "Content-Type: application/json" \
-d '{
"title": "Article Narration",
"input_text": "The full text goes here...",
"voice_id": "voice_abc123"
}'

The SSE stream emits progress notifications such as tts_started, tts_generating_speech, encoding_started, and encoding_complete (or tts_failed / encoding_failed).

4. Publish the Clip

Once the clip reaches READY state, publish it to make it available for delivery:

Terminal window
curl -X POST https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/clips/{audio_id}/publish \
-H "Authorization: Bearer YOUR_API_TOKEN"

Pronunciation Dictionaries

If your content includes names, technical terms, or brand names that the TTS engine mispronounces, you can configure a pronunciation dictionary at the organization level.

Pronunciation rules use IPA (International Phonetic Alphabet) phonemes to define how specific words should be pronounced.

Configure Pronunciation Rules

Terminal window
curl -X PATCH https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/settings/ \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"tts_settings": {
"pronunciation_dictionary": {
"rules": [
{
"grapheme": ["Arc XP", "ArcXP"],
"phoneme": "ɑːrk ɛks piː"
},
{
"grapheme": ["GIF"],
"phoneme": "ÉĄÉŞf"
}
]
}
}
}'

Each rule maps one or more grapheme strings (the text as written) to a phoneme (how it should be spoken). The dictionary is applied automatically to all future TTS operations for your organization.

Configure Voices

You can also set which voices are available to your organization through the settings endpoint:

Terminal window
curl -X PATCH https://api.[org].arcpublishing.com/audiocenter/api/editorial/v1/settings/ \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"tts_settings": {
"voices": [
{ "voice_id": "voice_abc123" },
{ "voice_id": "voice_def456" }
]
}
}'

TTS Fields Reference

Create TTS Clip (POST /clips/tts)

FieldRequiredDescription
titleYesClip title (3–500 characters).
input_textYesThe text to synthesize (up to 30,000 characters).
voice_idYesID of the voice to use (from /settings/voices).
descriptionNoClip description (up to 4,000 characters).
tagsNoTags for organization and filtering.
circulationNoSite ownership details.

Preview Voice (POST /tts/preview)

FieldRequiredDescription
input_textYesSample text to synthesize (up to 500 characters, ~10 seconds).
voice_idYesID of the voice to audition.