Content Recommendations API Developer Guide

Introduction

This guide covers Arc XP’s Content Recommendations API, which delivers personalized content recommendations for your audience by learning from user behavior and your content catalog.

It is built to work with any Customer Data Platform (CDP) and any Content Management System (CMS), and deliver content recommendations on any content surface.

The Content Recommendations API is a personalization backend that connects your content catalog and audience behavior to a machine learning engine.

It provides two APIs:

API	Purpose	Base Path
Collector API	Send user behavior events and content updates	`/collector/v1`
Recommendations API	Retrieve personalized content recommendations	`/recommend/v1`

The Collector API accepts behavioral signals (page views, clicks, engagement) and content lifecycle events (publish, delete) from your applications (via your CDP) and CMS.

The Recommendations API is a read-only API that returns ranked, personalized content IDs for a given user. You can then fetch that content from your content management system, including ArcXP.

Each tenant (organization) operates with a fully isolated recommendation model. Your content, your audience data, and your trained model are never shared with other tenants.

Base URL

Both APIs are served from:

https://{org}-config-prod.api.arc-cdn.net/

Replace {org} with your organization identifier. The /collector/v1 and /recommend/v1 base paths below are appended to this host.

How it works

Content Recommendations API flow

You send content data in — Your CMS sends content updates via webhooks. For Arc XP customers, this is accomplished via IFX.
Your web/mobile apps send user behavior events (page views, clicks, engagement) through a data collection partner, such as a CDP.
The Content Recommendations API learns — The ML engine trains a personalization model on your content catalog and your audience’s behavior.
You fetch recommendations out — Your applications call the Recommendations API to get ranked content for a specific user.

Authentication

Both the Collector API and Recommendations API require a Delivery API token. These are separate from Developer Center tokens.

See Provisioning tokens through Delivery API for the full provisioning flow, including how to generate a bootstrap Developer Center token, create Delivery API tokens via POST /v1/access/keys, and assign them to key collections.

Critical requirements

These rules are non-negotiable. Violating them will either corrupt your recommendation model, leak sensitive data, or silently produce bad results.

Never send PII as the user identifier. user_id must be an anonymized token — not an email, name, phone number, or any directly identifying value. Hash or pseudonymize before sending. Arc XP does not sanitize user IDs on ingestion.
Keep the content catalog in sync. Only send published content via the Content endpoint, and send an action: "delete" payload the moment an item is unpublished or deleted in your CMS. Stale content in the model leads to recommendations pointing at dead pages.
Use the same identifier for content and events. The item_id on every content item and the item_id on every interaction event must refer to the same identifier. This is how the system connects user behavior to your content catalog — if they don’t match, the event cannot contribute to recommendations. Your item_id must also be:
- Stable. The same piece of content must always produce the same ID. Do not include query parameters (fbclid, UTM tags, session tokens), cache-busters, or any other transient values.
- A CMS ID or URL slug, not a full URL. Full URLs are fragile — protocol changes, domain changes, and tracking parameters all break the join.
- 256 characters or fewer.
Arc XP customers: use the content’s ArcID.
Do not send synthetic or test traffic against real site_id values. Load tests, QA scripts, and bot traffic poison the training signal and degrade recommendation quality for real users. Scope any test activity to a dedicated, non-production site_id.
Exclude internal employee traffic. Events from your own staff — editors QA-ing articles, newsroom staff browsing stories, engineers exercising the site — skew the behavioral signal away from real reader interests and degrade recommendation quality. Filter employee sessions out before sending to the Events endpoint.
Exclude bot and crawler traffic. Search crawlers, scrapers, and other automated agents do not represent real reader interest, and their access patterns (exhaustive crawls, repeated hits, no engagement depth) distort the training signal. Filter known bots out before sending to the Events endpoint.
There is no separate Sandbox or Production instance for the Content Recommendations API — you get a single instance. Anything you send is training the one model that serves your live traffic. Plan your testing, seeding, and rollout accordingly, and use a dedicated test site_id to keep experimental data out of your production catalog.
Handle Delivery API tokens carefully. Never commit them to source control. Always provision the Recommendations API token and the Collector API token from separate key collections, so that a token exposed in one context cannot be used against the other API. Tokens may be used in client-side code under this separation.

Quick Start (5 minutes)

The shortest path from zero to your first recommendation:

Generate Delivery API tokens — Follow Provisioning tokens through Delivery API to create one token for the Collector API and a separate token for the Recommendations API, each from its own key collection. Pass the appropriate token in the X-Api-Key header on each request below.
Send content — POST /collector/v1/content with an action: "publish" payload for each item in your catalog. This seeds the model with what’s available to recommend.
```
POST /collector/v1/content
{ "action": "publish", "item_id": "ARTICLE-001", "site_id": "my-site", "type": "article", "timestamp": "...", "title": "..." }
```
Send events — POST /collector/v1/events as users interact with that content. At least a few events per user are needed before personalization kicks in.
```
POST /collector/v1/events
{ "user_id": "user-1", "item_id": "ARTICLE-001", "event_type": "page_view", "timestamp": "..." }
```
Fetch recommendations — GET /recommend/v1/recommendations?site_id=my-site&user_id=user-1 returns a ranked list of item_ids you can then hydrate from your CMS.

Once this loop is in place, the remaining sections of this guide cover the full field reference, supported event types, and the item_id-anchored “more like this” mode.

Collector API

Base path: /collector/v1

The Collector API accepts two types of data: user behavior events and content lifecycle events. Both are processed asynchronously — the API responds immediately with 202 Accepted and processes data in the background.

It exposes two endpoints: the Events endpoint for user interactions and the Content endpoint for content lifecycle updates.

Every request must include your Collector API Delivery API token in the X-Api-Key header.

Events endpoint

Send user interaction events to train the recommendation model. The more behavioral data you send, the better the recommendations become.

Endpoint: POST /collector/v1/events

Response: 202 Accepted (no body)

{
    "user_id": "abc-123",
    "item_id": "ZSGXFR2KNFCMPN3VHPWQR3BGCE",
    "event_type": "page_view",
    "timestamp": "2026-03-27T14:30:00+00:00",
    "session_id": null
}

Fields

Field	Type	Required	Description
`user_id`	string	Yes	Anonymized identifier of the user who triggered the event. Can be an authenticated user ID or an ephemeral session ID for anonymous users, but it must be anonymized before sending to Arc XP.
`item_id`	string	Yes	The content item the user interacted with. Must match an `item_id` previously sent via the content endpoint.
`event_type`	string	Yes	The type of interaction. See Supported Event Types.
`timestamp`	string (ISO 8601)	Yes	When the interaction occurred. Must include timezone information (e.g., `+00:00` or `Z`).
`session_id`	string	No	Session identifier for grouping interactions within a single user visit.

Supported Event Types

Event Type	Description
`page_view`	User viewed a content detail page, such as article page.
`click`	User clicked on a content link, including from recommendations modules.
`share`	User shared content, such as clicked on share on Facebook.
`article_save`	User saved the article to read later, such as via bookmarking.
`search`	User performed an on-site search. Requires topic or keyword extraction to be relevant.
`deepest_scroll`	The maximum depth that the user consumed a given piece of content, expressed as a number (0.0–1.0). Do not send multiple observations for the same piece of content within the same user session.
`engaged_read`	An organization-defined metric that represents a reader consuming, being highly engaged with a piece of content, such as leaving a comment, spending a robust amount of time on page, etc.

Example: Page View

{
    "user_id": "u_314",
    "item_id": "a_202",
    "event_type": "page_view",
    "timestamp": "2026-04-08T10:15:00Z"
}

Batch ingestion

For high-volume or backfill scenarios, send multiple events in a single request via POST /collector/v1/events/batch. The request body is a JSON array of the same event envelope shown above, and the response is the same 202 Accepted. See POST /collector/v1/events/batch in the API reference for the full schema.

Content endpoint

Keep your content catalog in sync with the Content Recommendations API. Content is sent as webhook payloads from your CMS whenever an item is published, updated, or deleted.

Endpoint: POST /collector/v1/content

Response: 202 Accepted (no body)

The request body uses a discriminated union on the action field — either "publish" (create or update) or "delete".

Publishing or Updating Content

Send this payload when a content item is first published or when it is updated.

{
    "action": "publish",
    "item_id": "a_202",
    "site_id": "acme",
    "type": "article",
    "timestamp": "2026-04-08T09:00:00Z",
    "title": "Breaking: Major Policy Change Announced",
    "categories": ["Politics", "Government"],
    "tags": ["policy", "congress", "legislation"],
    "author": "Jane Reporter",
    "is_premium": false,
    "metadata": {}
}

Fields

Field	Type	Required	Description
`action`	string	Yes	Discriminator indicating a create/update operation.
`item_id`	string	Yes	Unique identifier for this content item, typically from your CMS.
`site_id`	string	Yes	Identifies which website or property this content belongs to. Used to partition recommendations by site.
`type`	string	Yes	Content type: such as `"article"` or `"podcast"`.
`timestamp`	string (ISO 8601)	Yes	Publication date. Must include timezone.
`title`	string	Yes	Display title of the content.
`categories`	list of strings	No	High-level taxonomy labels (e.g., `"Politics"`, `"Sports"`). Defaults to empty list.
`tags`	list of strings	No	Detailed keywords for the content. Defaults to empty list.
`author`	string	No	Content creator name.
`is_premium`	boolean	No	Whether this content is behind a paywall. Defaults to `false`. Used for subscription-tier filtering in recommendations.
`metadata`	object	No	Flexible key-value pairs for tenant-specific fields. Defaults to empty object.

Deleting Content

Send this payload when content should be removed from recommendations.

{
    "action": "delete",
    "item_id": "a_202",
    "site_id": "acme"
}

Fields

Field	Type	Required	Description
`action`	`"delete"`	Yes	Discriminator indicating a delete operation.
`item_id`	string	Yes	The item to remove.
`site_id`	string	Yes	The site the item belongs to.

Deletes are soft: the item is marked as deleted and excluded from future recommendations.

Recommendations API

Base path: /recommend/v1

The Recommendations API returns personalized, ranked content for a given user.

Every request must include your Recommendations API Delivery API token in the X-Api-Key header.

Fetching Recommendations

Endpoint: GET /recommend/v1/recommendations

Query Parameters

Parameter	Type	Required	Default	Description
`site_id`	string	Yes	—	Scopes recommendations to a specific website or property. Must match `site_id` values used in content ingestion.
`user_id`	string	Yes	—	The user to personalize for. See Identifying Users for anonymous user handling.
`item_id`	string	No	`null`	Anchor item for “more like this” recommendations. When provided, the response contains items similar to this one rather than general personalized recommendations.

Response

{
    "recommendations": [
        {
            "item_id": "a_202",
            "score": 0.934
        },
        {
            "item_id": "a_911",
            "score": 0.891
        },
        {
            "item_id": "a_107",
            "score": 0.847
        }
    ]
}

Fields

Field	Type	Description
`recommendations`	array	Ordered list of recommended items, ranked by relevance.
`recommendations[].item_id`	string	The CMS item identifier. Use this to look up full content details from your CMS.

Example: Basic Personalized Recommendations

GET /recommend/v1/recommendations?site_id=acme&user_id=u_314&num_results=10

Example: “More Like This” Recommendations

GET /recommend/v1/recommendations?site_id=acme&user_id=u_314&item_id=a_202&num_results=5

Personalization and Filtering

The Content Recommendations API applies multiple layers of intelligence to produce relevant recommendations:

ML Personalization — The recommendation model learns from your audience’s behavior (page views, clicks, engagement) and your content metadata (categories, tags, authors, recency). Each user receives a uniquely ranked set of results based on their interaction history.
Site Partitioning — The site_id parameter ensures recommendations are scoped to a specific website. Content published to one site will not appear in another site’s recommendations.
Filters — Optional section and content_type query parameters scope a request to an editorial section, content type, or both. Filtering happens before ranking, so non-matching items are never scored and never compete for a slot — more efficient than excluding them in client code. See Content Recommendations API filters for the full reference.

Cold-Start Behavior

The Content Recommendations API handles two cold-start scenarios automatically:

New or anonymous users — When a user has no interaction history, the system returns popularity-based recommendations drawn from your content catalog. Results reflect what is trending among your broader audience, weighted by recency and content metadata. As the user accumulates interactions, recommendations progressively become more personalized.

New content — Freshly published content with no engagement data is still eligible for recommendations. The model uses content metadata (categories, tags, author, recency) to place new items in front of relevant audiences immediately.

No special handling is required on your part for either scenario. The API response shape is identical whether results are fully personalized or cold-started.

Integration Guide

Identifying Users

The user_id parameter is required for both sending events and fetching recommendations. It is your responsibility to provide a consistent, anonymized identifier for each user.

Content Sync from Your CMS

The Content Recommendations API stays in sync with your content catalog through webhooks. Configure your CMS to send POST /collector/v1/content requests whenever content is published, updated, or deleted. This provides near-real-time sync.

Arc XP CMS customers: Contact your Technical Account Manager for access to the IFX Recipe, which wires up the content webhook end-to-end without custom code.

Other CMS customers: Build a direct integration from your CMS to the Content endpoint. At minimum, your CMS (or an intermediary service) must:

Listen for publish, update, and delete events in your CMS.
Transform each event into the appropriate action: "publish" or action: "delete" payload described under Content endpoint.
POST the payload to /collector/v1/content with your Delivery API token in the X-Api-Key header.
Handle retries on transport-level failures (connection errors, 5xx responses). Do not retry on 202 Accepted — that means the payload was accepted for async processing.

Whichever path you take, the goal is the same: every publish, update, and unpublish in your CMS must reach the Content endpoint, ideally within seconds.

Displaying Recommendations

The Recommendations API returns item_id values and scores.

To display recommendations to users:

Call GET /recommend/v1/recommendations with the appropriate parameters.
Use the returned item_id values to fetch full content details (title, thumbnail, URL, etc.) from your CMS.
Display items in the order returned — the list is already ranked by relevance.
Send a click event back to the Events endpoint when a user clicks on a recommendation. This feedback loop improves future recommendations.

Onboarding for good recommendations on day one

The Quick Start gets the plumbing working. This section covers how to avoid launching with a cold, low-signal model that produces generic results for your first real users.

A freshly provisioned model knows nothing about your catalog or your audience. If you flip on recommendations in a live surface the moment the Quick Start completes, readers will see popularity-based cold-start results — not personalization. The goal of onboarding is to front-load as much content and behavioral signal as possible before recommendations are user-visible.

Bulk-load your existing catalog

Before going live, send an action: "publish" payload for every published item in your catalog — not just new publishes going forward.

Pull your full content inventory from your CMS and stream it through POST /collector/v1/content.
Include accurate categories, tags, author, and timestamp values on every item. Metadata is what carries new content through cold-start, so sparse metadata here will hurt you later.
Once the backfill is done, wire up your live CMS webhook (or the IFX recipe for Arc XP customers) so ongoing publishes, updates, and deletes stay in sync automatically.

Backfill historical behavioral events

Events are the other half of the training signal. If your CDP or analytics system retains historical user interactions, replay them through POST /collector/v1/events before launch.

Target at least 30–90 days of historical events where available. More is better, but quality matters more than absolute volume — send the richer event types (click, article_save, deepest_scroll, engaged_read) in addition to page_views.
Preserve the original timestamp on each event rather than stamping everything with the backfill time. The model uses recency as a signal.
Keep the same anonymization scheme for historical user_ids as you’ll use in production, so users who exist in both the backfill and live traffic are recognized as the same person.

Shadow-deploy the event collector first

Turn on event collection in your live applications well before you expose recommendations to users.

Instrument page views, clicks, and engagement events against the real production site_id as soon as you’re confident in your payload shape and anonymization.
Let the collector run for a meaningful window — typically a few weeks — so the model trains on real traffic patterns rather than only historical backfill.
During this window, you can call GET /recommend/v1/recommendations internally (QA builds, staging surfaces, internal dashboards) to spot-check relevance without any reader-facing risk.

Validate before cutting over

Before exposing recommendations on a user-facing surface:

Sample recommendation responses for a diverse set of user_ids — new users, heavy readers, anonymous sessions — and manually review whether results look on-topic for each user’s history.
Confirm the returned item_ids resolve cleanly in your CMS and aren’t pointing at deleted or unpublished (or non-existent) content your delivery layer would reject.
Consider a gradual rollout (a percentage of traffic, a single surface, or a single site) rather than flipping recommendations on everywhere at once. This gives you a real-world quality signal with a small blast radius.

Troubleshooting

Use this section when recommendations aren’t behaving as expected. Most issues trace back to catalog sync, event volume, or cold-start behavior being misread as a bug.

No recommendations returned

If the response comes back with an empty recommendations array, the model has nothing to rank for that user and site.

Check content ingestion first. Query your CMS integration or webhook logs to confirm POST /collector/v1/content calls are succeeding. A model with no catalog cannot return anything.
Confirm site_id matches. The site_id on the recommendations request must exactly match the site_id used during content ingestion. A typo silently partitions your catalog into an empty sub-model.
Check for over-aggressive deletes. If your CMS sends action: "delete" for items that are still live, the model excludes them from results. Audit your unpublish / delete pipeline.

Recommendations look low-relevance or generic

If the API returns results but they feel random or generic, the model likely doesn’t have enough behavioral signal yet.

Check event volume. The model improves with more interactions. If you’re only sending page_view events — or only sending them for a small fraction of your traffic — personalization quality will be weak. Add richer event types (click, article_save, deepest_scroll, engaged_read) where appropriate.
Verify user_id consistency. If the same person shows up under different user_id values across sessions, the model can’t accumulate history on them. Confirm your anonymization scheme produces stable IDs per user.
Check content metadata quality. Missing categories, tags, or author values reduce what the model can reason about, especially for new content that has no engagement signal yet.

New user or new content looks “cold”

Cold-start is expected behavior, not a bug.

New or anonymous users receive popularity-based recommendations until they accumulate interaction history. As events stream in, results progressively personalize.
Newly published content is still eligible immediately — the model uses metadata (categories, tags, recency) to place it in front of relevant audiences even with zero engagement data.
If you’re evaluating the API with a brand-new tenant, expect the first wave of results to look generic. Seed the model with historical content and a representative volume of events before drawing conclusions about relevance.