Skip to content

Item ID Guidance

Audience: Tenant integrators wiring a CMS to the For You ingestion API.

The choice of item identifier is the single most consequential decision a tenant makes during onboarding. Recommendation quality depends on a model that has learned which items each user interacted with, and that learning is keyed on the item ID you supplied at ingest time. Change the ID, and every interaction already recorded against the old ID becomes orphaned — the model will not know that those impressions belong to the new record, and your training continuity restarts from zero.

Get this right at onboarding and you will rarely think about it again. Get it wrong, and every CMS edit that reissues an ID quietly degrades your model.

TL;DR

  • Pick one ID source per content piece, on Day 1, and never change it. Treat it as a primary key, not a label.
  • Prefer the CMS’s internal record ID over the canonical URL unless your CMS guarantees URL stability across moves, renames, and re-categorisation.
  • Use the API constraint ceiling as a hard cap, not a target. API requests reject IDs over 256 characters; aim for 64 characters or fewer to keep logs and dashboards readable.
  • Never reuse a deleted item’s ID. If content needs to return, mint a new ID. The recommendation engine cannot tell “the old one came back” from “a different item appeared at the same key” — it will fold both histories together.

What “item ID” means in this system

When you POST to /collector/v1/content, the value you send as item_id flows through to the recommendation engine as the canonical key for that piece of content. The same identifier is what comes back as recommendations[].item_id when you call the recommendations endpoint, and it is the join key used to attach the display-ready card payload (title, author, URL, thumbnail, etc.) to each recommendation.

Three places see the same value:

  1. Your CMS — the source of truth for the content piece.
  2. Our ingestion API (item_id field on content events).
  3. The recommendation engine (the catalogue key against which user interactions are recorded, and the key by which each recommendation’s card is looked up from the stored Item document).

For recommendations to mean what you expect, all three must use exactly the same string for the same piece of content, every time. A mismatch does not just orphan interaction history — it also returns card: null for the recommendation, because the lookup against the Item document misses.

Stability requirements

Immutability

The ID must remain identical for the entire lifetime of the content piece. Once you have published an item with item_id = "abc123" and a single user interaction has been recorded against it, you should treat that string as permanent. Republishing the same article with a different ID is treated as a new item with zero history.

This is a customer-side responsibility. The platform does not detect “this new item is really the same as that old one.” If your CMS workflow can mint new IDs on edits, on locale changes, on category moves, or on re-publishes, that workflow is incompatible with stable recommendations as-is and needs to be reviewed.

IDs are compared byte-for-byte. abc123 and ABC123 are two different items, as are abc-123 and abc_123. Pick a casing and punctuation convention up front and apply it consistently — silently changing case on a republish has the same effect as changing the ID outright.

Character set

The platform does not enforce a character-set restriction on item_id. Any UTF-8 string of an allowed length is accepted by the API. You are responsible for the format you choose. We recommend:

  • ASCII alphanumerics, plus -, _, ., /, :.
  • Avoid whitespace, control characters, and characters that need URL-encoding in path segments — IDs surface in logs, dashboards, and debug tooling.
  • Avoid characters that may collide with downstream serialisation (newlines, tabs, NUL bytes).

Length

The API enforces a length bound of 1–256 characters at request time. Requests with an item_id outside that range are rejected with a 422.

Treat that bound as authoritative: 1–256 characters, target ≤ 64. Longer IDs work but make operational tooling harder to read, and risk truncation in third-party exports.

Choosing your ID source: use the CMS internal ID

Use the primary key your CMS assigns to the record (e.g. WordPress post_id, ArcXP _id, Drupal nid). CMSes treat their PKs as immutable, so this ID survives the edits that most often break recommendation continuity: slug rewrites, SEO URL changes, re-categorisation, and moves between sections.

Do not use the canonical URL or its slug as item_id. URLs are content, not keys — editorial workflows routinely rewrite them (“better SEO,” “fix typo,” re-categorisation from /news/foo to /politics/foo), and each rewrite orphans every interaction recorded against the old URL.

Watch-outs when using the CMS PK:

  • Tied to one CMS. A platform migration may reset PKs and orphan all history. If you ever change CMSes, plan an ID-mapping step at migration time so the old PKs continue to resolve to the same items under the new system.

Republish behaviour

The ingestion path is idempotent on (tenant_id, site_id, item_id). Sending a content event for an existing item updates the stored record in place; sending an event for a new combination creates a new item.

Same item_id on republish:

  • Item record is updated (title, body, categories, etc. refresh).
  • The recommendation engine’s catalogue entry is upserted under the same key.
  • Interaction history is preserved. Existing model knowledge carries forward.
  • The next recommendations response surfaces the refreshed fields on the card payload for that item.

New item_id on republish (do not do this):

  • A new item appears in your catalogue.
  • The recommendation engine learns about a brand-new item with zero interactions.
  • The old item still exists with all its history but is no longer being surfaced by your CMS — it sits in the catalogue as a dead entry.
  • Recommendations regress for the duration of training catch-up.
  • If the old ID is still recommended (e.g. via an editorial pin) before the new document syncs, its card will be null in the response and front-ends will skip rendering it.

If your CMS’s republish workflow assigns a new ID — whether due to versioning, “unpublish then republish” actions, or import from another source — that workflow needs to either preserve the original ID or be replaced before the content reaches our ingestion endpoint.

Delete semantics and the takedown pattern

Why delete-and-recreate is destructive

The recommendation engine learns from interaction sequences over time. The training signal that makes recommendations useful — “users who watched A, B, and C went on to read D” — is keyed on the exact item-ID strings recorded against past interactions.

Deleting an item and recreating it under the same ID after a delay is not equivalent to leaving it untouched. The interaction history still references the original ID, but in the meantime the model has been retrained on a catalogue without it; the new item appears as a fresh, untrained entry that will not surface in recommendations until enough new interactions accumulate.

Worse, delete-and-recreate with a different ID orphans every interaction ever recorded against the original ID. There is no retroactive “rebind.”

For voluntary takedowns (article retired, content out of date):

  1. Soft-delete the item by sending a delete action through the content ingestion endpoint. The catalogue retains the record and the training history is preserved.
  2. The recommendations endpoint will exclude soft-deleted items from results.
  3. Do not reuse the ID if the content is later restored under a new editorial workflow — mint a new ID for the new item.

API surface, in one place

FieldWhereConstraint
item_idPOST /collector/v1/content request1–256 chars; UTF-8; no charset filter
recommendations[].item_idGET /recommendations/v1/recommendations responsestring echoed verbatim from what was ingested
recommendations[].cardGET /recommendations/v1/recommendations responsenested display payload looked up by (tenant_id, site_id, item_id); null on lookup miss

See also