Skip to content
Product Documentation

Content Sources with Outbound Feeds

This document describes the Content Sources that are available out-of-the-box with Outbound Feeds (OBF). OBF content sources can be broken down by which Arc service they talk to. Most talk to Content-API, but the Hierarchy content source talks to Site Service.

For details on caching on the content sources, see PageBuilder Content Cache | Outbound Feeds.

Content Sources

  • Collections - Used to get content from /content/v4/collection endpoint which does not include content_elements or streams (video). The content source uses the Id’s from the collection to call /content/v4/ids to get the missing data. It is limited to 20 records per request.
  • Feeds-Content-API - This is the default content source use by most resolvers to get content from /content/v4/search/published endpoint, it is limited to 100 records per request.
  • Feeds-Content-API-by-Day(23) - These three content sources are intended to be used with the sitemap-index-by-day feature. They make multiple calls to /content/v4/search/published endpoint to get all of the content for a single day. Because Fusion will timeout calls to content sources after 5 seconds, the practical limit is around 1,500 records.
  • Single-Content - Used to get a single story, video, gallery using either the _id or website_url.
  • Site-Service-Hierarchy - This is the only content source that does not talk to Content-API. As the name suggests, it used Site-Service. Because the data returned by Site-Service is not in the same format as Content-API, this content source can only be used by the Sitemap Section Front Index block.

Included Data

To improve performance and reduce the size of the data returned from Content-API, each content source uses the _sourceInclude and _sourceExclude parameters to limit which ANS fields are returned. Each content source starts with the following list of ANS fields

ANSFields = [
"canonical_url",
"canonical_website",
"content_elements", // feeds-content-api-by-day does not include content_elements
"created_date",
"credits",
"description",
"display_date",
"duration",
"first_publish_date",
"headlines",
"last_updated_date",
"promo_image",
"promo_items",
"publish_date",
"streams",
"subheadlines",
"subtitles",
"subtype",
"taxonomy.primary_section",
"taxonomy.seo_keywords",
"taxonomy.tags",
"type",
"video_type",
]

Before each call to content-api, the calling website Id is added to the _sourceIncludes list like:

"websites.{{arc-site}}"

By not including taxonomy.sections and only including the relevant website’s websites data (section and website_url) we greatly reduce the response size, especially for clients with many websites. Before the data is returned from the content source a transform is used to move the websites.{{arc-site}}.website_section to taxonomy.sections and websites.{{arc-site}}.website_url to website_url so the data will have a website_url and one section in taxonomy.sections. If you need to query on specific sections, both included and excluded, you might need to add taxonomy.sections to Source-Include to get the correct results from your query.

Using the resolvers Source-Include field will add those values to the list, not replace the original list. Using the resolvers Source-Exclude field will remove those values from the list. If the value is not present in the list, it will be added to _sourceExcludes. For example taxonomy is not in the list, but taxonomy.primary_section is.

Collections

To use a Collection to power a feed choose the collections content source. The collections content source requires the _id or content_alias of the desired collection.

The Collections content source works with all OBF feed types except Sitemap Section Front Index.

Collections Content Source Configurable Parameters

1. _id

The Collection _id, if using the _id you don’t need to populate the content_alias.

2. content_alias

Instead of using the _id, you can populate the content_alias. This is NOT the name of the Collection. You must add content_aliases using the gear icon in the Edit Collection Detail screen.

3. from

This is used for pagination to indicate which article you want to start the feed on.

4. size

1 - 20. A Collection returns a maximum of 20 articles. Default is set to 20.

5. ANS Fields to Include

A comma separated list of ANS fields to add to the default list of ANS fields (see Included Data above).

6. ANS Fields to Exclude

A comma separated list of ANS fields to remove from the default list of ANS fields (see Included Data above). The Content-API IDS endpoint used does not support the _sourceExcludes parameter. If you exclude content_elements then the second call to the IDS endpoint will be skipped and it will return the collections response ignoring any Include and Exclude field values.

View Feed Output

You should be able to see the published feed using the following URL format. In this example, this assumes you have a Collection of top stories and you created the resolver’s regex with ^/arc/outboundfeeds/top-stories/?$,

You can use the internal URL if you are logged in via OKTA:

  • sandbox: https://outboundfeeds-sandbox.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/top-stories?\_website={websitename}&outputType=xml
  • prod: https://outboundfeeds.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/top-stories?\_website={websitename}&outputType=xml

You can use the public URLs:

  • sandbox: https://{ORG}-{WEBSITE}-sandbox.web.arc-cdn.net/arc/outboundfeeds/top-stories?outputType=xml
  • prod: https://{ORG}-{WEBSITE}-prod.web.arc-cdn.net/arc/outboundfeeds/top-stories?outputType=xml (if your site isn’t live yet)
  • prod: https://www.example.com/arc/outboundfeeds/top-stories?outputType=xml (if your site is live)

Feeds-Content-API

This is the default Content Source used to setup most of the pre-configured resolvers. It is documented in the Setup and Resolver documents.

Feeds Content API by Day

This is a special content-source meant to be used with the Sitemap Index By Day, and is limited to returning content for a single day. It consists of three separate content sources:

  • feeds-content-api-by-day
  • feeds-content-api-by-day2
  • feeds-content-api-by-day3

All three content sources are used together with the Sitemap Index by Day Block. The three resolvers are identical except each has a different cache (Time To Live) value.

  • feeds-content-api-by-day TTL - 5 minutes
  • feeds-content-api-by-day2 TTL - 1 hour
  • feeds-content-api-by-day3 TTL - 1 day

See Outbound Feed Content Source TTLs for the complete list of TTLs for each content source.

The content resource requires a Date Field and Date Range and will use them to limit the results to a single date. For complete instructions on setting up these resolvers, see These Instructions.

1. Date Field - Enter one of the five ANS date fields. This is the field that will be used in the query.

  • created_date
  • display_date
  • first_publish_date
  • last_updated_date
  • publish_date

2. Date Range- This must be a valid date in the format of YYYY-MM-DD or the word latest, with an optional integer at the end (-\d)?. Typically this is passed as a URL pattern.

3. Include Terms - A valid JSON array of ElasticSearch query terms . More details on custom queries can be found Here.

4. Exclude Terms - A valid JSON array of ElasticSearch query terms . More details on custom queries can be found Here.

5. Exclude Sections - A comma separated list of sections to exclude

6. Source Include - A comma separated list of ANS fields to add to the default list of ANS fields (see Included Data above).

7. Source Exclude - A comma separated list of ANS fields to exclude from the default list of ANS fields (see Included Data above).

View Feed Output

You should be able to see the published feed using the following URL format. In this example, this assumes you have a Sitemap with Day resolver’s regex with ^/arc/outboundfeeds/sitemap/(latest(-\d*)?|\d\d\d\d-\d\d-\d\d(-\d*)?)/?$,

You can use the internal URL if you are logged in via OKTA:

  • sandbox: https://outboundfeeds-sandbox.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/sitemap/YYYY-MM-DD?\_website={websitename}&outputType=xml
  • prod: https://outboundfeeds.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/sitemap/YYYY-MM-DD?\_website={websitename}&outputType=xml

You can use the public URLs:

  • sandbox: https://{ORG}-{WEBSITE}-sandbox.web.arc-cdn.net/arc/outboundfeeds/sitemap/YYYY-MM-DD?outputType=xml
  • prod: https://{ORG}-{WEBSITE}-prod.web.arc-cdn.net/arc/outboundfeeds/sitemap/YYYY-MM-DD?outputType=xml (if your site isn’t live yet)
  • prod: https://www.example.com/arc/outboundfeeds/sitemap/YYYY-MM-DD?outputType=xml (if your site is live)

Single Content

This content source is used to retrieve one piece of content by its _id or website_url. This content source is only supported by the ANS feed.

Single Content Source Configurable Parameters

1. _id

This is the ANS _id field, typically 26 random letters and numbers like XG4EARCYLBGKTMEIHDLCFTD5CU.

2. website_url

This is the url for a specific websites version of the content.

View Feed Output

You should be able to pass an _id in the request using the following REGEX format. In this example, this assumes you have a ANS feed and you created the resolver’s regex with ^/arc/outboundfeeds/article/(.*)/$ and set _id to pattern 1. Another approach you can use is to add a required parameter called _id and set _id to use the parameter.

If you want to be able to pass a url in the request using the following REGEX format. In this example, this assumes you have a ANS feed and you created the resolver’s regex with ^/arc/outboundfeeds/article(/.*)/$ and set website_url to pattern 1. Notice this regex is a little different, it must include the leading slash since all website_urls start with a slash. If your urls end in slashes you must put the training slash in side the parentheses. You can not use a parameter for the url as they get url encoded and will not work with Content-API.

Site Service Hierarchy

To retrieve a list of sections from Site Service hierarchy, select a hierarchy name. To return a single section use a sectionId.

Site Service Hierarchy Content Source Configurable Parameters

The Site Service Hierarchy content source only applies to the Sitemap Section, Sitemap Section Index or ANS feeds.

1. hierarchy

This is the name of the Site Service hierarchy, for example default.

2. sectionId

This is the category name (section name), for example /sports (notice it should start with a slash).

View Feed Output

You should be able to see the published feed using the following URL format. In this example, this assumes you have a Sitemap Section feed and you created the resolver’s regex with ^/arc/outboundfeeds/sitemap-section/?$,

You can use the internal URL if you are logged in via OKTA:

  • sandbox: https://outboundfeeds-sandbox.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/sitemap-section?\_website={websitename}&outputType=xml
  • prod: https://outboundfeeds.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/sitemap-section?\_website={websitename}&outputType=xml

You can use the public URLs:

  • sandbox: https://{ORG}-{WEBSITE}-sandbox.web.arc-cdn.net/arc/outboundfeeds/sitemap-section?outputType=xml
  • prod: https://{ORG}-{WEBSITE}-prod.web.arc-cdn.net/arc/outboundfeeds/sitemap-section?outputType=xml (if your site isn’t live yet)
  • prod: https://www.example.com/arc/outboundfeeds/sitemap-section?outputType=xml (if your site is live)

Additional Information

More About Content Sources

How To Create A Collection In WebSked

Overview Of Site Service

Managing Hierarchies

Steps To Create And Manage Outbound Feeds.

Using Jmespath To Map To CustomFields ANS Values

More details on Resolvers

Regex Debugger