Content Sources with Outbound Feeds
This document describes the Content Sources that are available out-of-the-box with Outbound Feeds (OBF). OBF content sources can be broken down by which Arc service they talk to. Most talk to Content-API, but the Hierarchy content source talks to Site Service.
For details on caching on the content sources, see PageBuilder Content Cache | Outbound Feeds.
Content Sources
- Collections - Used to get content from
/content/v4/collection
endpoint which does not includecontent_elements
or streams (video). The content source uses the Id’s from the collection to call/content/v4/ids
to get the missing data. It is limited to 20 records per request. - Feeds-Content-API - This is the default content source use by most resolvers to get content from
/content/v4/search/published
endpoint, it is limited to 100 records per request. - Feeds-Content-API-by-Day(23) - These three content sources are intended to be used with the sitemap-index-by-day feature. They make multiple calls to
/content/v4/search/published
endpoint to get all of the content for a single day. Because Fusion will timeout calls to content sources after 5 seconds, the practical limit is around 1,500 records. - Single-Content - Used to get a single story, video, gallery using either the
_id
orwebsite_url
. - Site-Service-Hierarchy - This is the only content source that does not talk to Content-API. As the name suggests, it used Site-Service. Because the data returned by Site-Service is not in the same format as Content-API, this content source can only be used by the
Sitemap Section Front Index
block.
Included Data
To improve performance and reduce the size of the data returned from Content-API, each content source uses the _sourceInclude
and _sourceExclude
parameters to limit which ANS fields are returned. Each content source starts with the following list of ANS fields
ANSFields = ["canonical_url","canonical_website","content_elements", // feeds-content-api-by-day does not include content_elements"created_date","credits","description","display_date","duration","first_publish_date","headlines","last_updated_date","promo_image","promo_items","publish_date","streams","subheadlines","subtitles","subtype","taxonomy.primary_section","taxonomy.seo_keywords","taxonomy.tags","type","video_type",]
Before each call to content-api, the calling website Id is added to the _sourceIncludes
list like:
"websites.{{arc-site}}"
By not including taxonomy.sections
and only including the relevant website’s websites data (section
and website_url
) we greatly reduce the response size, especially for clients with many websites. Before the data is returned from the content source a transform is used to move the websites.{{arc-site}}.website_section
to taxonomy.sections
and websites.{{arc-site}}.website_url
to website_url
so the data will have a website_url
and one section in taxonomy.sections
. If you need to query on specific sections, both included and excluded, you might need to add taxonomy.sections to Source-Include to get the correct results from your query.
Using the resolvers Source-Include field will add those values to the list, not replace the original list. Using the resolvers Source-Exclude field will remove those values from the list. If the value is not present in the list, it will be added to _sourceExcludes
. For example taxonomy is not in the list, but taxonomy.primary_section
is.
Collections
To use a Collection to power a feed choose the collections content source. The collections content source requires the _id
or content_alias
of the desired collection.
The Collections content source works with all OBF feed types except Sitemap Section Front Index
.
Collections Content Source Configurable Parameters
1. _id
The Collection _id
, if using the _id
you don’t need to populate the content_alias.
2. content_alias
Instead of using the _id
, you can populate the content_alias
. This is NOT the name of the Collection. You must add content_aliases
using the gear icon in the Edit Collection Detail screen.
3. from
This is used for pagination to indicate which article you want to start the feed on.
4. size
1 - 20. A Collection returns a maximum of 20 articles. Default is set to 20
.
5. ANS Fields to Include
A comma separated list of ANS fields to add to the default list of ANS fields (see Included Data above).
6. ANS Fields to Exclude
A comma separated list of ANS fields to remove from the default list of ANS fields (see Included Data above). The Content-API IDS endpoint used does not support the _sourceExcludes
parameter. If you exclude content_elements
then the second call to the IDS endpoint will be skipped and it will return the collections response ignoring any Include and Exclude field values.
View Feed Output
You should be able to see the published feed using the following URL format. In this example, this assumes you have a Collection of top stories and you created the resolver’s regex with ^/arc/outboundfeeds/top-stories/?$
,
You can use the internal URL if you are logged in via OKTA:
- sandbox:
https://outboundfeeds-sandbox.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/top-stories?\_website={websitename}&outputType=xml
- prod:
https://outboundfeeds.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/top-stories?\_website={websitename}&outputType=xml
You can use the public URLs:
- sandbox:
https://{ORG}-{WEBSITE}-sandbox.web.arc-cdn.net/arc/outboundfeeds/top-stories?outputType=xml
- prod:
https://{ORG}-{WEBSITE}-prod.web.arc-cdn.net/arc/outboundfeeds/top-stories?outputType=xml
(if your site isn’t live yet) - prod:
https://www.example.com/arc/outboundfeeds/top-stories?outputType=xml
(if your site is live)
Feeds-Content-API
This is the default Content Source used to setup most of the pre-configured resolvers. It is documented in the Setup and Resolver documents.
Feeds Content API by Day
This is a special content-source meant to be used with the Sitemap Index By Day, and is limited to returning content for a single day. It consists of three separate content sources:
- feeds-content-api-by-day
- feeds-content-api-by-day2
- feeds-content-api-by-day3
All three content sources are used together with the Sitemap Index by Day Block. The three resolvers are identical except each has a different cache (Time To Live) value.
- feeds-content-api-by-day TTL - 5 minutes
- feeds-content-api-by-day2 TTL - 1 hour
- feeds-content-api-by-day3 TTL - 1 day
See Outbound Feed Content Source TTLs for the complete list of TTLs for each content source.
The content resource requires a Date Field and Date Range and will use them to limit the results to a single date. For complete instructions on setting up these resolvers, see These Instructions.
1. Date Field - Enter one of the five ANS date fields. This is the field that will be used in the query.
created_date
display_date
first_publish_date
last_updated_date
publish_date
2. Date Range- This must be a valid date in the format of YYYY-MM-DD or the word latest, with an optional integer at the end (-\d)?
. Typically this is passed as a URL pattern.
3. Include Terms - A valid JSON array of ElasticSearch query terms . More details on custom queries can be found Here.
4. Exclude Terms - A valid JSON array of ElasticSearch query terms . More details on custom queries can be found Here.
5. Exclude Sections - A comma separated list of sections to exclude
6. Source Include - A comma separated list of ANS fields to add to the default list of ANS fields (see Included Data above).
7. Source Exclude - A comma separated list of ANS fields to exclude from the default list of ANS fields (see Included Data above).
View Feed Output
You should be able to see the published feed using the following URL format. In this example, this assumes you have a Sitemap with Day resolver’s regex with ^/arc/outboundfeeds/sitemap/(latest(-\d*)?|\d\d\d\d-\d\d-\d\d(-\d*)?)/?$
,
You can use the internal URL if you are logged in via OKTA:
- sandbox:
https://outboundfeeds-sandbox.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/sitemap/YYYY-MM-DD?\_website={websitename}&outputType=xml
- prod:
https://outboundfeeds.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/sitemap/YYYY-MM-DD?\_website={websitename}&outputType=xml
You can use the public URLs:
- sandbox:
https://{ORG}-{WEBSITE}-sandbox.web.arc-cdn.net/arc/outboundfeeds/sitemap/YYYY-MM-DD?outputType=xml
- prod:
https://{ORG}-{WEBSITE}-prod.web.arc-cdn.net/arc/outboundfeeds/sitemap/YYYY-MM-DD?outputType=xml
(if your site isn’t live yet) - prod:
https://www.example.com/arc/outboundfeeds/sitemap/YYYY-MM-DD?outputType=xml
(if your site is live)
Single Content
This content source is used to retrieve one piece of content by its _id
or website_url
. This content source is only supported by the ANS feed.
Single Content Source Configurable Parameters
1. _id
This is the ANS _id
field, typically 26 random letters and numbers like XG4EARCYLBGKTMEIHDLCFTD5CU.
2. website_url
This is the url for a specific websites version of the content.
View Feed Output
You should be able to pass an _id
in the request using the following REGEX format. In this example, this assumes you have a ANS feed and you created the resolver’s regex with ^/arc/outboundfeeds/article/(.*)/$
and set _id
to pattern 1. Another approach you can use is to add a required parameter called _id
and set _id
to use the parameter.
If you want to be able to pass a url in the request using the following REGEX format. In this example, this assumes you have a ANS feed and you created the resolver’s regex with ^/arc/outboundfeeds/article(/.*)/$
and set website_url
to pattern 1. Notice this regex is a little different, it must include the leading slash since all website_urls
start with a slash. If your urls end in slashes you must put the training slash in side the parentheses. You can not use a parameter for the url as they get url encoded and will not work with Content-API.
Site Service Hierarchy
To retrieve a list of sections from Site Service hierarchy, select a hierarchy name. To return a single section use a sectionId
.
Site Service Hierarchy Content Source Configurable Parameters
The Site Service Hierarchy content source only applies to the Sitemap Section, Sitemap Section Index or ANS feeds.
1. hierarchy
This is the name of the Site Service hierarchy, for example default.
2. sectionId
This is the category name (section name), for example /sports (notice it should start with a slash).
View Feed Output
You should be able to see the published feed using the following URL format. In this example, this assumes you have a Sitemap Section feed and you created the resolver’s regex with ^/arc/outboundfeeds/sitemap-section/?$
,
You can use the internal URL if you are logged in via OKTA:
- sandbox:
https://outboundfeeds-sandbox.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/sitemap-section?\_website={websitename}&outputType=xml
- prod:
https://outboundfeeds.{ORG}.arcpublishing.com/pf/arc/outboundfeeds/sitemap-section?\_website={websitename}&outputType=xml
You can use the public URLs:
- sandbox:
https://{ORG}-{WEBSITE}-sandbox.web.arc-cdn.net/arc/outboundfeeds/sitemap-section?outputType=xml
- prod:
https://{ORG}-{WEBSITE}-prod.web.arc-cdn.net/arc/outboundfeeds/sitemap-section?outputType=xml
(if your site isn’t live yet) - prod:
https://www.example.com/arc/outboundfeeds/sitemap-section?outputType=xml
(if your site is live)
Additional Information
How To Create A Collection In WebSked
Steps To Create And Manage Outbound Feeds.
Using Jmespath To Map To CustomFields ANS Values
More details on Resolvers
Regex Debugger