Content Source Blocks in Outbound Feeds
Content Source blocks are used to call external API’s, like Content API or Site-Service API. A content source can also be used to call a third party service like Chartbeat or Google Analytics.
feeds-content-source-utils
We include a Package of helper functions used by all content sources. The main use of the package is to hold the defaultANSFields and the transform function. To reduce the size of the Content-API response, the _sourceInclude
and _sourceExclude
parameters are used to only include the fields needed by Outbound Feeds (OBF). Each time a feed is generated, the response from the content source and the Fusion OBF code are passed to a lambda. AWS has a hard limit of 6MB for this, otherwise it will generate a “Content too large” error. Besides the obvious benefit of removing unused fields, articles for clients with multiple websites can carry multiple websites data. If an article is circulated to three websites, then the websites object will contain keys to all three websites. To eliminate that extra data, we only need the calling websites data for each request. The calling website is added to the _sourceInclude
as websites.${key['arc-site']}
. Also taxonomy.sections holds every section for every website, so taxonomy.sections is not part of the default list. The transform function is used to copy the websites.{{arc-site}}
data into the expected places (taxonomy.sections
).
feeds-source-content-api-block
The Feeds-Source-Content-Api-Block calls Content-API’s /content/v4/search/published
endpoint using a resolve function and returns a results set. It allows searching by sections
, author
, seo_keywords
, tags
and tag-slugs
.
DSL format
It uses the Elasticsearch DSL format for queries since you must use that format when searching by sections. It starts with the following object.
// basic ES query const body = { query: { bool: { must: [], must_not: [], }, }, }
Next, it adds query terms to the must array. First it checks if the resolver passed an Include-Terms
value. If not, it checks if the siteProperty (from blocks.json) feedDefaultQuery
exists, otherwise it uses the following hard value.
[ { "term": { "type": "story" } }, { "range": { "last_updated_date": { "gte": "now-2d", "lte": "now" } } }]
The value is pushed into the must
array.
The only way to populate the must_not
array is to pass a value from the resolver’s Exclude-Terms
parameter.
If any of the resolvers for Author
, SEO_keywords
, Tags
and Tags-slug
parameters are populated the appropriate term will be pushed (added) to the must array.
The section and Exclude-Sections parameters work a little differently. To search for the “/sports” section the following object is pushed into the must array:
{ nested: { path: 'taxonomy.sections', query: { bool: { must: [ { term: { 'taxonomy.sections._website': key['arc-site'] }, }, { terms: { 'taxonomy.sections._id': ['/sports'] }, }, ], }, }, },}
To exclude the /opinion
section the following object is pushed into the must_not array:
{ nested: { path: 'taxonomy.sections', query: { bool: { must: [ { terms: { 'taxonomy.sections._id': ['/opinion'] }, }, ], }, }, },}
If you only want to exclude sections then the must block would not contain a taxonomy.sections._id term
, but the taxonomy.sections._website
needs to exist in the must
block.
feeds-source-content-api-by-day-block
The Feeds-Source-Content-Api-By-Day block uses a fetch to make multiple calls to Content-API to get all of the content for a specific day removing the 100 record limit. There are three content source that are intended to be used with sitemaps. The three only differ in the TTL value. In practice it can handle around 1,500 records before Fusion times out the call. It helps that it excludes the content_elements
so the amount of data is even smaller. It’s code is based on feeds-source-content-api-block
, but most of the search filters have been removed.
feeds-source-collections-block
The feeds-source-collections-block
uses a fetch. It calls the collections endpoint to get the collection, which does not contain the articles content-elements, or videos streams. It uses the initial call to get the _ids
and their order from the collections response. It then uses the Content-API’s _ids
endpoint to get the full content of each _id
. The collections endpoint will only return 20 records at a time. When calling Content-API’s _ids
endpoint it will also not return the content_elements
unless that field is included in the included_fields
parameter (note that the parameter is different than what is used by the /search
endpoint). The included_fields
is populated with the defaultANSFields
value from @wpmedia/feeds-content-source-utils
.
site-hierarchy-content-block
This block is part of the themes repo Https://Github.Com/WPMedia/Arc-Themes-Blocks/Tree/Stable/Blocks/Site-Hierarchy-Content-Block
CONTENT_BASE
The content source uses the CONTENT_BASE
environment variable that is set in environment/index.js
. It should look like
https://api.ORG.arcpublishing.com
where ORG is your Organization name in Arc. For example if your org name was demo then you can use:
demo
- this will use your production environmentsandbox.demo
- this will use your sandbox environment
When running locally this value should be set in your .env
file in the root of the outboundfeeds repo.
exports
Each content source must have either a resolve or a fetch function that will be called by fusion. In addition, all parameters must be exported under the params key:
export default { resolve, transform, schemaName: 'feeds', params: { Section: 'text', Author: 'text', Keywords: 'text', },}