Content Source Blocks in Outbound Feeds

Content Source blocks are used to call external API’s, like Content API or Site-Service API. A content source can also be used to call a third party service like Chartbeat or Google Analytics.

feeds-content-source-utils

We include a package of helper functions used by all content sources. The main use of the package is to hold the defaultANSFields and the transform function. To reduce the size of the Content-API response, the _sourceInclude and _sourceExclude parameters are used to only include the fields needed by Outbound Feeds (OBF). Each time a feed is generated, the response from the content source and the Fusion OBF code are passed to a lambda. AWS has a hard limit of 6MB for this, otherwise it will generate a “Content too large” error. Besides the obvious benefit of removing unused fields, articles for clients with multiple websites can carry multiple websites data. If an article is circulated to three websites, then the websites object will contain keys to all three websites. To eliminate that extra data, we only need the calling websites data for each request. The calling website is added to the _sourceInclude as websites.${key['arc-site']}. Also taxonomy.sections holds every section for every website, so taxonomy.sections is not part of the default list. The transform function is used to copy the websites.{{arc-site}} data into the expected places (taxonomy.sections).

feeds-source-content-api-block

The Feeds-Source-Content-Api-Block calls Content-API’s /content/v4/search/published endpoint using a resolve function and returns a results set. It allows searching by sections, author, seo_keywords, tags and tag-slugs.

DSL format

It uses the Elasticsearch DSL format for queries since you must use that format when searching by sections. It starts with the following object.

     // basic ES query
  const body = {
    query: {
      bool: {
        must: [],
        must_not: [],
      },
    },
  }

Next, it adds query terms to the must array. First it checks if the resolver passed an Include-Terms value. If not, it checks if the siteProperty (from blocks.json) feedDefaultQuery exists, otherwise it uses the following hard value.

     [
  { "term": { "type": "story" } },
  { "range": { "last_updated_date": { "gte": "now-2d", "lte": "now" } } }
]

The value is pushed into the must array.

The only way to populate the must_not array is to pass a value from the resolver’s Exclude-Terms parameter.

If any of the resolvers for Author, SEO_keywords, Tags and Tags-slug parameters are populated the appropriate term will be pushed (added) to the must array.

The section and Exclude-Sections parameters work a little differently. To search for the “/sports” section the following object is pushed into the must array:

     {
  nested: {
    path: 'taxonomy.sections',
    query: {
      bool: {
        must: [
          {
            term: { 'taxonomy.sections._website': key['arc-site'] },
          },
          {
            terms: { 'taxonomy.sections._id': ['/sports'] },
          },
        ],
      },
    },
  },
}

To exclude the /opinion section the following object is pushed into the must_not array:

     {
  nested: {
    path: 'taxonomy.sections',
    query: {
      bool: {
        must: [
          {
            terms: { 'taxonomy.sections._id': ['/opinion'] },
          },
        ],
      },
    },
  },
}

If you only want to exclude sections then the must block would not contain a taxonomy.sections._id term, but the taxonomy.sections._website needs to exist in the must block.

feeds-source-content-api-by-day-block

The Feeds-Source-Content-Api-By-Day block uses a fetch to make multiple calls to Content-API to get all of the content for a specific day removing the 100 record limit. There are three content source that are intended to be used with sitemaps. The three only differ in the TTL value. In practice it can handle around 1,500 records before Fusion times out the call. It helps that it excludes the content_elements so the amount of data is even smaller. It’s code is based on feeds-source-content-api-block, but most of the search filters have been removed.

feeds-source-collections-block

The feeds-source-collections-block uses a fetch. It calls the collections endpoint to get the collection, which does not contain the articles content-elements, or videos streams. It uses the initial call to get the _ids and their order from the collections response. It then uses the Content-API’s _ids endpoint to get the full content of each _id. The collections endpoint will only return 20 records at a time. When calling Content-API’s _ids endpoint it will also not return the content_elements unless that field is included in the included_fields parameter (note that the parameter is different than what is used by the /search endpoint). The included_fields is populated with the defaultANSFields value from @wpmedia/feeds-content-source-utils.

site-hierarchy-content-block

This block is part of the Themes blocks repository.

CONTENT_BASE

The content source uses the CONTENT_BASE environment variable that is set in environment/index.js. It should look like

https://api.{ORG}.arcpublishing.com

where ORG is your Organization name in Arc. For example if your org name was demo then you can use:

demo - this will use your production environment
sandbox.demo - this will use your sandbox environment

When running locally this value should be set in your .env file in the root of the outboundfeeds repository.

exports

Each content source must have either a resolve or a fetch function that will be called by fusion. In addition, all parameters must be exported under the params key:

export default {
  resolve,
  transform,
  schemaName: 'feeds',
  params: {
    Section: 'text',
    Author: 'text',
    Keywords: 'text',
  },
}