Skip to content

How to Create a Custom Query with Outbound Feeds

Outbound Feeds leverages the feeds-content-api content source, which uses the Elasticsearch Query DSL JSON format to query the /publish endpoint of the Content Operations API. The feeds-content-api content source returns the 100 most recently updated stories from the last two days when using its default settings. This page details the default query configuration for feeds-content-api and the resolver fields you can modify to create custom queries.

Default query

The default query for the feeds-content-api content source is an array that contains two objects:

[{"term":{"type": "story"}},{"range":{"last_updated_date":{"gte":"now-2d","lte":"now"}}}]

Breaking down the array, the first object is as follows:

{
"term": {
"type": "story"
}
}

The first object only includes the content if the ANS field is “type” == “story”.

The second object is as follows:

{
"range": {
"last_updated_date": {
"gte": "now-2d",
"lte": "now"
}
}
}

The second object includes the content if the ANS field last_update_date is between now and two days ago (now-2d).

Custom queries

If you want to change the query used by the feeds-content-api content source, you must enter a valid JSON array of terms into the Include-Terms and Exclude-Terms fields of the resolver.

The Include-Terms value becomes the queries must clause and the Exclude-Terms becomes the queries must_not clause. If you leave the Include-Terms field blank, the feeds-content-api content source uses the default query.

Examples of custom queries

This section lists the steps to create custom queries taking the default query as a starting point and using different terms and parameters.

If you want to create a feed containing stories or galleries published in the last week, take the following steps:

  1. Change the first term to terms and use an array to hold both “story” and “gallery”.

  2. Change last_updated_date to publish_date and “2d” to “1w”.

    Valid terms to use instead of “now” include: y=year, M=month, w=week, d=day, h=hour

    Example:

    [{"terms":{"type": ["story", "gallery"]}},{"range":{"publish_date":{"gte":"now-1w","lte":"now"}}}]
  3. Add more terms to the query by adding additional terms joined with a comma.

    The following example limits a feed to stories created by your staff:

    [{"term":{"type": "story"}},{"range":{"last_updated_date":{"gte":"now-2d","lte":"now"}}},{"term":{"source.source_type":"staff"}}]

If you want to limit a feed to only stories with a featured image, use the “exists” term to make sure content has a “promo_items.basic.url”.

[{"term":{"type": "story"}},{"range":{"last_updated_date":{"gte":"now-2d","lte":"now"}}},{"exists":{"field":"promo_items.basic.URL"}}]

Sections

Section queries are different from other ANS fields. You can not use taxonomy.sections with a term statement like other fields. Because of this, the content source has a Sections field that takes a list of sections to include and a Exclude-Sections field that takes a list of sections to exclude. You can use taxonomy.primary_section in queries, but a piece of content can only have one primary_section.

Analyzed fields

ElasticSearch has a special way of indexing some fields called “analyzed fields”, where each word in the field is tracked individually. For example, if you store the phrase “Chicago-Cubs” in an analyzed field (subtype, for example), it keeps track of each word separately. This is also true if you use spaces in an analyzed field. This enables you to search for either of the lower case values “chicago” or “cubs” and get a match on that record. But if you want to search for exactly “Chicago-Cubs” you must use either the match_phrase or simple_query_string queries. The simple_query_string query allows for the use of complex regex syntax like “&” and “|”. If you want to search for the phrase “Chicago-Cubs” in subtype, you must use the following query:

[{"match_phrase":{"subtype": "chicago-cubs"}}]

If you want to match on several phrases like “Chicago Cubs”, “Chicago Bears”, or “Chicago Bulls”, you must add quotes (\”) around the words in the query string:

[{"simple_query_string":{"query": "\"chicago cubs\" | \"chicago bears\" | \"chicago bulls\"", "fields":["subtype"]}}]

Some analyzed fields like taxonomy.tags.text have a raw version that isn’t analyzed. If you want to search for the phrase “Chicago-Cubs” in tags.text, you can use a query like the following:

[{"term":{"taxonomy.tags.text.raw":"chicago-cubs"}}]

Exclude-terms

To exclude content based on a field, the Exclude-Terms uses the exact same format as Include-Terms, but it goes into the must_not section of the query. You can just populate the Exclude-Terms field to add it to the default query. You can also combine it with a custom query in Include-Terms. If you want to exclude sponsored content while using the default query, enter the following in Exclude-Terms:

[{"term":{"sponsored": true}}]

You might want to include all types of content (story, video, gallery, and audio) in a feed except collections. Enter the following in Exclude-Terms to exclude “collections”:

[{"term":{"type":"collection"}}]

As with the Include-Terms field, if you use multiple terms in the Exclude-Terms field, they are logically AND’d together. If the content_code is “premium” and the language is “es” in the Exclude-Terms field, the content is excluded:

[{"term":{"content_restrictions": {"content_code": "premium"}}},{"term":{"language": "es"}}]