Searching for Highly Paginated Documents
In this guide, we’ll go over how to most effectively use Content API to query through many pages of content.
Overview
Content API is used in Arc to deliver your content to the web. For most organizations, this means being used in PageBuilder Content Sources to that it can be rendered correctly to your end-users.
Content API has two main key functionalities:
- Accessing a single piece of content in order to render a single content page
- Searching for content that matches specific authors, sections, or tags in order to build a feed of related content
Deeply Paginated Content
Your use case for building a feed or searching for content may require accessing pages and pages of results. While Content API can support this, as of ElasticSearch 7, there is a hard limit at 10,000 documents in a single result set. This change was implemented in Content API in Spring 2021.
If you require paginating through more than 10,000 documents, we recommend splitting up your query to query by single day(s) at a time.
For example, if your original query is for all documents in a particular section:
q = type:story AND websites.the-herald.website_section._id:/politics
We recommend adding a filter for a particular date range that will keep your result set under 10,000. Depending on your organization, the number of days that makes sense may vary.
q = type:story AND websites.the-herald.website_section._id:/politics AND display_date:[2021-01-01+TO+2021-01-05]
By making a series of smaller queries, you can access all of the results and perform any required aggregation in the downstream system.