How to Request Tracing in Content Sources with PageBuilder Engine

Request Tracing allows developers to be able to see all the HTTP requests their content sources are making. For simple content sources, this isn’t hard to do this manually. However, with complex contents sources, or ones using external packages, it can be difficult or impossible to manually trace. Additionally, Engine provides caching for content sources, but there was no way for a developer to be able to check and see if the cache was used. Request tracing provides a solution for both of these problems enabling developers to monitor and understand caching and network behavior while their pages render or content sources execute individually.

Previously, this was only available locally using a Custom Debugging Image. It is now integrated into Engine and can be used in all environments. There are a few differences when using it locally:

You no longer need to have a different FUSION_RELEASE version set in your .env file.
Content sources will not be slower, but the order is no longer guaranteed for non-global content source calls. The global content source (when provided) will still be the first one.
Calls will no longer show up twice due to having 2 passes for the server-side render.
The structure of the data in the JSON file is different. We’ll talk more about this in the Using Request Tracing Locally section.
Headers are added to each HTTP call made within a content source. We’ll talk more about this in the Using Request Tracing Without Logs section.

Request tracing uses the experimental async_hooks functionality in node. Turning it on does have a performance penalty so we are making it opt-in while we evaluate the impact.

Engine encodes invalid header values (like non-English characters) using the encodeURI function.

Using request tracing without logs

Request tracing is off by default in production environments. In order to enable it, you will need to set FUSION_ENABLE_TRACING=true in your environment file.

In production environments, no logs will be generated for request tracing. Instead, we will be relying on headers that are injected into each HTTP request. These headers are also injected locally and on non-production environments, but those environments have logs which will contain the same information. The following headers are injected:

Header	Description
`x-amzn-trace-id`	A string in the format `Root=${rootId};Self=${traceId}`. `rootId` is a UUID indicating which request called the content source. Currently, it is generated at Engine, but in the future we will be using it to tie CDN requests to content sources. `traceId` is a UUID indicating which content source made the HTTP call. The same `traceId` will be set for all HTTP calls made within the same content source call.
`x-fusion-source`	The content source filename.
`x-fusion-query`	The JSON stringified query the content source was called with.
`x-fusion-deployment`	The deployment id.
`x-arc-org`	The id of the environment that made the request.
`x-arc-site`	The id of the site that made the request.
`x-fusion-cache`	The name provided to a CachedCall Function. Only available on calls to cache proxy.

The rootId is also included other content related logs, like content size, content latency, cache backoff, etc. You can use a rootId to tie together data in your forwarded logs and the endpoints called by your content sources.

Using request tracing locally

Request tracing is an opt in feature. In order to enable it, you will need to set FUSION_ENABLE_TRACING=true in your .env file.

If you haven’t set DEBUG variable in your .env file before, you need to adjust this to make these event logs appear in your command line output. You can either set DEBUG=arc.fusion.content.tracing.* to only see the request tracing logs or set DEBUG=arc.fusion.content.*,arc.fusion.render.error to see both the request tracing logs and the server-side rendering error logs.

When enabled, you’ll start seeing these objects printed in your command line. They describe different event and calls to PageBuilder cache service or your network calls (wether Arc APIs or external APIs):

Collect calls in `calls.json` file

In order to generate the calls.json file locally, you’ll need to add FUSION_GENERATE_TRACING_JSON=true in your .env file.

To generate data, go to any page/template (either in Editor or published). Once it is loaded, you can go to http://localhost/pf/dist/calls.json also works if you were following the custom debugging image guide) to see the call data. If you have a custom JSON parser, you can also directly get the file by going to .fusion/dist/calls.json. A few things to keep in mind:

The data is always appended to the file. The file can become quite large if you are using a lot of content sources.
The file is deleted each time PageBuilder Engine restarts. If you want to clear it sooner, you’ll need to delete the file manually.
If you want to clear your content cache, you will need to restart the content-cache docker container or restart PageBuilder Engine. This will not clear the calls.json file.

The data in calls.json will be an array of call objects. Each call object consists of the following:

Attribute	Description
`source`	The content source filename.
`query`	The query the content source was called with.
`rootId`	UUID indicating which request called the content source. All the content sources called within the same server-side render will have the same `rootId`. Each content source on the client side will have a different `rootId`. In the future, the `rootId` will be used to tie content sources and CDN requests together.
`duration`	The amount of time the call took (in milliseconds).
`calls`	An array of objects showing the HTTP calls made within a content source. This includes the calls made to the cache proxy. Each one will have the `url`, `method`, `cache` (the name value a CachedCall Function was called with), `statusCode`, and `duration` (in milliseconds). The `cache` value is only available during a call to cache proxy. If it matches the `source` name, it is usually the parent content source’s cache. Otherwise it is the cache for a HTTP call made within the fetch content source. If there is no `statusCode` or `duration`, that means the request was terminated. This is usually due to a content source taking more than 5 seconds.

Parsing `calls.json` to a nice table view in command line

Engine generates calls.json in json format to allow customer developers to programmatically process, build their own tooling to monitor, investigate the content source network activity. But in a lot of cases, you just want to view this activity in a simple easy to read format, like a list or table view to quickly take a glance of what happened.

You can use jq and tty-table command line tools with the sample command below to print the calls from each content source execution in a nice table format in your command line.

Make sure you install jq, and tty-table from your package managers (i.e: brew install jq and npm install -g tty-table). Then, after making some page renders or content source executions, run the following command in your project root:

(echo "#,Source,Call,Method,Cache,Status,Duration,URL" && jq -r '
  to_entries[] as $top |
  $top.value.calls[] as $call |
  [
    if (( ($top.value.calls | index($call)) + 1 ) > 1) then "" else ($top.key + 1) end,
    if (( ($top.value.calls | index($call)) + 1 ) > 1) then "⠀⠀↳" else $top.value.source end,
    (( ($top.value.calls | index($call)) + 1 ) | tostring),
    $call.method,
    $call.cache,
    $call.statusCode,
    $call.duration,
    $call.url
  ] | @csv
' .fusion/dist/calls.json) | tty-table --delimiter ',' --options-align=left --options-compact=true

Here is a sample output this command will print out.

┌───┬────────┬──────┬────────┬───────────┬────────┬────────────────────┬──────────────────────────────────────────────────────────────────────┐
│ # │ Source │ Call │ Method │ Cache     │ Status │ Duration           │ URL                                                                  │
│ 1 │ city   │ 1    │ GET    │ city      │ 200    │ 91.716459          │ http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7            │
│ 2 │ city   │ 1    │ GET    │ city      │ 404    │ 71.46229199999999  │ http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7            │
│   │   ↳    │ 2    │ GET    │ get-story │ 200    │ 10.363208          │ http://cache-proxy/cache?key=2:0803:f37fba...5c2a2f5464a4            │
│   │   ↳    │ 3    │ GET    │ get-time  │ 200    │ 3.2725             │ http://cache-proxy/cache?key=2:0803:2df5ce...2a36c2cb4f5a            │
│   │   ↳    │ 4    │ PUT    │ city      │ 201    │ 1.930458           │ http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7&ttl=259200 │
│ 3 │ city   │ 1    │ GET    │ city      │ 404    │ 84.59975           │ http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7            │
│   │   ↳    │ 2    │ GET    │ get-story │ 200    │ 10.557209          │ http://cache-proxy/cache?key=2:0803:f37fba...5c2a2f5464a4            │
│   │   ↳    │ 3    │ GET    │ get-time  │ 200    │ 3.3147089999999997 │ http://cache-proxy/cache?key=2:0803:2df5ce...2a36c2cb4f5a            │
│   │   ↳    │ 4    │ PUT    │ city      │ 201    │ 2.0747079999999998 │ http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7&ttl=259200 │
│ 4 │ city   │ 1    │ GET    │ city      │ 404    │ 70.877292          │ http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7            │
│   │   ↳    │ 2    │ GET    │ get-story │ 404    │ 11.359874999999999 │ http://cache-proxy/cache?key=2:0803:f37fba...5c2a2f5464a4            │
│   │   ↳    │ 3    │ GET    │           │ 200    │ 2092.765585        │ https://jsonplaceholder.typicode.com/posts/1                         │
│   │   ↳    │ 4    │ PUT    │ get-story │ 201    │ 2.677375           │ http://cache-proxy/cache?key=2:0803:f37fba...c2a2f5464a4&ttl=259200  │
│   │   ↳    │ 5    │ GET    │ get-time  │ 404    │ 5.2153339999999995 │ http://cache-proxy/cache?key=2:0803:2df5ce...a36c2cb4f5a             │
│   │   ↳    │ 6    │ GET    │           │ 200    │ 2544.083917        │ https://timeapi.io/api/Time/current/zone?timeZone=America/New_York   │
│   │   ↳    │ 7    │ PUT    │ get-time  │ 201    │ 3.656084           │ http://cache-proxy/cache?key=2:0803:2df5c...c2a36c2cb4f5a&ttl=259200 │
│   │   ↳    │ 8    │ PUT    │ city      │ 201    │ 2.610375           │ http://cache-proxy/cache?key=2:0803:8f685...51194b41ee4e7&ttl=259200 │
│ 5 │ city   │ 1    │ GET    │ city      │ 404    │ 77.35270899999999  │ http://cache-proxy/cache?key=2:0803:8f685...51194b41ee4e7            │
│   │   ↳    │ 2    │ GET    │ get-story │ 200    │ 10.524             │ http://cache-proxy/cache?key=2:0803:f37fb...f5c2a2f5464a4            │
│   │   ↳    │ 3    │ GET    │ get-time  │ 200    │ 3.4125419999999997 │ http://cache-proxy/cache?key=2:0803:2df5c...c2a36c2cb4f5a            │
│   │   ↳    │ 4    │ PUT    │ city      │ 201    │ 2.03225            │ http://cache-proxy/cache?key=2:0803:8f685...51194b41ee4e7&ttl=259200 │
└───┴────────┴──────┴────────┴───────────┴────────┴────────────────────┴──────────────────────────────────────────────────────────────────────┘

This table columns can be customized with modifying the jq command which is easy to read. You can also ask an AI tool to modify the command with providing a sample object from calls.json

While you’re testing and developing your content sources, calls.json file may get long and the output of this command may get really long. You can always flush the call.json with removing it rm .fusion/dist/calls.json occasionally.

Using Request Tracing through Forwarded Logs

Request tracing is off by default in non-production environments. In order to enable it, you will need to set FUSION_ENABLE_TRACING=true in your environment file.

In non-production environments, the request tracing logs will be available in the forwarded logs for engine (not compiler or resolver). We only generate the calls.json file locally, so you will need to parse your logs in order to get the same data in a deployed environment. If you do not have log forwarding enabled, you can request it through a support ticket or only do your request tracing locally. The data for each content source call is split across three types of log statements: one or more start objects, one or more end objects, and one total object. If you see a start but not an end, that means the lambda was terminated, usually due to the content source time limit being hit. If the time limit wasn’t hit, the total will contain duration for the entire content source call.

The start object will have the following:

Attribute	Description
`name`	This will always be `arc.fusion.content.tracing.start`.
`source`	The content source filename.
`query`	The JSON stringified query the content source was called with.
`rootId`	UUID indicating which request called the content source. All the content sources called within the same server-side render will have the same `rootId`. Each content source on the client side will have a different `rootId`. In the future, the `rootId` will be used to tie content sources and CDN requests together.
`traceId`	UUID indicating which content source made the HTTP call. All HTTP calls made within the same content source call will have the same `traceId`.
`url`	The URL used for the HTTP call.
`method`	The method used for the HTTP call.
`value`	This will always be `1`.
`cache`	The name provided to a CachedCall Function. Only available on calls to cache proxy.

The end object will have the following:

Attribute	Description
`name`	This will always be `arc.fusion.content.tracing.end`.
`source`	The content source filename.
`query`	The JSON stringified query the content source was called with.
`rootId`	UUID indicating which request called the content source. All the content sources called within the same server-side render will have the same `rootId`. Each content source on the client side will have a different `rootId`. In the future, the `rootId` will be used to tie content sources and CDN requests together.
`traceId`	UUID indicating which content source made the HTTP call. All HTTP calls made within the same content source call will have the same `traceId`.
`URL`	The URL used for the HTTP call.
`method`	The method used for the HTTP call.
`status`	The status code for the HTTP call’s response.
`value`	The amount of time the call took (in milliseconds).
`x-amz-request-id`	The value of the `x-amz-request-id` header in the HTTP call’s response. Only included when the response has this header set.
`cache`	The name provided to a CachedCall Function. Only available on calls to cache proxy.

The total object will have the following:

Attribute	Description
`name`	This will always be `arc.fusion.content.tracing.total`.
`source`	The content source filename.
`query`	The JSON stringified query the content source was called with.
`rootId`	UUID indicating which request called the content source. All the content sources called within the same server-side render will have the same `rootId`. Each content source on the client side will have a different `rootId`. In the future, the `rootId` will be used to tie content sources and CDN requests together.
`traceId`	UUID indicating which content source made the HTTP call. All HTTP calls made within the same content source call will have the same `traceId`.
`value`	The amount of time the content source took (in milliseconds).