How to Request Tracing in Content Sources with PageBuilder Engine
Request Tracing allows developers to be able to see all the HTTP requests their content sources are making. For simple content sources, this isnβt hard to do this manually. However, with complex contents sources, or ones using external packages, it can be difficult or impossible to manually trace. Additionally, Engine provides caching for content sources, but there was no way for a developer to be able to check and see if the cache was used. Request tracing provides a solution for both of these problems enabling developers to monitor and understand caching and network behavior while their pages render or content sources execute individually.
Previously, this was only available locally using a Custom Debugging Image. It is now integrated into Engine and can be used in all environments. There are a few differences when using it locally:
-
You no longer need to have a different
FUSION_RELEASE
version set in your.env
file. -
Content sources will not be slower, but the order is no longer guaranteed for non-global content source calls. The global content source (when provided) will still be the first one.
-
Calls will no longer show up twice due to having 2 passes for the server-side render.
-
The structure of the data in the JSON file is different. Weβll talk more about this in the
Using Request Tracing Locally
section. -
Headers are added to each HTTP call made within a content source. Weβll talk more about this in the
Using Request Tracing Without Logs
section.
Request tracing uses the experimental async_hooks
functionality in node. Turning it on does have a performance penalty so we are making it opt-in while we evaluate the impact.
Engine encodes invalid header values (like non-English characters) using the encodeURI
function.
Using request tracing without logs
Request tracing is off by default in production environments. In order to enable it, you will need to set FUSION_ENABLE_TRACING=true
in your environment file.
In production environments, no logs will be generated for request tracing. Instead, we will be relying on headers that are injected into each HTTP request. These headers are also injected locally and on non-production environments, but those environments have logs which will contain the same information. The following headers are injected:
Header | Description |
---|---|
x-amzn-trace-id | A string in the format Root=${rootId};Self=${traceId} . rootId is a UUID indicating which request called the content source. Currently, it is generated at Engine, but in the future we will be using it to tie CDN requests to content sources. traceId is a UUID indicating which content source made the HTTP call. The same traceId will be set for all HTTP calls made within the same content source call. |
x-fusion-source | The content source filename. |
x-fusion-query | The JSON stringified query the content source was called with. |
x-fusion-deployment | The deployment id. |
x-arc-org | The id of the environment that made the request. |
x-arc-site | The id of the site that made the request. |
x-fusion-cache | The name provided to a CachedCall Function. Only available on calls to cache proxy. |
The rootId
is also included other content related logs, like content size, content latency, cache backoff, etc. You can use a rootId
to tie together data in your forwarded logs and the endpoints called by your content sources.
Using request tracing locally
Request tracing is an opt in feature. In order to enable it, you will need to set FUSION_ENABLE_TRACING=true
in your .env
file.
If you havenβt set DEBUG
variable in your .env
file before, you need to adjust this to make these event logs appear in your command line output. You can either set DEBUG=arc.fusion.content.tracing.*
to only see the request tracing logs or set DEBUG=arc.fusion.content.*,arc.fusion.render.error
to see both the request tracing logs and the server-side rendering error logs.
When enabled, youβll start seeing these objects printed in your command line. They describe different event and calls to PageBuilder cache service or your network calls (wether Arc APIs or external APIs):
Collect calls in calls.json
file
In order to generate the calls.json file locally, youβll need to add FUSION_GENERATE_TRACING_JSON=true
in your .env
file.
To generate data, go to any page/template (either in Editor or published). Once it is loaded, you can go to http://localhost/pf/dist/calls.json
also works if you were following the custom debugging image guide) to see the call data. If you have a custom JSON parser, you can also directly get the file by going to .fusion/dist/calls.json
. A few things to keep in mind:
-
The data is always appended to the file. The file can become quite large if you are using a lot of content sources.
-
The file is deleted each time PageBuilder Engine restarts. If you want to clear it sooner, youβll need to delete the file manually.
-
If you want to clear your content cache, you will need to restart the content-cache docker container or restart PageBuilder Engine. This will not clear the calls.json file.
The data in calls.json will be an array of call objects. Each call object consists of the following:
Attribute | Description |
---|---|
source | The content source filename. |
query | The query the content source was called with. |
rootId | UUID indicating which request called the content source. All the content sources called within the same server-side render will have the same rootId . Each content source on the client side will have a different rootId . In the future, the rootId will be used to tie content sources and CDN requests together. |
duration | The amount of time the call took (in milliseconds). |
calls | An array of objects showing the HTTP calls made within a content source. This includes the calls made to the cache proxy. Each one will have the url , method , cache (the name value a CachedCall Function was called with), statusCode , and duration (in milliseconds). The cache value is only available during a call to cache proxy. If it matches the source name, it is usually the parent content sourceβs cache. Otherwise it is the cache for a HTTP call made within the fetch content source. If there is no statusCode or duration , that means the request was terminated. This is usually due to a content source taking more than 5 seconds. |
Parsing calls.json
to a nice table view in command line
Engine generates calls.json in json format to allow customer developers to programmatically process, build their own tooling to monitor, investigate the content source network activity. But in a lot of cases, you just want to view this activity in a simple easy to read format, like a list or table view to quickly take a glance of what happened.
You can use jq
and tty-table
command line tools with the sample command below to print the calls from each content source execution in a nice table format in your command line.
Make sure you install jq
, and tty-table
from your package managers (i.e: brew install jq
and npm install -g tty-table
). Then, after making some page renders or content source executions, run the following command in your project root:
(echo "#,Source,Call,Method,Cache,Status,Duration,URL" && jq -r ' to_entries[] as $top | $top.value.calls[] as $call | [ if (( ($top.value.calls | index($call)) + 1 ) > 1) then "" else ($top.key + 1) end, if (( ($top.value.calls | index($call)) + 1 ) > 1) then "β β β³" else $top.value.source end, (( ($top.value.calls | index($call)) + 1 ) | tostring), $call.method, $call.cache, $call.statusCode, $call.duration, $call.url ] | @csv' .fusion/dist/calls.json) | tty-table --delimiter ',' --options-align=left --options-compact=true
Here is a sample output this command will print out.
βββββ¬βββββββββ¬βββββββ¬βββββββββ¬ββββββββββββ¬βββββββββ¬βββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # β Source β Call β Method β Cache β Status β Duration β URL ββ 1 β city β 1 β GET β city β 200 β 91.716459 β http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7 ββ 2 β city β 1 β GET β city β 404 β 71.46229199999999 β http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7 ββ β β³ β 2 β GET β get-story β 200 β 10.363208 β http://cache-proxy/cache?key=2:0803:f37fba...5c2a2f5464a4 ββ β β³ β 3 β GET β get-time β 200 β 3.2725 β http://cache-proxy/cache?key=2:0803:2df5ce...2a36c2cb4f5a ββ β β³ β 4 β PUT β city β 201 β 1.930458 β http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7&ttl=259200 ββ 3 β city β 1 β GET β city β 404 β 84.59975 β http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7 ββ β β³ β 2 β GET β get-story β 200 β 10.557209 β http://cache-proxy/cache?key=2:0803:f37fba...5c2a2f5464a4 ββ β β³ β 3 β GET β get-time β 200 β 3.3147089999999997 β http://cache-proxy/cache?key=2:0803:2df5ce...2a36c2cb4f5a ββ β β³ β 4 β PUT β city β 201 β 2.0747079999999998 β http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7&ttl=259200 ββ 4 β city β 1 β GET β city β 404 β 70.877292 β http://cache-proxy/cache?key=2:0803:8f685b...1194b41ee4e7 ββ β β³ β 2 β GET β get-story β 404 β 11.359874999999999 β http://cache-proxy/cache?key=2:0803:f37fba...5c2a2f5464a4 ββ β β³ β 3 β GET β β 200 β 2092.765585 β https://jsonplaceholder.typicode.com/posts/1 ββ β β³ β 4 β PUT β get-story β 201 β 2.677375 β http://cache-proxy/cache?key=2:0803:f37fba...c2a2f5464a4&ttl=259200 ββ β β³ β 5 β GET β get-time β 404 β 5.2153339999999995 β http://cache-proxy/cache?key=2:0803:2df5ce...a36c2cb4f5a ββ β β³ β 6 β GET β β 200 β 2544.083917 β https://timeapi.io/api/Time/current/zone?timeZone=America/New_York ββ β β³ β 7 β PUT β get-time β 201 β 3.656084 β http://cache-proxy/cache?key=2:0803:2df5c...c2a36c2cb4f5a&ttl=259200 ββ β β³ β 8 β PUT β city β 201 β 2.610375 β http://cache-proxy/cache?key=2:0803:8f685...51194b41ee4e7&ttl=259200 ββ 5 β city β 1 β GET β city β 404 β 77.35270899999999 β http://cache-proxy/cache?key=2:0803:8f685...51194b41ee4e7 ββ β β³ β 2 β GET β get-story β 200 β 10.524 β http://cache-proxy/cache?key=2:0803:f37fb...f5c2a2f5464a4 ββ β β³ β 3 β GET β get-time β 200 β 3.4125419999999997 β http://cache-proxy/cache?key=2:0803:2df5c...c2a36c2cb4f5a ββ β β³ β 4 β PUT β city β 201 β 2.03225 β http://cache-proxy/cache?key=2:0803:8f685...51194b41ee4e7&ttl=259200 ββββββ΄βββββββββ΄βββββββ΄βββββββββ΄ββββββββββββ΄βββββββββ΄βββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This table columns can be customized with modifying the jq
command which is easy to read. You can also ask an AI tool to modify the command with providing a sample object from calls.json
While youβre testing and developing your content sources, calls.json
file may get long and the output of this command may get really long. You can always flush the call.json with removing it rm .fusion/dist/calls.json
occasionally.
Using Request Tracing through Forwarded Logs
Request tracing is off by default in non-production environments. In order to enable it, you will need to set FUSION_ENABLE_TRACING=true
in your environment file.
In non-production environments, the request tracing logs will be available in the forwarded logs for engine (not compiler or resolver). We only generate the calls.json file locally, so you will need to parse your logs in order to get the same data in a deployed environment. If you do not have log forwarding enabled, you can request it through a support ticket or only do your request tracing locally. The data for each content source call is split across three types of log statements: one or more start objects, one or more end objects, and one total object. If you see a start but not an end, that means the lambda was terminated, usually due to the content source time limit being hit. If the time limit wasnβt hit, the total will contain duration for the entire content source call.
The start object will have the following:
Attribute | Description |
---|---|
name | This will always be arc.fusion.content.tracing.start . |
source | The content source filename. |
query | The JSON stringified query the content source was called with. |
rootId | UUID indicating which request called the content source. All the content sources called within the same server-side render will have the same rootId . Each content source on the client side will have a different rootId . In the future, the rootId will be used to tie content sources and CDN requests together. |
traceId | UUID indicating which content source made the HTTP call. All HTTP calls made within the same content source call will have the same traceId . |
url | The URL used for the HTTP call. |
method | The method used for the HTTP call. |
value | This will always be 1 . |
cache | The name provided to a CachedCall Function. Only available on calls to cache proxy. |
The end object will have the following:
Attribute | Description |
---|---|
name | This will always be arc.fusion.content.tracing.end . |
source | The content source filename. |
query | The JSON stringified query the content source was called with. |
rootId | UUID indicating which request called the content source. All the content sources called within the same server-side render will have the same rootId . Each content source on the client side will have a different rootId . In the future, the rootId will be used to tie content sources and CDN requests together. |
traceId | UUID indicating which content source made the HTTP call. All HTTP calls made within the same content source call will have the same traceId . |
URL | The URL used for the HTTP call. |
method | The method used for the HTTP call. |
status | The status code for the HTTP callβs response. |
value | The amount of time the call took (in milliseconds). |
x-amz-request-id | The value of the x-amz-request-id header in the HTTP callβs response. Only included when the response has this header set. |
cache | The name provided to a CachedCall Function. Only available on calls to cache proxy. |
The total object will have the following:
Attribute | Description |
---|---|
name | This will always be arc.fusion.content.tracing.total . |
source | The content source filename. |
query | The JSON stringified query the content source was called with. |
rootId | UUID indicating which request called the content source. All the content sources called within the same server-side render will have the same rootId . Each content source on the client side will have a different rootId . In the future, the rootId will be used to tie content sources and CDN requests together. |
traceId | UUID indicating which content source made the HTTP call. All HTTP calls made within the same content source call will have the same traceId . |
value | The amount of time the content source took (in milliseconds). |