Skip to content

How to transform data to migrate between Arc XP organizations or environments

To transform and prepare data for migration between Arc XP organizations or environments, you can run the arc2arc_primer Python scripts locally. The scripts show the minimum required fields that need to be changed in an object’s ANS schema in order to be transferred through an API from one Arc XP organization or environment to another.

The scripts modify individual objects one at a time in your Arc XP organizations. The resulting transformations are either:

  • A dry-run, where no new object is created but you can view the new object’s ANS.
  • A live process that creates a new object in the target organization.

Both options report information to the terminal at the completion of the script, showing the changes that were made and items that may need further transformation.

The scripts are written as a functional story that reads the code and answers the question “What do I need to modify or be aware of when trying to move an object from one Arc XP organization to another?”.

The scripts are called a primer because the stories within the code are specific lessons that walk you through the transformation process.

Visit the arc2arc_primer scripts repository:

arc2arc_primer


Prerequisites

  • You must belong to two different Arc XP organizations (see What is the difference between an Arc XP organization, display name, and ID?) or just one organization if you want to move data from the Production to the Sandbox environment.

  • The organization from where content originates must have Migration Center active and enabled. As an alternative, you must modify the scripts to replace the Migration Center requests with requests to the correct Arc XP API for the type of ANS you want to move.

  • You need two Arc XP bearer tokens (see Using Developer Access Tokens to access the Arc XP APIs), one for each organization’s Production environment, or one each for the Production and Sandbox environments of a single organization.

The Python scripts in the primer each target a specific kind of Arc XP object. Depending on which object and script you want to run, you also need to have specific organization data ready to pass into script arguments, including:

  • A website id from the target organization.

  • A section id from the website.

  • The arc id of an object in the source organization.

You must create a virtual environment, installing the libraries in the provided requirements.txt file. You can run each script from the terminal command line or from within your Python IDE that you set up with access to the virtual environment.

The libraries used in the scripts reference built-in Python imports, external libraries installed through the requirements.txt file, or custom helper libraries that are included in the primer.

  • argparse - to set up command line arguments so each script can accept user information and send it to the script to use

  • dataclass - to create classes that are used only to collect contextual information and access it neatly, to ingest Migration Center data, and for some metrics reported after the data load is complete

  • jmespath - search utility to search through JSON data and build subsets of the data, used for the transformation of the ANS, and for assembling some metrics for reporting after the data load is complete

  • requests - used to contact Arc XP and collect ANS or send ANS for the extract and load processes

  • arc_endpoints - custom helper library containing a collection of Arc XP API endpoints; used for aspects of the extract and load processes

  • arc_id - custom helper library that generates an arc id value; used for aspects of the transformation processes

  • dist_ref_id - custom helper library that locates an Arc XP distributor or geographic restriction from the target organization, matching the one from the source object’s data, or attempt to generate a new target matching distributor or geographic restriction if none is found

Finally you will fork the primer scripts when you identify areas that can be modified to work for your organization’s data. The primer scripts have been written to work in the most generic implementation of what is possible in Arc XP. You will be able to identify if there are areas that don’t work for your situation as presented, analyze the causes, and change your forked version as needed.

Primer scripts

The primer scripts do the work of a very simple Extract Transform Load (ETL) pipeline. As such, they focus on only what is necessary to locate data, change data, load data, and, at the completion of the script’s run, display a report on the terminal showing some aspects of the data changed in the transformation or objects within the data that you might need to subsequently locate and transform to the target organization.

Each script has a main class object named Arc2Arc``Object or Arc2Sandbox``Object . The class object has multiple methods where the ANS transformations are written. Each method operates on a group of ANS properties where the transformations are similar or a single ANS field where the transformation is more complex. The class object also has methods to extract source data from the origin organization, validate the transformed ANS, and load transformed data to the target organization, as well as a method that applies a sequence to the application of the class methods, and returns display information to the terminal when the process is complete.

Each script is written to accept command line arguments and flags, which then are passed into the class object and determine aspects of the transformation process. One of these flags is —dry-run, which, when True, allows the script to apply the transformations and builds the return metrics without saving any changes to the target organization. Some scripts that operate on more than a single object at a time also have a —test-run flag, which accepts a number that halts the process after n iterations. If you want to test the behavior of these multiple-object scripts, you pass in both —dry-run 1 (True) and —test-run 5 (n=5 for five iterations).

You can read about the command line arguments each script uses in the script code and comments. Each script includes an example of how you can call it from the command line, for example:

python arc2arc/09_transform_author.py --from-org devtraining --to-org cetest --author-id KilgoreTrout --from-token token --to-token token --dry-run 1

After the script is finished, a successful primer script creates one new object in the target organization, except for the few scripts that operate on multiple objects at once. The original object remains in the source organization and is not deleted. In most cases, you can run an object through the script multiple times, if desired, and subsequent runs causes the same object in the target organization to be updated. One notable exception to this is when you run a video transform script, which does not allow you to update a video if it already exists in the target organization.

Video transfer script error

Story in the source organization, left. Same story in target organization after being run through the Arc2Arc Primer script. At completion, the story exists in the target organization but referenced objects in the new story still need to be ingested.

Script transformations and post-transform information

Most scripts employ a series of similar steps to transform the ANS so that you can ingest the ANS into a new environment.

  • Updating the ANS version value to 0.10.9, the most recent version value, with the exception of Video ANS

  • Updating ANS key values referencing the source organization ID, to be replaced with the target organization ID

  • Removing ANS keys that will fail validation (for example, some Photo Center IPTC-related fields)

  • Updating Photo Center ANS IDs when the target organization is different from the source organization: image, gallery, lightbox objects

  • Updating Distributor IDs and Geographic Restriction IDs in all cases

  • Updating Website and Section IDs when the target organization is different from the source organization and the script arguments are provided

  • Rewriting object references to ANS reference syntax when necessary

As well, most scripts display information in the terminal at completion. The first to display is a references object, which shows relevant document references in the target and source objects. Relevant document references are those that need to be additionally ingested to support the complete rendering of the parent object, or those that could have potentially been changed from the source object to the target object. The references that display to the terminal, if produced at all, differ in their contents between each script in the primer. For example, the scripts that transform from a Production environment to a Sandbox environment don’t show website and section information in references because the Production-to-Sandbox scripts assume no changes there.

Script results

Because some properties of the references object show items with changed IDs from the target to the source organization, these properties contain an additional value showing the target and source organization IDs; the key name being the organization ID value from the source org while the key value is the value from the target org.

In the previous image, the references.galleries property shows a gallery in devtraining (VIWRUMHQTRHQNMN5SKLXXATILY) organization and shows the ID it will have when it is moved to the cetest (LDDAPQ3OAXXSTZ5RZOYGM3Z4D4) organization. The cetest gallery was not ingested. However, within the cetest story, the reference to the gallery was re-written to use this new Arc ID. After the gallery is ingested separately, using a script from the primer, its ID is the same as the one shown in the references object. The script from the primer for ingesting galleries to new organizations uses the same code to set the gallery’s new ID on cetest that was used in the story ingestion script to set the gallery’s new ID in the story’s reference.

The references.videos property shows two video IDs, which are or will be the same on both the source and target organizations, after the video is ingested separately. The script from the primer for ingesting videos to new organizations does not rewrite the video’s ID, just as the script for ingesting a story did not rewrite the video ID in the story’s reference.

The converted ANS of the target object appears in the terminal as well, after the references. When a script in the primer operates on multiple items, rather than one single conversion, there is neither a references object nor converted ANS for display in the terminal.

Primer script inventory

This section contains a list of primer scripts along with a description of what that script does.

  • 01_transform_story.py - Transform one story ANS using its Arc ID, from one Arc XP organization to a second Arc XP organization, in the Production environment.

  • 02_transform_story_to_sandbox.py - Transform one story ANS using its Arc ID, from an Arc XP organization’s Production environment to its Sandbox environment.

  • 03_transform_video.py - Transform one video ANS using its Arc ID, from one Arc XP organization to a second Arc XP organization, in the Production environment.

  • 04_transform_video_to_sandbox.py - Transform one video ANS using its Arc ID, from an Arc XP organization’s Production environment to its Sandbox environment.

  • 05_transform_gallery.py - Transform one gallery ANS using its Arc ID, from one Arc XP organization to a second Arc XP organization, in the Production environment. During this process, the gallery Arc ID changes to a new value. It is not possible to have Photo Center Arc IDs that are the same between different organizations. This restriction applies only to Photo Center objects.

  • 06_transform_gallery_to_sandbox.py - Transform one gallery ANS using its Arc ID, from an Arc XP organization’s Production environment to its Sandbox environment.

  • 06_transform_gallery_to_sandbox.py -Transform one gallery ANS using its Arc ID, from an Arc XP organization’s Production environment to its Sandbox environment.

  • 07_transform_image.py - Transform one image ANS using its Arc ID, from one Arc XP organization to a second Arc XP organization, in the Production environment. During this process, the image Arc ID changes to a new value. It is not possible to have Photo Center Arc IDs that are the same between different organizations. This restriction applies only to Photo Center objects.

  • 08_transform_image_to_sandbox.py - Transform one image ANS using its Arc ID, from an Arc XP organization’s Production environment to its Sandbox environment.

  • 09_transform_author.py - Transform one author object using its author ID, from one Arc XP organization to a second Arc XP organization, in the Production environment.

  • 10_transform_authors_all.py - Transform all author objects from one Arc XP organization to a second Arc XP organization, in the Production environment.

  • 11_transform_redirects_all.py - Transform all document redirects from one Arc XP organization to a second Arc XP organization, in the Production environment. This script works well for story document redirects, but is potentially problematic for video or gallery document redirects.

  • 12_transform_lightbox.py - Transform one lightbox using its lightbox ID from one Arc XP organization’s Production environment to a target organization’s Production environment.

  • 13_transform_collection.py - Transform one collection using its ANS ID from one Arc XP organization’s Production environment to a target organization’s Production environment.

  • arc_endpoints.py - Access methods that extend the Arc XP APIs so they can be more easily used from within the primer scripts.

  • arcid.py - Create a new Arc ID, used from within the transformation scripts.

  • dist_ref_id.py - Create new distributors and geographic references, used from with the transformation scripts.