Syncing Content Between Stages

This is not a beginner tutorial. Incorrect use of the following features could cause irreparable damage to your project, resulting in permanent data loss.

What is NOT a good use-case for Stage Syncing

Stages are not meant to be used as a replacement for content versioning (a feature that is in active development on our roadmap.) The ideal use case for syncing content is when there is new content added to the Master stage that the developers are wanting to test in the modified schema of a child stage. In rare cases, when content is changed in a child stage, content syncing can also work in reverse, syncing those changes back to Master.

Understanding the parts

In terms of a traditional database, your content is comprised of a collection of entries, or nodes, in a master/backing database. Your stages are replications of that content. When you want to sync stages, the moving parts of your project are:

  • The individual content entries and any model changes to the schema fields.
  • Images (handled via our CDN and as such, need special treatment to keep IDs consistent)
  • Relationships between the individual nodes, which are themselves, a node.

The Plan of Action

We'll handle our sync in a three step process. Each step is composed of two parts. First we will query our source endpoint for content entries. Then we will post the results to our destination endpoint using an upsert mutation which gives us the benefit of being able to update existing records or insert new ones.

One of the primary reasons that we work via our own mutation API instead of using the import API for everything is that it gives us the opportunity to edit our data "in transition" or "in-flight" between our two endpoints. If we are writing content that needs a sensible default (in the case of a newly modified "required" field) or whatever else we might come across, this will allow us to handle those cases as they come up.

We will use the import API for syncing our assets because there's an inherent round-trip involved. The physical assets are stored in a "per-stage" bucket at our CDN. When a stage is deleted, the bucket is destroyed and the physical assets backing the image is gone. To sync across stages, we first need to check if an asset exists in our destination stage. If the asset exists, we can simply update the content. If it does not, we need to upload the asset to the correct destination bucket at the CDN, return the data, associate the new CDN asset ID with the existing asset ID and then import that into our destination stage.

Getting Started

It is most likely you are here regarding your own, existing project, and not just looking for needless pain points to handle. If you are that sort of person, then you can follow along with this tutorial by creating a new project from template, being sure to choose one of the trial plans that allow for multiple stages.

Creating Tokens

We are going to need to create some import, export and permanent auth tokens as we build out our utilities. You can create your tokens from the settings panel. I've created tokens for both import and export from our various stages to support content manipulation in either direction, you can decide if this is necesary for your configuration. See the video below for guidance.

Creating a Clone of our Stage

We also need to create a clone of our master stage to clone to/from. You can achieve this by clicking on the stage icon below your project icon in the sidebar. See the video below for guidance.

Fetching our Data

With all the stages, tokens and endpoints, we need to add these to our environment variables. Fortunately, we can query this data directly from GraphCMS with little effort.

First, navigate to the built-in API Explorer. The reason for using the built-in API Explorer is that we will be taking advantage of the Management API, which requires the Management API Token, and that is not something we currently make easily accessible (though it's buried in your browser's local state).

Next, we'll use the following query:

{
  viewer {
    # You need to use your own Project ID here.
    project(id: "13f2202a1e9040409da278b2f99fbbb1") {
      stages {
        name
        endpoint
        assetConfig {
          apiKey
        }
        permanentAuthTokens {
          name
          token
        }
        systemTokens {
          name
          token
        }
      }
    }
  }
}

Your result will be similar to the following:

Note, tokens do not exist in the following format eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.... This is done for readability purposes. Also note, this token is revoked and you should never store your tokens where others might be able to see it, including public/private repositories.

{
  "data": {
    "viewer": {
      "project": {
        "stages": [
          {
            "name": "master",            "endpoint": "https://api-euwest.graphcms.com/v1/ck1rvohw10be401df2et44uhy/master",
            "assetConfig": {
              "apiKey": "AHo0DbFvQ7OVXZEtn12H2z"
            },
            "permanentAuthTokens": [
              {
                "name": "Stage Sync PAT",
                "token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9..."
              }
            ],
            "systemTokens": [
              {
                "name": "Import System Token",
                "token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9..."
              },
              {
                "name": "Export System Token",
                "token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9..."
              }
            ]
          },
          {
            "name": "delta",            "endpoint": "https://api-euwest.graphcms.com/v1/ck1rvohw10be401df2et44uhy/delta",
            "assetConfig": {
              "apiKey": "AXd3cyl2KQi2aJVwb9IgQz"
            },
            "permanentAuthTokens": [
              {
                "name": "Delta PAT",
                "token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9..."
              }
            ],
            "systemTokens": [
              {
                "name": "Delta Import Token",
                "token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9..."
              },
              {
                "name": "Delta Export Token",
                "token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9..."
              }
            ]
          }
        ]
      }
    }
  }
}

Mapping the Response to our .env File

Once we've fetched our data, we are going to store the results into our environment variables so that we can use them in a secure way throughout our import/export scripts. Using the JSON response object above, we'd map the following values to a .env file.

# Project API ID
GCMS_PROJECT_ID=13f2202a1e9040409da278b2f99fbbb1

# Source API keys

GCMS_SOURCE_STAGE_NAME=delta

GCMS_PAT_SOURCE=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9... # Delta PAT

GCMS_SYSTEM_SOURCE_EXPORT=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9... # Delta Export Token
GCMS_SYSTEM_SOURCE_IMPORT=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9... # Delta Import Token

GCMS_FILESTACK_SOURCE=AXd3cyl2KQi2aJVwb9IgQz # apiKey from delta

# Destination API Keys

GCMS_DEST_STAGE_NAME=master

GCMS_PAT_DEST=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9... # Stage Sync PAT

GCMS_SYSTEM_DEST_IMPORT=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9... # Import System Token
GCMS_SYSTEM_DEST_EXPORT=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9... # Export System Token

GCMS_FILESTACK_DEST=AHo0DbFvQ7OVXZEtn12H2z # apiKey from master

Changing Content + Gotchas

As mentioned above, syncing stages should not be seen as a hack for content stages. That is a feature that is in development and will be landing soon. However, sometimes you update content in the master stage and want to see how it behaves in a modified schema, or you have situations where you needed to transform (split, reduce, change type, etc) data in the clone and now need to sync that back to the master. It's important to be aware of the gotchas, however.

Gotchas

Required Fields Gotcha

When preparing to sync content in either direction, we need to be aware of the gotchas. It's currently not possible to sync newly created fields with required constraints. It makes sense, too. If there's a new field being added to an existing set of nodes, and that field requires content, but doesn't have any, it will cause an error. If you need to add a required field to the schema, first change the field back to non-required, sync the schema, update the content, and then change the schema settings back to required.

Syncing Assets Gotcha

Each stage has its own asset handling API. Because we work with FileStack as our image CDN, each stage has its own API key for its own image bucket. The images have their own, unique handles/IDs in FileStack, then have an additional associated ID that is persisted in GraphCMS.

Setting up the Files

Because we will be reading to and from various endpoints, we are going to create a fetch utility library that will help us throughout our scripting. The file will look something like this:

require('dotenv').config();
import axios from 'axios';
// Be sure to use your own project ID here.
const PROJECT_ID = '13f2202a1e9040409da278b2f99fbbb1';

const API = (stage = 'master') =>
  `https://api-euwest.graphcms.com/v1/${PROJECT_ID}/${stage}`;

// Add your own stage names here.
const SOURCE_STAGE_API = API('delta');
const DEST_STAGE_API = API('master');

const axiosCreate = (url, key) =>
  axios.create({
    baseURL: url,
    headers: {
      Authorization: `Bearer ${process.env[key]}`,
      'Content-Type': 'application/json',
    },
  });

const sourceAxios = axiosCreate(SOURCE_STAGE_API, 'GCMS_PAT_SOURCE');
const destAxios = axiosCreate(DEST_STAGE_API, 'GCMS_PAT_DEST');

const destAxiosFileStack = axiosCreate(
  `https://www.filestackapi.com/api`,
  'GCMS_FILESTACK_DEST'
);

const sourceAxiosFileStack = axiosCreate(
  `https://www.filestackapi.com/api`,
  'GCMS_FILESTACK_SOURCE'
);

const destAxiosImport = axiosCreate(
  DEST_STAGE_API + '/import',
  'GCMS_SYSTEM_DEST_IMPORT'
);

const destAxiosExport = axiosCreate(
  DEST_STAGE_API + '/export',
  'GCMS_SYSTEM_DEST_EXPORT'
);

const sourceAxiosImport = axiosCreate(
  SOURCE_STAGE_API + '/import',
  'GCMS_SYSTEM_SOURCE_IMPORT'
);

const sourceAxiosExport = axiosCreate(
  SOURCE_STAGE_API + '/export',
  'GCMS_SYSTEM_SOURCE_EXPORT'
);

export {
  sourceAxios,
  destAxios,
  destAxiosFileStack,
  sourceAxiosFileStack,
  destAxiosImport,
  destAxiosExport,
  sourceAxiosImport,
  sourceAxiosExport,
};

Here we've created a handful of constructors that will let us work with our endpoints and consume our various authorization tokens through the dotenv library. The rough pattern we follow is creating a fetch utility with axios that passes in both the endpoint URL and the tokens for the various services.

Writing our Queries

For each of our content models we need read queries and upsert mutations. Our model in the demo example includes three types called Destination, Hotel and Review. We'll create these queries.

There are a number of ways to get a listing of the fields for each of these models when writing our queries. It would be possible to create a programmatic method that would introspect our models and generate the queries for us. Assuming we know which fields would need to be updated in a model, we can save bits over the wire by doing custom queries, enabling us to also transform that content on the fly, and so we'll be hand-crafting our queries.

For each of our models we will write a query and a mutation.

Query Sample

query ReadHotels($first: Int, $skip: Int) {
  pages: hotels(first: $first, skip: $skip, orderBy: createdAt_DESC) {
    id
    updatedAt
    name
  }
}

Mutation Sample

mutation UpsertHotel($id: ID!, $status: Status, $name: String!) {
  upsertHotel(
    where: { id: $id }
    create: { status: $status, name: $name }
    update: { status: $status, name: $name }
  ) {
    id
  }
}

Synchronize the Stage Schemas

For our example, I added an additional field to the Hotel model definition in the Delta schema. I've added a field for a Twitter handle, as well as some corresponding data. Now we need to sync that back up to Master.

Additionally, I've added a new asset to the Delta stage which will need to be added back to Master as well.

When we return to where we cloned our stage, we'll now see that there's a notification of the schema differences between our two stages.

Pending schema changes

Press the "sync" button below the stage displaying the number of differences. You will be asked to confirm the sync and then the schema's will be reconciled.

This is important **BEFORE** any content related to differing schema's can take place otherwise the import scripts will error.

Let us Turn to Code

The majority of the code base is composed of helper utilities to handle features such as pagination, batch processing, fetching with auth tokens, compiling ES5 and error handling. You can have a thorough look through the code repository on Github, and are encouraged to do so to modify the code for your individual needs.

The file we want to look at in particular is the primary orchestration layer of our batching processes. The general structure of our file is:

  1. Import helpers
  2. Import our queries
  3. Construct our processing functions as a product of querying a batch of entries, looping through them and writing them into the destination stage.
  4. Run our processing functions in serial (to guard against rate-limitations) and catch/report errors.
import { buildPagination } from './utils/buildPagination';
import { buildAssetBatch } from './utils/buildAssetBatch';
import { buildBatch } from './utils/buildBatch';
import { buildIO } from './utils/buildIO';
import { reportErrors } from './utils/errors';

// Hotels
import readHotelsQuery from './queries/readHotelsQuery.graphql';
import upsertHotelMutation from './queries/upsertHotelMutation.graphql';

// Destination
import readDestinationsQuery from './queries/readDestinationsQuery.graphql';
import upsertDestinationMutation from './queries/upsertDestinationMutation.graphql';

// Destination
import readReviewsQuery from './queries/readReviewsQuery.graphql';
import upsertReviewMutation from './queries/upsertReviewMutation.graphql';

// Assets
import readAssetQuery from './queries/readAsset.graphql';
import readDestAssetQuery from './queries/readDestAsset.graphql';
import upsertAssetQuery from './queries/upsertAsset.graphql';

// Relationships
import { importBatch } from './utils/importBatch';

// Destinations
const writeDestination = buildBatch(upsertDestinationMutation);
const processDestinations = buildPagination(
  readDestinationsQuery,
  writeDestination
);

// Hotels

/*
  Optionally pass a transformation function
  that accepts the expected payload shape and
  returns the desired shape.
*/
const writeHotel = buildBatch(upsertHotelMutation /* transform function */);
const processHotels = buildPagination(readHotelsQuery, writeHotel);

// Reviews
const writeReview = buildBatch(upsertReviewMutation);
const processReviews = buildPagination(readReviewsQuery, writeReview);

// Assets
const writeAssets = buildAssetBatch(readDestAssetQuery, upsertAssetQuery);
const processAssets = buildPagination(readAssetQuery, writeAssets);

// Relations
const importRelations = importBatch({ valueType: 'relations' });
const processRelations = buildIO('relations', importRelations);

const run = async () => {
  try {
    // Sync Assets
    console.log('Processing Assets...');
    // await processAssets()
    reportErrors();

    // Sync Hotels
    console.log('Processing Hotels...');
    await processHotels();
    reportErrors();

    // Sync Destinations
    console.log('Processing Destinations...');
    await processDestinations();
    reportErrors();

    // Sync Reviews
    console.log('Processing Reviews...');
    await processReviews();
    reportErrors();

    // Sync Relations
    console.log('Processing Relations...');
    await processRelations();
    reportErrors();
  } catch (error) {
    console.log(error);
  }
};

try {
  console.log('Get Ready!');
  run();
} catch (e) {
  console.log('Server Error!', e);
}

To use this example, you need to add the .env file from above into a clone of the Github repo.

In summary, syncing content between the stages is little more than querying the data from one stage, transforming it if needed and writing it into a different stage. The tricky components include syncing the assets where they first need to be duplicated in an external CDN before matching the new CDN ID with the existing asset ID in GraphCMS.

Again, this is an advanced topic. If you have any questions please reach out to us in our community Slack and we'll be happy to help!