Pharse (Memsource) integration

Overview Phrase Integration

We send various types of entities to Phrase, including:

  • Page
  • Block
  • Menu
  • All Menus
  • FAQ
  • Page Tags
  • Story Lists
  • Stories
  • Resource Filters
  • Media Types
  • Others

Each entity type may require specific identifiers, such as:

  • block_type_key (for page blocks) – Used to retrieve and match the correct block in the CMS for translation.
  • ID (for menus, menu items, and various other entities).

Creating a Project

To create a Phrase project, we send a request to:

POST /api2/v2/projects/applyTemplate/{template}
{
    "name": "Project Name",
    "sourceLang": "en",
    "targetLangs": ["fr", "de", "es"],
    "note": "Project description or notes"
}

Storing Project Data

Once the project is created, its UID is stored in the database. For certain cases (e.g., pages), we also store the current state of the page with this project.

Uploading Translations to Phrase

After creating a project, we generate a JSON file containing texts in base language and send it to Phrase.

POST /api2/v1/projects/{project_uid}/jobs

This request uploads the translation jobs to the project.
Once completed, we store the project jobs, workflow step, and status.

In most cases (except page entity) JSON has next structure:

{
  "id": ENTITY_ID,
  "translations": {
    "title": TEXT,
    "description": MARKDOWN
  }
}

Parameters Sent to Pharse for Page Entity

When the page entity is sent to Phrase for translation, the following JSON structure is used. This includes various attributes and child entities such as blocks and translations.

Main Structure for page entity

The main structure of the request includes the page ID, screenshots, page URL, blocks, and translations:

{
  "id": "PAGE_ID",
  "screenshot": {
    "screenshotUri": "SCREENSHOT_URL"
  },
  "page_url": {
    "screenshotUri": "PAGE_URL"
  },
  "blocks": [
    {
      "id": "BLOCK_ID",
      "block_config_key": {
        "BLOCK_CONFIG_KEY": ""
      },
      "values": [
        {
          "BLOCK_TYPE_FIELD_KEY": "BLOCK_TYPE_FIELD_VALUE"
        }
      ],
      "block_extras": [{
        "id": "EXTRA_BLOCK_ID",
        "values": [
          {
            "EXTRA_BLOCK_TYPE_FIELD_KEY": "EXTRA_BLOCK_TYPE_FIELD_VALUE"
          }
        ],
      }],
      "children": [
        {
          "id": "CHILD_BLOCK_ID",
          "block_config_key":  {
            "CHILD_BLOCK_CONFIG_KEY": ""
          },
          "values": [
            {
              "BLOCK_TYPE_FIELD_KEY": "BLOCK_TYPE_FIELD_VALUE"
            }
          ],
          "block_extras": {
            "child_extra_key": "child_extra_value"
          }
        }
      ]
    }
  ],
  "translations": {
    "title": {
      "text": "Translated Title",
      "screenshotUri": "No strict character limit, but try to keep length the same as English",
      "maxlen": null
    },
    "subtitle": {
      "text": "Translated Subtitle",
      "screenshotUri": "Max length = 140 characters (Strict limit!) Careful, subtitles can be divided into several segments and spaces between segments also count as characters.",
      "maxlen": 140
    },
    "content": {
      "text": "Translated Content",
      "screenshotUri": null,
      "maxlen": null
    },
    "meta_title": {
      "text": "Translated Meta Title",
      "screenshotUri": "Ideal length is 50-60 characters (not a strict limit)",
      "maxlen": null
    },
    "meta_description": {
      "text": "Translated Meta Description",
      "screenshotUri": "Ideal length is 155-160 characters (not a strict limit). Careful, they can be divided into several segments and spaces between segments count as a character",
      "maxlen": null
    },
    "location": {
      "text": "Translated Location",
      "screenshotUri": null,
      "maxlen": null
    }
  }
}

Breakdown of Parameters

The JSON structure contains several key elements representing the page entity, including metadata, translation data, and blocks with their respective values and extras. Below is a breakdown of each key component:

1. id (string)

  • Description: Unique identifier for the page.

2. screenshot (object)

  • Description: Contains context information related to the screenshot for the page.
    • screenshotUri (string): The URL or path to the screenshot associated with the page.

3. page_url (object)

  • Description: Contains the URL of the page.
    • screenshotUri (string): The public URL for the page.

4. blocks (array)

  • Description: A list of page blocks with additional fields and children blocks. Each block contains values and extra fields.
  • block_config_key (object): Contains the configuration key(s) for the block.
  • values (array): Contains the field values for the block.
  • block_extras (array): Extra information related to the block.
  • children (array): List of child blocks, which can be nested inside the parent block.

5. translations (object)

  • Description: Contains the translations for various fields related to the page. Each translation includes the translated text, a context note, and the maximum allowed length for the field.
    • text (string): The translated text content for the field.
    • screenshotUri (string): A context note or description for the field, such as character limits.
    • maxlen (integer or null): The maximum character length allowed for the field. If no limit is defined, this field is null.

Notes:

  • The translations section contains text for various fields like title, subtitle, content, and meta_description. Each field may also include a description (screenshotUri) explaining the expected length or any special notes for translation.
  • The blocks field contains structured data for each block in the page, including child blocks, configuration keys, field values, and extra information. Each block can have its own set of values and additional details.
  • The maxlen attribute in the translations section provides information on character length restrictions for fields like subtitle and meta_description. If no length limit is specified, the field will be null.

Markdown Formatting for Phrase Translation

When sending Markdown content to Phrase, certain elements need to be replaced to prevent translation issues and maintain Markdown integrity after translation. Below are the replacements applied in the provided PHP methods.

Replaced Markdown Elements

Original Markdown Replacement for Phrase Purpose
Two spaces before a newline ( \n) <newlinespaced> Prevents Phrase from altering intentional line breaks.
=== at the start of a line <h1lined> Ensures H1 underline syntax remains intact.
--- at the start of a line <h2lined> Ensures H2 underline syntax remains intact.
Non-breaking space (\xc2\xa0) " " (regular space) Avoids encoding issues.
Line break (\n) <newline> Ensures line breaks are preserved.
##### (H5 header) <h5> Prevents Markdown from being altered in translation.
#### (H4 header) <h4> Same as above.
### (H3 header) <h3> Same as above.
## (H2 header) <h2> Same as above.
# (H1 header) <h1> Same as above.
- (Unordered list) <ul> Prevents Phrase from breaking list structure.
**** (Bold-Italic emphasis) "" (removed) To prevent errors in interpretation.
** (Bold) <b> Avoids translation breaking the bold syntax.
_ (Italic) <it> Prevents single _ from being misinterpreted.

Handling Links and Images

Links and images require special handling to avoid breaking their syntax in Markdown.

Original Markdown Replacement for Phrase Example
[Title](https://example.com) [https://example.com]<urlquote>Title<urlquoteclose> Prevents Markdown syntax from being altered.
![Alt Text](https://example.com/image.png) <img>[https://example.com/image.png]<urlquote>Alt Text<urlquoteclose> Ensures image syntax remains intact after translation.

These replacements ensure that the translated text remains valid Markdown and doesn't break formatting when re-integrated into the CMS.

Replacing Formatted Elements from Phrase Back to Markdown

When translated text is returned from Phrase, it contains special placeholder tags used to maintain Markdown formatting during translation. This process replaces those placeholders back to valid Markdown syntax before storing the content in the CMS.


Replacement Logic.

Convert Special Tags Back to Markdown

The function restores the original Markdown syntax using a predefined set of replacements:

Placeholder Replaced With
<h5> ##### (H5 heading)
<h4> #### (H4 heading)
<h3> ### (H3 heading)
<h2> ## (H2 heading)
<h1> # (H1 heading)
<ul> - (Unordered list)
<b> ** (Bold text)
<it> _ (Italic text)
<newline> \n (New line)
<newlinespaced> \n (Double space + new line for Markdown line breaks)
<urlquote> ( (Start of a URL in a link)
<urlquoteclose> ) (End of a URL in a link)
<img> ! (Image syntax in Markdown)
<h1lined> ===\n (Alternative H1 heading)
<h2lined> ---\n (Alternative H2 heading)

Handling Links

Phrase might alter the format of Markdown links. The function corrects links to restore their original structure:

  • It finds link placeholders formatted as:
[translated_text](url)
  • Then, it swaps the URL and text if necessary to maintain correct Markdown formatting.

Conclusion

  • Replaces placeholders with valid Markdown symbols.
  • Uses regex to fix links, ensuring [Title](URL) format is preserved.
  • Returns the corrected Markdown content ready to be stored in the CMS.

Handling Webhooks from Phrase

Our CRM subscribes to Phrase webhooks to receive updates when a project job status changes.

When a webhook is received, we:

  1. Find the corresponding project in our database.
  2. Check the workflow step, which can be either Translation or Revision.
  3. If the job status is COMPLETED_BY_LINGUIST and the workflow step is Revision, we process the webhook and request the translated job file from Phrase:
GET /api2/v1/projects/{project_uid}/jobs/{job_uid}/targetFile?format=ORIGINAL
  1. The translation is then stored for the respective entity.

Storing Translations

For most entities, we locate the entity using its ID from the translation file and store the translation for the corresponding locale.

Pages require additional processing because they contain multiple blocks. Each block is identified by a block ID and a block type key, allowing us to correctly map translations to the CMS structure.