Transforms

A building block is a specification — it defines a data model that data can conform to. Transforms complement that by defining a reusable conversion library: for data that conforms to this building block, here is how to convert it into another format, encoding, or building block representation. Clients and tools can discover these transforms from the building block register and use them as ready-made adapters without having to implement the conversion logic themselves.

Typical conversions include encoding translations (e.g. XML to JSON), schema or structural transformations, semantic uplift to RDF, and vocabulary or terminology mappings.

Transforms are declared in a transforms.yaml file in the building block directory. During postprocessing, example snippets that match a transform’s declared input media types are automatically run through it — this demonstrates the transform works and gives clients a concrete preview of the output. The transform library itself, however, is the primary artifact; the snippet outputs are illustrative.

transforms.yaml structure

transforms:
  - id: my-transform           # required; alphanumeric and dashes only
    description: What it does  # optional; Markdown accepted
    type: jq                   # required; see supported types below
    inputs:
      mediaTypes:
        - application/json     # media types this transform accepts
    outputs:
      mediaTypes:
        - application/json     # media types this transform produces
    code: |                    # inline code/script
      .foo = "bar"

The transform code can be declared inline with code, or referenced from a separate file with ref:

  - id: my-transform
    type: jq
    ref: transforms/my-script.jq

Input and output media types can be given as plain strings (application/json) or as objects when a file extension is needed for the output:

    outputs:
      mediaTypes:
        - mimeType: text/csv
          defaultExtension: csv

Common short-form aliases such as json, xml, or turtle are also accepted and will be normalized to their canonical MIME types.


Supported transform types

jq

Applies a jq expression to JSON input.

  • Default inputs: application/json
  • Default outputs: application/json
  - id: add-type
    type: jq
    code: |
      .type = "ex:MyFeature"

sparql-construct

Runs a SPARQL CONSTRUCT query on RDF input, producing an RDF graph.

  • Default inputs: application/ld+json, text/turtle
  • Default outputs: text/turtle
  - id: to-geosparql
    type: sparql-construct
    ref: transforms/to-geosparql.sparql

sparql-update

Runs a SPARQL UPDATE statement on an RDF graph in-place.

  • Default inputs: application/ld+json, text/turtle
  • Default outputs: same as input
  - id: remap-predicates
    type: sparql-update
    ref: transforms/remap.sparql

shacl-af-rule

Applies SHACL Advanced Features rules (SPARQL-based) to an RDF graph.

  • Default inputs: application/ld+json, text/turtle
  • Default outputs: text/turtle
  - id: infer-types
    type: shacl-af-rule
    ref: transforms/infer-types.shacl.ttl

xslt

Applies an XSLT stylesheet to XML input.

  • Default inputs: application/xml
  • Default outputs: application/xml
  - id: normalise-xml
    type: xslt
    ref: transforms/normalise.xslt

json-ld-frame

Applies a JSON-LD frame to JSON-LD or RDF input.

  • Default inputs: application/ld+json, text/turtle
  • Default outputs: application/ld+json
  - id: frame-feature
    type: json-ld-frame
    ref: transforms/frame.jsonld

semantic-uplift

Applies a semantic uplift mapping (as used by the OGC NA tools) to JSON input, producing RDF.

  • Default inputs: application/json
  • Default outputs: text/turtle
  - id: uplift
    type: semantic-uplift
    ref: transforms/uplift.yaml

python

Runs a Python code snippet. The snippet receives input_data (a string) and must assign its result to output_data.

  - id: uppercase-keys
    type: python
    inputs:
      mediaTypes: [ application/json ]
    outputs:
      mediaTypes: [ application/json ]
    code: |
      import json
      data = json.loads(input_data)
      output_data = json.dumps({k.upper(): v for k, v in data.items()}, indent=2)

With dependencies:

  - id: to-csv
    type: python
    inputs:
      mediaTypes: [ application/json ]
    outputs:
      mediaTypes:
        - mimeType: text/csv
          defaultExtension: csv
    metadata:
      dependencies:
        pip: pandas>=1.5
        python: ">=3.10"   # optional; skipped if not met
    code: |
      import json, pandas as pd
      data = json.loads(input_data)
      output_data = pd.DataFrame(data if isinstance(data, list) else [data]).to_csv(index=False)

pip accepts any specifier that pip install understands, including GitHub URLs. If python is set to a PEP 440 version specifier, the transform is silently skipped when the runtime does not meet the requirement.

The snippet can be adapted into a standalone script by reading from stdin and printing to stdout — input_data is just a string variable, and output_data is whatever string you assign.


node

Runs a Node.js code snippet. The snippet receives inputData (a string) and must assign its result to outputData.

  - id: add-metadata
    type: node
    inputs:
      mediaTypes: [ application/json ]
    outputs:
      mediaTypes: [ application/json ]
    code: |
      const data = JSON.parse(inputData);
      data.generatedBy = 'my-transform';
      outputData = JSON.stringify(data, null, 2);

With dependencies:

  - id: to-csv
    type: node
    inputs:
      mediaTypes: [ application/json ]
    outputs:
      mediaTypes:
        - mimeType: text/csv
          defaultExtension: csv
    metadata:
      dependencies:
        npm: json2csv
        node: ">=18"   # optional; skipped if not met
    code: |
      const { Parser } = require('json2csv');
      const rows = Array.isArray(inputData) ? inputData : [JSON.parse(inputData)];
      outputData = new Parser().parse(rows);

npm accepts any package name or specifier that npm install understands. If node is set to a semver range, the transform is silently skipped when the runtime does not meet the requirement.


Unknown transform types

Declaring a transform with a type not listed above is valid — it will be included in the building block register for other tools or systems that support it, and skipped during postprocessing unless a matching transform plugin is declared.


Transform plugins

You can add support for custom transform types by declaring transform plugins in a transform-plugins.yml file at the root of your building blocks repository:

plugins:
  - pip: git+https://github.com/example/my-bblocks-plugin.git
    modules:
      - my_bblocks_plugin

Each plugin entry installs one or more pip packages and scans the listed Python modules for transformer classes. A transformer class is recognised by duck typing — it needs:

  • transform_types: a non-empty list of type name strings
  • transform(metadata): a callable that accepts a metadata object and returns a string or bytes, or raises an exception on failure

Each plugin runs in its own isolated virtualenv (created automatically under the postprocessing sandbox), so dependency conflicts between plugins, or between a plugin and the postprocessor itself, are not a concern.

pip accepts any specifier that pip install understands, including version constraints, GitHub URLs, and local paths. It can be a string or a list when multiple packages are needed.

The postprocessor automatically derives a human-facing URL from the pip specifier (PyPI page for package names, repository URL for git+https:// references). You can override this with an explicit url field:

plugins:
  - pip: git+https://github.com/example/my-bblocks-plugin.git
    url: https://github.com/example/my-bblocks-plugin
    modules:
      - my_bblocks_plugin

Plugin metadata (types, class names, pip reference, and URL) is included in register.json under transformPlugins, allowing viewers and tooling to attribute each transform type to its plugin.

The metadata object

The metadata argument passed to transform() is a plain namespace with the following attributes:

Attribute Type Description
type str The transform type identifier (e.g. jinja2)
transform_content str The code or script declared in transforms.yaml (code or ref)
input_data str The example snippet text
source_mime_type str MIME type of the input snippet
target_mime_type str MIME type of the declared output
metadata dict Extra metadata from the transform declaration (keys starting with _ excluded)
sandbox_dir None Always None in the plugin subprocess context

Return value and error handling

Return a str or bytes to produce output. Return None to produce no output (not an error). Raise any exception to signal failure — the exception message becomes the transform’s stderr output.

Transformer class attributes

Attribute Required Description
transform_types yes List of type name strings this class handles
default_inputs no Default input media types (used when inputs is not declared in transforms.yaml)
default_outputs no Default output media types (used when outputs is not declared in transforms.yaml)

Example plugin

The following skeleton shows the minimal structure. metadata.transform_content carries the user-supplied code or script from transforms.yaml, so the transform logic is data-driven rather than hard-coded in the plugin.

# my_bblocks_plugin/__init__.py
import json

class MyTransformer:
    transform_types = ['my-type']
    default_inputs = ['application/json']
    default_outputs = ['text/plain']

    def transform(self, metadata):
        data = json.loads(metadata.input_data)
        # metadata.transform_content holds the code/expression from transforms.yaml
        # return a string or bytes, or raise on error
        return str(data)

A real-world example is the bblocks-jinja2-transform-plugin, which adds a jinja2 transform type that renders Jinja2 templates against JSON input.