Transforms

A building block is a specification — it defines a data model that data can conform to. Transforms complement that by defining a reusable conversion library: for data that conforms to this building block, here is how to convert it into another format, encoding, or building block representation. Clients and tools can discover these transforms from the building block register and use them as ready-made adapters without having to implement the conversion logic themselves.

Typical conversions include encoding translations (e.g. XML to JSON), schema or structural transformations, semantic uplift to RDF, and vocabulary or terminology mappings.

Transforms are declared in a transforms.yaml file in the building block directory. During postprocessing, example snippets that match a transform’s declared input media types are automatically run through it — this demonstrates the transform works and gives clients a concrete preview of the output. The transform library itself, however, is the primary artifact; the snippet outputs are illustrative.

transforms.yaml structure

transforms:
  - id: my-transform           # required; alphanumeric and dashes only
    description: What it does  # optional; Markdown accepted
    type: jq                   # required; see supported types below
    inputs:
      mediaTypes:
        - application/json     # media types this transform accepts
    outputs:
      mediaTypes:
        - application/json     # media types this transform produces
    code: |                    # inline code/script
      .foo = "bar"

The transform code can be declared inline with code, or referenced from a separate file with ref:

  - id: my-transform
    type: jq
    ref: transforms/my-script.jq

Input and output media types can be given as plain strings (application/json) or as objects when a file extension is needed for the output:

    outputs:
      mediaTypes:
        - mimeType: text/csv
          defaultExtension: csv

Common short-form aliases such as json, xml, or turtle are also accepted and will be normalized to their canonical MIME types.


Supported transform types

jq

Applies a jq expression to JSON input.

  • Default inputs: application/json
  • Default outputs: application/json
  - id: add-type
    type: jq
    code: |
      .type = "ex:MyFeature"

sparql-construct

Runs a SPARQL CONSTRUCT query on RDF input, producing an RDF graph.

  • Default inputs: application/ld+json, text/turtle
  • Default outputs: text/turtle
  - id: to-geosparql
    type: sparql-construct
    ref: transforms/to-geosparql.sparql

sparql-update

Runs a SPARQL UPDATE statement on an RDF graph in-place.

  • Default inputs: application/ld+json, text/turtle
  • Default outputs: same as input
  - id: remap-predicates
    type: sparql-update
    ref: transforms/remap.sparql

shacl-af-rule

Applies SHACL Advanced Features rules (SPARQL-based) to an RDF graph.

  • Default inputs: application/ld+json, text/turtle
  • Default outputs: text/turtle
  - id: infer-types
    type: shacl-af-rule
    ref: transforms/infer-types.shacl.ttl

xslt

Applies an XSLT stylesheet to XML input.

  • Default inputs: application/xml
  • Default outputs: application/xml
  - id: normalise-xml
    type: xslt
    ref: transforms/normalise.xslt

json-ld-frame

Applies a JSON-LD frame to JSON-LD or RDF input.

  • Default inputs: application/ld+json, text/turtle
  • Default outputs: application/ld+json
  - id: frame-feature
    type: json-ld-frame
    ref: transforms/frame.jsonld

semantic-uplift

Applies a semantic uplift mapping (as used by the OGC NA tools) to JSON input, producing RDF.

  • Default inputs: application/json
  • Default outputs: text/turtle
  - id: uplift
    type: semantic-uplift
    ref: transforms/uplift.yaml

python

Runs a Python code snippet. The snippet receives input_data (a string) and must assign its result to output_data. A transform_metadata namespace is also available with the following attributes:

Attribute Description
source_mime_type MIME type of the input snippet
target_mime_type MIME type of the declared output
metadata Extra metadata from the transform declaration (keys starting with _ excluded). Supports both attribute access (transform_metadata.metadata.mode) and dict-style access (transform_metadata.metadata['mode'], for k in transform_metadata.metadata, etc.)
context Transform context namespace
  - id: uppercase-keys
    type: python
    inputs:
      mediaTypes: [ application/json ]
    outputs:
      mediaTypes: [ application/json ]
    code: |
      import json
      data = json.loads(input_data)
      output_data = json.dumps({k.upper(): v for k, v in data.items()}, indent=2)

With dependencies:

  - id: to-csv
    type: python
    inputs:
      mediaTypes: [ application/json ]
    outputs:
      mediaTypes:
        - mimeType: text/csv
          defaultExtension: csv
    metadata:
      dependencies:
        pip: pandas>=1.5
        python: ">=3.10"   # optional; skipped if not met
    code: |
      import json, pandas as pd
      data = json.loads(input_data)
      output_data = pd.DataFrame(data if isinstance(data, list) else [data]).to_csv(index=False)

pip accepts any specifier that pip install understands, including GitHub URLs. If python is set to a PEP 440 version specifier, the transform is silently skipped when the runtime does not meet the requirement.

The snippet can be adapted into a standalone script by reading from stdin and printing to stdout — input_data is just a string variable, and output_data is whatever string you assign.

Binary data: input_data may be a bytes object when the input comes from a binary-producing transform in a chain. Assign bytes to output_data to produce binary output; the postprocessor will detect this and open the output file in binary mode. print() calls are captured and do not interfere with the output.

Python transforms can also call transforms from other building blocks using the get_transformer() builtin.

node

Runs a Node.js code snippet. The snippet receives inputData (a string) and must assign its result to outputData. A transformMetadata object is also available with the following properties:

Property Description
sourceMimeType MIME type of the input snippet
targetMimeType MIME type of the declared output
metadata Extra metadata from the transform declaration (keys starting with _ excluded)
context Transform context object (snake_case keys)
  - id: add-metadata
    type: node
    inputs:
      mediaTypes: [ application/json ]
    outputs:
      mediaTypes: [ application/json ]
    code: |
      const data = JSON.parse(inputData);
      data.generatedBy = 'my-transform';
      outputData = JSON.stringify(data, null, 2);

With dependencies:

  - id: to-csv
    type: node
    inputs:
      mediaTypes: [ application/json ]
    outputs:
      mediaTypes:
        - mimeType: text/csv
          defaultExtension: csv
    metadata:
      dependencies:
        npm: json2csv
        node: ">=18"   # optional; skipped if not met
    code: |
      const { Parser } = require('json2csv');
      const rows = Array.isArray(inputData) ? inputData : [JSON.parse(inputData)];
      outputData = new Parser().parse(rows);

npm accepts any package name or specifier that npm install understands. If node is set to a semver range, the transform is silently skipped when the runtime does not meet the requirement.

Binary data: inputData may be a Buffer when the input comes from a binary-producing transform in a chain. Assign a Buffer to outputData to produce binary output; assigning a string produces text output. console.log() calls are captured and do not interfere with the output.

Node transforms can also call transforms from other building blocks using the getTransformer() function.


get_transformer / getTransformer

Python and Node transforms can call any transform defined in any building block — including transforms of a different type — using a built-in composition helper. This lets you build complex pipelines by reusing transforms across building blocks without duplicating logic.

The callable returned by the helper accepts the content to transform plus optional parameters, runs the target transform in a sub-process, and returns the result.

Python: get_transformer(bblock_id, transform_id)

get_transformer is injected as a built-in into every Python snippet. Call it to obtain a callable for a specific transform, then invoke that callable with the data you want to transform.

# In a python transform
# Get a callable for another building block's transform
convert = get_transformer('ogc.example.other-bblock', 'my-jq-transform')

result_str = convert(data)
output_data = result_str

Callable signature:

callable(content, source_mime_type=None, extra_metadata=None)
Parameter Type Description
content str or bytes The input data to transform
source_mime_type str | None Optional MIME type hint passed to the target transform
extra_metadata dict | None Optional dict merged into the target transform’s metadata (caller values take precedence over the target’s declared metadata)

The callable returns a str (or bytes for binary outputs), or None if the target transform produced no output.

Node: getTransformer(bblockId, transformId)

getTransformer is injected into every Node snippet. The returned callable accepts the content and an options object.

// In a node transform
const convert = getTransformer('ogc.example.other-bblock', 'my-python-transform');

const data = JSON.parse(inputData);
const result = convert(JSON.stringify(data), { sourceMimeType: 'application/json' });
outputData = result;

Callable signature:

callable(content, opts?)
Parameter Type Description
content string or Buffer The input data to transform
opts.sourceMimeType string Optional MIME type hint passed to the target transform
opts.extraMetadata object Optional object merged into the target transform’s metadata

The callable returns a string (or Buffer for binary outputs), or null if the target transform produced no output.

Supported target types

Both helpers can call transforms of the following types: python, node, jq, xslt, json-ld-frame.

SPARQL, SHACL-AF, and semantic-uplift transforms are not supported as targets.

Cross-type chaining

Any combination of supported types can call each other arbitrarily deep — for example, a Python transform can call a jq transform that was defined in another building block, or a Node transform can call a Python transform, which in turn calls an XSLT transform. The composition is fully symmetric across language boundaries.

Cycle detection

If a transform is already executing in the current call chain, calling it again via get_transformer / getTransformer raises a RuntimeError (Python) or throws an Error (Node) immediately. Cycle detection works across process and language boundaries.

Metadata scoping

Each transform always receives its own declared metadata from transforms.yaml — the caller’s metadata is not inherited. If you need to pass values from the calling transform into the target, use extra_metadata:

# Python — forward the caller's metadata to the sub-transform
convert = get_transformer('other.bblock', 'some-transform')
result = convert(data, extra_metadata=transform_metadata.metadata)
// Node — same idea
const convert = getTransformer('other.bblock', 'some-transform');
const result = convert(data, { extraMetadata: transformMetadata.metadata });

extra_metadata / extraMetadata is merged on top of the target’s own declared metadata, so the target’s keys take lower priority than what the caller explicitly passes.

_nested_transform metadata flag

The target transform’s metadata will contain _nested_transform: true when invoked via get_transformer / getTransformer. This lets a transform behave differently when called as a sub-transform versus running as a top-level postprocessing step.


Output profile validation

A transform’s outputs can be validated against one or more building blocks by declaring them as profiles. During postprocessing, every output file produced by the transform is validated against each declared profile using the same validators that run on regular test resources (JSON Schema, JSON-LD context, and SHACL).

Profiles are declared under outputs.profiles as a list of building block identifiers, using the bblocks:// URI scheme:

transforms:
  - id: to-geojson-feature
    type: jq
    inputs:
      mediaTypes: [ application/json ]
    outputs:
      mediaTypes: [ application/geo+json ]
      profiles:
        - bblocks://ogc.geo.features.feature

Both locally-defined building blocks and imported building blocks from other registers are supported.

What gets produced

For each declared profile, postprocessing creates a subdirectory named after the profile identifier alongside the transform outputs and writes:

  • A .validation_{passed|failed}.txt text report for each output file
  • Semantic uplift side-outputs (.jsonld, .ttl) when the profile includes a JSON-LD context
  • A consolidated _report.json covering all validated outputs for that profile

The per-snippet transform result in register.json gains a profilesValidation map keyed by profile identifier:

"profilesValidation": {
  "ogc.geo.features.feature": {
    "result": true,
    "report": "build/tests/my.bblock/transforms/ogc.geo.features.feature/_report.json",
    "upliftedFiles": {
      "jsonld": "build/tests/my.bblock/transforms/ogc.geo.features.feature/output.jsonld",
      "ttl":    "build/tests/my.bblock/transforms/ogc.geo.features.feature/output.ttl"
    }
  }
}

Transform context

All executable transform types (Python, Node, and plugins) receive a transform context with metadata about the building block, example, and postprocessing run. In Python snippets it is transform_metadata.context (a SimpleNamespace); in Node snippets it is transformMetadata.context (a plain object); in plugins it is metadata.ctx (a SimpleNamespace). All fields use snake_case.

Most transforms only need a handful of these fields; the full set is listed here for reference.

Building block:

Field Type Description
bblock_id str Building block identifier
bblock_name str | None Human-readable name
bblock_version str | None Version string
bblock_tags list Tags declared in bblock.json
bblock_files_path str Absolute path to the building block source directory
bblock_annotated_path str Absolute path to the annotated output directory
bblock_metadata dict Full building block metadata snapshot at transform time
source_schema_path str | None Relative path to the source schema file, or URL if declared as a remote reference
annotated_schema_path str | None Relative path to the annotated schema, if generated
jsonld_context_path str | None Relative path to the generated JSON-LD context, if present
shacl_shapes_paths list Relative paths or URLs of SHACL shapes (local files are relativized to CWD; remote references are preserved as URLs)

Example and snippet:

Field Type Description
example_index int Zero-based index of the current example
example dict Full example object (title, prefixes, base-output-filename, etc.) — snippets excluded
snippet_index int Zero-based index of the current snippet within the example
snippet dict Full snippet object (language, url, ref, json-path, prefixes, etc.) — code excluded (use input_data)

Note: When json-path is set on a snippet, snippet['full-code'] contains the complete content of the referenced file before path extraction. This is useful when the transform needs context beyond the extracted value.

Note: snippet['shacl-closure'] contains the merged list of SHACL closure entries from both the building block’s shaclClosures (bblock.json) and the snippet’s own shacl-closure (examples.yaml), deduplicated. Entries may be URLs or paths relative to the building block source directory. To resolve a relative path:

import os
closures = context.snippet.get('shacl-closure') or []
resolved = [
    c if c.startswith('http') else os.path.join(context.working_dir, context.bblock_files_path, c)
    for c in closures
]

Output:

Field Type Description
output_file str Absolute path where this transform’s output will be written
output_dir str Absolute path to the transform output directory for this building block
working_dir str Working directory at postprocessing time

Register and configuration:

Field Type Description
base_url str | None Base URL for generated output
github_base_url str | None GitHub repository base URL (e.g. https://github.com/org/repo/)
git_repository str | None Git remote URL
id_prefix str Building block identifier prefix from bblocks-config.yaml
imported_register_urls list Register import URLs from bblocks-config.yaml
transform_plugins list Active transform plugins

Note: bblock_metadata reflects the state at transform time — fields populated after the transforms step (such as shaclShapes URLs and documentation) will not be present yet.


Transform plugins

Declaring a transform type not listed in Supported transform types is valid — it will be included in the building block register for other tools or systems that support it, and skipped during postprocessing unless a matching plugin is declared here.

You can add support for custom transform types by declaring transform plugins in bblocks-config.yaml:

plugins:
  transforms:
    - pip: git+https://github.com/example/my-bblocks-plugin.git
      modules:
        - my_bblocks_plugin

Note: The legacy transform-plugins.yml file is still accepted but deprecated. Move its contents to the plugins.transforms key in bblocks-config.yaml.

Each plugin entry installs one or more pip packages and scans the listed Python modules for transformer classes. A transformer class is recognised by duck typing — it needs:

  • transform_types: a non-empty list of type name strings
  • transform(metadata): a callable that accepts a metadata object and returns a string or bytes, or raises an exception on failure

Each plugin runs in its own isolated virtualenv (created automatically under the postprocessing sandbox), so dependency conflicts between plugins, or between a plugin and the postprocessor itself, are not a concern.

pip accepts any specifier that pip install understands, including version constraints, GitHub URLs, and local paths. It can be a string or a list when multiple packages are needed.

The postprocessor automatically derives a human-facing URL from the pip specifier (PyPI page for package names, repository URL for git+https:// references). You can override this with an explicit url field:

plugins:
  - pip: git+https://github.com/example/my-bblocks-plugin.git
    url: https://github.com/example/my-bblocks-plugin
    modules:
      - my_bblocks_plugin

Plugin metadata (types, class names, pip reference, and URL) is included in register.json under transformPlugins, allowing viewers and tooling to attribute each transform type to its plugin.

The metadata object

The metadata argument passed to transform() is a plain namespace with the following attributes:

Attribute Type Description
type str The transform type identifier (e.g. jinja2)
transform_content str The code or script declared in transforms.yaml (code or ref)
input_data str The example snippet text
source_mime_type str MIME type of the input snippet
target_mime_type str MIME type of the declared output
metadata namespace / dict Extra metadata from the transform declaration (keys starting with _ excluded). Supports both attribute access and dict-style access
sandbox_dir None Always None in the plugin subprocess context
ctx SimpleNamespace Transform context

Return value and error handling

Return a str or bytes to produce output. Return None to produce no output (not an error). Raise any exception to signal failure — the full traceback becomes the transform’s stderr output.

Any output written to stdout or stderr during transform() (e.g. print() calls) is captured and logged at DEBUG level. To see it, run the postprocessor with --log-level DEBUG.

Transformer class attributes

Attribute Required Description
transform_types yes List of type name strings this class handles
default_inputs no Default input media types (used when inputs is not declared in transforms.yaml)
default_outputs no Default output media types (used when outputs is not declared in transforms.yaml)

Example plugin

The following skeleton shows the minimal structure. metadata.transform_content carries the user-supplied code or script from transforms.yaml, so the transform logic is data-driven rather than hard-coded in the plugin.

# my_bblocks_plugin/__init__.py
import json

class MyTransformer:
    transform_types = ['my-type']
    default_inputs = ['application/json']
    default_outputs = ['text/plain']

    def transform(self, metadata):
        data = json.loads(metadata.input_data)
        # metadata.transform_content holds the code/expression from transforms.yaml
        # return a string or bytes, or raise on error
        return str(data)

A real-world example is the bblocks-jinja2-transform-plugin, which adds a jinja2 transform type that renders Jinja2 templates against JSON input.