I. Keywords
The following are keywords to be used by search engines and document catalogues.
metadata, GeoDCAT, 19115, Records, STAC, Provenance, Building Blocks
II. Preface
II.A. Acknowledgements
OGC thanks the following organisations for material support for this codesprint:
[University of NSW School of the Built Environment, Geospatial Research Innovation Development (UNSW GRID)](https://www.unsw.edu.au/arts-design-architecture/our-schools/built-environment/our-research/clusters-groups/grid) for the venue and logistics.
[AURIN](https://aurin.org.au/) and [SURROUND Australia](https://surroundaustralia.com/) for catering.
II.B. Abstract
The Metadata Code Sprint was planned to have a primary focus on the following group of tools, APIs and encodings:
OGC GeoDCAT – (under development) a spatio-temporal profile of the W3C DCAT Recommendation DCAT, and provide guidance about its use and further specialization. OGC GeoDCAT is inspired by the GeoDCAT-AP specification but defines just the internationally relevant concepts to allow wider application. The key areas to consider are around the expression of place and time, such as GeoDCAT-AP properties-for-location
OGC API – Records provides a way to browse or search a curated collection of records of geospatial resources, known as a catalog. A record makes a resource discoverable by providing summary information (e.g. metadata) about the geospatial resource.
ISO 19115 Standards define the schema required for describing geographic information and services by means of metadata. Of particular interest will be recent developments on JSON encodings provided by ISO 19115-4.
STAC provides a common structure for describing and cataloging spatiotemporal assets. A spatiotemporal asset is any file that represents information about the earth captured in a certain space and time.
Other metadata standards, such as CDIF, EML, and Local Context may be included where the use of these have important implications on the utility of geospatial metadata.
Particular focus will be placed on use of OGC modular “building blocks for location” that address both simple and the most complex use-cases.
III. Security considerations
No security considerations have been made for this document.
IV. Submitting Organizations
The following organizations submitted this Document to the Open Geospatial Consortium (OGC):
- OGC
- ISO TC211
V. Submitters
All questions regarding this document should be directed to the editors or the contributors:
Table — Submitters
Name | Organization | Role |
---|---|---|
Rob Atkinson | OGC | Editor |
Byron Cochrane | Openworks | Editor |
David Stolarz | ASPRS | Contributor |
Peter Parslow | ISO TC 211 | Contributor |
Ivana Ivanova | Curtin University | Contributor |
Panagiotis (Peter) A. Vretanos | CubeWerx Inc. | Contributor |
Christin Henzen | TU Dresden | Contributor |
Prof. Matt Duckham | RMIT | Contributor |
1. Terms, definitions and abbreviated terms
This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.
This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.
For the purposes of this document, the following additional terms and definitions apply.
An Application Programming Interface (API) is a standard set of documented and supported functions and procedures that expose the capabilities or data of an operating system, application, or service to other applications (adapted from ISO/IEC TR 13066-2:2016).
A coordinate system that is related to the real world by a datum term name (source: ISO 19111).
a type of graph in mathematics and computer science that represents a conceptual or mathematical model of a data pipeline, embodying a series of activities in a specific arrangement in which data flows through a finite set of nodes connected by edges. Notably, these graphs lack a designated start or end node and prevent data from looping back to its point of origin. Their popularity in data engineering stems from their clear depiction of Data Lineage and their suitability for functional approaches, ensuring idempotency in restarting pipelines without side effects. https://www.ssp.sh/brain/dag/
an open-source project that automates the deployment of applications as movable, independent containers
a web service to programmatically assess FAIRness of research data objects at the dataset level based on the FAIRsFAIR Data Object Assessment Metrics. https://fair-impact.eu/
a format for encoding collections of simple geographical features along with their non-spatial attributes using JSON
A JSON-based Serialization for Linked Data
A document (or set of documents) that defines or describes an API. An OpenAPI definition uses and conforms to the OpenAPI Specification (https://www.openapis.org).
formal representation of phenomena of a universe of discourse with an underlying vocabulary including definitions and axioms that make the intended meaning explicit and describe phenomena and their interrelationships.
set of one or more base standards or subsets of base standards, and, where applicable, the identification of chosen clauses, classes, options and parameters of those base standards, that are necessary for accomplishing a particular function. Note: A profile is derived from base standards so that by definition, conformance to a profile is conformance to the base standards from which it is derived.
Provenance — in terms of a family of documents that defines a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments, via W3C.
a geometry manipulation framework for multidisciplinary design optimization
1.13. Terms and definitions
This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.
This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.
For the purposes of this document, the following additional terms and definitions apply.
formal description of a model
Web of data with meaning. Note: The association of meaning allows data and information to be understood and processed by automated tools as well as by people.
An API using an architectural style that is founded on the technologies of the Web [source: OGC API — Features — Part 1: Core].
1.15. Abbreviated terms
API
Application Programming Interface
AURIN
Australian Urban Research Infrastructure Network
BFO
Basic Framework Ontology
CARE
Collective benefit, Authority, Responsibility to control, Ethical behaviors
CITE
Compliance Interoperability & Testing Evaluation
CRS
Coordinate Reference System
DAG
Directed Acyclic Graph
DCAT
Data Catalog Vocabulary
DOI
Digital Object Identifier
DWG
Domain Working Group
EDR
Environmental Data Retrieval
FAIR
Findability, Accessibility, Interoperability, and Reusability
GeoDCAT
a spatio-temporal profile of the W3C DCAT Recommendation
GIS
Geographic Information System
GN
GeoNetwork
ICSM
Intergovernmental Committee on Surveying and Mapping
ISO
International Organization for Standardization
JSON
JavaScript Object Notation
NCRIS
National Collaborative Research Infrastructure Strategy
NFDI
National Research Data infrastructure
OGC
Open Geospatial Consortium
ORCID
Open Researcher and Contributor ID
OWL
Web Ontology Language
OWS
OGC Web Services
PROV
Provenance family of documents
PROV-O
PROV Ontology
RDF
Resource Description Framework
REST
Representational State Transfer
SHACL
Shapes Constraint Language
STAC
Spatial Temporal Asset Catalog
SWG
Standards Working Group
TC 211
ISO Technical Committee 211 — Geographic information/Geomatics
TEAM
Test, Evaluation, And Measurement Engine
UML
Unified Modeling Language
URI
Uniform Resource Identifier
URL
Uniform Resource Locator
W3C
World Wide Web Consortium
WG
Working Group
XSLT
Extensible Stylesheet Language Transformations
2. Introduction
2.1. Summary
OGC Code Sprints experiment with emerging ideas in the context of geospatial Standards and help improve interoperability of existing Standards by experimenting with new extensions or profiles. They are also used for building proofs-of-concept to support standards development activities and the enhancement of software products. The nature of the activities is influenced by whether a code sprint is ‘generic’ or ‘focused’. All OGC working groups are invited and encouraged to set up a thread in generic code sprints, whereas focused code sprints are tailored to a specific set of standards (typically limited to three standards).
In the case of “metadata”, this is a ubiquitous concept with a long and varied tradition of handling metadata in many places, catalogs of “data set descriptions”, embedded in data objects, documented in specifications for such data, attached to deployments of services to serve or utilise data etc. Each technology and each domain of application has its own legacy and natural strengths and weaknesses. This code sprint will examine some of the common patterns and overarching needs in an endeavour to improve and extend various OGC metadata approaches.
This paper presents the high-level architecture of the code sprint and describes each of the standards and software packages that were deployed in support of the code sprint. The paper also discusses the results and presents a set of conclusions and recommendations. The recommendations identify ideas for future work, some of which may be more appropriate for testbeds, pilots, or other types of OGC initiatives. Therefore, the reader is encouraged to consider the recommended future work within the context of all OGC Standards development, collaborative solutions, and innovation activities.
The idea is to determine and demonstrate where and when different standards (particularly STAC, OGC API records, ISO 19115, and GeoDCAT) can be used to provide the most relevant information to the right users at the right time. The goal is to demonstrate how the same concepts may be handled by different options, whilst supporting existing communities of practices to extend and profile their chosen approaches.
OGC will provide an open source framework for testing mappings between standards, using test case examples covering Records, STAC and DCAT. Code sprint participants would be able to extend this to XML based metadata standards such as ISO 19115, in a way that is extensible to various profiles in use. This could leverage skills in any or all of coding, datamodelling, metadata standards or application domain requirements.
Potential scope from an ISO TC 211 perspective includes canvassing for views on what could be useful in regards to an upcoming revision of ISO 19115-1:2014, including the potential for a formal document on how to map ISO 19115-1 to DCAT, or perhaps a new version of ISO 19115-1 restructured using the DCAT terms and structure, with the intention of aligning the encoding of ISO 19115-1 with one or more encodings of DCAT. Similarly, is there user demand for an RDF/XML and/or OWL “encoding” of ISO 19115-1 in an official or recommended form?
More specifically, in terms of a specific “code sprint” activity on 19115, potential foci could look to define and exercise the “mapping” — this could be done several ways and the GeoDCAT Building Blocks can be used to execute, test and publish these mappings. Five options worthy of consideration:
XML→JSON using existing libraries and JSON-LD uplift (often this needs an intermediate transform) 19115 UML → JSON schema and OWL using Shapechange, and JSON-LD uplift (bonus marks to make shapechencge generate the JSON-LD mapping) — and transforms from 19115 OWL to GeoDCAT. Use existing transform languages and libraries designed for relational→RDF mapping such as R2RML custom scripts taking a mapping table and generating or incorporating custom translation code per element Take a 19115 → OGC Records mapping and use the Records→DCAT mapping under development Note that option 4 could be used to generate the transforms for 1,2 or 5. Option 5 will be most OGC API friendly solution.
Similarly, ISO TC 211 — working group 7 — Information Communities, is developing a JSON implementation of the 19115 metadata-standard, the 19115-4, Geographic information — Metadata — Part 4: JSON schema implementation of metadata fundamentals, and are looking for help on creating a GeoJSON schema.
The approach: 19115-1 and 19157-1 (data quality) are in scope and encoding will be in GeoJSON although automated generation from uml to GeoJSON schema is possible, but creates an overly-complex JSON schema which may be not fit for purpose. A comprehensive XML ‘real life’ dataset has been created and converted to a GeoJSON preferred encoding. The GeoJSON example dataset is a subset (profile) of 19115-1 and 19157-1, but is considered to cover the essential part and is considered as the requirements that are set for the GeoJSON schema, with development pending resource allocation, ultimately resulting in the generation of a GeoJSON schema based on the dataset and UML model.
Specific case studies with engagement in the first session included: #17 TITLE NEEDED?
Another goal under consideration is to move the Geonetwork 4 FormatterAPI, as well as all its support, over to Geonetwork 5 via a set of XSLTs that take the underlying ISO XML documents and output DCAT-AP (country specific) XML RDF files, which is already working in Geonetwork 4, which will provide records infrastructure access to all of Geonetwork’s output formats (more than just DCAT).
XAI- OGC API Processes for RAG/LLM using GeoDCAT and PROV #18 The LangFlow toolkit uses Apache Airflow to define and run workflows using the LangChain toolkit. If such a workflow used or generated geospatial information sources, then wrapping it in OGC API Processess, and capturing details of the source, output and training data usage using the PROV model makes sense to integrate into spatial systems.
Potential codesprint outputs:
GeoDCAT profile for PROV using LangChain workflow examples — extending and/or maturing the [existing Records-PROV Building Block — work in progress] (https://ogcincubator.github.io/geodcat-ogcapi-records/bblock/ogc.geo.geodcat.geodcat-records-prov) — note for a start this can simply demonstrate use of the generic PROV pattern supported with real workflow examples. Define a new Building Block PROV-AI profile for AI defining types of activities, using TrainingML GeoDCAT+PROV-AI profile (richer version of first with full description of activity and training set specifics Code for generic AirFlow components to capture provenance — wrappers for existing components? Code for LangFlow to capture specific provenance details OGCAPI-Processess profile for including PROV (draft to be available for review and testing) Code for AirFlow to generate OGC API Processess interface to export output and provenance STAC and OGC API Records extension/profiles for PROV-AI — building on STAC-PROV extension Examples and mappings for STAC as a profile of OGC-API Records mapped to DCAT — linking GeoDCAT-PROV to STAC-PROV and Records-PROV Code for AirFlow to generate STAC or Records metadata traces for generated objects Code for AirFlow to import STAC or Records metadata traces for referenced data objects and attach to the generated provenance trace extend existing Code for PyGeoAPI to deliver Records with PROV profile extend any OGC Records or STAC client to display provenance information extend and OGC Records or STAC editor to display and manage provenance information These are all fairly small individually — but publishing the profiles as Building Blocks — or testing existing ones with these scenarios — ensures a output that can be built upon systematically, so any combination of the tasks has value to progress GeoDCAT and OGC APIs as an interoperability framework for XGeoAI — “Explainable AI for Geospatial”.
2.2. Motivating Use Case: Data Format and Validation Service For Interoperability Between Analytics and Data
The following concept was submitted by AURIN, as an e-Research infrastructure provider.
_As systems become more complicated, and a system of systems approach becomes the norm rather than the dream, it’s important that datasets and containerised analytics to be able declare, trust, and verify that data meets certain assertions and standards. AURIN has been contemplating an idea for building a Data Format and Validation Registry, first internally and then externally, that would mix persistent identifiers, human readable data assertions, and machine readable and executable assertions on data formats to add an automated “trust, with verification” layer to their data/analytics ecosystem.
The basic sketch would be to have a minted persistent identifier (similar to a DOI or an ORCID) of a description that links to a human readable and machine readable page describing what this data format is and what assertions data would have to meet to be considered valid. A preliminary prototype of the service could have a sort of standard DOI type link that would resolve to a page with human readable metadata about the data format, and then validation scripts or attached assertions in a validation language, such as Great Expectations, or similar.
The challenge, beyond vetting the approach is to think about what metadata should be provided on the other end of that “format persistent identifier” link that would allow for a service to independently and automatically verify that a dataset complies with the standard it claims to adhere to._
This challenge is open-ended to an extent, however the Clause 4.2.1 articulated during the code sprint provides a basis for designing such a metadata capability in a scalable, sustainable fashion using fit-for-purpose standardised components. The completed exercises using Provenance as such a component highlight the feasibility of having semantically rich components combined with OGC and ISO “base” standards.
3. High-Level Architecture
3.1. Overview
As illustrated in Figure 1, the sprint architecture is somewhat loose — to identify what aspects of metadata standardisation the community considers necessary, but also to explore the commonalities and principles that can be discovered from current activities.
It was not assumed that specific standards could be developed not tested thoroughly, however the general principle of extending multiple metadata standards with extensions based on common models was prioritised, with a focus on Provenance. Provenance represents a complex challenge with a well-known conceptual model, so testing if a solution to this complex challenge can be successfully re-used provides clear direction to the problems of extending metadata standards for any specific aspects.
The key methodology used was the testing of examples based on prior work to model a reusable JSON schema for PROV-O with a ready-to-use JSON-LD semantic mapping, thus encapsulating a complex problem to make implementation feasible. Some codesprint activities tested the feasibility of re-use of such an extension.
Figure 1 — High Level Overview of the Sprint Architecture
Figure 2 illustrates some of the key metadata standards discussed and how they might relate as extension, profiles or mappings.
Figure 2 — Relationships between metadata standards, extensions and profiles
The rest of this section describes the software deployed, and standards implemented during the code sprint or in support of the code sprint.
3.2. Approved OGC Standards
3.2.1. OGC API — Features
The OGC API — Features Standard offers the capability to create, manage, and query spatial data on the Web. The Standard specifies requirements and recommendations for Web APIs that are designed to facilitate the sharing of feature data. The specification is a multi-part standard. Part 1, labelled the Core, describes the mandatory capabilities that every implementing service has to support and is restricted to read-access to spatial data that is referenced to the World Geodetic System 1984 (WGS 84) Coordinate Reference System (CRS) (OGC 17-069r4). Part 2 enables the use of different CRSs, in addition to the WGS 84 (OGC 18-058r1). Additional capabilities that address specific needs will be specified in additional parts. Envisaged future capabilities include, for example, support for creating and modifying data, more complex data models, and richer queries.
3.2.2. OGC API — Processes
The OGC API — Processes Standard supports the wrapping of computational tasks into executable processes that can be offered by a server through a Web API and be invoked by a client application (OGC 18-062r2). The Standard enables the execution of computing processes and the retrieval of metadata describing the purpose and functionality of the processes. Typically, these processes execute well-defined algorithms that ingest vector and/or coverage data to produce new datasets.
The OGC API — Processes — Part 2: Deploy, Replace, Undeploy candidate Standard extends the core capabilities specified in the OGC API — Processes — Part 1: Core (OGC 18-062r2) with the ability to dynamically add, modify and/or delete individual processes using an implementation (endpoint) of the OGC API — Processes Standard.
3.3. Approved ISO Standards
ISO 19115-1:2014 Geographic information — Metadata — Part 1: Fundamentals
Note: many in Europe still use ISO 19115:2003
ISO 19115-3 Geographic information — Metadata — Part 3: XML schema implementation for fundamental concepts
Note: many in Europe still use the XML encoding of ISO 19115:2003 which is in ISO/TS 19139:2007
3.4. Candidate OGC Standards
3.4.1. OGC API — Records
The OGC API — Records candidate Standard provides discovery and access to metadata records that describe resources such as features, coverages, tiles / maps, models, assets, datasets, services, or widgets (OGC 20-004). The candidate Standard enables the discovery of geospatial resources by standardizing the way collections of descriptive information about the resources (metadata) are exposed. The candidate Standard also enables the discovery and sharing of related resources that may be referenced from geospatial resources or their metadata by standardizing the way all kinds of records are exposed and managed.
3.5. Candidate ISO Standards
The roadmap for ISO 19115-4, JSON encoding was discussed. The code sprint illustrated how modularity for such an encoding could be approached.
ISO/CD 19157-3 Geographic information — Data quality — Part 3: Data quality measures register, in particular its draft implementation on the OGC Development Server
ISO/TC 211 plans to revise ISO 19115-1:2014 were discussed, particularly the plan for ISO/TC 211 to publish a DCAT mapping.
3.7. Software Projects and Products
3.7.1. OSGeo pygeoapi
pygeoapi is a Python server implementation of the OGC API suite of Standards. The project emerged as part of the next generation OGC API efforts in 2018 and provides the capability for organizations to deploy a RESTful OGC API endpoint using OpenAPI, GeoJSON, and HTML. pygeoapi is open source and released under an MIT license. pygeoapi is an official OSGeo Project as well as an OGC Reference Implementation. pygeoapi supports numerous OGC API Standards. The official documentation provides an overview of all supported standards.
4. Results
4.1. Overview
The code sprint included multiple software applications and experimented with several standards. This section presents the key results from the code sprint.
4.2. Metadata standards alignment
4.2.1. Metadata in motion
The workshop discussion agreed on the following principles which help inform metadata interoperability requirements:
Metadata is best captured at source
Systems therefore have a need to ingest and pass on metadata from source to output, inline or by reference
Metadata should preserve canonical references
schemas must be available for the metadata elements being passed
description of those schemas are necessary, ideally as machine-readable semantics
this pattern of encapsulating metadata and references is well-known as a “provenance chain”
Provenance chains will exhibit a DAG (Directed Acyclic Graph) as the same nodes may appear in multiple places
Schemas etc for a DAG pattern are not trivial, hence it is critical to encapsulate a good design and make it available for reuse.
In line with these principles, several activities focussed on the potential reusability of a PROV model using a formalised JSON schema. See Clause 5.5
4.2.2. A framework for developing metadata standards
It is commonly understood that profiles are necessary to implement a general metadata standard in the context of a specific application domain or community of practice.
The description and management of such profiles is however highly variable in implementation practice.
One key pattern is the “Core + extensions” model — characterised by the OGC ModSpec (OGC 08-131r3), and implemented in various forms including:
STAC extensions
DCAT profiles published for use in EU Portals
ISO 19115-Part 2,3, etc
In addition, metadata elements defined by different profiles may be combined (composition)
Finally, metadata specifications may be model and composed, but bundled into a single form for implementation convenience — such as JSON schema for OpenAPI 3.0 (3.1 does not require this as it follows more modern JSON schema)
And, since many systems have their own technology base, and metadata to describe re-use means legacy systems will be the source of much metadata, not single solution will hold universal sway, hence the mappings between standards will be a common concern.
This results in a simple architectural framework for understanding the scope of different metadata standardisation activities.
Figure 3 — Metadata Standardisation Patterns
Note that mappings themselves relate to the different ways metadata standards are defined — in that mappings for core and extensions, or restricted profiles can be composed into a complete mapping, and bundled into some executable form.
The bundling of JSON-LD contexts supported by the OGC Building Block Register represents evidence that these patterns are inevitable, complex, but feasible if a level of standardisation is imposed on the standards themselves. This may be in the form of FAIR machine readable descriptions of available standards, noting the requirement to transparently reference the authoritative standard being described.
4.2.3. Key Recommendations
Explore alignment of metadata standards and development approaches using the framework to identify common approaches to different aspects where possible
publish mappings between metadata standards in a FAIR way.
4.4. Candidate OGC Standards
4.4.1. OGC API — Records and Provenance
Experimentation related to OGC API — Records included extension of pygeoapi to enable support for including Provenance details in Records. (Note the previous codesprint has explored capture of provenance in OGC API Processes).
The provenance model used is based on the W3C provenance model and a [draft “Building Block” that defines a JSON schema and a mapping directly to PROV using JSON-LD](https://ogcincubator.github.io/bblock-prov-schema/bblock/ogc.ogc-utils.prov).
Several experiments were undertaken, with generation of a provenance chain from AI based workflows successfully demonstrated, and rendering of such content in Records “Landing Pages” successfully demonstrated. A detailed summary is presented in Clause 4.6.1.
4.4.2. OGC API — Records and GeoDCAT
Some challenges were highlighted, with reference to mapping the JSON schema of OGC API Records (RecordJSON) to DCAT, including similar element names with different semantics and the need to model the structure of Records.
Even though little was done to progress testing this mapping further, a significant finding is that no alternative approach to machine readable mappings was identified by any participant, though the consensus was that some machine readable FAIR mapping was necessary.
TBD- feedback from Conterra — was any progress made?
4.5. Common Patterns
4.5.1. XAI — Explainable AI
Two teams explored the capture of provenance from “GeoAI” workflows.
Figure 4 shows the target for a workflow based on the LangFlow toolkit, based on the open source Apache AirFlow tools.
Within the limited time available in the codesprint it was shown that generating a provenance trace that conformed to the proposed standardised schema was achievable. It was not feasible within the same limited time to learn and implement OGC API Processes, however another participant has demonstrated this is feasible as part of work on DGGS APIs.
Figure 4 — Architecture to generate OGC API Records with Provenance from AI processes.
An example of the output of one such workflow is here:
{
"prov:type": "prov:Activity",
"generated": {
"id": "output",
"type": "Entity",
"AgentType": "SoftwareAgent",
"response": [
{
"id": "LLM Generated Code",
"type": "Entity",
"wasGeneratedBy": "gemini-1.5-pro-001",
"data": "gdf.to_crs(epsg=7856).set_index('name').loc['UNSW Village'].geometry.distance(gdf.to_crs(epsg=7856)[gdf.amenity == 'hospital'].geometry).min()"
},
{
"id": "Code Output",
"type": "Entity",
"data": "511.8048618048641"
},
{
"id": "Final Output",
"type": "Entity",
"wasGeneratedBy": "gemini-1.5-flash-001",
"data": "The closest hospital to UNSW Village is approximately 512 meters away."
}
]
},
"startedAtTime": "2024-11-19T05:07:22.927913Z",
"endedAtTime": "2024-11-19T05:07:34.304708Z",
"used": [
{
"id": "file",
"type": "Entity",
"AgentType": "Person",
"data": [
{
"id": "osmdata.shp",
"type": "Entity",
"records": 3544
}
]
},
{
"id": "user_input",
"type": "Entity",
"input": "How far away is the closest hospital from UNSW village"
}
]
}
Listing 1
Note that further work is required to characterise the ad-hoc “data” as persisted data sets, described using OGC API Records schemas. The main challenge in this case is establishment of persistent identifiers, however the schema supports such ad-hoc inline content options.
4.5.2. Semantic spatial data enrichment
One team explored transparent mechanisms for annotating spatial data with metadata information as part of a semantic enrichment pipeline.
The team selected the creation of an up-to-date placenames knowledge graph for Australia as our case study, using the placenames ontology and the 2017 FSDF placenames project as inspiration. The approach can be thought of as an update of the Australian FSDF placenames project. Using RML (RDF Mapping Language) the team created RML mappings for each state from JSON and CSV gazetteer data, following rules set out in the placename ontology.
The results demonstrated the potential for RML can be used to support transparent uplift of data with standards-based as an integral part of the semantic data enrichment pipeline. The key advantage of RML as a tool for this purpose is it enables the definition of sophisticated and standards-based logic for enrichment, separated from the mechanism for enrichment.
A fragment of the RML generated for enriching South Australia geojson gazetteer data with metadata is below. Similar mappings were generated for all Australian states and territories.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix fsdf: <http://linked.data.gov.au/def/fsdf/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
...
<#SA_GazetteerSites2020_DS>
rml:source "./SA/SA_sampleGazetteerSites_GDA2020.geojson" ;
rml:referenceFormulation ql:JSONPath ;
rml:iterator "$".
<#SA_GeoFeatures_GazetteerSites2020_DS>
rml:source "./SA/SA_sampleGazetteerSites_GDA2020.geojson" ;
rml:referenceFormulation ql:JSONPath ;
rml:iterator "$.features[*]".
## CSV with Geometry in WKT
<#SA_SitesSource> a rml:LogicalSource ;
rml:source "./SA/sites.csv" ;
rml:referenceFormulation ql:CSV ;
rml:iterator "$" .
## CSV with metadata for SA
<#SA_SitesMetaDataSource> a rml:LogicalSource ;
rml:source "./meta_data_SA.csv" ;
rml:referenceFormulation ql:CSV ;
rml:iterator "$" .
################################## MetaData Mapping ##################################
<#MetaDataMapping> a rr:TriplesMap;
rml:logicalSource <#SA_SitesMetaDataSource>;
rr:subjectMap [
rr:template "http://example.com/{State}/MetaData/ds_{DatasetNumber}";
rr:class dcat:DataSet;
];
rr:predicateObjectMap [
rr:predicate dcterms:identifier;
rr:objectMap [
rml:reference "DatasetNumber"
]
] ;
rr:predicateObjectMap [
rr:predicate dcterms:title;
rr:objectMap [
rml:reference "Title"
]
] ;
rr:predicateObjectMap [
rr:predicate dcterms:description;
rr:objectMap [
rml:reference "Description"
]
];
rr:predicateObjectMap [
rr:predicate fsdf:hasCustodian;
rr:objectMap [
rr:parentTriplesMap <#MetaDataCustodianMapping> ;
]
];
rr:predicateObjectMap [
rr:predicate dcterms:issued;
rr:objectMap [
rml:reference "DatasetAcquiredOn";
rr:datatype xsd:date
]
];
rr:predicateObjectMap [
rr:predicate dcterms:licence;
rr:objectMap [
rml:reference "licence";
rr:datatype xsd:string
]
];
rr:predicateObjectMap [
rr:predicate dcterms:publisher;
rr:objectMap [
rml:reference "Publisher";
rr:datatype xsd:string
]
];
...
Listing 2
4.6. Software Projects and Products
4.6.1. OSGeo pygeoapi
4.6.1.1. Extension of Records with PROV using pygeo templates
pygeoapi’s template system was extended to support provenance traces.
Provenance metadata is naturally is a directed acyclic graph (DAG) of arbitrary depth and detail.
A simple catalog test case was established using the provenance model and the schema from https://ogcincubator.github.io/bblock-prov-schema/bblock/ogc.ogc-utils.prov.
A generalised provenance rendering template was then referenced when rendering a standardised property (provenance) configured to understand the high level abstract typing of Entity, Activity and Agent, and recursively nest content. (https://github.com/MarkusWilhelmJahn/pygeoapi/blob/master/CODESPRINT/prov_recursive.html)
This successfully presented such a provenance graph as shown:
Figure 5 — Rendering an embedded, standard provenance schema in OGC API Record
This has three significant implications for considering metadata standardisation:
A rich provenance model can be handled with current reference implementation software and hence there is no substantial technical barrier
Well-known metadata extensions can be supported using well-known attachment properties, which will be necessary for each object type (record, feature etc)
Well-known schemas will be required to to allow predictable behaviour given the inherent complexity of some types of metadata.
It is also worth noting that a scalable solution will probably require more dynamic UI filtering, and can load referenced objects that are not directly embedded in the results.
5. Discussion
5.1. ISO 19115-4
The draft project plan for ISO 19115-4 (JSON encoding) and revisions to 19115-1 was presented and feedback solicited.
The issue of modular implementation was discussed. Of particular relevance to the code sprint was the statement “Recommendation is to use PROV as a vocabulary for encoding ‘lineage’ within a DCAT context “. The codesprint undertook several experiments to validate the feasibility of this from the perspective of re-use of a draft JSON schema that is compatible with GeoJSON Features and OGC API Records, and supports a FAIR mapping to DCAT.
5.2. Identification of common patterns and challenges
As a result of discussion of various aspects of various metadata standards being examined by participants during the codesprint, a “framework” for characterising the common patterns being observed was developed. This is described in detail in the results Clause 4.2.1, since the consensus reached around this understanding represents a significant outcome of the codesprint, and a theory that can be systematically tested and operationalised in terms of future efforts to support standards application and alignments.
5.3. ISO 19115 DCAT mapping
Publish a conceptual mapping from ISO 19115-1:2014 to DCAT, preferably OGC GeoDCAT
Recommendation is to use PROV as a vocabulary for encoding ‘lineage’ within a DCAT context
This should be as lossless as possible
It should state preferred choices where there are several possibilities
As well as a ‘concept to concept’ mapping, there need to be some instructions on restructuring e.g. when to put resource constraints on the DCAT dataset and when to put them on the DCAT distribution.
Consider publishing this mapping in a machine readable way, perhaps on the OGC incubator
On the last point, it was noted that OGC has developed an approach to publish mappings from JSON to RDF models — by mapping JSON schemas to JSON-LD, and via the URI identifiers to RDF models.
Work is underway to support mapping OGC API Records and STAC to GeoDCAT — this can be applied to ISO 19115-4 JSON encoding, and can be co-designed with it to minimise the friction between schema patterns and conceptual mappings.
5.4. FAIR metadata mappings
Examples from the mapping of OGC API Records to DCAT were shown, highlighting the additional challenges when schemas use labels for elements with different semantics to the target conceptual model.
For example it was shown that the “language” element in OGC API records refers to the language of the record, however DCAT uses the Dublin Core property “dct:language” which references the language of the resource described by the record.
It is thus essential that metadata standards have explicit mappings published with good examples to meet FAIR requirements, and investment in machine readable mappings will be required to allow these nuanced relationships to be clearly identified.
The following quote neatly sums up this from the perspectice of one participant: “A formal mapping is indeed very welcome! Our customers often start doing their own mapping validation exercise. Which is not very productive”
5.5. Describing Provenance with OGC API Records
OGC has been exploring the use of reusable building blocks to enable elements of standards to be re-used, extended, and modified with a FAIR (Findable, Accessible, Interoperable, Reusable) way. A focus of this code sprint was around extension of various standards to include provenance information.
A draft JSON schema based on the W3C Provenance Vocabulary (PROV-O) was available for testing.
The following research questions
Can we create machine-readable provenance chains to enable scientific data re-creation?
Can generate of such machine-readable provenance chains be automatable in scientific workflows?
Can such provenance chains be exploited in current technology .
6. Conclusions
6.1. Overview
The code sprint provided a significant opportunity to highlight the heterogeneous nature of metadata standards and perspectives from multiple stakeholders. As a result it was possible to articulate a coherent framework for understanding scope of different activities and a propose a coherent architecture to guide ongoing work in metadata standards.
In addition, progress was made on a number of specific activities, providing a validation that the proposed framework can be progressed incrementally, without it being necessary for all participants to fully engage with all the diverse aspects of metadata standardisation. The general principles of improving FAIR-ness of metadata standards, whilst supporting multiple existing and emerging communities
6.2. Future Work
The sprint participants made the following recommendations regarding future work:
Focus on FAIR publication of mappings between metadata standards
Support for XSLT2 transformations in metadata publishing
Ongoing activities are planned as part of research projects to complete the implementation of the XAI approach using GeoDCAT, OGC API Processes and OGC API Records etc/
Annex A
(informative)
Revision History
Table — Revision History
Date | Release | Author | Primary clauses modified | Description |
---|---|---|---|---|
2024-11-19 | 0.1 | D. Stolarz | all | capture scope discussions |
2024-12-04 | 0.2 | R. Atkinson | all | Initial version |
2025-01-06 | 0.3 | R. Atkinson | all | Reformat to OGC Discussion Paper |
Bibliography
[1] Mark Burgoyne, David Blodgett, Charles Heazel, Chris Little: OGC 19-086r6, OGC API — Environmental Data Retrieval Standard. Open Geospatial Consortium (2023). http://www.opengis.net/doc/IS/ogcapi-edr-1/1.1.0.
[2] Clemens Portele, Panagiotis (Peter) A. Vretanos, Charles Heazel: OGC 17-069r4, OGC API — Features — Part 1: Core corrigendum. Open Geospatial Consortium (2022). http://www.opengis.net/doc/IS/ogcapi-features-1/1.0.1.
[3] Clemens Portele, Panagiotis (Peter) A. Vretanos: OGC 18-058r1, OGC API — Features — Part 2: Coordinate Reference Systems by Reference corrigendum. Open Geospatial Consortium (2022). http://www.opengis.net/doc/IS/ogcapi-features-2/1.0.1.
[4] Benjamin Pross, Panagiotis (Peter) A. Vretanos: OGC 18-062r2, OGC API — Processes — Part 1: Core. Open Geospatial Consortium (2021). http://www.opengis.net/doc/IS/ogcapi-processes-1/1.0.0.
[5] Joan Masó, Jérôme Jacovella-St-Louis: OGC 20-057, OGC API — Tiles — Part 1: Core. Open Geospatial Consortium (2022). http://www.opengis.net/doc/IS/ogcapi-tiles-1/1.0.0.
[6] Steve Liang, Tania Khalafbeigi, Hylke van der Schaaf: OGC 18-088, OGC SensorThings API Part 1: Sensing Version 1.1. Open Geospatial Consortium (2021). http://www.opengis.net/doc/is/sensorthings/1.1.0.
[7] ISO: ISO 19135-1:2015, Geographic information — Procedures for item registration — Part 1: Fundamentals. International Organization for Standardization, Geneva (2015). https://www.iso.org/standard/54721.html.
[8] ISO: ISO 19157-1:2023, Geographic information — Data quality — Part 1: General requirements. International Organization for Standardization, Geneva (2023). https://www.iso.org/standard/78900.html.
[9] ISO: ISO/DIS 19157-3, Geographic information — Data quality — Part 3: Data quality measures register. International Organization for Standardization, Geneva https://www.iso.org/standard/87032.html.
[10] OGC: OGC API — Discrete Global Grid Systems — Part 1: Core (draft), Open Geospatial Consortium. https://docs.ogc.org/DRAFTS/21-038.html
[11] OGC: OGC API — Records — Part 1: Core (draft), Open Geospatial Consortium. https://docs.ogc.org/DRAFTS/20-004.html
[12] OGC: OGC Features and Geometries JSON — Part 1: Core (draft), Open Geospatial Consortium. https://docs.ogc.org/DRAFTS/21-045r1.html