language en

The Machine Learning Sailor Ontology

Release: December 7th, 2023

Latest version:
http://w3id.org/mlso
Revision:
1.0.0
Authors:
Anastasia Dimou
Ioannis Dasoulas
Download serialization:
JSON-LD RDF/XML N-Triples TTL
License:
http://creativecommons.org/licenses/by/2.0/

Ontology Specification Draft

Abstract

An ontology for describing machine learning datasets, tasks, pipelines, experiments, software and publications. The ontology extends ML-Schema, DCAT, FaBiO and SDO.

Introduction back to ToC

The Machine Learning Sailor Ontology (MLSO) is an ontology that formally represents machine learning datasets, along with their features and characteristics, machine learning tasks, their implementations, experiments and their executions, relevant software and publications, all complemented with rich metadata. The ontology extends ML-Schema, DCAT, FaBiO and SDO, complemented by 8 taxonomies formulated as controlled SKOS vocabularies.

The ontology was developed in a data-centric manner, with the goal of curating data from diverse machine learning publicly available sources, examining machine-learning-related data and metadata from online repositories, such as OpenML, Kaggle and Papers with Code.

We have favored the reuse of existing ontologies and standards while developing this ontology. We built upon and extend the ML-Schema, using it as a basis to describe machine learning datasets and pipelines. We extend ML-Schema, combining it with the Data Catalog Vocabulary (DCAT), the W3C recommendation vocabulary designed to describe datasets and data catalogs on the Web. We also combine ML-Schema with the Software Description Ontology (SDO), to represent machine learning software and their characteristics, as well as with the FRBR-aligned Bibliographic Ontology (FaBiO) to describe machine learning scientific publications and other publishable entities.

Check out MLSO's GitHub Repository, the turtle files for the ontology and the taxonomies.

Namespace declarations

Table 1: Namespaces used in the document
[Ontology NS Prefix]<http://w3id.org/mlso/>
adms<http://www.w3.org/ns/adms#>
dcat<http://www.w3.org/ns/dcat>
dcat1<http://www.w3.org/ns/dcat#>
dcterms<http://purl.org/dc/terms/>
edamontology<http://edamontology.org/>
fabio<http://purl.org/spar/fabio/>
foaf<http://xmlns.com/foaf/0.1/>
mls<http://www.w3.org/ns/mls#>
mlso<http://w3id.org/mlso>
ov<http://open.vocab.org/terms/>
owl<http://www.w3.org/2002/07/owl#>
prov<http://www.w3.org/ns/prov#>
rdf<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdfs<http://www.w3.org/2000/01/rdf-schema#>
schema<http://schema.org/>
sdo<https://w3id.org/okn/o/sd#>
skos<http://www.w3.org/2004/02/skos/core#>
xml<http://www.w3.org/XML/1998/namespace>
xsd<http://www.w3.org/2001/XMLSchema#>

The Machine Learning Sailor Ontology: Overview back to ToC

This ontology has the following classes and properties.

Classes

Object Properties

Data Properties

Cross-reference for The Machine Learning Sailor Ontology classes, object properties and data properties back to ToC

This section provides details for each class and property defined by The Machine Learning Sailor Ontology.

Classes

Agentc back to ToC or Class ToC

IRI: http://xmlns.com/foaf/0.1/Agent

Is defined by
http://xmlns.com/foaf/0.1/

Algorithmc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Algorithm

Is defined by
http://www.w3.org/ns/mls
is in domain of
has algorithm type op, has learning method type op

Catalogc back to ToC or Class ToC

IRI: http://www.w3.org/ns/dcat#Catalog

Is defined by
http://www.w3.org/ns/dcat
has super-classes
Dataset c

Conceptc back to ToC or Class ToC

IRI: http://www.w3.org/2004/02/skos/core#Concept

Is defined by
http://www.w3.org/2004/02/skos/core#
is in range of
has algorithm type op, has data characteristic type op, has evaluation measure type op, has evaluation procedure type op, has learning method type op, has task type op, has type op, related to field op

Datac back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Data

Is defined by
http://www.w3.org/ns/mls
has super-classes
has sub-classes
Distribution c
is in domain of
has data loader location dp, has format op, has modality op, has variant op
is in range of
has variant op, trained on op

Data Characteristicc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#DataCharacteristic

Is defined by
http://www.w3.org/ns/mls
has sub-classes
HyperParameter Characteristic c
is in domain of
has data characteristic type op

Data Modalityc back to ToC or Class ToC

IRI: http://w3id.org/mlso/DataModality

Data modality refers to the different types or forms of data that exist, and it is often used to describe the way information is represented or encoded in a dataset.
has super-classes
information entity c
is in range of
has modality op

Data Servicec back to ToC or Class ToC

IRI: http://www.w3.org/ns/dcat#DataService

Is defined by
http://www.w3.org/ns/dcat
has super-classes
information entity c

Datasetc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Dataset

has sub-classes
Catalog c

Dataset Characteristicc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#DatasetCharacteristic

Dataset Characteristic is a distinguishing quality or property that distinguish one dataset from another.
Is defined by
http://www.w3.org/ns/mls

Distributionc back to ToC or Class ToC

IRI: http://www.w3.org/ns/dcat#Distribution

Is defined by
http://www.w3.org/ns/dcat
has super-classes
Data c
is in domain of
has MD5 dp, has default target feature op, has id feature op, ignores feature op

Evaluation Measurec back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#EvaluationMeasure

Is defined by
http://www.w3.org/ns/mls
is in domain of
has evaluation measure type op

Evaluation Procedurec back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#EvaluationProcedure

Is defined by
http://www.w3.org/ns/mls
is in domain of
has evaluation procedure type op

Evaluation Specificationc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#EvaluationSpecification

Is defined by
http://www.w3.org/ns/mls

Experimentc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Experiment

Is defined by
http://w3id.org/mlso

Featurec back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Feature

Is defined by
http://www.w3.org/ns/mls
has super-classes
is in range of
has default target feature op, has id feature op, ignores feature op

Feature Characteristicc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#FeatureCharacteristic

Feature Characteristic is a distinguishing quality or property that distinguish one dataset feature from another.
Is defined by
http://www.w3.org/ns/mls

Formatc back to ToC or Class ToC

IRI: http://edamontology.org/format_1950

Is defined by
http://edamontology.org/
has super-classes
information entity c
is in range of
has format op

HyperParameterc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#HyperParameter

Is defined by
http://www.w3.org/ns/mls

HyperParameter Characteristicc back to ToC or Class ToC

IRI: http://w3id.org/mlso/HyperParameterCharacteristic

HyperParameter Characteristic is a distinguishing quality or property that distinguish one hyper-parameter from another.
has super-classes
Data Characteristic c

HyperParameter Settingc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#HyperParameterSetting

Is defined by
http://www.w3.org/ns/mls

Implementationc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Implementation

Is defined by
http://www.w3.org/ns/mls
is in range of
has related implementation op

Implementation Characteristicc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#ImplementationCharacteristic

Is defined by
http://www.w3.org/ns/mls

Locationc back to ToC or Class ToC

IRI: http://www.w3.org/ns/prov#Location

Is defined by
http://www.w3.org/ns/prov#

Modelc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Model

Is defined by
http://www.w3.org/ns/mls
is in domain of
trained on op

Model Characteristicc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#ModelCharacteristic

Is defined by
http://www.w3.org/ns/mls

Model Evaluationc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#ModelEvaluation

Is defined by
http://www.w3.org/ns/mls

Processc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Process

Is defined by
http://www.w3.org/ns/mls

Qualityc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Quality

Is defined by
http://www.w3.org/ns/mls

Runc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Run

Is defined by
http://www.w3.org/ns/mls

Scientific Workc back to ToC or Class ToC

IRI: http://w3id.org/mlso/ScientificWork

has super-classes
information entity c
is in domain of
bibliographic citation dp, scientific reference of op
is in range of
has scientific reference op
is same as
work
is also defined as
named individual

Softwarec back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Software

Is defined by
http://www.w3.org/ns/mls
is in range of
has related software op

Source Codec back to ToC or Class ToC

IRI: https://w3id.org/okn/o/sd#SourceCode

Is defined by
https://w3id.org/okn/o/sd
has super-classes
information entity c

Studyc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Study

Is defined by
http://www.w3.org/ns/mls

Taskc back to ToC or Class ToC

IRI: http://www.w3.org/ns/mls#Task

Is defined by
http://www.w3.org/ns/mls
is in domain of
has train test split indices dp

Object Properties

access serviceop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/dcat#accessService

Is defined by
http://www.w3.org/ns/dcat

achievesop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#achieves

Is defined by
http://www.w3.org/ns/mls

at locationop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/prov#atLocation

Is defined by
http://www.w3.org/ns/prov

datasetop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/dcat#dataset

Is defined by
http://www.w3.org/ns/dcat

defined onop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#definedOn

Is defined by
http://www.w3.org/ns/mls

definesop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#defines

Is defined by
http://www.w3.org/ns/mls

distributionop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/dcat#distribution

Is defined by
http://www.w3.org/ns/dcat

executesop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#executes

Is defined by
http://www.w3.org/ns/mls

has algorithm typeop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasAlgorithmType

A relation between an algorithm the category of algorithms that it belongs to.
has super-properties
has type op
has domain
Algorithm c
has range
Concept c

has data characteristic typeop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasDataCharacteristicType

A relation between a data characteristic and the category of data characteristics that it belongs to.
has super-properties
has type op
has domain
Data Characteristic c
has range
Concept c

has default target featureop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasDefaultTargetFeature

A relation between a dataset and a feature that is the default target feature.
has domain
Distribution c
has range
Feature c

has evaluation measure typeop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasEvaluationMeasureType

A relation between an evaluation measure and the category of evaluation measures that it belongs to.
has super-properties
has type op
has domain
Evaluation Measure c
has range
Concept c

has evaluation procedure typeop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasEvaluationProcedureType

A relation between an evaluation procedure and the category of evaluation procedures that it belongs to.
has super-properties
has type op
has domain
Evaluation Procedure c
has range
Concept c

has formatop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasFormat

A relation between a data file and its format.
has domain
Data c
has range
Format c

has hyper parameterop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#hasHyperParameter

Is defined by
http://www.w3.org/ns/mls

has id featureop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasIdFeature

A relation between a dataset and a feature that is used for identifying the different instances of the dataset.
has domain
Distribution c
has range
Feature c

has inputop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#hasInput

Is defined by
http://www.w3.org/ns/mls

has learning method typeop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasLearningMethodType

A relation between an algorithm learning method and the category of algorithm learning methods that it belongs to.
has super-properties
has type op
has domain
Algorithm c
has range
Concept c

has modalityop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasModality

A relation between a data entity and its data modality.
has domain
Data c
has range
Data Modality c

has outputop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#hasOutput

Is defined by
http://www.w3.org/ns/mls

has partop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#hasPart

Is defined by
http://www.w3.org/ns/mls

has qualityop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#hasQuality

Is defined by
http://www.w3.org/ns/mls

has related implementationop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasRelatedImplementation

A relation between an entity and a machine learning implementation that leverages, references or is related to this entity in some way.
has domain
information entity c
has range
Implementation c

has related softwareop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasRelatedSoftware

A relation between an entity and a software implementation that leverages, references or is related to this entity in some way.
has domain
information entity c
has range
Software c

has scientific referenceop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasScientificReference

A relation between an entity and a scientific work that references it.
has range
Scientific Work c
is inverse of
scientific reference of op

has task typeop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasTaskType

A relation between a task and the category of tasks that it belongs to.
has super-properties
has type op
has range
Concept c

has typeop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasType

has variantop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/hasVariant

A relation between two data entities that share significant similarities but differ in certain characteristics or aspects.
has domain
Data c
has range
Data c
is inverse of
is variant of op

ignores featureop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/ignoresFeature

A relation between a dataset and a feature that is to be ignored when processing the dataset.
has domain
Distribution c
has range
Feature c

implementsop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#implements

Is defined by
http://www.w3.org/ns/mls

is variant ofop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/isVariantOf

is inverse of
has variant op

publisherop back to ToC or Object Property ToC

IRI: http://purl.org/dc/terms/publisher

Is defined by
http://purl.org/dc/terms

realizesop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#realizes

Is defined by
http://www.w3.org/ns/mls

related to fieldop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/relatedToField

A relation between an instance and the machine learning field that it is associated with.
has range
Concept c

scientific reference ofop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/scientificReferenceOf

has domain
Scientific Work c
is inverse of
has scientific reference op

serves datasetop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/dcat#servesDataset

Is defined by
http://www.w3.org/ns/dcat

specified byop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/mls#specifiedBy

Is defined by
http://www.w3.org/ns/mls

statusop back to ToC or Object Property ToC

IRI: http://www.w3.org/ns/adms#status

Is defined by
http://www.w3.org/ns/adms

trained onop back to ToC or Object Property ToC

IRI: http://w3id.org/mlso/trainedOn

A relation between a machine learning model and the data that it was trained on.
has domain
Model c
has range
Data c

Data Properties

abstractdp back to ToC or Data Property ToC

IRI: http://purl.org/spar/fabio/abstract

Is defined by
http://purl.org/spar/fabio

bibliographic citationdp back to ToC or Data Property ToC

IRI: http://purl.org/dc/terms/bibliographicCitation

has domain
Scientific Work c
is also defined as
annotation property

code repositorydp back to ToC or Data Property ToC

IRI: http://schema.org/codeRepository

Is defined by
http://schema.org/

download URLdp back to ToC or Data Property ToC

IRI: http://www.w3.org/ns/dcat#downloadURL

Is defined by
https://www.w3.org/ns/dcat

has ar xiv iddp back to ToC or Data Property ToC

IRI: http://purl.org/spar/fabio/hasArXivId

Is defined by
http://purl.org/spar/fabio/
has super-properties
identifier dp

has cache formatdp back to ToC or Data Property ToC

IRI: http://w3id.org/mlso/hasCacheFormat

has range
literal

has data loader locationdp back to ToC or Data Property ToC

IRI: http://w3id.org/mlso/hasDataLoaderLocation

A relation between a data entity and an online location that its data loader can be found.
has domain
Data c
has range
anyURI

has MD5dp back to ToC or Data Property ToC

IRI: http://w3id.org/mlso/hasMD5

A relation between a dataset distribution and its md5 hash.
has domain
Distribution c
has range
literal

has number of referencesdp back to ToC or Data Property ToC

IRI: http://w3id.org/mlso/hasNumberOfReferences

A relation between an entity and the number of times it is referenced in scientific publications.
has range
int

has predictions locationdp back to ToC or Data Property ToC

IRI: http://w3id.org/mlso/hasPredictionsLocation

A relation between a machine learning experiment execution and the URI where its predictions can be found.
has range
anyURI

has train test split indicesdp back to ToC or Data Property ToC

IRI: http://w3id.org/mlso/hasTrainTestSplitIndices

A relation between a machine learning task and the dataset split it uses on the dataset that it is defined on.
has domain
Task c
has range
literal

hasVersiondp back to ToC or Data Property ToC

IRI: http://purl.org/dc/terms/hasVersion

Is defined by
http://purl.org/dc/terms

identifierdp back to ToC or Data Property ToC

IRI: http://purl.org/dc/terms/identifier

Is defined by
http://purl.org/dc/terms
has sub-properties
has ar xiv id dp

issueddp back to ToC or Data Property ToC

IRI: http://purl.org/dc/terms/issued

Is defined by
http://purl.org/dc/terms

keyworddp back to ToC or Data Property ToC

IRI: http://www.w3.org/ns/dcat#keyword

Is defined by
http://www.w3.org/ns/dcat

landing pagedp back to ToC or Data Property ToC

IRI: http://www.w3.org/ns/dcat#landingPage

Is defined by
http://www.w3.org/ns/dcat

licencedp back to ToC or Data Property ToC

IRI: http://purl.org/dc/terms/licence

Is defined by
http://purl.org/dc/terms

software requirementsdp back to ToC or Data Property ToC

IRI: https://w3id.org/okn/o/sd#softwareRequirements

Is defined by
https://w3id.org/okn/o/sd#

Legend back to ToC

c: Classes
op: Object Properties
dp: Data Properties

References back to ToC

Add your references here. It is recommended to have them as a list.

Acknowledgments back to ToC

The authors would like to thank Silvio Peroni for developing LODE, a Live OWL Documentation Environment, which is used for representing the Cross Referencing Section of this document and Daniel Garijo for developing Widoco, the program used to create the template used in this documentation.