How to use MIME types for RDF typed literals

Jakob

RDF allows to use typed literals to specify the data type of string values. This is usually used for XML Schema datatypes such as xsd:integer and xsd:date, e.g.

<https://example.org/> dc:created "1999-12-17"^^xsd:date.

How can typed literals in RDF be used to specify data type defined by (or extending the) IANA MIME Types registry? I'd like to do something like this:

<https://example.org/>
  dc:description "I love cookies!" ;
  dc:description "I <em>love</em> cookies!"^^<text/html> ;
  dc:description "I *love* cookies!"^^<text/x-markdown> ;
  dc:description "I \\emph{love} cookies!"^^<application/x-tex> .

But plain MIME types are no valid datatype IRIs. Does an official URI namespace exist for MIME types and have such URIs been used for RDF typed literals?

IS4

There is no official way to use MIME types as RDF (or XML Schema) datatypes, because it is ambiguous what such a thing would mean ‒ MIME types describe a sequence of bytes, while RDF literals are always a sequence of Unicode characters. You'd have to define a method the lexical value should be converted to the byte sequence and then interpreted, and for non-textual formats, you would likely have to start from xsd:base64Binary or xsd:hexBinary. In addition to that, some of your examples are only fragments, not documents valid on their own, so let's look at other options first:

Non-MIME datatypes

I'd recommend first looking for concrete identifiers for the formats you want to support, but even then you will likely have several options:

  • rdf:XMLLiteral, rdf:HTML, and rdf:JSON are official and should be used for valid literals in these languages.
  • Extra Types! is an existing vocabulary for formats and fragments. For your example, you could easily use xtypes:Fragment-HTML, xtypes:Fragment-Markdown, or xtypes:Fragment-LaTeX. What might be a bit ambiguous is what exactly a "fragment" means here. I assume it means that something like '<tag attr="a">'^^xtypes:Fragment-XML is valid, while '<tag attr="a">'^^rdf:XMLLiteral is not (it must be self-contained, akin to application/xml-external-parsed-entity).
  • All formats are also present as entities in DBPedia, so you could use a URI like http://dbpedia.org/resource/Markdown, but these are not explicitly defined as datatypes, so some processors might have a hard time trying to find their definition.
  • Another nice vocabulary is Unique URIs for File Formats from W3C, but it only contains RDF serialization formats and does not explicitly define them as datatypes either.
  • Some data formats might have existing YAML tags, which are very similar to XML Schema datatypes and are representable using URIs. I was however unable to find more examples other than tag:yaml.org,2002:yaml for YAML.
  • There is also a remote possibility to derive URIs from PUBLIC identifiers that were used for SGML notations. For TeX, the URI would be urn:publicid:%2B:ISBN+0-201-13448-9;Knuth:NOTATION+The+TeXbook:EN, but these are not produced anymore (you can find a collection of them here).

I would not recommend using any other URI scheme than http(s) however, since at least humans should be able to find out what it means through HTTP.

URIs for MIME types

If you want to have URIs for MIME types (but not necessarily used as datatypes), you could use something like uri4uri to arrive at RDF descriptions of MIME types, for example https://w3id.org/uri4uri/mime/text/markdown (but note that the charset parameter is required for Markdown, so it should be https://w3id.org/uri4uri/mime/text/markdown;charset=utf-8 ‒ parameters are supported too!).

You could also refer to the IANA registration document such as https://www.iana.org/assignments/media-types/text/markdown, but that's just a document and not all MIME types have those. This URL pattern could also be used for non-standard MIME types such as https://www.iana.org/assignments/media-types/text/yaml but these will not be resolvable unless officially registered.

Use language tags

Another option I could come use up with is to (ab)use language tags instead of datatypes for this purpose, such as zxx-Latn-x-md for Markdown or zxx-Latn-x-tex for TeX. This is absolutely not standardized (except for zxx usable for, among other things, programming source codes, and Latn for texts using the Latin alphabet), and I would not recommend using it for literals that should be parsed ‒ think of it as affecting the presentation of the text, such as picking a syntax highlighter.

Use data: URIs

The only standardized way to combine a text and MIME type is using the data URI scheme, but you won't get a literal:

<https://example.org/>
  dc:description <data:text/markdown;charset=utf-8,I%20*love*%20cookies!> .

<data:text/markdown;charset=utf-8,I%20*love*%20cookies!>
  a <https://w3id.org/uri4uri/mime/text/markdown;charset=utf-8> ;
  rdf:value "I *love* cookies!" .

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related