DataCite Metadata Schema
Documentation for the Publication and Citation of Research Data
Members of the Metadata Working GroupMadeleine de Smaele, TU Delft (co‐chair of working group) Joan Starr,
California Digital Library (co‐chair of working group) Jan Ashton, British Library
Amy Barton, Purdue University Library
Tina Bradford, NRC/CISTI (New)
Anne Ciolek‐Figiel, Inist‐CNRS
Stefanie Dietiker, ETH Zurich (New)
Jannean Elliott, DOE/OSTI
Berrit Genat, TIB
Karoline Harzenetter, GESIS
Barbara Hirschmann, ETH Zurich (Departing)
Stefan Jakobsson, SND (New)
Jean‐Yves Mailloux, NRC/CISTI (Departing)
Lars Holm Nielsen, CERN (Departing)
Mohamed Yahia, Inist‐CNRS
Frauke Ziedorn, TIB (On leave, Metadata Supervisor)
resourcesmainly a schema
The DataCite Metadata Schema is a list of core metadata properties chosen for an accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions.
DataCite Metadata Working Group. (2016). DataCite Metadata Schema for the Publication and Citation of Research Data. Version 4.0. DataCite e.V. http://doi.org/10.5438/0013
DOIDataCite does not allocate DOIs directly; this activity is undertaken by many of DataCite’s members, who act as DOI allocating agents. DataCite members enable data owners, stewards, or archives to assign persistent identifiers to research data. The list below provides details and contact information for all of DataCite’s members.
INIST CNRS, CERN, CDL...
source codeThe DataCite assets server.
This repository holds the official metadata schemas from DataCite as required by the DataCite Metadata Store.
Uses the middleman static site generator
Using Schema.org for DOI Registration
a "post" of the DataCite blog : https://doi.org/10.5438/0000-00CC
Three weeks ago we started assigning DOIs to every post on this blog "https://blog.datacite.or" (Fenner, 2016c). The process we implemented uses a new command line utility and integrates well with our the publishing workflow, with (almost) no extra effort compared to how we published blog posts before.
Given that DataCite is a DOI registration agency, we obviously are careful about following best practices for assigning DOIs. DataCite focusses on DOIs for research data, but many of the general principles can also apply to blog posts. And we have learned a few things already.
Using schema.org metadata embedded in landing pages
Our initial implementation collected the metadata required for DOI registration in a way that is specific to a particular type of blogging software, so-called static site generators. While popular, this leaves out a large number of blogs, for example every blog hosted by Wordpress, by far the most popular blogging platform. We have now relaunched our blog to collect metadata differently, generic enough to work for any blog, but also well aligned with best practices for DOIs.
Our practice is that every DOI should resolve to a landing page, and that landing page should provide both human- and machine-readable metadata.
Machine-readable metadata can be embedded into web pages in a number of ways. Traditionally this was done using HTML meta tags, more recent approaches to embedding metadata in HTML include microdata, microformats and RDFa. An alternative approach is to embed the metadata using JSON and a script tag The latter approach is easier to implement, as all metadata are in a single place, and the JSON can be embedded dynamically via a script.
As for the vocabulary, the DataCite Metadata Schema has never been widely used for metadata embedded in web pages. Dublin Core Metadata (“Dublin Core Metadata Element Set, Version 1.1,” 2012) are often used for metadata in HTML meta tags. Schema.org is an initiative started in 2011 with many of the same goals as Dublin Core, namely to create, maintain, and promote schemas for structured data on the Internet.
DOI minting workflowPublishing a blog post with embedded schema.org metadata, which is then used to mint a DOI and register DOI metadata, changes the DOI minting workflow for this blog. Although the publication workflow of a blog is much simpler than for peer-reviewed content, there are still three distinct phases:
- post is drafted by author
- post is shared for feedback with staff (and possibly others)
- post is published
Blog posts in JATS XMLBlog posts are web pages and the landing page for the DOI also contains the fulltext of the post. But there are good reasons to make a blog post also available in downloadable form, most importantly to facilitate reuse, and for archiving. Journal Article Tag Suite (JATS) is an XML standard for tagging journal articles, used by the PubMed Central full-text archive of biomedical literature and by an increasing number of scholarly publishers.
JATS is an appropriate format for the blog posts of this blog, and starting this week all of our posts are also available in JATS XML format. You can see the download URL in the schema.org markup (the JATS for this post is here), we will add a more visible link to all posts once some minor tagging issues are resolved. We will also start registering the download URL with the DataCite MDS as media, making the JATS XML available to DOI content negotiation, and thus direct download. This should facilitate reuse by others, e.g. aggregation of content from multiple sources and display of content in different formats. This blog uses the Creative Commons Attribution license, allowing the copying, redistribution and remixing of the material in any medium or format for any purpose.