Tuesday, February 14, 2017

format file 2 import into HAL (XML TEI) ZENODO (JSON) via API. Converter, serialization, Metadata from marc, marcXML; YAML...

Une introduction en français sur les formats:

Implementations below are written in different languages:



web service for simple schema


import to HAL


download HAL XML-TEI schema




chapter book
example of a chapter book
example of an export via the TEI button

import to ZENODO


download schema JSON ZENODO

(863 lines)
"description": "Describe information needed for deposit module.",
  "title": "Zenodo Deposit Schema v1.0.0",
  "required": [

(432 lines)
"$schema": "http://json-schema.org/draft-04/schema#",
  "additionalProperties": false,
  "description": "Describe information needed for deposit module.",
  "id": "http://zenodo.org/schemas/deposits/records/legacyjson.json",
  "properties": {
    "$schema": {
      "type": "string"

example of a chapter book
example of an export via the JSON button
(not exactly the same of the initial JSON)

API Documentation for developers:

Create a new deposit and obtain a deposit ID:

curl -i -H "Content-Type: application/json" -X POST --data '{"metadata":{"access_right": "open","creators": [{"affiliation": "Brain Catalogue", "name": "Toro, Roberto"}],"description": "Brain MRI","keywords": ["MRI", "Brain"],"license": "cc-by-nc-4.0", "title": "Brain MRI", "upload_type": "dataset"}}' https://zenodo.org/api/deposit/depositions/?access_token=$token |tee zenodo.json

JSON Schema zenodo and invenio

zenodo is based on invenio.
Invenio uses JSON Schema to describe formats of managed entities such as records.

Based on JSON Schema, we can generate forms permitting users and curators to enter records.  There are two use cases: (1) deposition by end users such as physicists; (2) editing by power users such as curators and librarians.
This talk will show work-in-progress for both these scenarios.  We can discuss pros/cons of using available tools such as JSON Editor:
Invenio uses https://github.com/jdorn/json-editor

interesting slides
  • good/bad performances of "JSON Editor"
  • BibEdit

export from catalogue SUDOC (and Brise-ES)


Somes codes XML JSON

generate code.json / zenodo.json metadata files for github? (comments @2014)
Seems like it would be a straight-forward exercise to serialize some json-ld from an R DESCRIPTION file (and potentially other sources) to provide more metadata to zenodo (and potentially other sites if this becomes a more standard schema). Not sure if this package is the right home for it.
Ref. https://github.com/ropensci/zenodo/issues/3

Minimal metadata schemas for science software and code, in JSON and XML

Matthew B. Jones, Carl Boettiger, Abby Cabunoc Mayes, Arfon Smith, Peter Slaughter, Kyle Niemeyer, Yolanda Gil, Martin Fenner, Krzysztof Nowak, Mark Hahnel, Luke Coy, Alice Allen, Mercè Crosas, Ashley Sands, Neil Chue Hong, Patricia Cruse, Dan Katz, Carole Goble. 2016. CodeMeta: an exchange schema for software metadata. KNB Data Repository. doi:10.5063/schema/codemeta-1.0
(193 lines)

CodeMeta contributors are creating a minimal metadata schema for science software and code, in JSON and XML. The goal of CodeMeta is to create a concept vocabulary that can be used to standardize the exchange of software metadata across repositories and organizations. 
CodeMeta started by comparing the software metadata used across multiple repositories, which resulted in the CodeMeta Metadata Crosswalk.

That crosswalk was then used to generate a set of software metadata concepts, which were arranged into a JSON-LD context for serialization (see codemeta.jsonld, or an example CodeMeta document).

This is an extension of the work done by @arfon, @hubgit, @kaythaney and others on Code as a Research Object / fidgit. Code as a research object is a Mozilla Science Lab (@MozillaScience) project working with community members to explore how we can better integrate code and scientific software into the scholarly workflow. Out of this came fidgit - a proof of concept integration between Github and figshare, providing a Digital Object Identifier (DOI) for the code which allows for persistent reference linking.

With codemeta, we want to formalize the schema used to map between the different services (Github, figshare, Zenodo) to help others plug into existing systems. Having a standard software metadata interoperability schema will allow other data archivers and libraries join in. This will help keep science on the web shareable and interoperable!


This repository contains the software implementation for our paper A Novel Approach to Higgs Coupling Measurements (Cranmer, Kreiss, Lopez-Val, Plehn), arXiv:1401.0080 [hep-ph]. It contains tools to apply the discussed methods to new models and contains a Makefile to recreate the plots in the paper.

Serializer Serialization


The Serializer component is meant to be used to turn objects into a specific format (XML, JSON, YAML, ...) and the other way around.

php framework

Serializers allow complex data such as querysets and model instances to be converted to native Python datatypes that can then be easily rendered into JSON, XML or other content types. Serializers also provide deserialization, allowing parsed data to be converted back into complex types, after first validating the incoming data.


brief history

In the late 1990s, a push to provide an alternative to the standard serialization protocols started: XML was used to produce a human readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte-stream-based encoding, but by this point larger storage and transmission capacities made file size less of a concern than in the early days of computing. Binary XML had been proposed as a compromise which was not readable by plain-text editors, but was more compact than regular XML. In the 2000s, XML was often used for asynchronous transfer of structured data between client and server in Ajax web applications.

JSON is a more lightweight plain-text alternative to XML which is also commonly used for client-server communication in web applications. JSON is based on JavaScript syntax, but is supported in other programming languages as well.

Another alternative, YAML, is similar to JSON and includes features that make it more powerful for serialization, more "human friendly," and potentially more compact. These features include a notion of tagging data types, support for non-hierarchical data structures, the option to structure data with indentation, and multiple forms of scalar data quoting.

Many institutions, such as archives and libraries, attempt to future proof their backup archives—in particular, database dumps—by storing them in some relatively human-readable serialized format.

google gears

Le transfert de fichier texte avec l'apparition de l'internet a laissé place à des protocoles client/Serveur gérant le transfert de données sous forme de classes. Les anciens clients avaient des cookies dont la taille et l'origine étaient limités. Les objets sont l'évolution des cookies et peuvent ou non être sauvegardés dans l'espace de travail du navigateur web.

Google Gears est un plug in AJAX pour navigateur web. Il permet de façon transparente de sauvegarder des données localement dans une base de données SQLite durant une connexion internet. Ces données pourront être utilisées en mode non connecté. Il est fourni par défaut avec Google Chrome. Les services web en ligne Google Reader et Remember the Milk sont compatibles Google Gears

Zenodo serializer

bibtex, marcxml, json, oai, datacite.


  1. I just want to know about file conversion and found this post is perfect one ,Thanks for sharing the informative post
    Also Check out the : https://www.credosystemz.com/training-in-chennai/best-data-science-training-in-chennai/

  2. The blog is so interactive and Informative , you should write more blogs like this Data Science Online course

  3. Very nice and useful article .. keep sharing this type of articles
    word document properties

  4. This comment has been removed by the author.

  5. Awesome article…..truly appreciated thanks keep sharing convert word to html

  6. Very nice and quite informative article.

  7. If you are thinking to appear in Checkpoint exam then I will suggest you to download Checkpoint braindump to successfully pass this certification. I have had a successful experience with this dumps material. I aced my exam by the first attempt with Pass4sure Checkpoint dumps and wish you the same.

  8. In case you confront any kind of technical hitches and unable to fix the problem by the own, then give single ring on +1 800-684-5649 epson printer service. These whole procedures would be taken just only a plenty of 10 minutes. So, stop being worrying and keep continue read this article as here you will get the desired steps at an ease manner. After performing the above tasks, you should be now good to go for the print job.
    epson printer customer service