Tuesday, February 14, 2017

format file 2 import into HAL (XML TEI) ZENODO (JSON) via API. Converter, serialization, Metadata from marc, marcXML; YAML...


Une introduction en français sur les formats:
http://sametmax.com/yaml-xml-json-csv-ini-quest-ce-que-cest-et-a-quoi-ca-sert/


JSON
http://stephane-mottin.blogspot.fr/2017/01/datacite-inist-cern-metadata-schema.html
Implementations below are written in different languages:
http://json-schema.org/implementations.html
https://github.com/jdorn/json-editor


XML
http://stephane-mottin.blogspot.fr/2017/01/moissonnage-oai-pm-structure-ead-puis.html

converters CSV-XML  CSV-JSON XML-JSON

web service for simple schema

http://codebeautify.org/csv-to-xml-json#

import to HAL

http://stephane-mottin.blogspot.fr/2017/01/api-v3-hal-import.html

download HAL XML-TEI schema

"all"
https://api.archives-ouvertes.fr/documents/all.xml

https://github.com/CCSDForge/HAL/tree/master/Sword

article
https://github.com/CCSDForge/HAL/blob/master/Sword/ART.xml

chapter book
https://github.com/CCSDForge/HAL/blob/master/Sword/COUV.xml
example of a chapter book
https://hal.archives-ouvertes.fr/hal-01071717v1
example of an export via the TEI button
https://hal.archives-ouvertes.fr/hal-01071717v1/tei

import to ZENODO

http://stephane-mottin.blogspot.fr/2017/01/zenodo-github-research-data-repository.html

download schema JSON ZENODO


zenodo/zenodo/modules/deposit/jsonschemas/deposits/records/record-v1.0.0.json
(863 lines)
"description": "Describe information needed for deposit module.",
  "title": "Zenodo Deposit Schema v1.0.0",
  "required": [
    "_deposit"
  ],

zenodo/zenodo/modules/deposit/jsonschemas/deposits/records/legacyrecord.json
(432 lines)
"$schema": "http://json-schema.org/draft-04/schema#",
  "additionalProperties": false,
  "description": "Describe information needed for deposit module.",
  "id": "http://zenodo.org/schemas/deposits/records/legacyjson.json",
  "properties": {
    "$schema": {
      "type": "string"
    },

example of a chapter book
example of an export via the JSON button
(not exactly the same of the initial JSON)

API Documentation for developers:


Create a new deposit and obtain a deposit ID:

curl -i -H "Content-Type: application/json" -X POST --data '{"metadata":{"access_right": "open","creators": [{"affiliation": "Brain Catalogue", "name": "Toro, Roberto"}],"description": "Brain MRI","keywords": ["MRI", "Brain"],"license": "cc-by-nc-4.0", "title": "Brain MRI", "upload_type": "dataset"}}' https://zenodo.org/api/deposit/depositions/?access_token=$token |tee zenodo.json

JSON Schema zenodo and invenio

zenodo is based on invenio.
Invenio uses JSON Schema to describe formats of managed entities such as records.

Based on JSON Schema, we can generate forms permitting users and curators to enter records.  There are two use cases: (1) deposition by end users such as physicists; (2) editing by power users such as curators and librarians.
This talk will show work-in-progress for both these scenarios.  We can discuss pros/cons of using available tools such as JSON Editor:
https://indico.cern.ch/event/407109/
Invenio uses https://github.com/jdorn/json-editor

interesting slides
  • good/bad performances of "JSON Editor"
  • BibEdit
http://slides.com/neumann/json-based-record-editing#/
Ref.
https://github.com/inveniosoftware/invenio/issues/2854

export from catalogue SUDOC (and Brise-ES)

http://stephane-mottin.blogspot.com/2017/01/importance-du-catalogage-librairie.html
http://stephane-mottin.blogspot.fr/2012/06/catalogage-sudoc-abes-unimarc-des.html
http://stephane-mottin.blogspot.fr/2011/10/sudoc-export-et-interoperabilite.html

Somes codes XML JSON

generate code.json / zenodo.json metadata files for github? (comments @2014)
Seems like it would be a straight-forward exercise to serialize some json-ld from an R DESCRIPTION file (and potentially other sources) to provide more metadata to zenodo (and potentially other sites if this becomes a more standard schema). Not sure if this package is the right home for it.
Ref. https://github.com/ropensci/zenodo/issues/3

Minimal metadata schemas for science software and code, in JSON and XML

1/
Matthew B. Jones, Carl Boettiger, Abby Cabunoc Mayes, Arfon Smith, Peter Slaughter, Kyle Niemeyer, Yolanda Gil, Martin Fenner, Krzysztof Nowak, Mark Hahnel, Luke Coy, Alice Allen, Mercè Crosas, Ashley Sands, Neil Chue Hong, Patricia Cruse, Dan Katz, Carole Goble. 2016. CodeMeta: an exchange schema for software metadata. KNB Data Repository. doi:10.5063/schema/codemeta-1.0
https://raw.githubusercontent.com/codemeta/codemeta/1.0/codemeta.jsonld
https://github.com/codemeta/codemeta/blob/master/codemeta.jsonld
(193 lines)

CodeMeta contributors are creating a minimal metadata schema for science software and code, in JSON and XML. The goal of CodeMeta is to create a concept vocabulary that can be used to standardize the exchange of software metadata across repositories and organizations. 
CodeMeta started by comparing the software metadata used across multiple repositories, which resulted in the CodeMeta Metadata Crosswalk.
https://github.com/codemeta/codemeta/blob/master/crosswalk.csv

That crosswalk was then used to generate a set of software metadata concepts, which were arranged into a JSON-LD context for serialization (see codemeta.jsonld, or an example CodeMeta document).

This is an extension of the work done by @arfon, @hubgit, @kaythaney and others on Code as a Research Object / fidgit. Code as a research object is a Mozilla Science Lab (@MozillaScience) project working with community members to explore how we can better integrate code and scientific software into the scholarly workflow. Out of this came fidgit - a proof of concept integration between Github and figshare, providing a Digital Object Identifier (DOI) for the code which allows for persistent reference linking.

With codemeta, we want to formalize the schema used to map between the different services (Github, figshare, Zenodo) to help others plug into existing systems. Having a standard software metadata interoperability schema will allow other data archivers and libraries join in. This will help keep science on the web shareable and interoperable!
https://github.com/codemeta/codemeta

json-LD
http://www.arfon.org/json-ld-for-software-discovery-reuse-and-credit
http://json-ld.org/

2/
This repository contains the software implementation for our paper A Novel Approach to Higgs Coupling Measurements (Cranmer, Kreiss, Lopez-Val, Plehn), arXiv:1401.0080 [hep-ph]. It contains tools to apply the discussed methods to new models and contains a Makefile to recreate the plots in the paper.
https://github.com/lnielsen/decouple/blob/master/.zenodo.json


Serializer Serialization

https://en.wikipedia.org/wiki/Serialization

The Serializer component is meant to be used to turn objects into a specific format (XML, JSON, YAML, ...) and the other way around.

php framework
http://symfony.com/doc/current/components/serializer.html
http://www.django-rest-framework.org/api-guide/serializers/

Serializers allow complex data such as querysets and model instances to be converted to native Python datatypes that can then be easily rendered into JSON, XML or other content types. Serializers also provide deserialization, allowing parsed data to be converted back into complex types, after first validating the incoming data.

https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

brief history

In the late 1990s, a push to provide an alternative to the standard serialization protocols started: XML was used to produce a human readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte-stream-based encoding, but by this point larger storage and transmission capacities made file size less of a concern than in the early days of computing. Binary XML had been proposed as a compromise which was not readable by plain-text editors, but was more compact than regular XML. In the 2000s, XML was often used for asynchronous transfer of structured data between client and server in Ajax web applications.

JSON is a more lightweight plain-text alternative to XML which is also commonly used for client-server communication in web applications. JSON is based on JavaScript syntax, but is supported in other programming languages as well.

Another alternative, YAML, is similar to JSON and includes features that make it more powerful for serialization, more "human friendly," and potentially more compact. These features include a notion of tagging data types, support for non-hierarchical data structures, the option to structure data with indentation, and multiple forms of scalar data quoting.


Many institutions, such as archives and libraries, attempt to future proof their backup archives—in particular, database dumps—by storing them in some relatively human-readable serialized format.

google gears

Le transfert de fichier texte avec l'apparition de l'internet a laissé place à des protocoles client/Serveur gérant le transfert de données sous forme de classes. Les anciens clients avaient des cookies dont la taille et l'origine étaient limités. Les objets sont l'évolution des cookies et peuvent ou non être sauvegardés dans l'espace de travail du navigateur web.

Google Gears est un plug in AJAX pour navigateur web. Il permet de façon transparente de sauvegarder des données localement dans une base de données SQLite durant une connexion internet. Ces données pourront être utilisées en mode non connecté. Il est fourni par défaut avec Google Chrome. Les services web en ligne Google Reader et Remember the Milk sont compatibles Google Gears
https://fr.wikipedia.org/wiki/S%C3%A9rialisation

Zenodo serializer


bibtex, marcxml, json, oai, datacite.


No comments:

Post a Comment