Showing posts with label php. Show all posts
Showing posts with label php. Show all posts

Tuesday, February 14, 2017

format file 2 import into HAL (XML TEI) ZENODO (JSON) via API. Converter, serialization, Metadata from marc, marcXML; YAML...


Une introduction en français sur les formats:
http://sametmax.com/yaml-xml-json-csv-ini-quest-ce-que-cest-et-a-quoi-ca-sert/


JSON
http://stephane-mottin.blogspot.fr/2017/01/datacite-inist-cern-metadata-schema.html
Implementations below are written in different languages:
http://json-schema.org/implementations.html
https://github.com/jdorn/json-editor


XML
http://stephane-mottin.blogspot.fr/2017/01/moissonnage-oai-pm-structure-ead-puis.html

converters CSV-XML  CSV-JSON XML-JSON

web service for simple schema

http://codebeautify.org/csv-to-xml-json#

import to HAL

http://stephane-mottin.blogspot.fr/2017/01/api-v3-hal-import.html

download HAL XML-TEI schema

"all"
https://api.archives-ouvertes.fr/documents/all.xml

https://github.com/CCSDForge/HAL/tree/master/Sword

article
https://github.com/CCSDForge/HAL/blob/master/Sword/ART.xml

chapter book
https://github.com/CCSDForge/HAL/blob/master/Sword/COUV.xml
example of a chapter book
https://hal.archives-ouvertes.fr/hal-01071717v1
example of an export via the TEI button
https://hal.archives-ouvertes.fr/hal-01071717v1/tei

import to ZENODO

http://stephane-mottin.blogspot.fr/2017/01/zenodo-github-research-data-repository.html

download schema JSON ZENODO


zenodo/zenodo/modules/deposit/jsonschemas/deposits/records/record-v1.0.0.json
(863 lines)
"description": "Describe information needed for deposit module.",
  "title": "Zenodo Deposit Schema v1.0.0",
  "required": [
    "_deposit"
  ],

zenodo/zenodo/modules/deposit/jsonschemas/deposits/records/legacyrecord.json
(432 lines)
"$schema": "http://json-schema.org/draft-04/schema#",
  "additionalProperties": false,
  "description": "Describe information needed for deposit module.",
  "id": "http://zenodo.org/schemas/deposits/records/legacyjson.json",
  "properties": {
    "$schema": {
      "type": "string"
    },

example of a chapter book
example of an export via the JSON button
(not exactly the same of the initial JSON)

API Documentation for developers:


Create a new deposit and obtain a deposit ID:

curl -i -H "Content-Type: application/json" -X POST --data '{"metadata":{"access_right": "open","creators": [{"affiliation": "Brain Catalogue", "name": "Toro, Roberto"}],"description": "Brain MRI","keywords": ["MRI", "Brain"],"license": "cc-by-nc-4.0", "title": "Brain MRI", "upload_type": "dataset"}}' https://zenodo.org/api/deposit/depositions/?access_token=$token |tee zenodo.json

JSON Schema zenodo and invenio

zenodo is based on invenio.
Invenio uses JSON Schema to describe formats of managed entities such as records.

Based on JSON Schema, we can generate forms permitting users and curators to enter records.  There are two use cases: (1) deposition by end users such as physicists; (2) editing by power users such as curators and librarians.
This talk will show work-in-progress for both these scenarios.  We can discuss pros/cons of using available tools such as JSON Editor:
https://indico.cern.ch/event/407109/
Invenio uses https://github.com/jdorn/json-editor

interesting slides
  • good/bad performances of "JSON Editor"
  • BibEdit
http://slides.com/neumann/json-based-record-editing#/
Ref.
https://github.com/inveniosoftware/invenio/issues/2854

export from catalogue SUDOC (and Brise-ES)

http://stephane-mottin.blogspot.com/2017/01/importance-du-catalogage-librairie.html
http://stephane-mottin.blogspot.fr/2012/06/catalogage-sudoc-abes-unimarc-des.html
http://stephane-mottin.blogspot.fr/2011/10/sudoc-export-et-interoperabilite.html

Somes codes XML JSON

generate code.json / zenodo.json metadata files for github? (comments @2014)
Seems like it would be a straight-forward exercise to serialize some json-ld from an R DESCRIPTION file (and potentially other sources) to provide more metadata to zenodo (and potentially other sites if this becomes a more standard schema). Not sure if this package is the right home for it.
Ref. https://github.com/ropensci/zenodo/issues/3

Minimal metadata schemas for science software and code, in JSON and XML

1/
Matthew B. Jones, Carl Boettiger, Abby Cabunoc Mayes, Arfon Smith, Peter Slaughter, Kyle Niemeyer, Yolanda Gil, Martin Fenner, Krzysztof Nowak, Mark Hahnel, Luke Coy, Alice Allen, Mercè Crosas, Ashley Sands, Neil Chue Hong, Patricia Cruse, Dan Katz, Carole Goble. 2016. CodeMeta: an exchange schema for software metadata. KNB Data Repository. doi:10.5063/schema/codemeta-1.0
https://raw.githubusercontent.com/codemeta/codemeta/1.0/codemeta.jsonld
https://github.com/codemeta/codemeta/blob/master/codemeta.jsonld
(193 lines)

CodeMeta contributors are creating a minimal metadata schema for science software and code, in JSON and XML. The goal of CodeMeta is to create a concept vocabulary that can be used to standardize the exchange of software metadata across repositories and organizations. 
CodeMeta started by comparing the software metadata used across multiple repositories, which resulted in the CodeMeta Metadata Crosswalk.
https://github.com/codemeta/codemeta/blob/master/crosswalk.csv

That crosswalk was then used to generate a set of software metadata concepts, which were arranged into a JSON-LD context for serialization (see codemeta.jsonld, or an example CodeMeta document).

This is an extension of the work done by @arfon, @hubgit, @kaythaney and others on Code as a Research Object / fidgit. Code as a research object is a Mozilla Science Lab (@MozillaScience) project working with community members to explore how we can better integrate code and scientific software into the scholarly workflow. Out of this came fidgit - a proof of concept integration between Github and figshare, providing a Digital Object Identifier (DOI) for the code which allows for persistent reference linking.

With codemeta, we want to formalize the schema used to map between the different services (Github, figshare, Zenodo) to help others plug into existing systems. Having a standard software metadata interoperability schema will allow other data archivers and libraries join in. This will help keep science on the web shareable and interoperable!
https://github.com/codemeta/codemeta

json-LD
http://www.arfon.org/json-ld-for-software-discovery-reuse-and-credit
http://json-ld.org/

2/
This repository contains the software implementation for our paper A Novel Approach to Higgs Coupling Measurements (Cranmer, Kreiss, Lopez-Val, Plehn), arXiv:1401.0080 [hep-ph]. It contains tools to apply the discussed methods to new models and contains a Makefile to recreate the plots in the paper.
https://github.com/lnielsen/decouple/blob/master/.zenodo.json


Serializer Serialization

https://en.wikipedia.org/wiki/Serialization

The Serializer component is meant to be used to turn objects into a specific format (XML, JSON, YAML, ...) and the other way around.

php framework
http://symfony.com/doc/current/components/serializer.html
http://www.django-rest-framework.org/api-guide/serializers/

Serializers allow complex data such as querysets and model instances to be converted to native Python datatypes that can then be easily rendered into JSON, XML or other content types. Serializers also provide deserialization, allowing parsed data to be converted back into complex types, after first validating the incoming data.

https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

brief history

In the late 1990s, a push to provide an alternative to the standard serialization protocols started: XML was used to produce a human readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte-stream-based encoding, but by this point larger storage and transmission capacities made file size less of a concern than in the early days of computing. Binary XML had been proposed as a compromise which was not readable by plain-text editors, but was more compact than regular XML. In the 2000s, XML was often used for asynchronous transfer of structured data between client and server in Ajax web applications.

JSON is a more lightweight plain-text alternative to XML which is also commonly used for client-server communication in web applications. JSON is based on JavaScript syntax, but is supported in other programming languages as well.

Another alternative, YAML, is similar to JSON and includes features that make it more powerful for serialization, more "human friendly," and potentially more compact. These features include a notion of tagging data types, support for non-hierarchical data structures, the option to structure data with indentation, and multiple forms of scalar data quoting.


Many institutions, such as archives and libraries, attempt to future proof their backup archives—in particular, database dumps—by storing them in some relatively human-readable serialized format.

google gears

Le transfert de fichier texte avec l'apparition de l'internet a laissé place à des protocoles client/Serveur gérant le transfert de données sous forme de classes. Les anciens clients avaient des cookies dont la taille et l'origine étaient limités. Les objets sont l'évolution des cookies et peuvent ou non être sauvegardés dans l'espace de travail du navigateur web.

Google Gears est un plug in AJAX pour navigateur web. Il permet de façon transparente de sauvegarder des données localement dans une base de données SQLite durant une connexion internet. Ces données pourront être utilisées en mode non connecté. Il est fourni par défaut avec Google Chrome. Les services web en ligne Google Reader et Remember the Milk sont compatibles Google Gears
https://fr.wikipedia.org/wiki/S%C3%A9rialisation

Zenodo serializer


bibtex, marcxml, json, oai, datacite.


Saturday, January 28, 2017

character encoding conversion UTF-8; transformers converters (by php) of letters in different format like single eight-bit code à © to letter "é"



première                           école
appears
première école

Looks like you originally had a 
UTF-8 file 
which has been interpreted as an 8 bit encoding (e.g. ISO-8859-15) 
and entity-encoded. 

ISO-8859-15
http://en.wikipedia.org/wiki/ISO/IEC_8859-15
ISO 8859-15 encodes what it refers to as "Latin alphabet no. 9". This character set is used throughout the Americas, Western Europe, Oceania, and much of Africa.
Each character is encoded as a single eight-bit code value. 

See below for a list of encoding systems.

I say this because the sequence C3A9 looks like a pretty plausible UTF-8 encoding sequence.
http://en.wikipedia.org/wiki/UTF-8#Description
The W3C recommends UTF-8 as the default encoding in XML and HTML.

You will need to first entity-decode it,
then you'll have a UTF-8 encoding again.

iconv

You could then use something like iconv to convert to an encoding of your choosing.
iconv
http://www.gnu.org/savannah-checkouts/gnu/libiconv/documentation/libiconv-1.13/iconv.1.html
The iconv program converts text from one encoding to another encoding.
https://en.wikipedia.org/wiki/Iconv

iconv : unix, mac OSX, linux
https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/iconv.1.html

iconv-lite-js

Convert character encodings in pure javascript:
https://github.com/ashtuchkin/iconv-lite

example

To work through your example:

  • Ã ©     would be decoded as the byte sequence 0xC3A9
  • 0xC3A9 =  11000011     10101001      in binary
                       octetFist       octetSecond
  • the leading 110 in the first octet tells us this could be interpreted as a UTF-8 two byte sequence
  • second octet starts with  10, we're looking at something we can interpret as UTF-8
  • To do that, we take the last 5 bits of the first octet, and the last 6 bits of the second octet...
  • So, interpreted as UTF-8 it's
    00011101001
    = E9
    = é (LATIN SMALL LETTER E WITH ACUTE
    http://www.fileformat.info/info/unicode/char/e9/index.htm)


You mention wanting to handle this with PHP, something like this might do it for you:

 //to load from a file, use
 //$file=file_get_contents("/path/to/filename.txt");
 //example below uses a literal string to demonstrate technique...

 $file="&Précédent is a French word";
 $utf8=html_entity_decode($file);
 $iso8859=utf8_decode($utf8);

 //$utf8 contains "Précédent is a French word" in UTF-8
 //$iso8859 contains "Précédent is a French word" in ISO-8859

Ref.
http://stackoverflow.com/questions/4177783/xc3-xa9-and-other-codes

run OSX Mac (php is installed)

1/ create this text file (a php script: decode/recode)
<?php
 echo "decode-recode*";
 $file="&Pr&#xC3;&#xA9;c&#xC3;&#xA9;dent is a French word";
 $utf8a=html_entity_decode($file);
 $iso8859a=utf8_decode($utf8a);
 echo $utf8a, "//";
 echo $iso8859a, "//";
?>

2/save as "decode1.hph"

3/use the application "terminal" :

php  /path/YYY/decode1.php


4/you will get:

decode-recode*&Précédent is a French word//&Précédent is a French word//

list of encoding systems

in terminal >>iconv --list gives

ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV:1991 US US-ASCII CSASCII
UTF-8 UTF8
UTF-8-MAC UTF8-MAC
ISO-10646-UCS-2 UCS-2 CSUNICODE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
UCS-2LE UNICODELITTLE
ISO-10646-UCS-4 UCS-4 CSUCS4
UCS-4BE
UCS-4LE
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7
UCS-2-INTERNAL
UCS-2-SWAPPED
UCS-4-INTERNAL
UCS-4-SWAPPED
C99
JAVA
CP819 IBM819 ISO-8859-1 ISO-IR-100 ISO8859-1 ISO_8859-1 ISO_8859-1:1987 L1 LATIN1 CSISOLATIN1
ISO-8859-2 ISO-IR-101 ISO8859-2 ISO_8859-2 ISO_8859-2:1987 L2 LATIN2 CSISOLATIN2
ISO-8859-3 ISO-IR-109 ISO8859-3 ISO_8859-3 ISO_8859-3:1988 L3 LATIN3 CSISOLATIN3
ISO-8859-4 ISO-IR-110 ISO8859-4 ISO_8859-4 ISO_8859-4:1988 L4 LATIN4 CSISOLATIN4
CYRILLIC ISO-8859-5 ISO-IR-144 ISO8859-5 ISO_8859-5 ISO_8859-5:1988 CSISOLATINCYRILLIC
ARABIC ASMO-708 ECMA-114 ISO-8859-6 ISO-IR-127 ISO8859-6 ISO_8859-6 ISO_8859-6:1987 CSISOLATINARABIC
ECMA-118 ELOT_928 GREEK GREEK8 ISO-8859-7 ISO-IR-126 ISO8859-7 ISO_8859-7 ISO_8859-7:1987 ISO_8859-7:2003 CSISOLATINGREEK
HEBREW ISO-8859-8 ISO-IR-138 ISO8859-8 ISO_8859-8 ISO_8859-8:1988 CSISOLATINHEBREW
ISO-8859-9 ISO-IR-148 ISO8859-9 ISO_8859-9 ISO_8859-9:1989 L5 LATIN5 CSISOLATIN5
ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10 ISO_8859-10:1992 L6 LATIN6 CSISOLATIN6
ISO-8859-11 ISO8859-11 ISO_8859-11
ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 L7 LATIN7
ISO-8859-14 ISO-CELTIC ISO-IR-199 ISO8859-14 ISO_8859-14 ISO_8859-14:1998 L8 LATIN8
ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 ISO_8859-15:1998 LATIN-9
ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 ISO_8859-16:2001 L10 LATIN10
KOI8-R CSKOI8R
KOI8-U
KOI8-RU
CP1250 MS-EE WINDOWS-1250
CP1251 MS-CYRL WINDOWS-1251
CP1252 MS-ANSI WINDOWS-1252
CP1253 MS-GREEK WINDOWS-1253
CP1254 MS-TURK WINDOWS-1254
CP1255 MS-HEBR WINDOWS-1255
CP1256 MS-ARAB WINDOWS-1256
CP1257 WINBALTRIM WINDOWS-1257
CP1258 WINDOWS-1258
850 CP850 IBM850 CSPC850MULTILINGUAL
862 CP862 IBM862 CSPC862LATINHEBREW
866 CP866 IBM866 CSIBM866
MAC MACINTOSH MACROMAN CSMACINTOSH
MACCENTRALEUROPE
MACICELAND
MACCROATIAN
MACROMANIA
MACCYRILLIC
MACUKRAINE
MACGREEK
MACTURKISH
MACHEBREW
MACARABIC
MACTHAI
HP-ROMAN8 R8 ROMAN8 CSHPROMAN8
NEXTSTEP
ARMSCII-8
GEORGIAN-ACADEMY
GEORGIAN-PS
KOI8-T
CP154 CYRILLIC-ASIAN PT154 PTCP154 CSPTCP154
MULELAO-1
CP1133 IBM-CP1133
ISO-IR-166 TIS-620 TIS620 TIS620-0 TIS620.2529-1 TIS620.2533-0 TIS620.2533-1
CP874 WINDOWS-874
VISCII VISCII1.1-1 CSVISCII
TCVN TCVN-5712 TCVN5712-1 TCVN5712-1:1993
ISO-IR-14 ISO646-JP JIS_C6220-1969-RO JP CSISO14JISC6220RO
JISX0201-1976 JIS_X0201 X0201 CSHALFWIDTHKATAKANA
ISO-IR-87 JIS0208 JIS_C6226-1983 JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 X0208 CSISO87JISX0208
ISO-IR-159 JIS_X0212 JIS_X0212-1990 JIS_X0212.1990-0 X0212 CSISO159JISX02121990
CN GB_1988-80 ISO-IR-57 ISO646-CN CSISO57GB1988
CHINESE GB_2312-80 ISO-IR-58 CSISO58GB231280
CN-GB-ISOIR165 ISO-IR-165
ISO-IR-149 KOREAN KSC_5601 KS_C_5601-1987 KS_C_5601-1989 CSKSC56011987
EUC-JP EUCJP EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE CSEUCPKDFMTJAPANESE
MS_KANJI SHIFT-JIS SHIFT_JIS SJIS CSSHIFTJIS
CP932
ISO-2022-JP CSISO2022JP
ISO-2022-JP-1
ISO-2022-JP-2 CSISO2022JP2
CN-GB EUC-CN EUCCN GB2312 CSGB2312
GBK
CP936 MS936 WINDOWS-936
GB18030
ISO-2022-CN CSISO2022CN
ISO-2022-CN-EXT
HZ HZ-GB-2312
EUC-TW EUCTW CSEUCTW
BIG-5 BIG-FIVE BIG5 BIGFIVE CN-BIG5 CSBIG5
CP950
BIG5-HKSCS:1999
BIG5-HKSCS:2001
BIG5-HKSCS BIG5-HKSCS:2004 BIG5HKSCS
EUC-KR EUCKR CSEUCKR
CP949 UHC
CP1361 JOHAB
ISO-2022-KR CSISO2022KR
CP856
CP922
CP943
CP1046
CP1124
CP1129
CP1161 IBM-1161 IBM1161 CSIBM1161
CP1162 IBM-1162 IBM1162 CSIBM1162
CP1163 IBM-1163 IBM1163 CSIBM1163
DEC-KANJI
DEC-HANYU
437 CP437 IBM437 CSPC8CODEPAGE437
CP737
CP775 IBM775 CSPC775BALTIC
852 CP852 IBM852 CSPCP852
CP853
855 CP855 IBM855 CSIBM855
857 CP857 IBM857 CSIBM857
CP858
860 CP860 IBM860 CSIBM860
861 CP-IS CP861 IBM861 CSIBM861
863 CP863 IBM863 CSIBM863
CP864 IBM864 CSIBM864
865 CP865 IBM865 CSIBM865
869 CP-GR CP869 IBM869 CSIBM869
CP1125
EUC-JISX0213
SHIFT_JISX0213
ISO-2022-JP-3
BIG5-2003
ISO-IR-230 TDS565
ATARI ATARIST

RISCOS-LATIN1

Example

converts input from the old West-European encoding ISO-8859-9 to Unicode UTF-8

For example, use the application "terminal" :

iconv -f ISO8859-9 -t UTF-8 NameFile

and the inverse
iconv -f UTF-8 -t  ISO8859-9

             

Thursday, January 5, 2017

Open Journal Systems de Public Knowledge Project


https://en.wikipedia.org/wiki/Open_Journal_Systems

En décembre 2016, Open Journal Systems (OJS) a atteint 10 000 revues publiées et  420,000 articles:
Evolution du nombre de revues publiées avec OJS:
https://pkp.sfu.ca/ojs/ojs-usage/ojs-map/

Open Journal Systems (OJS) a été conçu pour faciliter le développement du libre accès, pairs, fournissant l’infrastructure technique, non seulement pour la présentation en ligne d’articles de revues, mais aussi un flux de travail  de la rédaction entière, y compris : l’article de présentation, et d’indexation. OJS relie des personnes remplissant des rôles différents, tels que le gestionnaire de Journal, éditeur, critique, auteur, lecteur, etc.. Il dispose d’un module qui prend en charge les revues de l’abonnement..

Le logiciel possède une architecture de « plugin », similaire à d’autres projets communautaires tels que WordPress, ce qui permet de nouvelles fonctionnalités à intégrer facilement sans devoir modifier la base de code. Les plugins ont contribué à ce que l’OJS soit une base pour des outils afin de faciliter l’indexation dans Google Scholar et PubMed Central, un plugin aliment fournissant des flux de syndication RSS/Atom web, un plugin COUNTER, permettant aux statistiques de COUNTER et de rapports et plus. OJS est également conforme à LOCKSS, 
(https://en.wikipedia.org/wiki/LOCKSS
contribuant à assurer un accès permanent au contenu du journal d’archivage permanent. 
Sous l'égide de l'Université de Stanford, le projet LOCKSS («Beaucoup de copies se tiennent en sécurité») est un réseau peer-to-peer qui développe et soutient un système open source permettant aux bibliothèques de recueillir, de préserver et de donner accès à leurs lecteurs du contenu publié sur le Web.

Afin d’améliorer l’engagement du lecteur, PKP a développé une série d’outils de lecture, qui permettent d’accéder à des études connexes, reportages médiatiques, politiques gouvernementales, etc.. dans les bases de données access ouvert.

Originalement publié en 2001, OJS est actuellement en version 3. 0.  OJS est écrit en PHP, utilise soit un MariaDB (née MySQL) ou PostgreSQL, et peut être hébergé sur un serveur Web Unix-like ou Windows. 

OJS 3.X
System Requirements
To run OJS 3.x, your web server will need:
  • PHP 5.3.7 or later with MySQL or PostgreSQL support
  • A database server: MySQL 4.1 or later OR PostgreSQL 9.1.5 or later
  • UNIX-like OS recommended (such as Linux, FreeBSD, Solaris, Mac OS X, etc.

Current Production Release (November 25, 2016)

OJS a été traduit dans de nombreuses langues y compris en français (32 langages).

https://github.com/pkp/ojs
https://pkp.sfu.ca/wiki/index.php?title=Developer_Documentation#Getting_Started
Le service d'hébergement OJS est offert moyennant des frais par PKP | PS (Publishing Services),  ainsi qu'une variété de prestataires de services commerciaux et non commerciaux tiers non affiliés à PKP.
https://pkpservices.sfu.ca/content/plans-pricing