Science hubs & hub Sciences: php

Showing posts with label php. Show all posts

Tuesday, February 14, 2017

format file 2 import into HAL (XML TEI) ZENODO (JSON) via API. Converter, serialization, Metadata from marc, marcXML; YAML...

Une introduction en français sur les formats:
http://sametmax.com/yaml-xml-json-csv-ini-quest-ce-que-cest-et-a-quoi-ca-sert/

JSON
http://stephane-mottin.blogspot.fr/2017/01/datacite-inist-cern-metadata-schema.html
Implementations below are written in different languages:
http://json-schema.org/implementations.html
https://github.com/jdorn/json-editor

XML
http://stephane-mottin.blogspot.fr/2017/01/moissonnage-oai-pm-structure-ead-puis.html

converters CSV-XML CSV-JSON XML-JSON

zenodo/zenodo/modules/deposit/jsonschemas/deposits/records/record-v1.0.0.json

(863 lines)

"description": "Describe information needed for deposit module.",

"title": "Zenodo Deposit Schema v1.0.0",

"required": [

"_deposit"

zenodo/zenodo/modules/deposit/jsonschemas/deposits/records/legacyrecord.json

(432 lines)

"$schema": "http://json-schema.org/draft-04/schema#",

"additionalProperties": false,

"description": "Describe information needed for deposit module.",

"id": "http://zenodo.org/schemas/deposits/records/legacyjson.json",

"properties": {

"$schema": {

"type": "string"

example of a chapter book

https://zenodo.org/record/259721#.WIkL1GrNzdR

example of an export via the JSON button

https://zenodo.org/record/259721/export/json#.WIkMNWrNzdQ

(not exactly the same of the initial JSON)

API Documentation for developers:

https://zenodo.cern.ch/dev

Create a new deposit and obtain a deposit ID:

curl -i -H "Content-Type: application/json" -X POST --data '{"metadata":{"access_right": "open","creators": [{"affiliation": "Brain Catalogue", "name": "Toro, Roberto"}],"description": "Brain MRI","keywords": ["MRI", "Brain"],"license": "cc-by-nc-4.0", "title": "Brain MRI", "upload_type": "dataset"}}' https://zenodo.org/api/deposit/depositions/?access_token=$token |tee zenodo.json

http://siphonophore.org/blog/2016/01/16/at-brain-catalogue-we-love-zenodo/

JSON Schema zenodo and invenio

zenodo is based on invenio.
Invenio uses JSON Schema to describe formats of managed entities such as records.

Based on JSON Schema, we can generate forms permitting users and curators to enter records. There are two use cases: (1) deposition by end users such as physicists; (2) editing by power users such as curators and librarians.
This talk will show work-in-progress for both these scenarios. We can discuss pros/cons of using available tools such as JSON Editor:
https://indico.cern.ch/event/407109/
Invenio uses https://github.com/jdorn/json-editor

interesting slides

good/bad performances of "JSON Editor"
BibEdit

http://slides.com/neumann/json-based-record-editing#/
Ref.
https://github.com/inveniosoftware/invenio/issues/2854

export from catalogue SUDOC (and Brise-ES)

http://stephane-mottin.blogspot.com/2017/01/importance-du-catalogage-librairie.html
http://stephane-mottin.blogspot.fr/2012/06/catalogage-sudoc-abes-unimarc-des.html
http://stephane-mottin.blogspot.fr/2011/10/sudoc-export-et-interoperabilite.html

Somes codes XML JSON

generate code.json / zenodo.json metadata files for github? (comments @2014)
Seems like it would be a straight-forward exercise to serialize some json-ld from an R DESCRIPTION file (and potentially other sources) to provide more metadata to zenodo (and potentially other sites if this becomes a more standard schema). Not sure if this package is the right home for it.
Ref. https://github.com/ropensci/zenodo/issues/3

Minimal metadata schemas for science software and code, in JSON and XML

1/
Matthew B. Jones, Carl Boettiger, Abby Cabunoc Mayes, Arfon Smith, Peter Slaughter, Kyle Niemeyer, Yolanda Gil, Martin Fenner, Krzysztof Nowak, Mark Hahnel, Luke Coy, Alice Allen, Mercè Crosas, Ashley Sands, Neil Chue Hong, Patricia Cruse, Dan Katz, Carole Goble. 2016. CodeMeta: an exchange schema for software metadata. KNB Data Repository. doi:10.5063/schema/codemeta-1.0
https://raw.githubusercontent.com/codemeta/codemeta/1.0/codemeta.jsonld
https://github.com/codemeta/codemeta/blob/master/codemeta.jsonld
(193 lines)

CodeMeta contributors are creating a minimal metadata schema for science software and code, in JSON and XML. The goal of CodeMeta is to create a concept vocabulary that can be used to standardize the exchange of software metadata across repositories and organizations.
CodeMeta started by comparing the software metadata used across multiple repositories, which resulted in the CodeMeta Metadata Crosswalk.
https://github.com/codemeta/codemeta/blob/master/crosswalk.csv

That crosswalk was then used to generate a set of software metadata concepts, which were arranged into a JSON-LD context for serialization (see codemeta.jsonld, or an example CodeMeta document).

This is an extension of the work done by @arfon, @hubgit, @kaythaney and others on Code as a Research Object / fidgit. Code as a research object is a Mozilla Science Lab (@MozillaScience) project working with community members to explore how we can better integrate code and scientific software into the scholarly workflow. Out of this came fidgit - a proof of concept integration between Github and figshare, providing a Digital Object Identifier (DOI) for the code which allows for persistent reference linking.

With codemeta, we want to formalize the schema used to map between the different services (Github, figshare, Zenodo) to help others plug into existing systems. Having a standard software metadata interoperability schema will allow other data archivers and libraries join in. This will help keep science on the web shareable and interoperable!
https://github.com/codemeta/codemeta

json-LD
http://www.arfon.org/json-ld-for-software-discovery-reuse-and-credit
http://json-ld.org/

2/
This repository contains the software implementation for our paper A Novel Approach to Higgs Coupling Measurements (Cranmer, Kreiss, Lopez-Val, Plehn), arXiv:1401.0080 [hep-ph]. It contains tools to apply the discussed methods to new models and contains a Makefile to recreate the plots in the paper.
https://github.com/lnielsen/decouple/blob/master/.zenodo.json

Serializer Serialization

https://en.wikipedia.org/wiki/Serialization

The Serializer component is meant to be used to turn objects into a specific format (XML, JSON, YAML, ...) and the other way around.

php framework
http://symfony.com/doc/current/components/serializer.html
http://www.django-rest-framework.org/api-guide/serializers/

Serializers allow complex data such as querysets and model instances to be converted to native Python datatypes that can then be easily rendered into JSON, XML or other content types. Serializers also provide deserialization, allowing parsed data to be converted back into complex types, after first validating the incoming data.

https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

brief history

In the late 1990s, a push to provide an alternative to the standard serialization protocols started: XML was used to produce a human readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte-stream-based encoding, but by this point larger storage and transmission capacities made file size less of a concern than in the early days of computing. Binary XML had been proposed as a compromise which was not readable by plain-text editors, but was more compact than regular XML. In the 2000s, XML was often used for asynchronous transfer of structured data between client and server in Ajax web applications.

JSON is a more lightweight plain-text alternative to XML which is also commonly used for client-server communication in web applications. JSON is based on JavaScript syntax, but is supported in other programming languages as well.

Another alternative, YAML, is similar to JSON and includes features that make it more powerful for serialization, more "human friendly," and potentially more compact. These features include a notion of tagging data types, support for non-hierarchical data structures, the option to structure data with indentation, and multiple forms of scalar data quoting.

Many institutions, such as archives and libraries, attempt to future proof their backup archives—in particular, database dumps—by storing them in some relatively human-readable serialized format.

google gears

Le transfert de fichier texte avec l'apparition de l'internet a laissé place à des protocoles client/Serveur gérant le transfert de données sous forme de classes. Les anciens clients avaient des cookies dont la taille et l'origine étaient limités. Les objets sont l'évolution des cookies et peuvent ou non être sauvegardés dans l'espace de travail du navigateur web.

Google Gears est un plug in AJAX pour navigateur web. Il permet de façon transparente de sauvegarder des données localement dans une base de données SQLite durant une connexion internet. Ces données pourront être utilisées en mode non connecté. Il est fourni par défaut avec Google Chrome. Les services web en ligne Google Reader et Remember the Milk sont compatibles Google Gears
https://fr.wikipedia.org/wiki/S%C3%A9rialisation

Zenodo serializer

http://pythonhosted.org/zenodo/api/records.html#module-zenodo.modules.records.serializers

bibtex, marcxml, json, oai, datacite.

Saturday, January 28, 2017

character encoding conversion UTF-8; transformers converters (by php) of letters in different format like single eight-bit code Ã © to letter "é"

première école
appears
premiÃ¨re Ã©cole

Looks like you originally had a
UTF-8 file
which has been interpreted as an 8 bit encoding (e.g. ISO-8859-15)
and entity-encoded.

ISO-8859-15
http://en.wikipedia.org/wiki/ISO/IEC_8859-15
ISO 8859-15 encodes what it refers to as "Latin alphabet no. 9". This character set is used throughout the Americas, Western Europe, Oceania, and much of Africa.
Each character is encoded as a single eight-bit code value.

See below for a list of encoding systems.

I say this because the sequence C3A9 looks like a pretty plausible UTF-8 encoding sequence.
http://en.wikipedia.org/wiki/UTF-8#Description
The W3C recommends UTF-8 as the default encoding in XML and HTML.

You will need to first entity-decode it,
then you'll have a UTF-8 encoding again.

iconv

You could then use something like iconv to convert to an encoding of your choosing.
iconv
http://www.gnu.org/savannah-checkouts/gnu/libiconv/documentation/libiconv-1.13/iconv.1.html
The iconv program converts text from one encoding to another encoding.
https://en.wikipedia.org/wiki/Iconv

iconv : unix, mac OSX, linux
https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/iconv.1.html

iconv-lite-js

Convert character encodings in pure javascript:
https://github.com/ashtuchkin/iconv-lite

example

To work through your example:

Ã © would be decoded as the byte sequence 0xC3A9
0xC3A9 = 11000011 10101001 in binary
octetFist octetSecond
the leading 110 in the first octet tells us this could be interpreted as a UTF-8 two byte sequence.
second octet starts with 10, we're looking at something we can interpret as UTF-8.
To do that, we take the last 5 bits of the first octet, and the last 6 bits of the second octet...
So, interpreted as UTF-8 it's
00011101001
= E9
= é (LATIN SMALL LETTER E WITH ACUTE
http://www.fileformat.info/info/unicode/char/e9/index.htm)

You mention wanting to handle this with PHP, something like this might do it for you:

//to load from a file, use
//$file=file_get_contents("/path/to/filename.txt");
//example below uses a literal string to demonstrate technique...

$file="&PrÃ©cÃ©dent is a French word";
$utf8=html_entity_decode($file);
$iso8859=utf8_decode($utf8);

//$utf8 contains "Précédent is a French word" in UTF-8
//$iso8859 contains "Précédent is a French word" in ISO-8859

Ref.
http://stackoverflow.com/questions/4177783/xc3-xa9-and-other-codes

run OSX Mac (php is installed)

1/ create this text file (a php script: decode/recode)
<?php
echo "decode-recode*";
$file="&PrÃ©cÃ©dent is a French word";
$utf8a=html_entity_decode($file);
$iso8859a=utf8_decode($utf8a);
echo $utf8a, "//";
echo $iso8859a, "//";
?>

2/save as "decode1.hph"

3/use the application "terminal" :

php /path/YYY/decode1.php

4/you will get:

decode-recode*&PrÃ©cÃ©dent is a French word//&Précédent is a French word//

list of encoding systems

in terminal >>iconv --list gives

ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV:1991 US US-ASCII CSASCII

UTF-8 UTF8

UTF-8-MAC UTF8-MAC

ISO-10646-UCS-2 UCS-2 CSUNICODE

UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11

UCS-2LE UNICODELITTLE

ISO-10646-UCS-4 UCS-4 CSUCS4

UCS-4BE

UCS-4LE

UTF-16

UTF-16BE

UTF-16LE

UTF-32

UTF-32BE

UTF-32LE

UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7

UCS-2-INTERNAL

UCS-2-SWAPPED

UCS-4-INTERNAL

UCS-4-SWAPPED

C99

JAVA

CP819 IBM819 ISO-8859-1 ISO-IR-100 ISO8859-1 ISO_8859-1 ISO_8859-1:1987 L1 LATIN1 CSISOLATIN1

ISO-8859-2 ISO-IR-101 ISO8859-2 ISO_8859-2 ISO_8859-2:1987 L2 LATIN2 CSISOLATIN2

ISO-8859-3 ISO-IR-109 ISO8859-3 ISO_8859-3 ISO_8859-3:1988 L3 LATIN3 CSISOLATIN3

ISO-8859-4 ISO-IR-110 ISO8859-4 ISO_8859-4 ISO_8859-4:1988 L4 LATIN4 CSISOLATIN4

CYRILLIC ISO-8859-5 ISO-IR-144 ISO8859-5 ISO_8859-5 ISO_8859-5:1988 CSISOLATINCYRILLIC

ARABIC ASMO-708 ECMA-114 ISO-8859-6 ISO-IR-127 ISO8859-6 ISO_8859-6 ISO_8859-6:1987 CSISOLATINARABIC

ECMA-118 ELOT_928 GREEK GREEK8 ISO-8859-7 ISO-IR-126 ISO8859-7 ISO_8859-7 ISO_8859-7:1987 ISO_8859-7:2003 CSISOLATINGREEK

HEBREW ISO-8859-8 ISO-IR-138 ISO8859-8 ISO_8859-8 ISO_8859-8:1988 CSISOLATINHEBREW

ISO-8859-9 ISO-IR-148 ISO8859-9 ISO_8859-9 ISO_8859-9:1989 L5 LATIN5 CSISOLATIN5

ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10 ISO_8859-10:1992 L6 LATIN6 CSISOLATIN6

ISO-8859-11 ISO8859-11 ISO_8859-11

ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 L7 LATIN7

ISO-8859-14 ISO-CELTIC ISO-IR-199 ISO8859-14 ISO_8859-14 ISO_8859-14:1998 L8 LATIN8

ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 ISO_8859-15:1998 LATIN-9

ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 ISO_8859-16:2001 L10 LATIN10

KOI8-R CSKOI8R

KOI8-U

KOI8-RU

CP1250 MS-EE WINDOWS-1250

CP1251 MS-CYRL WINDOWS-1251

CP1252 MS-ANSI WINDOWS-1252

CP1253 MS-GREEK WINDOWS-1253

CP1254 MS-TURK WINDOWS-1254

CP1255 MS-HEBR WINDOWS-1255

CP1256 MS-ARAB WINDOWS-1256

CP1257 WINBALTRIM WINDOWS-1257

CP1258 WINDOWS-1258

850 CP850 IBM850 CSPC850MULTILINGUAL

862 CP862 IBM862 CSPC862LATINHEBREW

866 CP866 IBM866 CSIBM866

MAC MACINTOSH MACROMAN CSMACINTOSH

MACCENTRALEUROPE

MACICELAND

MACCROATIAN

MACROMANIA

MACCYRILLIC

MACUKRAINE

MACGREEK

MACTURKISH

MACHEBREW

MACARABIC

MACTHAI

HP-ROMAN8 R8 ROMAN8 CSHPROMAN8

NEXTSTEP

ARMSCII-8

GEORGIAN-ACADEMY

GEORGIAN-PS

KOI8-T

CP154 CYRILLIC-ASIAN PT154 PTCP154 CSPTCP154

MULELAO-1

CP1133 IBM-CP1133

ISO-IR-166 TIS-620 TIS620 TIS620-0 TIS620.2529-1 TIS620.2533-0 TIS620.2533-1

CP874 WINDOWS-874

VISCII VISCII1.1-1 CSVISCII

TCVN TCVN-5712 TCVN5712-1 TCVN5712-1:1993

ISO-IR-14 ISO646-JP JIS_C6220-1969-RO JP CSISO14JISC6220RO

JISX0201-1976 JIS_X0201 X0201 CSHALFWIDTHKATAKANA

ISO-IR-87 JIS0208 JIS_C6226-1983 JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 X0208 CSISO87JISX0208

ISO-IR-159 JIS_X0212 JIS_X0212-1990 JIS_X0212.1990-0 X0212 CSISO159JISX02121990

CN GB_1988-80 ISO-IR-57 ISO646-CN CSISO57GB1988

CHINESE GB_2312-80 ISO-IR-58 CSISO58GB231280

CN-GB-ISOIR165 ISO-IR-165

ISO-IR-149 KOREAN KSC_5601 KS_C_5601-1987 KS_C_5601-1989 CSKSC56011987

EUC-JP EUCJP EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE CSEUCPKDFMTJAPANESE

MS_KANJI SHIFT-JIS SHIFT_JIS SJIS CSSHIFTJIS

CP932

ISO-2022-JP CSISO2022JP

ISO-2022-JP-1

ISO-2022-JP-2 CSISO2022JP2

CN-GB EUC-CN EUCCN GB2312 CSGB2312

GBK

CP936 MS936 WINDOWS-936

GB18030

ISO-2022-CN CSISO2022CN

ISO-2022-CN-EXT

HZ HZ-GB-2312

EUC-TW EUCTW CSEUCTW

BIG-5 BIG-FIVE BIG5 BIGFIVE CN-BIG5 CSBIG5

CP950

BIG5-HKSCS:1999

BIG5-HKSCS:2001

BIG5-HKSCS BIG5-HKSCS:2004 BIG5HKSCS

EUC-KR EUCKR CSEUCKR

CP949 UHC

CP1361 JOHAB

ISO-2022-KR CSISO2022KR

CP856

CP922

CP943

CP1046

CP1124

CP1129

CP1161 IBM-1161 IBM1161 CSIBM1161

CP1162 IBM-1162 IBM1162 CSIBM1162

CP1163 IBM-1163 IBM1163 CSIBM1163

DEC-KANJI

DEC-HANYU

437 CP437 IBM437 CSPC8CODEPAGE437

CP737

CP775 IBM775 CSPC775BALTIC

852 CP852 IBM852 CSPCP852

CP853

855 CP855 IBM855 CSIBM855

857 CP857 IBM857 CSIBM857

CP858

860 CP860 IBM860 CSIBM860

861 CP-IS CP861 IBM861 CSIBM861

863 CP863 IBM863 CSIBM863

CP864 IBM864 CSIBM864

865 CP865 IBM865 CSIBM865

869 CP-GR CP869 IBM869 CSIBM869

CP1125

EUC-JISX0213

SHIFT_JISX0213

ISO-2022-JP-3

BIG5-2003

ISO-IR-230 TDS565

ATARI ATARIST

RISCOS-LATIN1

Example

converts input from the old West-European encoding ISO-8859-9 to Unicode UTF-8

For example, use the application "terminal" :

iconv -f ISO8859-9 -t UTF-8 NameFile

and the inverse
iconv -f UTF-8 -t ISO8859-9

Thursday, January 5, 2017

Open Journal Systems de Public Knowledge Project

https://en.wikipedia.org/wiki/Open_Journal_Systems

En décembre 2016, Open Journal Systems (OJS) a atteint 10 000 revues publiées et 420,000 articles:

https://pkp.sfu.ca/2016/12/15/ojs-reaches-10000/#more-1454532

Evolution du nombre de revues publiées avec OJS:

https://pkp.sfu.ca/ojs/ojs-usage/ojs-map/

Open Journal Systems (OJS) a été conçu pour faciliter le développement du libre accès, pairs, fournissant l’infrastructure technique, non seulement pour la présentation en ligne d’articles de revues, mais aussi un flux de travail de la rédaction entière, y compris : l’article de présentation, et d’indexation. OJS relie des personnes remplissant des rôles différents, tels que le gestionnaire de Journal, éditeur, critique, auteur, lecteur, etc.. Il dispose d’un module qui prend en charge les revues de l’abonnement..

Le logiciel possède une architecture de « plugin », similaire à d’autres projets communautaires tels que WordPress, ce qui permet de nouvelles fonctionnalités à intégrer facilement sans devoir modifier la base de code. Les plugins ont contribué à ce que l’OJS soit une base pour des outils afin de faciliter l’indexation dans Google Scholar et PubMed Central, un plugin aliment fournissant des flux de syndication RSS/Atom web, un plugin COUNTER, permettant aux statistiques de COUNTER et de rapports et plus. OJS est également conforme à LOCKSS,

(https://en.wikipedia.org/wiki/LOCKSS)

contribuant à assurer un accès permanent au contenu du journal d’archivage permanent.

Sous l'égide de l'Université de Stanford, le projet LOCKSS («Beaucoup de copies se tiennent en sécurité») est un réseau peer-to-peer qui développe et soutient un système open source permettant aux bibliothèques de recueillir, de préserver et de donner accès à leurs lecteurs du contenu publié sur le Web.

Afin d’améliorer l’engagement du lecteur, PKP a développé une série d’outils de lecture, qui permettent d’accéder à des études connexes, reportages médiatiques, politiques gouvernementales, etc.. dans les bases de données access ouvert.

Originalement publié en 2001, OJS est actuellement en version 3. 0. OJS est écrit en PHP, utilise soit un MariaDB (née MySQL) ou PostgreSQL, et peut être hébergé sur un serveur Web Unix-like ou Windows.

OJS 3.X

System Requirements

To run OJS 3.x, your web server will need:

PHP 5.3.7 or later with MySQL or PostgreSQL support
A database server: MySQL 4.1 or later OR PostgreSQL 9.1.5 or later
UNIX-like OS recommended (such as Linux, FreeBSD, Solaris, Mac OS X, etc.

Current Production Release (November 25, 2016)

OJS a été traduit dans de nombreuses langues y compris en français (32 langages).

https://github.com/pkp/ojs
https://pkp.sfu.ca/wiki/index.php?title=Developer_Documentation#Getting_Started
Le service d'hébergement OJS est offert moyennant des frais par PKP | PS (Publishing Services), ainsi qu'une variété de prestataires de services commerciaux et non commerciaux tiers non affiliés à PKP.
https://pkpservices.sfu.ca/content/plans-pricing

Pages