Tuesday, February 14, 2017

PDF;PDF/A ISO; metadata Dubiln core schema in PDF; Indesign; PDF software mac OSX sierra

Le Portable Document Format, communément abrégé en PDF, est un langage de description de pages créé par la société Adobe Systems en 1993 comme évolution du format PostScript.

Le format PDF peut aussi être interactif. Il est possible d'incorporer des champs de textes, des notes, des corrections, des menus déroulants, des choix, des calculs, etc (formulaire PDF).
Date     version
2001, PDF 1.4 / Acrobat 5.0
2003, PDF 1.5 / Acrobat 6.0
2005, PDF 1.6 / Acrobat 7.0
2006, PDF 1.7 / Acrobat 8.0
2008, PDF 1.7, Adobe Extension Level 3, Acrobat 9.0
2009, PDF 1.7, Adobe Extension Level 5, Acrobat 9.1

Le format ouvert « ISO 32000-1:2008 PDF » a été publié par l'Organisation internationale de normalisation (ISO) le 1er juillet 2008.
PDF est à présent une norme ISO, intitulée
« Gestion de documents - - Format de document portable - - Partie 1: PDF 1.7 »

Quatre sous-ensembles du format PDF ont également été normalisés par l’ISO, il s'agit des formats
  • PDF/A-1 (PDF for Archive, référencé par la norme ISO 19005-1), 
  • PDF/X (PDF for eXchange), 
  • PDF/VT (PDF for Volume Transactional Output) 
  • PDF/E-1 (PDF for Engineering). 
De plus, un autre sous-ensemble est actuellement en proposition pour devenir une norme ISO : PDF/UA (PDF for Universal Access).

voir le très bon article (en anglais)

metadata et pdf

How do I add the full dublin core metadata set to a document and save it in PDF/A?
According to this webpage : PDF/A Metadata XMP, RDF & Dublin Core | PDF Association , Acrobat supports the 15 basis Dublin Core elements:
XMP is a more recent development. It was introduced with PDF 1.4 (Acrobat 5).
XMP is based on RDF (Resource Definition Framework). RDF is a W3C standard for XMP-based metadata (for more information, see https://www.w3.org/RDF/). XMP can be linked with XObjects on document pages. XObjects are also known as images and repeating objects. In addition, XMP can be linked with fonts and ICC profiles.

Dublin Core (dc) Schema

PropertyValue TypeCategoryDescription
dc:contributorbag ProperNameExternalContributors to the resource (other than the authors).
dc:coverageTextExternalThe extent or scope of the resource.
dc:creatorseq ProperNameExternalThe authors of the resource (listed in order of precedence, if significant).
dc:dateseq DateExternalDate(s) that something interesting happened to the resource.
dc:descriptionLang AltExternalA textual description of the content of the resource. Multiple values may be present for different languages.
dc:formatMIMETypeInternalThe file format used when saving the resource. Tools and applications should set this property to the save format of the data. It may include appropriate qualifiers.
dc:identifierTextExternalUnique identifier of the resource.
dc:languagebag LocaleInternalAn unordered array specifying the languages used in the resource.
dc:publisherbag ProperNameExternalPublishers.
dc:relationbag TextExternalRelationships to other documents.
dc:rightsLang AltExternalInformal rights statement, selected by language.
dc:sourceTextExternalUnique identifier of the work from which this resource was derived.
dc:subjectbag TextExternalAn unordered array of descriptive phrases or keywords that specify the topic of the content of the resource.
dc:titleLang AltExternalThe title of the document, or the name given to the resource. Typically, it will be a name by which the resource is formally known.
dc:typebag open ChoiceExternalA document type; for example, novel, poem, or working.

How to get metadata into PDF?

There is a range of solutions that can be used to automatically or manually add metadata to PDF. This overview shows some of the possibilities:

Info Dictionary

  • Adobe Acrobat Standard/Professional
  • Other 3rd party PDF viewers
  • Acrobat SDK
  • Adobe PDFLibrary
  • hunderte weitere Programme
  • Libraries und Tools


  • Adobe Acrobat Standard/Professional
  • PDF Enhancer von Apago
  • PdfLicenseManager
  • Acrobat SDK
  • Adobe PDFLibrary
  • Adobe XMPToolkit
  • PDF Enhancer von Apago
  • iText

Related Resources

Adobe Acrobat
XMP eXtensible Metadata Platform
  • PdfLicenseManager
  • PDF Enhancer
  • iText
    iText is a software developer toolkit that allows users to integrate PDF functionalities within their applications, processes or products. You know that PDF is one of the world's most widely used document formats, but perhaps you didn't know how it can keep being revolutionary? iText, in turn, is one of the best-documented and versatile PDF engines in the world. 

According to this webpage : PDF/A Metadata XMP, RDF & Dublin Core | PDF Association , Acrobat supports the 15 basis Dublin Core elements:
But when we tested the metadata appending process, only a very limited number of dc metadata was made available.
I then defined an XMP file with all dublin core properties, but when I append it to a document, it is not possible to assign individual values to the dc properties as they do not appear anywhere in a dialog box.

Metadata is a specialized issue within PDF whose potential value generally goes unappreciated.  As you have already discovered, Acrobat itself just scratches the surface, allowing easy access to just a few of the Dublin Core categories.  There are a number of vendors offering software specifically designed for editing PDF metadata, and perhaps one of their offerings will do what you want.
It is important to note in your case that PDF/A imposes particular limitations of metadata.  Back in 2012, Dave Merchant wrote in these forums that "PDF/A only permits a subset of the XMP schema, if the document has anything else in it the standards check will fail."  At that time he expressed hope the situation would improve, but I don't know that it has.  Obviously, you will want to confirm what categories of metadata the most recent PDF/A permits, and whether the software you contemplate using to certify your PDFFs can handle that version.  You may find it simpler to fit your customizations within what PDF/A permits.


We recently got an asset manager that allows us to add custom fields to the advanced XMP information. Is there a way to get inDesign to see these custom fields? Specifically for metadata caption uses.
The File Info panel is actually controlled by the XMP specifications. They are common across Adobe products. A search of "XMP" in the communities list turned up this link:
The quick answer is that InDesign XMP SDK kit, for which Steve references the forum, includes a "generic" panel intended to allow non-programmers to make a custom panel showing XMP data in categories not envisaged by InDesign's programmers -- only fitting, given that XMP's first name is "eXtensible."  I'm no programmer, but I've gotten one to work, and I include screen grabs from IDCS4 showing the panel I added to ID's document info for inserting bibliographic information in categories not available in Adobe's stock implementation.  Note that I made this in IDCS4, but it ported easily to IDCS6, which is as late as I go -- I assume the IDCC XMP SDK continues to include a generic panel despite significant changes in how panels operate.

The File Info panel (of Indesign) is actually controlled by the XMP specifications. They are common across Adobe products.

The PDF grab shows that the category both Acrobat and InDesign call "Keywords" corresponds to the category "Subject" under "dc" ("Dublin Core"): "Keyword" does not appear in the actual XMP data.  The categories are defined in schemas, and while you are free to invent your own it is often wiser to use an existing standard: thus the 3rd grab shows I drew on <http://prismstandard.org/namespaces/basic/2.0/> to add fields for holding bibliographic data. 
 Custom XMP panels must be able to access the definitions

Still, I managed to make a custom XMP panel for IDCS4 for adding PRISM publication metadata to the journal articles I work on (using the generic models in the XMP SDK 4.1).  The PRISM metadata is fairly esoteric, but I've also been trying to convince the editors I work with that we should be embedding basic metadata, including Keywords, in PDFs to be made available electronically.  Thus I was more than a little surprised to read in one of Carl's post:

> One search engine for journal articles looks as PDF Doc Info
> metadata, most search engines avoid all forms of metadata.

That would seem to fly in the face of WWW folk wisdom (for example, at InDesign Secrets), so I'm must be missing something.
On PRISM and XMP it may be worth mentioning that the Nature Group and Elsevier now routinely embed bibliographic metadata in PDFs using XMP; also, the citation-management program Mendeley was recently modified to access it.  Meanwhile, CrossRef, registrar of Digital Object Identifiers, has made available a tool (pdfmark) that, given a DOI, can go out and fetch the PRISM bibliographic metadata from CrossRef and insert it as XMP into a PDF using the very categories Carl lists (plus prism:copyright and prism:doi).  Putting accurate citation metadata inside the PDF seems a reasonable response to Google Scholar’s well-known difficulties in getting these details right.

On the larger issue, it is depressing to learn how the abuse of Keywords caused Google to ignore them in most contexts.  Still, I'll probably urge the editors I work with to provide them -- after all, some journals printed them on the page back when print was all there was.  They might still prove useful in walled-gardens today, and they may become effective again for large-scale search engines (and they're a lot easier to add now instead of 10 years hence). 
https://forums.adobe.com/thread/637815 (@2010)

PDF/A only permits a subset of the XMP schema, if the document has anything else in it the standards check will fail. It's not Acrobat's fault, it's the way the ISO standard was written.


metadata, XMP, InDesign et autres appli Adobe

sur mac dans le répertoire bibliothèque
vous avez deux dossiers Adobe/XMP:

1/ Metadata Templates
/Users/YourName/Library/Application Support/Adobe/XMP/Metadata Templates/
C'est là où sont sauvegardé vos "enregistrements " de metadata (fait avec le bouton "enregistrer").

2/Custom File Info Panels

faire attention sélectionner celui-ci:
/Users/YourName/Library/Application Support/Adobe/XMP/
(l'autre existe mais il est vide)

XMP FileInfo SDK

XMP Metadata UI SDK
Adobe Creative Cloud applications can be modified to display custom metadata UI to either adapt the existing UI to own workflow or to be able to interact with custom metadata. The XMP Metadata UI SDK provides documentation and samples on how to create such custom metadata UI. Starting with 2014 release, the extensibility mechanism has changed.

Current SDK:
This version of the SDK can be used with Creative Cloud 2014 applications or later. For the 2014 release, Photoshop CC, Illustrator CC and InDesign CC support this SDK. It offers a simple and easy to use mechanism to extend the metadata UI (also known as FileInfo dialog) without the need to write or compile code.
XMP Metadata UI SDK CC 2014
Customers who need more flexibility and functionality for custom metadata UI should use the Adobe Extension SDK that allows to write HTML5 based panels for Creative Cloud applications. The following link is an example of an extension panel that can interact with XMP and is meant as a starting point for developers.

Adobe Extensions SDK XMP sample panel
Previous SDKs: 
These versions of the SDK can be used with Creative Suite and Creative Cloud applications older than the 2014 version. 
For CC applications (before 2014) all custom FileInfo panels should always be installed into the user-specific location (as mentioned in the documentation). The shared location isn't used anymore. 

XMP FileInfo SDK CS6 and CC
XMP FileInfo SDK 5.1 (for CS5.x)
XMP FileInfo SDK 4.4.2 (for CS4)
XMP Custom File Info for pre-CS4 products (ZIP, 240K)

The File Info dialog is a standard dialog that many Adobe applications use to provide access to file metadata information. Some Adobe applications, however, including Adobe Bridge and audio-video applications such as Adobe Premiere® Pro, do not use the File Info dialog to display metadata. Instead, they have their own metadata palettes that rely on an XML representation of XMP properties.

The Generic Panel offers only limited support for properties defined in XML; it can display only simple XMP properties and comma-separated array lists, with limited layout capabilities. For more complex workflows, it is recommended that you create a custom panel using Flex components.

This SDK is an independent companion to the general XMP Toolkit SDK (also available at the same location) which provides an API for working with XMP metadata programmatically. The XMP FileInfo SDK is used to extend CS6 applications, whereas the XMP Toolkit SDK is used to build XMP support into other products and applications. The API is not required for building custom panels.

The File Info dialog is based on Adobe FlexTM technology. To extend it, you can use Adobe Flash Builder, a sophisticated development environment which is available as a standalone application or as an Eclipse plug-in. 


I have built a Custom XMP InfoPanel that shows up in CS6 Bridge just fine! But for ease of data entry we want it to show up in the Bridge MetaData panel. How is this done with the latest best practices?

read the chapter "3 The Generic Panel" on p. 38 in the XMP FileInfo SDK Programmer's guide, which comes with the XMP FileInfo SDK.

Create your xml file: IE
<?xml version='1.0' encoding='UTF-8'?>
  <xmp_schema prefix='mad' namespace='http://ns.deank.com/johnmadison/1.0/' label='Label=JohnMadison'>
    <xmp_property name='status' category='external' label='status=Status' type='text'/>
          <xmp_property name='notes' category='external' label='notes=Notes' type='text'/>

Save it at the path "/Library/Application Support/Adobe/XMP/Custom File Info Panels/4.0/custom/JohnMadison.xml"

Structuration de document , Références croisées

table des matières

exemple automatisation des références croisées

J'ai monté un document de >200 pages dans Indesign. Certains mots clés ou phrases clés du document doivent renvoyer à des annexes ou des articles dans ce même document. Par exemple "tomates fraîches" renvoie grâce à un lien hypertexte à un article sur la préparation des tomates fraîches à la fin du document. Le problème c'est que toutes les instances de "tomates fraîches" dans le document doivent renvoyer à ce même article. Créer les liens hypertexte un à un est extrêmement fastidieux puisqu'il y en a énormément...
Ma question est donc la suivante:
Existe-t-il un moyen d'automatiser la création de lien hypertexte sur un groupe de mot dans un document entier ?


PDF viewer/editor


Preview (en français Apercu)

L’app Aperçu facilite la visualisation et la modification de fichiers PDF et de fichiers d’image courants (JPEG, TIFF et PNG entre autres).
Fourni avec mac OSX 

Acrobat pro DC

modules externes


Convertit les fichiers PDF non balisés en fichiers Adobe PDF balisés. Le format PDF balisé est compatible et nécessaire avec les lecteurs d'écran pour des raisons d'accessibilité. En outre, les fichiers Adobe PDF balisés peuvent être redistribués à l'aide du module externe Reflow et enregistrés sous différents formats de sortie (XML, HTML et RTF, par exemple) en vue d'une réutilisation. Pour plus de détails sur la création de fichiers PDF accessibles, rendez-vous sur http://www.adobe.com/fr/accessibility.


Catalog permet de créer un index de texte intégral de vos documents ou jeux de documents Adobe PDF. Une fois l'index constitué, la commande Recherche dans plusieurs documents permet de parcourir rapidement l'ensemble du jeu.


Intégré à la zone de dialogue d’Adobe Acrobat DC « Créer un fichier PDF -> A partir du fichier », ce module externe permet d’ouvrir fichiers PostScript au format Adobe PDF.

Permet également d'ouvrir des fichiers Microsoft Office et d'autres documents à l'aide de PDFMaker pour créer des documents Adobe PDF.

Vous pouvez faire glisser ces fichiers et les déposer sur l’icône ou dans la fenêtre de l’application Adobe Acrobat DC.


Filtres permettant d'enregistrer un document Adobe PDF balisé au format XML ou en texte brut La structure incorporée par le programme de création dans un fichier Adobe PDF balisé détermine le contenu généré par le module externe SaveAsXML.



still later guests might be interested in CrossRef’s “experimental” Java tool, PDFMark, for putting PRISM bibliographic metadata into a PDF.
PDFMark goes out to the web and reads the metadata in from the DOI (at CrossRef). CrossRef released PDFMark in December 2009.
Then in February, 2010, the bibliography program Mendeley released a new version that added the ability to extract bibliographical metadata from PDFs, noting that both Nature and Elsevier now routinely insert this (PRISM) information in the journals they publish in PDF format.
Those wishing to add such metadata further back in the production process might want to look into adding a custom metadata panel in InDesign by modifying the “Generic” panel that comes with XMP FileInfo SDK. Note that there are two versions, for CS4 vs. earlier. I gather Acrobat 9 falls in the category pre-CS4 (and of course Acrobat 9 is the version announced to be included in CS5).
David W. Goodrich


“pdfmark” is an experimental open source tool that allows you to add Crossref metadata to a PDF. You can add metadata to a PDF by passing the tool a pre-generated XMP file, or you can apply Crossref bibliographic metadata by passing the command a Crossref DOI as an argument. If you pass it a Crossref DOI, it will automatically lookup the metadata for that DOI using the Crossref OpenURL API, generate XMP from said metadata and apply it to the PDF.

Note that pdfmark is non-destructive. It will always generate a new PDF with the XMP added to it. Having said this, pdfmark does not re-linearize the resulting file. To re-linearized the PDF you can simply use ghostscript’s pdfopt command or any similar tool (e.g. Acrobat Pro).

“pdfmark” is open source.
We have released it in order to encourage publishers and other content producers to start adding embedded bibliographic metadata to their PDFs.

We are assuming that you are at least technical enough to know whether you have a recent version of Java installed on your system and that you are comfortable with the command line.

If you had a PDF of Allen Renear and Carole Palmer’s Science article, “Strategic Reading, Ontologies, and the Future of Scientific Publishing” and said PDF file was named “renear_palmer.pdf”, simply invoking the following would add the relevant metadata to the PDF:

java -jar pdfmark.jar -d 10.1126/science.1157784 renear_palmer.pdf



Latest commit on 24 Mar 2014

pdfExtract and Cermine

Since the retirement of the project pdfExtract, we recommend that you use the excellent Cermine instead.

Pdf-extract is an open source set of tools and libraries for identifying and extracting semantically significant regions of a scholarly journal article (or conference proceeding) PDF.

CERMINE is a Java library and a web service (cermine.ceon.pl) for extracting metadata and content from PDF files containing academic publications. CERMINE is written in Java at Centre for Open Science at Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw.

How to cite CERMINE:

Dominika Tkaczyk, Pawel Szostek, Mateusz Fedoryszak, Piotr Jan Dendek and Lukasz Bolikowski.
CERMINE: automatic extraction of structured metadata from scientific literature.
In International Journal on Document Analysis and Recognition (IJDAR), 2015,
vol. 18, no. 4, pp. 317-335, doi: 10.1007/s10032-015-0249-8.
DOI of CERMINE release 1.8:

There are three way of using CERMINE, depending on the user's needs:

  • standalone application -- use this, if you need to process larger amounts of data locally on your laptop or server
  • Maven dependency -- allows to use CERMINE's API in your own Java/Scala code
  • web application -- for demonstration purposes and only small amounts (less than 50 files) of data

No comments:

Post a Comment