MARCXML, Enhanced MARCXML, JSON. What is happening here?
Inspire exposes an API for querying most aspects of its holdings and provides responses in either XML, Enhanced MARCXML or JSON.
Also for bulk download Inspire takes snapshots in JSON Format and Enhanced MARCXML format.
It has to be mentioned that with Inspire API it`s possible to query special MARC Fields data and download information directly in JSON or MARCXML formats.
The further request: http://inspirehep.net/record/1757461?of=xm&ot=100,200
returns
The JSON API operates similarly, with named fields instead of MARC tags. (So it is possible that MARC tags and JSON fields can be mapped not in a proper way). Since the field names are evolving, a comprehensive list is currently best found in the source: https://github.com/inspirehep/invenio/blob/prod/modules/bibfield/etc/atlantis.cfg
MARCXML is the native format used to store metadata in INSPIRE. All the bibliographic metadata that could be hand-curated are stored in the MARCXML format. Links between authors in papers to corresponding authors in HepNames, links between Author disambiguation and references are available in the Enhanced MARCXML format. This is based on the original MARCXML, but with additional subfields that express relations across records.
For more information on the additional relations see the detailed MARCXML description in: records markup.
To get Enhanced MARCXML record via API the of=xm has to be replaced by of=xme. So this request: http://inspirehep.net/record/1757461?of=xme
returns full Enhanced MARCXML record.
If would compare of=xm and of=xme full records we see additional info under 100 code:
XM
XME
But it doesn't work with such request http://inspirehep.net/record/1757461?of=xme&ot=100
it returns of=xm data.
Let's make a conclusion:
- There are API and available data in JSON, MARCXML and Enhanced MARCXML.
- There are dumps for JSON and Enhanced MARCXML.
- Data from JSON API is larger than from JSON dump.
- Enhanced MARCXML has additional custom fields that might coincide with this mapping.
- The mapping between JSON and MARC is described here.
Also we are doing business not with MARC but with MARCXML customized to Enhanced MARCXML with (possibly correct) mappings. There are many MARC/MARCXML to JSON converters in Python, GO, Scala, Perl, ... And all of them are working just with reading from MARC/MARCXML formats and link MARC codes to some JSON fields, so I suppose there is no universal converter for all custom mappings.
- Here is Python Converter: pymarc
- Here is GO converters: marc21 and marctools
- Here is Scala converter: scala-marc
- Here is Perl converter: Catmandu
The variety of these tools in different languages, for me, means that they all just read from XML and convert XML tags (MARC codes) to JSON Fields pursuant to some mappings.
So what we want to do and what we need to do?