GithubHelp home page GithubHelp logo

fiduswriter / biblatex-csl-converter Goto Github PK

View Code? Open in Web Editor NEW
33.0 7.0 7.0 6.13 MB

A set of JavaScript converters: bib(la)tex => json, json => csl, and json => biblatex

License: GNU Lesser General Public License v3.0

JavaScript 5.77% HTML 0.52% TypeScript 93.71%
bibtex biblatex csl converter citations

biblatex-csl-converter's Introduction

biblatex-csl-converter Build Status

A set of JavaScript converters: biblatex => json, json => csl, and json => biblatex

Usage:

import {BibLatexParser} from "biblatex-csl-converter"

// synchronous:
let parser = new BibLatexParser(input, {processUnexpected: true, processUnknown: true})
let bib = parser.parse()

// asynchronous:
let parser = new BibLatexParser(input, {processUnexpected: true, processUnknown: true})
parser.parseAsync().then((bib) => { ... })

Try demo here.

FAQ

Q: Why do you use a different json as internal format and not just the json format of CSL? Wouldn't that save you one conversion step?

A: Unfortunately, the CSL json cannot hold all the information we import from biblatex, so if we used the json of CSL internally, we would loose information that we may want to export in biblatex later on.

Q: Do you import all information from the imported bibtex/biblatex files?

A: We only keep the information found in any of the required or optional fields defined in the BibLatex documentation. Other fields are removed upon import.

Q: How do I see if there are errors when parsing the BiBTeX/BibLatex file?

A: There is an array of errors that were encountered while parsing the file that can be found at parser.errors after you get the parser.output. There is also parser.warnings for less serious issues.

Q: I need access to the raw/non-processed contents of certain fields. What do I do?

A: The fields in their almost raw form can be found under entry.raw_fields[FIELD_NAME].

Q: What if I need to process fields that don't follow the biblatex definition?

A: You can initialize the parser with a config object like this: new BibLatexParser(inputString, {processUnexpected: true, processUnknown: {collaborator: 'l_name'}}). The processUnexpected setting will enable parsing of fields that are known, but shouldn't be in the bibliography entry due to its type. The processUnknown will allow parsing of fields that are entirely unknown. You can either set it to true, or you can set it to an object containing descriptions for the field types these unknown fields should be processed as. If a field is not specified, it will be processed as a literal field (f_literal). These fields will be available under entry.unexpected_fields[FIELD_NAME] and entry.unknown_fields[FIELD_NAME] respectively.

Q: I use variables in my biblatex files. Will your converter read them?

A: Yes, but in order for the converter to be able to create a string, the variables need to be defined. Undefined variables can also be handled by the biblatex importer/exporter, but when exporting to CSL, they just print out the variable name in an an HTML tag that is not supported by citeproc (and an error is thrown).

Q: I want to run the demo locally?

A: http-server is handy. Do a global install of http-server with npm install http-server -g and run http-server docs.

Q: I want to include this on my website and I don't use npm packages, etc. . Is there a file I can just add to the header of my webpage?

A: Yes, you can download such a file here.

Upgrading

  • From 1.x to 2.x: Note that the APi for the asynchronousparser has changed.

You need to change instances of this:

let parser = new BibLatexParser(input, {processUnexpected: true, processUnknown: true, async: true})
parser.parse().then((bib) => { ... })

to

let parser = new BibLatexParser(input, {processUnexpected: true, processUnknown: true})
parser.parseAsync().then((bib) => { ... })

biblatex-csl-converter's People

Contributors

dependabot[bot] avatar derdrodt avatar ggrossetie avatar johanneswilm avatar milahu avatar rcpeters avatar retorquere avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

biblatex-csl-converter's Issues

Immediately consecutive runs with the parser yield differing output

I have no idea what's going on. If I uncomment this line, a run of the tests will replace the "expected" output (which will then be read in immediately, and the test will predictably pass). But if you then re-comment that line and immediately run the test suite again, some of the tests fail. This means the parser result isn't deterministic, but damned if I know how this can be.

percentage sign in bibfields

I guess traditionally the percentage sign means in latex that only comments follow in that line. The parser also assumes that, whcih is why it's having problems with

SLACcitation   = "%%CITATION = ARXIV:1403.3399;%%"

Should this be accepted, @retorquere ?

token_mismatch error

On this entry:

@article{Dassler2001,
  author = {D{\"a}\ss{}ler, Klaus},
  doi = {10.1175/1520-0493(1987)115<1606:GARSPP>2.0.CO;2},
  journal = {Title, with comma},
  keywords = {Vulcanian eruptions,breadcrust,plinian},
  timestamp = {2015-02-24 12:14:36 +0100},
  title = {The Physical: Violent Volcanology of the 1600 Eruption of {{Huaynaputina}}, Southern {{Peru}}},
  volume = {62},
  year = {2001}
}

\vphantom\{ should be ignored

In fact \vphantom<latex construct> should be ignored, but for BBT purposes, \vphantom\{ will do. BBT generates \vphantom\{ to compensate for a bug in an older biber version which would break on title = {Aesthetic {{Judgment}}\}}, but not on title = {Aesthetic {{Judgment}}\vphantom\{\}} (which render to the same).

distinguish between warnings and errors

Errors should only be things that mean that a field or entry is ignored entirely.
Warnings should merely notify when there is something unexpected (for example an unknown bib entry type is imported, so it is saved a a misc).

How about warnings/errors when unknown/unexpeccted fields are encountered? What if the user specifies he wants these fields? SHould there still be an error? A warning? Or maybe a third level (info notification)?

@retorquere You spoke earlier about warnings and error. Is this the distinction you were thinking about?

Non-ascii citekeys

The parser currently flags citation keys like VlčekVácha2014HistoryLatin as errors. I think this must actually be supported by biblatex, because I've had users request generation of keys like this.

Parsing JabRef group structure

JabRef has a fairly... special format for storing hierarchical structure that I used to parse in the BBT parser. The format isn't especially difficult (but also not especially pretty). Where would be the best entry point to start parsing it? I need that info to replicate (where possible) the hierarchy in Zotero during import.

Error parsing reference

I get this error:

lib/import/literal-parser.js:19
        this.string = string;
                      ^

ReferenceError: string is not defined

when parsing

@Article{Abu-Zeid_1986,
Title = {Determination of the Thickness and Refractive Index of {Cu$_2$O} Thin Film Using Thermal and Optical
Interferometry},
Author = {Abu-Zeid, M. E. and Rakhshani, A. E. and Al-Jassar, A. A. and Youssef, Y. A.},
Journal = {Physica Status Solidi (a)},
Year = {1986},
Pages = {613--620},
Volume = {93},
Doi = {10.1002/pssa.2210930226}
}

Unknown date?

I'm getting this error: { type: 'unknown_date', entry: 'bentley_academic_2011', field_name: 'date' } when I parse the reference below. I think the field_name should probably be year rather than date, and the format in this field is sometimes used as the origdate. Again, not saying it's nice, but this is what people give me, and keeping in mind Postel's law...

EDIT: same goes for another entry that has year = {1875 [2004]}

@article{bentley_academic_2011,
  abstract = {State planning has been a defining means for modern subjects to regulate the passage of time. In practice, it is the focus of multiple conflicts and doubts, which planners attempt to mediate. In this paper, I address the regimes of time that planning both promotes and encounters, and tease out what these imply for anthropology. Using ethnography of Norwegian and Swedish planning offices and their encounters with participatory planning, I question recent claims that there has been an evacuation of the near future or a retreat of administrative intervention. I also suggest that recent anthropological concerns with time have been confined by their attempts to characterize the changing timescapes of specific modal shifts, such as from the modern to the neoliberal. Instead, in my ethnography, I focus not on tracking epochal breaks in time, but on demonstrating how time is manipulated, and how multiple temporalities are performed in ongoing projects of democratic planning.},
  annotation = {This is the child note},
  author = {Abram, Simone},
  doi = {10.1111/1467-9655.12097},
  issn = {1467-9655},
  journaltitle = {Journal of the Royal Anthropological Institute},
  keywords = {måske},
  langid = {english},
  pages = {129--147},
  rights = {© Royal Anthropological Institute 2014},
  shortjournal = {J R Anthropol Inst},
  shorttitle = {6~{{The}} Time It Takes},
  timestamp = {2015-02-24 12:14:36 +0100},
  title = {6~{{The}} Time It Takes: Temporalities of Planning},
  url = {http://onlinelibrary.wiley.com.ep.fjernadgang.kb.dk/doi/10.1111/1467-9655.12097/abstract},
  urldate = {2014-12-27},
  volume = {20},
  xref = {welker_andreas_1999},
  year = {[2014]}
}

Name particles

Right now,

author = {Aubignac, abbé d', François Hédelin},

parses to

[{"lastName":"Aubignac abbé d'","firstName":"François Hédelin"}]

I didn't do much better with my own parser, but do keep in mind that CSL will expect to see abbé d reported as a dropping or non-dropping particle rather than part of a name part. The BBT name generator, which generates Bib(La)TeX names from how they're stored in Zotero (which is why you see lastName rather than family in my name reports, sorry), is pretty involved and uses the citeproc name parser for part of the work.

Name parsing leaves leading space in name

author = {abbé d' Aubignac, François Hédelin}

parses to

[{"lastName":"abbé d' Aubignac","firstName":" François Hédelin"}]

rather than

[{"lastName":"abbé d' Aubignac","firstName":"François Hédelin"}]

Date ranges?

I have an entry that has Year = {1982--1983}. What to do with these?

\enquote is not parsed

@article{1e, author = "01e",  title = {Blah  \emph{emph} and \enquote{enquote}.},}

parses to

title: 'Blah <i>emph</i> and qnquote<span class="nocase">enquote</span>.'

The parser doesn't resolve LaTeX commands

If I parse this:

@InCollection{Madelung_1998_LB_10681727_56,
Title = {Cuprous oxide ({Cu$_2$O}) crystal structure, lattice parameters},
Author = {Madelung, O. and others},
Booktitle = {{L}andolt-{B}\"ornstein},
Publisher = {Springer-Verlag},
Year = {1998},
Editor = {Madelung, O. and R\"ossler, U. and Schulz, M.},
Series = {SpringerMaterials - The Landolt-B\"ornstein Database},
Volume = {III/41c},
Doi = {10.1007/10681727_56},
File = {Madelung_1998_LB_10681727_56.pdf:CopperOxides\Madelung_1998_LB_10681727_56.pdf:PDF},
Owner = {Francesco},
Timestamp = {2010.02.22}
}

and put it through the CSL Exporter, I get this:

{ title: 'Cuprous oxide (<span class="nocase">Cu_2O</span>) crystal structure, lattice parameters',
     author: [ { family: 'Madelung', given: 'O.' }, { literal: 'others' } ],
     'container-title': '<span class="nocase">L</span>andolt-<span class="nocase">B</span>rornstein',
     publisher: [ 'Springer-Verlag' ],
     editor:
      [ { family: 'Madelung', given: 'O.' },
        { family: 'R\\"ossler', given: 'U.' },
        { family: 'Schulz', given: 'M.' } ],
     'collection-title': 'SpringerMaterials - The Landolt-B\\"ornstein Database',
     volume: 'III/41c',
     DOI: '10.1007/10681727_56',
     issued: { 'date-parts': [ 1998 ] },
     type: 'entry',
     id: '0' }

(note the \ commands, and Cu$_2$O should have become Cu2O)

information request: pages field

When parsing the pages field I get this structure:

[[[{"type":"text","text":"300"}],[{"type":"text","text":"301"}]]]

does that mean "a pages field can occur multiple times, and can be a range"? I didn't know the pages field could occur multiple times.

Reference hangs parser

When I try to parse these references, the parser hangs

@article{bentley_academic_2011,
  abstract = {State planning has been a defining means for modern subjects to regulate the passage of time. In practice, it is the focus of multiple conflicts and doubts, which planners attempt to mediate. In this paper, I address the regimes of time that planning both promotes and encounters, and tease out what these imply for anthropology. Using ethnography of Norwegian and Swedish planning offices and their encounters with participatory planning, I question recent claims that there has been an evacuation of the near future or a retreat of administrative intervention. I also suggest that recent anthropological concerns with time have been confined by their attempts to characterize the changing timescapes of specific modal shifts, such as from the modern to the neoliberal. Instead, in my ethnography, I focus not on tracking epochal breaks in time, but on demonstrating how time is manipulated, and how multiple temporalities are performed in ongoing projects of democratic planning.},
  annotation = {This is the child note},
  author = {Abram, Simone},
  doi = {10.1111/1467-9655.12097},
  issn = {1467-9655},
  journaltitle = {Journal of the Royal Anthropological Institute},
  keywords = {måske},
  langid = {english},
  pages = {129--147},
  rights = {© Royal Anthropological Institute 2014},
  shortjournal = {J R Anthropol Inst},
  shorttitle = {6~{{The}} Time It Takes},
  timestamp = {2015-02-24 12:14:36 +0100},
  title = {6~{{The}} Time It Takes: Temporalities of Planning},
  url = {http://onlinelibrary.wiley.com.ep.fjernadgang.kb.dk/doi/10.1111/1467-9655.12097/abstract},
  urldate = {2014-12-27},
  volume = {20},
  xref = {welker_andreas_1999},
  year = {[2014]}
}

@article{pollard_bicycle_2007,
  author = {Pollard, A. Mark and Bray, Peter},
  journaltitle = {Annu. Rev. Anthropol.},
  pages = {245--59},
  shortjournal = {Annu Rev Anthr.},
  timestamp = {2015-02-24 12:14:36 +0100},
  title = {A Bicycle Made for Two? {{The}} Integration of Scientific Techniques into Archaeological Interpretation},
  url = {http://www.annualreviews.org/doi/pdf/10.1146/annurev.anthro.36.081406.094354},
  volume = {36},
  year = {[2007]}
}

@jurisdiction{kartinyeri,
  author = {Kaetinyeri v Commonwealth},
  date = {1998-04-01},
  institution = {{HCA}},
  journaltitle = {CLR},
  keywords = {au},
  pages = {337},
  shorttitle = {Kartinyeri},
  timestamp = {2015-02-24 12:14:36 +0100},
  title = {Kartinyeri v {{Commonwealth}}},
  translator = {Kartinyeri},
  volume = {195}
}

Error in entry key stops parsing all further entries

I have in my test files one entry the looks like this

@Article{Dobrovol'skii_1963,
  Title                    = {Use of the {H}all current for investigation of carrier scattering in semiconductors},
  Author                   = {Dobrovol'skii, V. N. and Gritsenko, Yu. I.},
  Journal                  = {Soviet Physics - Solid State},
  Year                     = {1963},
  Note                     = {Original paper in Russian: Fizika Tverdogo Tela, 4 (1962), pp. 2760--2769},
  Pages                    = {2025--2031},
  Volume                   = {4},

  Comment                  = {SBB},
  File                     = {Dobrovol'skii_1963.pdf:CopperOxides\\Dobrovol'skii_1963.pdf:PDF},
  Owner                    = {Francesco},
  Timestamp                = {2009.04.14}
}

Clearly broken, but it stops parsing the whole file, which makes it harder or provide feedback.

Unparsed fields from my test suite

I've just ran through all my test files, and this is the list of fields that the parser flags as errors. Some of these clearly are bogus, but these are all references from the field.

  • collaborator
  • keywords
  • owner
  • timestamp
  • comment
  • numpages
  • adsnote
  • adsurl
  • refid
  • nationality
  • yearfiled
  • day
  • dayfiled
  • monthfiled
  • review
  • options
  • langid
  • rights
  • xref
  • crossref
  • biburl
  • description
  • ee
  • groups
  • interhash
  • intrahash
  • bibsource
  • director
  • scriptwriter
  • pmid
  • affiliation
  • author-email
  • doc-delivery-number
  • funding-acknowledgement
  • funding-text
  • journal-iso
  • keywords-plus
  • number-of-cited-references
  • subject-category
  • times-cited
  • usere
  • presort
  • related
  • relatedtype

handling of key fields

Some fields are defined as holding string keys. If I understand the biblatex instructions correctly, the fields can either contain one of the known options, or a random string. What I cannot figure out is if this is true for all these fields, or more for some than others. Also, for some fields there doesn't seem to be a good list specifying which are the "known" keys.

The fields that are defined as key fields are:

pagination, pubstate, type, authortype, bookpagination, editortype, editoratype, editorbtype, editorctype, origlanguage.

language is defined as a "list of keys".

For these it's unclear to me what the "known options" are:

authortype, origlanguage, language.

There is also a field langid, which only seems to permit the languages listed in table 2, page 26 of http://tug.ctan.org/macros/latex/exptl/biblatex/doc/biblatex.pdf .

What is not clear to me: Are the languages for langid the same ones that are the "known keys" for language and origlanguage? Is langid just one value, while language allows multiple? Do I just choose one of them?

Any idea, @retorquere ?

line endings in test files

On some machines the importer produces output with \r\n line endings, on others it produces \n. The line endings are part of the test output files, so this goes beyond having common line handling across contributor OSs (which should be standardized through ce02d94 ).

TypeError: Cannot read property 'replace' of undefined

The script below errors out with TypeError: Cannot read property 'replace' of undefined when I feed it the attached file.
fw.txt

var BibTeXParser, data, fs, parseReferences;

BibTeXParser = require('biblatex-csl-converter').BibLatexParser;

fs = require('fs');

parseReferences = function(input) {
  var parser, references;
  parser = new BibTeXParser(input, {
    rawFields: true,
    processUnexpected: true,
    processUnknown: true
  });

  /* this must be called before requesting warnings or errors -- this really, really weirds me out */
  references = parser.output;

  /* references is an array-ish object */
  references.length = Object.keys(references).length;

  /* relies on side effect of calling '.output' */
  return {
    references: references,
    groups: parser.groups,
    errors: parser.errors,
    warnings: parser.warnings
  };
};

data = fs.readFileSync("test/fixtures/import/Maintain the JabRef group and subgroup structure when importing a BibTeX db #97.bib", 'utf8');

console.log(parseReferences());

7a76ad7 breaks on this reference

This is the error:

/Users/emile/zotero/biblatex-csl-converter/lib/import/biblatex.js:405
            var openBraces = (theValue.match(/\{/g) || []).length,
                                       ^

TypeError: theValue.match is not a function
@article{BachmannDybalskiNaaijkens14,
  archivePrefix = {arXiv},
  author = {Bachmann, Sven and Dybalski, Wojciech and Naaijkens, Pieter},
  date = {2014-12-09},
  eprint = {1412.2970},
  eprinttype = {arxiv},
  keywords = {Lieb-Robinson bounds,Mathematical Physics,Quantum Physics},
  primaryClass = {math-ph, physics:quant-ph},
  timestamp = {2015-02-24 12:14:36 +0100},
  title = {Lieb-{{Robinson}} Bounds, {{Arveson}} Spectrum and {{Haag}}-{{Ruelle}} Scattering Theory for Gapped Quantum Spin Systems},
  url = {http://arxiv.org/abs/1412.2970},
  urldate = {2015-02-07}
}

separate annotation fields?

right now we have defined optional and required fields for each bibtype.

However, the fields keyword, abstract and annotations are not really listed as optional fields in the biblatex manual, as these are fields that allow for annotation functionality for any kind of entry type.

Biblatex also allows for annotation fields per field by using the +an suffix (for example title+an), and we don't handle those at all right now.

I wonder whether it would be better to handle all of these as a third kind of fields "annotation fields", that are only defined once for all entry types and which additionally automatically include all defined input types with +an suffixes for any given type.

On the other hand, the rules for how such annotations are structured seems complex and I'm not sure it is being used much in real-world applications. So maybe that is a bit too much? Do you have any experience with these, @retorquere ?

BibLatexParser output has no "length" field

The references are returned in .output as an object with string-formatted numeric keys ('0' rather than 0), and the output object doesn't have a length field, so I have to loop through the object until I find an undefined to know I've had them all.

Date error on "date = {1723~}"

date = {1723~} means "approximately 1723" and is a supported edtf date. I can highly recommend edtf.js for etdf date parsing.

\_ not parsed correctly

title = {bugreports\_{{ApostropheOnParticle}}}

is translated to

bugreports<sub><span class="nocase">ApostropheOnParticle</span></sub>

rather than

bugreports_<span class="nocase">ApostropheOnParticle</span>

"collaborator" field is not parsed

@incollection{robinson2007theoriesglobalization,
  title = {{Theories of globalization}},
  booktitle = {{The Blackwell companion to globalization}},
  publisher = {{John Wiley \& Sons}},
  author = {Robinson, William I},
  collaborator = {Ritzer, George},
  year = {2007},
  keywords = {orphaned},
  pages = {125-143},
  annote = {Extracted Annotations (zondag 17 november 2013 15:45:35) "What is the relationship between globalization and the nation-state? Is the nation-state being undermined?" (Robinson 2007:127) Relatedly, to what extent is the relationship between social structure and territoriality being redefined by globalization? Is there a deterritorialization of social relations under globalization? What is the relationship between the local and the global? How are space and time being reconfigured? (note on p.127) "Relatedly, to what extent is the relationship between social structure and territoriality being redefined by globalization? Is there a deterritorialization of social" (Robinson 2007:127) "relations under globalization? What is the relationship between the local and the global? How are space and time being reconfigured?" (Robinson 2007:128) "THE NETWORK SOCIETY Manuel Castells' groundbreaking trilogy, The Rise of the Network Society (1996, 1997, 1998), exemplifies a 'technologistic' approach to globalization" (Robinson 2007:132) "'communication decisively shapes culture because we do not see . . . reality as it 'is' but as our languages are'. He adds, 'we are not living in a global village, but in customized cottages, globally produced and locally distributed' (1996: 370)." (Robinson 2007:133)},
  file = {Robinson_2007_Theories of globalization.pdf:Better BibTeX.015/F64PVIN4/Robinson_2007_Theories of globalization.pdf:application/pdf}
}

Case-protection should only be applied to (English) title fields

In this entry, case-protection is applied to the note field. Case protection is only in relevant for title fields (and then technically, only for English references)

@article{sasson_increasing_2013,
  title = {Increasing cardiopulmonary resuscitation provision in communities with low bystander cardiopulmonary resuscitation rates: a science advisory from the American Heart Association for healthcare providers, policymakers, public health departments, and community leaders},
  volume = {127},
  issn = {1524-4539},
  shorttitle = {Increasing cardiopulmonary resuscitation provision in communities with low bystander cardiopulmonary resuscitation rates},
  doi = {10.1161/CIR.0b013e318288b4dd},
  language = {eng},
  number = {12},
  journal = {Circulation},
  author = {Sasson, Comilla and Meischke, Hendrika and Abella, Benjamin S and Berg, Robert A and Bobrow, Bentley J and Chan, Paul S and Root, Elisabeth Dowling and Heisler, Michele and Levy, Jerrold H and Link, Mark and Masoudi, Frederick and Ong, Marcus and Sayre, Michael R and Rumsfeld, John S and Rea, Thomas D and {American Heart Association Council on Quality of Care and Outcomes Research} and {Emergency Cardiovascular Care Committee} and {Council on Cardiopulmonary, Critical Care, Perioperative and Resuscitation} and {Council on Clinical Cardiology} and {Council on Cardiovascular Surgery and Anesthesia}},
  month = {mar},
  year = {2013},
  note = {{PMID:} 23439512},
  keywords = {Administrative Personnel, American Heart Association, Cardiopulmonary Resuscitation, Community Health Services, Health Personnel, Heart Arrest, Humans, Leadership, Public Health, United States},
  pages = {1342--1350}
}

languages for langid and for languages field differ and look pretty random

These fields correspond to predefined values that exist for either field. So maybe there is not a lot we can do about this, but it will seem odd to any end user. I think it would be interesting to hear why biblatex doesn't add translation strings for some more of these, for example.

For language, these are defined by the translation strings that are predefined by biblatex:

['catalan', 'croatian', 'czech', 'danish',
'dutch', 'english', 'american', 'finnish', 'french', 'german', 'greek',
'italian', 'latin', 'norwegian', 'polish', 'portuguese', 'brazilian', 'russian',
'slovene', 'spanish', 'swedish']

(Note the absence of Chinese, among others, and the odd labeling for US English: "american").

Any custom value will be accepted in the language field, but these will be printed outright: Only the above will be translated to other languages automatically.

For langid the options are defined through support from babel/polyglossia and we currently have:

["acadian", "afrikaans", "arabic", "basque", "bulgarian", "catalan", "pinyin", "croatian", "czech", "danish", "dutch", "australian", "canadian", "newzealand", "ukenglish", "usenglish", "estonian", "finnish", "french", "canadien", "ngerman", "naustrian", "greek", "hebrew", "hungarian", "icelandic", "italian", "japanese", "latin", "latvian", "lithuanian", "magyar", "mongolian", "norsk", "nynorsk", "farsi", "polish", "portuguese", "brazilian", "romanian", "russian", "serbian", "serbianc", "slovak", "slovene", "spanish", "swedish", "thai", "turkish", "ukrainian", "vietnamese"]

We also support various aliases of these. I assume that there is an actual language difference between Cyrillic Serbian and Serbian, and not just two different packages. The langid field does not allow custom input, but the above list covers all our test files.

endnote support

endnote apparently doesn't add citation keys:

@article{
	  author = {Huang, Y. and Mucke, L.},
	  title = {Alzheimer mechanisms and therapeutic strategies},
	  journal = {Cell},
	  volume = {148},
	  number = {6},
	  pages = {1204-22},
	  note = {Huang, Yadong
	Mucke, Lennart
	AG011385/AG/NIA NIH HHS/
	AG022074/AG/NIA NIH HHS/
	Cell. 2012 Mar 16;148(6):1204-22.},
	  abstract = {There are still no effective treatments to prevent, halt, or reverse Alzheimer's disease, but research advances over the past three decades could change this gloomy picture. Genetic studies demonstrate that the disease has multiple causes. Interdisciplinary approaches combining biochemistry, molecular and cell biology, and transgenic modeling have revealed some of its molecular mechanisms. Progress in chemistry, radiology, and systems biology is beginning to provide useful biomarkers, and the emergence of personalized medicine is poised to transform pharmaceutical development and clinical trials. However, investigative and drug development efforts should be diversified to fully address the multifactoriality of the disease.},
	  year = {2012}
	}

Parser doesn't recognize variables

This fails to import:

@inproceedings{test_citation1,
  abstract = {Abstract here},
  author = {Doe, J. and Smith, R.},
  booktitle = IEEE_J_PROC,
  doi = {10.1109/JPROC.2016.2526118},
  keywords = {Keyword 1,Keyword 2},
  month = feb,
  pages = {300--301},
  timestamp = {2015-02-24 12:14:36 +0100},
  title = {Test {{Title With}} 100~{{$\mu$J}}, 200\,{{$\mu$J Energy}}, and $\pm$0.1\% {{Accuracy}}, 0.2\,Mm$^2$ {{Size}}, and -50\,{{dB Attenuation}}},
  year = {2016}
}

Parser lower-cases citation keys

While they may (or may not, no idea) be semantically the same, people will want to have the citation keys they provided kept in BBT.

{\emph{Thunnus thynnus}} should not get nocase

currently,

title = {{{Atlantic}} bluefin tuna ({\emph{Thunnus thynnus}})}

imports to

title: '<span class="nocase">Atlantic</span> bluefin tuna (<span class="nocase"><i>Thunnus thynnus</i></span>)'

but as you'll see if you compile the example below, {\emph{...}} doesn't case-protect in BibLaTeX, so semantically it should be translated to just

title: '<span class="nocase">Atlantic</span> bluefin tuna (<i>Thunnus thynnus</i>)'

\documentclass{article}
\usepackage[
    backend=biber,
    natbib=true,
    url=false, 
    doi=true,
    eprint=false,
    style=apa,
]{biblatex}

\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}

@String{pub-FRED = "Freds Publishing"}
@String{pub-FRED:adr = "London, UK"}

@Book{cit1,
  author = {R. A. Bert},
  title = {{{Atlantic}} bluefin tuna ({\emph{Thunnus thynnus}})},
  publisher = pub-FRED,
  address = pub-FRED:adr
}
\end{filecontents}

\addbibresource{\jobname.bib}

\begin{document}

\cite{cit1}

\printbibliography

\end{document}

\LaTeX is not parsed

I don't really know what \LaTeX should be parsed to. Perhaps just LaTeX, or a combination like L$^A$T$_E$X. But right now it parses to łatex

BibLaTeX output

I'm not entirely sure yet whether I'll also use biblatex-csl-converter for BibLaTeX output (see below), but is biblatex-csl-converter going to allow encoding unicode as latex constructs, or will it just trust biblatex to handle unicode?

The reasons why I might stick with my own BibLaTeX production:

  • I need to do BibTeX in addition to BibLaTeX
  • For BibTeX I need to be able to re-code unicode characters into LaTeX constructs
  • I'd have to convert Zotero to bibDB, might as well just produce biblatex at that point.

"keywords" field not parsed

On this entry, the keywords field is not parsed

@article{sasson_increasing_2013,
  title = {Increasing cardiopulmonary resuscitation provision in communities with low bystander cardiopulmonary resuscitation rates: a science advisory from the American Heart Association for healthcare providers, policymakers, public health departments, and community leaders},
  volume = {127},
  issn = {1524-4539},
  shorttitle = {Increasing cardiopulmonary resuscitation provision in communities with low bystander cardiopulmonary resuscitation rates},
  doi = {10.1161/CIR.0b013e318288b4dd},
  language = {eng},
  number = {12},
  journal = {Circulation},
  author = {Sasson, Comilla and Meischke, Hendrika and Abella, Benjamin S and Berg, Robert A and Bobrow, Bentley J and Chan, Paul S and Root, Elisabeth Dowling and Heisler, Michele and Levy, Jerrold H and Link, Mark and Masoudi, Frederick and Ong, Marcus and Sayre, Michael R and Rumsfeld, John S and Rea, Thomas D and {American Heart Association Council on Quality of Care and Outcomes Research} and {Emergency Cardiovascular Care Committee} and {Council on Cardiopulmonary, Critical Care, Perioperative and Resuscitation} and {Council on Clinical Cardiology} and {Council on Cardiovascular Surgery and Anesthesia}},
  month = {mar},
  year = {2013},
  note = {{PMID:} 23439512},
  keywords = {Administrative Personnel, American Heart Association, Cardiopulmonary Resuscitation, Community Health Services, Health Personnel, Heart Arrest, Humans, Leadership, Public Health, United States},
  pages = {1342--1350}
}

escaped braces ignored, text cut short

({{Liquid}}+liquid) Equilibrium of {{\{water+phenol+(1-butanol, or 2-butanol, or tert-butanol)\}}} Systems

is parsed to

(<span class="nocase">Liquid</span>+liquid) Equilibrium of <span class="nocase">water+phenol+(1-butanol, or 2-butanol, or tert-butanol)</span>

rather than (something like)

(Liquid+liquid) equilibrium of <span class="nocase">{water+phenol+(1-butanol, or 2-butanol, or tert-butanol)}</span> systems

Part of title interpreted as variable?

When I run this script:

var BibTeXParser, data, fs, parseReferences;

BibTeXParser = require('biblatex-csl-converter').BibLatexParser;

parseReferences = function(input) {
  var parser, references;
  parser = new BibTeXParser(input, {
    rawFields: true,
    processUnexpected: true,
    processUnknown: true
  });

  references = parser.output;

  return {
    references: references,
    groups: parser.groups,
    errors: parser.errors,
    warnings: parser.warnings
  };
};

data = `
@InProceedings{test_citation1,
  Title                    = {{T}est {T}itle {W}ith 100~$\mu\${J}, 200\,{\mbox{$\mu$}}{J} {E}nergy, and $\pm$0.1\% {A}ccuracy, 0.2\,mm$^2$ {S}ize, and $-$50\,d{B} {A}ttenuation},
}
`

console.log(JSON.stringify(parseReferences(data), null, 2));

the last part of the title comes out as

{
            "type": "variable",
            "attrs": {
              "variable": " {A}ccuracy, 0.2,mm$^2$ {S}ize, and $-$50,d{B} {A}ttenuation"
            }
          }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.