Light

hfmark / tektonika_copyedit Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 1.0 386 KB

Scripts for automated processing of .docx files into a LaTeX template for Tektonika

TeX 41.25% Python 58.75%

tektonika_copyedit's Introduction

docx parsing for Tektonika

dependencies:

python 3.n (preferably 3.8+)
numpy
biblib

A conda environment is a nice way to set this up.

You will also need to have pandoc (pandoc.org/) installed for the initial conversion of the .docx file.

general steps for the conversion process

convert .docx article file to latex
- pandoc file.docx -f docx -t latex --wrap=none -s -o file_pandoc.tex
copy-paste bibliography from docx into anystyle.io (or run the anystyle gem locally if you're into that) and output as bibtex, save as a .bib file
- [anystyle.io -> file_anystyle.bib]
fix anystyle bibtex file year fields and keys, make a new .bib file
- (set input filenames manually in the script)
- fix_bibtex.py -> file_init.bib
manually correct any non-ascii keys in bib file, if there are any
- (these will be printed to stdout so we know they need to be fixed, usually for non-ascii characters)
- (feels like there should be a way around this but I don't know it)
parse the pandoc output tex file to a better tex format
- (set the input filenames manually in the script)
- parse_pandoc_file.py -> file_init.tex
run bibtex and pdflatex, look at the output and figure out what needs fixing
- pdflatex file_init.tex -> file_init.pdf
- bibtex file_init.aux
- pdflatex ''
- pdflatex ''
- (running at least twice gives inline references a chance to sort themselves out)
manually link figure files at the right sizes, adjust placement of automated \includegraphics as needed
- pandoc does not extract image files from word so they will need to be uploaded separately
manually adjust for extra bits of inline citations (in red), in line citations for multiple papers by the same authors (hopefully in red), and year-only citations (in red)
add extra hyphenation rules for words latex doesn't know if columns are overfull
manually add authors, affilitations, short title, other header metadata with default placeholders
look at junk file and manually reformat/place tables in text where they belong (because I do not understand longtable)

TODO:

make sure catch for supplemental figures/tables works for in-text references
figure out parsing for author metadata
figure out longtable/table parsing?
parse extra bits of citations, like 'e.g.,' wherever possible
more user-friendly startup (ie input filenames, rather than editing scripts)
- related: complete workflow that runs all scripts in sequence automatically
- and maybe make this all install as a package?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs