GithubHelp home page GithubHelp logo

isabella232 / pypandoc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pycontribs/pypandoc

0.0 0.0 0.0 158 KB

Thin wrapper for "pandoc" (MIT)

Home Page: http://pypi.python.org/pypi/pypandoc/

License: Other

Python 100.00%

pypandoc's Introduction

pypandoc

Build Status PyPI version conda version

pypandoc provides a thin wrapper for pandoc, a universal document converter.

Installation

pypandoc uses pandoc, so it needs an available installation of pandoc. For some common cases (wheels, conda packages), pypandoc already includes pandoc (and pandoc_citeproc) in it's prebuilt package.

If pandoc is already installed (pandoc is in the PATH), pypandoc uses the version with the higher version number and if both are the same, the already installed version. You can point to a specific version by setting the environment variable PYPANDOC_PANDOC to the full path to the pandoc binary (PYPANDOC_PANDOC=/home/x/whatever/pandoc or PYPANDOC_PANDOC=c:\pandoc\pandoc.exe). If this environment variabel is set, this is the only place where pandoc is searched for.

To use pandoc filters, you must have the relevant filter installed on your machine.

Installing via pip

Install via pip install pypandoc

Prebuilt wheels for Windows and Mac OS X include pandoc. If there is no prebuilt binary available, you have to install pandoc yourself.

If you use Linux and have your own wheelhouse, you can build a wheel which includes pandoc with python setup.py download_pandoc; python setup.py bdist_wheel. Be aware that this works only on 64bit intel systems, as we only download it from the official source.

Installing via conda

Install via conda install -c https://conda.anaconda.org/janschulz pypandoc.

You can also add the channel to your conda config via conda config --add channels https://conda.anaconda.org/janschulz. This makes it possible to use conda install pypandoc directly and also lets you update via conda update pypandoc.

Conda packages include pandoc and are available for py2.7, py3.4 and py3.5, for Windows (32bit and 64bit), Mac OS X (64bit) and Linux (64bit).

Installing pandoc

pandoc is available for many different platforms:

  • Ubuntu/Debian: sudo apt-get install pandoc
  • Fedora/Red Hat: sudo yum install pandoc
  • Arch: sudo pacman -S pandoc
  • Mac OS X with Homebrew: brew install pandoc pandoc-citeproc Caskroom/cask/mactex
  • Machine with Haskell: cabal-install pandoc
  • Windows: There is an installer available here
  • FreeBSD port

Usage

The basic invocation looks like this: pypandoc.convert('input', 'output format'). pypandoc tries to infer the type of the input automatically. If it's a file, it will load it. In case you pass a string, you can define the format using the parameter. The example below should clarify the usage:

import pypandoc

output = pypandoc.convert('somefile.md', 'rst')

# alternatively you could just pass some string to it and define its format
output = pypandoc.convert('#some title', 'rst', format='md')
# output == 'some title\r\n==========\r\n\r\n'

If you pass in a string (and not a filename), convert expects this string to be unicode or utf-8 encoded bytes. convert will always return a unicode string.

It's also possible to directly let pandoc write the output to a file. This is the only way to convert to some output formats (e.g. odt, docx, epub, epub3, pdf). In that case convert() will return an empty string.

import pypandoc

output = pypandoc.convert('somefile.md', 'docx', outputfile="somefile.docx")
assert output == ""

In addition to format, it is possible to pass extra_args. That makes it possible to access various pandoc options easily.

output = pypandoc.convert(
    '<h1>Primary Heading</h1>',
    'md', format='html',
    extra_args=['--atx-headers'])
# output == '# Primary Heading\r\n'
output = pypandoc.convert(
    '# Primary Heading',
    'html', format='md',
    extra_args=['--base-header-level=2'])
# output == '<h2 id="primary-heading">Primary Heading</h2>\r\n'

pypandoc now supports easy addition of pandoc filters.

filters = ['pandoc-citeproc']
pdoc_args = ['--mathjax',
             '--smart']
output = pd.convert(source=filename,
                    to='html5',
                    format='md',
                    extra_args=pdoc_args,
                    filters=filters)

Please pass any filters in as a list and not a string.

Please refer to pandoc -h and the official documentation for further details.

Dealing with Formatting Arguments

Pandoc supports custom formatting though -V parameter. In order to use it through pypandoc, use code such as this:

output = pypandoc.convert('demo.md', 'pdf', outputfile='demo.pdf',
  extra_args=['-V', 'geometry:margin=1.5cm'])

Note that it's important to separate -V and its argument within a list like that or else it won't work. This gotcha has to do with the way subprocess.Popen works.

Getting Pandoc Version

As it can be useful sometimes to check what Pandoc version is available at your system, pypandoc provides an utility for this. Example:

version = pypandoc.get_pandoc_version()

Related

pydocverter is a client for a service called Docverter, which offers pandoc as a service (plus some extra goodies). It has the same API as pypandoc, so you can easily write code that uses one and falls back to the other. E.g.:

try:
    import pypandoc as converter
except ImportError:
    import pydocverter as converter

converter.convert('somefile.md', 'rst')

See pyandoc for an alternative implementation of a pandoc wrapper from Kenneth Reitz. This one hasn't been active in a while though.

Contributing

Contributions are welcome. When opening a PR, please keep the following guidelines in mind:

  1. Before implementing, please open an issue for discussion.
  2. Make sure you have tests for the new logic.
  3. Make sure your code passes flake8 pypandoc.py tests.py
  4. Add yourself to contributors at README.md unless you are already there. In that case tweak your contributions.

Note that for citeproc tests to pass you'll need to have pandoc-citeproc installed. If you installed a prebuilt wheel or conda package, it is already included.

Contributors

  • Valentin Haenel - String conversion fix
  • Daniel Sanchez - Automatic parsing of input/output formats
  • Thomas G. - Python 3 support
  • Ben Jao Ming - Fail gracefully if pandoc is missing
  • Ross Crawford-d'Heureuse - Encode input in UTF-8 and add Django example
  • Michael Chow - Decode output in UTF-8
  • Janusz Skonieczny - Support Windows newlines and allow encoding to be specified.
  • gabeos - Fix help parsing
  • Marc Abramowitz - Make setup.py fail hard if pandoc is missing, Travis, Dockerfile, PyPI badge, Tox, PEP-8, improved documentation
  • Daniel L. - Add extra_args example to README
  • Amy Guy - Exception handling for unicode errors
  • Florian Eßer - Allow Markdown extensions in output format
  • Philipp Wendler - Allow Markdown extensions in input format
  • Jan Schulz - Handling output to a file, Travis to work on newer version of Pandoc, return code checking, get_pandoc_version. Helped to fix the Travis build.
  • Aaron Gonzales - Added better filter handling
  • David Lukes - Enabled input from non-plain-text files and made sure tests clean up template files correctly if they fail
  • valholl - Set up licensing information correctly and include examples to distribution version
  • Cyrille Rossant - Fixed bug by trimming out stars in the list of pandoc formats. Helped to fix the Travis build.
  • Paul Osborne - Don't require pandoc to install pypandoc.
  • Felix Yan - Added installation instructions for Arch Linux.

License

pypandoc is available under MIT license. See LICENSE for more details. pandoc itself is available under the GPL2 license.

pypandoc's People

Contributors

bebraw avatar benjaoming avatar coldfix avatar dlukes avatar esc avatar felixonmars avatar flesser avatar gabeos avatar jankatins avatar machow avatar msabramo avatar philippwendler avatar posborne avatar rhiaro avatar rossant avatar xysmas avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.