GithubHelp home page GithubHelp logo

qls0ulp / blogger2kirby Goto Github PK

View Code? Open in Web Editor NEW

This project forked from otherjoel/blogger2kirby

0.0 0.0 0.0 152 KB

Python script for moving Blogger blogs (with images and comments) to Kirby CMS

License: The Unlicense

Python 100.00%

blogger2kirby's Introduction

blogger2kirby

Python script for moving Blogger blogs (with images and comments) to Kirby CMS.

Blogger allows you to export your blog as a single large XML file. My script takes this file, parses out just the blog posts and comments, and creates a folder and a text file in Markdown format for each post.

  • Any images are downloaded and given a unique filename in the post's folder
  • Image links are converted to Kirby format (looks like (image: image01.jpg))
  • Comments are appended to the end of each post, with comment author names/links and timestamps
  • Tags are preserved in the post's metadata

The resulting folders can simply be dropped into the content folder of a Kirby-based site, and boom: the blog has moved.

Requirements

The script is in Python 3. I couldn't use v2 because of problems with Unicode data.

Pandoc (version 1.13.1 or later) is also required for the HTML to Markdown conversion process.

The script also requires these libraries:

  • lxml, for parsing XML
  • BeautifulSoup, for parsing HTML
  • python-rfc3339, for parsing Atom timestamps in RFC3339 format (https://github.com/tonyg/python-rfc3339) -- note, at this time Python has no native support for parsing strings in this format. There is an open issue for this in Python's bug tracker, and the latest comment on that page identifies the above library as being the best one for the job.
  • requests, for downloading images over HTTP

On my Mac running Yosemite, the simplest way to get all these prerequisites was to install Homebrew, then run the following commands:

brew install python3
pip3 install git+https://github.com/tonyg/python-rfc3339.git
pip3 install lxml
pip3 install beautifulsoup4
pip3 install requests

brew install pandoc

Usage

Place your Blogger XML file in the same folder as the script, and name it blog.xml. Then run python3 blogger2kirby.py.

You can also run chmod u+x blogger2kirby.py to make it executable and then just run it as ./blogger2kirby.py, assuming your python3 lives in /usr/local/bin/python3 (if you installed it with Homebrew, that's where it would be).

You will see a lot of messages fly by about the posts being parsed out.

Afterwards there will be a folder named out in the current folder, containing a single folder for each post in the format YYYYMMDD-post-slug -- the slug will be the same as the filename on the post's original Blogger URI but without the .html -- this will allow for easy redirects.

Acknowledgements

This is my first Python script so I'm sure it's very rough in places.

I had googled and stack-overflowed about halfway through it when I came across this gist https://gist.github.com/larsks/4022537 by Lars Kellogg-Stedman. His is much better-written, but output is formatted for some other blogging platform, and doesn't download images or attempt to retain comments. I adopted one of the markdownify functions from that script, and it was his code that put me on to the use of lxml instead of the included ElementTree library.

This post was very helpful in understanding the Unicode string processing problems I was encountering in Python 2: Solving Unicode Problems in Python 2.7

blogger2kirby's People

Contributors

otherjoel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.