GithubHelp home page GithubHelp logo

gautamdhameja / medium-2-md Goto Github PK

View Code? Open in Web Editor NEW
138.0 4.0 18.0 264 KB

A CLI tool that converts exported Medium posts (html) to Jekyll/Hugo compatible markdown with front matter.

Home Page: https://www.npmjs.com/package/medium-2-md

License: Apache License 2.0

JavaScript 100.00%
medium markdown html html-to-markdown jekyll hugo frontmatter

medium-2-md's Introduction

medium-2-md

A CLI tool that converts medium posts (html) into Jekyll/Hugo compatible markdown files. Also downloads images and adds yaml front matter to the converted markdown files. It works with exported Medium posts (local html files) and converts them to markdown using a single command. It can be useful in scenarios when you want to migrate your blog away from Medium to Jekyll or Hugo (or something similar that supports markdown content).

Steps to use

Convert local Medium exports

  1. Export and extract your Medium posts from your Medium account.
    1. Go to https://medium.com/me/settings/security and click on Download your information. Click the export button. This will allow you to download a medium-export.zip archive containing all your Medium content.
    2. Extract the .zip archive downloaded in the previous step. It will have a sub-directory called posts.
    3. Copy the path of this posts directory.
  2. Install node.js and medium-2-md on your system.
    1. Download and Install node.js - https://nodejs.org/en/download/.
    2. Install medium-2-md - npm i -g medium-2-md.
  3. Run the following command to convert all your Medium posts (html) to markdown files,
medium-2-md convertLocal '<path of the posts directory>' -dfi

That's it. The output markdown files will be stored in a sub-directory called md_<a big number> in the input posts directory itself. (By the way, that big number is coming from the Date.now() JavaScript function, added to differentiate multiple output folders.)

The converted markdown files include front matter containing title, description, published date and canonical URL of the original Medium post/story. The images from the Medium posts are downloaded in a sub-directory called img inside the output directory.

Optional flags

The convertLocal command supports the following optional flags,

  1. -f or --frontMatter: Add the front matter on top of the markdown files.
  2. -i or --images: Download images to a local img sub-directory.
  3. -op or --path: Custom path for saving markdown files.
  4. -ip or --img-path: Custom path for downloading images.
  5. -d or --drafts: Convert the drafts too.

Example: Convert from local - front matter and images but no drafts

medium-2-md convertLocal '/home/user/Desktop/posts' -fi

Example: Convert from local - default output and images path

medium-2-md convertLocal '/home/user/Desktop/posts' -dfi

Example: Convert from local - with custom output and images path

medium-2-md convertLocal '/home/user/Desktop/posts' -dfi --path '/home/user/Desktop/md' --img-path '/home/user/Downloads/img'

Note: The flags do not support any defaults. You need to add them in order to get the respective results (drafts, images and/or front matter inclusion).

Custom Output and Image Paths

When using the -op or the --path flag, the output markdown files are written to this path instead of the default value. If this custom path is invalid or does not exist, the output files are written to the default path.

When using the -i or --images with the --img-path flag, the images are downloaded into the directory at this custom path. If this directory does not already exist, the images are downloaded to the default path. The image elements in the converted markdown files link to their respective local paths.

Dependencies

This package uses:

  1. turndown - to convert html into markdown.
  2. cheerio - to select and extract relevant html attributes from Medium posts' html files.
  3. commander - to enable command line interface.
  4. js-yaml - to add yaml front matter to markdown files.
  5. node-fetch - to download images.

medium-2-md's People

Contributors

dependabot[bot] avatar djensen47 avatar gautamdhameja avatar jozefcipa avatar mabhub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

medium-2-md's Issues

Get parameters are breaking file name generation

I just copied an URL from medium that contains a (weird) GET parameter:

https://medium.com/@benlaurie_18378/how-to-ruin-a-perfectly-good-container-d33250fca595?source=---------2------------------

When trying to archive it:

$ medium-2-md convertUrl 'https://medium.com/@benlaurie_18378/how-to-ruin-a-perfectly-good-container-d33250fca595?source=---------2------------------' -o .

The file will be called .md (and therefore hidden on *NIX systems).

PS: Maybe it could be fixed together with #4

Error on converting

Hi am getting an error. Any idea? I tried with single quotes around the directory (and with \\ instead of \) and same error. My files are in the posts folder. The arrows are pointing at the async command - there should be a comma after that... I would assume but maybe the error not did put one there?

C:\Users\danzen>medium-2-md convertLocal E:\Downloads\medium-export\posts -dfi
C:\Users\danzen\AppData\Roaming\npm\node_modules\medium-2-md\lib\workflow.js:26
            fs.readdirSync(inputPath).forEach(async file => {
                                              ^^^^^
SyntaxError: missing ) after argument list
    at createScript (vm.js:56:10)
    at Object.runInThisContext (vm.js:97:10)
    at Module._compile (module.js:542:28)
    at Object.Module._extensions..js (module.js:579:10)
    at Module.load (module.js:487:32)
    at tryModuleLoad (module.js:446:12)
    at Function.Module._load (module.js:438:3)
    at Module.require (module.js:497:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (C:\Users\danzen\AppData\Roaming\npm\node_modules\medium-2-md\index.js:5:18)

Add support for "page bundles"?

Hi,

I am interested in using this to export my medium blog to a hugo blog.
I really want to use the page bundle system in the hugo system. Could it be possible to add some support so that the exported md/images go into seperate folders?

Parameter order on macOS

Hello, Thanks for your work on this cli tool!

It seems that adding the flags after the directory(like in the examples) does not work on macOS.
medium-2-md covertLocal ./medium-export/posts -dfi does not work for me
while
medium-2-md convertLocal -dfi ./medium-export/posts does

Tested on MacOS Catalina v10.15.2

Trying to get in touch regarding a security issue

Hey there!

I'd like to report a security issue but cannot find contact instructions on your repository.

If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

Thank you for your consideration, and I look forward to hearing from you!

(cc @huntr-helper)

Convert from url is not working anymore.

It seems the HTML/CSS class and attributes names have been changed in Medium posts and the reader is not able to extract the article content from the html body.

Error converting file

Error converting file: 2016-06-03_Payette-Brewing--Idaho-Will-Be-Known-for-More-Than-Potatoes-9ac8792a37f5.html. Skipping.
internal/streams/legacy.js:59
      throw er; // Unhandled stream error in pipe.
      ^

Error: socket hang up
    at createHangUpError (_http_client.js:329:15)
    at TLSSocket.socketOnEnd (_http_client.js:421:23)
    at emitNone (events.js:110:20)
    at TLSSocket.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1057:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)

P.S. Thanks for making such a great tool!

including speaking url in file name

It would be nice if filenames would be converted into the 'speaking' urls instead article identifier.

So, for example if I want to archive an article with

$ medium-2-md convertUrl https://medium.com/@watzon/doing-crystal-2-501daaf8390d -o . -fi

it is resulting in 501daaf8390d.md. By looking at it's filename I'm not able to understand anymore what the article was about.

Maybe something like watzon_-_doing-crystal-2-501daaf8390d.md would be more speaking and recognizable?

What do you think @gautamdhameja?

all images are 13kb and not valid

Using docker to run...

docker run --rm -it -v $PWD:/src node:latest /bin/bash -c 'npm install -g mediumexporter && npm i -g medium-2-md && mkdir /src/medium/output && medium-2-md convertLocal /src/medium/posts --path /src/medium/output -dfi'

output looks good:

drew@drews-MBP drewkhoury-website % docker run --rm -it -v $PWD:/src node:latest /bin/bash -c 'npm install -g mediumexporter && npm i -g medium-2-md && mkdir /src/medium/output && medium-2-md convertLocal /src/medium/posts --path /src/medium/output -dfi'
npm WARN deprecated [email protected]: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated [email protected]: this library is no longer supported
npm WARN deprecated [email protected]: request has been deprecated, see https://github.com/request/request/issues/3142

added 50 packages, and audited 51 packages in 2s

2 packages are looking for funding
  run `npm fund` for details

4 moderate severity vulnerabilities

To address all issues, run:
  npm audit fix

Run `npm audit` for details.
npm notice
npm notice New patch version of npm available! 8.1.2 -> 8.1.4
npm notice Changelog: https://github.com/npm/cli/releases/tag/v8.1.4
npm notice Run npm install -g [email protected] to update!
npm notice

added 27 packages, and audited 28 packages in 2s

11 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities
Completed: 2019-11-09_Something-6b1e95c9f09.md
Completed: 2020-01-05_Teaching-DevOps-in-one-afternoon-e85f02ef036b.md
Completed: 2021-08-27_AWS-2021-Highlights-b16b6c59b4fe.md
...
Output path: /src/medium/output
...
Completed: 2019-11-23_One-DevOps-Please---Part-2-57aff9ad8595.md
Completed: draft_What-is-Good-Software-Delivery--48929b586c4d.md

the md files look great, but the images seem broken.

Screen Shot 2021-11-23 at 11 35 08 AM

Image Captions

Exported Medium files have Image captions under a figcaption tag.

<figcaption class="imageCaption">Some Caption Goes Here </figcaption>

At present, the conversion renders this caption as part of the all the other text present in the post. I'm aware of the markdown limitation for ![](http://image.url). But it would be nice to have ![Some Caption Goes Here](http://image.url)

Error converting file when using --images parameter (for stories under publications)

Hi, first of all, thanks for this cool tool 👍

When I try to convert with the command:

medium-2-md convertLocal posts/ -i this story: Lucky 7 new tools and plugins for Android developers & designers

I got only

Error converting file: 2018-09-25_Lucky-7-new-tools-and-plugins-for-Android-developers---designers-1545e5c59f27.html. Skipping.

What is interesting, my story 50 Android Studio Tips, Tricks & Resources you should be familiar with, as an Android Developer is successfully converted with all images and gifs.

Completed: 2016-11-07_50-Android-Studio-Tips--Tricks---Resources-you-should-be-familiar-with--as-an-Android-Developer-af86e7cf56d2.md

The only difference I noticed is that tips and tricks are not under any publication, however other stories are.

Please let me know where could be a problem.

Cheers 🍻

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.