GithubHelp home page GithubHelp logo

rlewkowicz / article-parser Goto Github PK

View Code? Open in Web Editor NEW

This project forked from extractus/article-extractor

0.0 1.0 0.0 316 KB

A poorly made and poorly maintained fork of article-parser that uses puppeteer

Home Page: http://secondpage.io

License: MIT License

JavaScript 100.00%

article-parser's Introduction

article-parser

Extract main article, main image and meta data from URL.

NPM Build Status codecov Dependency Status NSP Status

Usage

npm install article-parser

Then:

var {
  extract
} = require('article-parser');

let url = 'https://goo.gl/MV8Tkh';

extract(url).then((article) => {
  console.log(article);
}).catch((err) => {
  console.log(err);
});

APIs

configure(Object conf)

{
  fetchOptions: Object,
  wordsPerMinute: Number,
  htmlRules: Object,
  SoundCloudKey: String,
  YouTubeKey: String,
  EmbedlyKey: String
}
  • fetchOptions: Object, simple version of node-fetch options. Only headers, timeout and agent are available here.
  • wordsPerMinute: Number, default 300, use to estimate time to read
  • htmlRules: Object, options to to clean HTML with sanitize-html
  • SoundCloudKey: String, use to get audio duration. Get it here.
  • YouTubeKey: String, use to get video duration. Get it here.
  • EmbedlyKey: String, use to extract with Embedly API. Refer here.

Default configurations may work for most case.

extract(String url)

Extract article data from specified url.

var {
  extract
} = require('article-parser');

let url = 'https://www.youtube.com/watch?v=tRGJj59G1x4';

extract(url).then((article) => {
  console.log(article);
}).catch((err) => {
  console.log(err);
});

Now article would be something like this:

{
  title: 'Zato ESB - Test demo hosted on company server',
  alias: 'zato-esb-test-demo-hosted-on-company-server-1500021746537-PAQXw8IYcU',
  url: 'https://www.youtube.com/watch?v=tRGJj59G1x4',
  canonicals:
   [ 'https://www.youtube.com/watch?v=tRGJj59G1x4',
     'https://youtu.be/tRGJj59G1x4',
     'https://www.youtube.com/v/tRGJj59G1x4',
     'https://www.youtube.com/embed/tRGJj59G1x4' ],
  description: 'Our sample: https://github.com/greenglobal/zato-demo Zato homepage: https://zato.io Tutorial: "Zato โ€” a powerful Python-based ESB solution for your SOA" http...',
  content: '<iframe src="https://www.youtube.com/embed/tRGJj59G1x4?feature=oembed" frameborder="0" allowfullscreen></iframe>',
  image: 'https://i.ytimg.com/vi/tRGJj59G1x4/hqdefault.jpg',
  author: 'Dong Nguyen',
  source: 'YouTube',
  domain: 'youtube.com',
  publishedTime: '',
  duration: 292
}

extractWithEmbedly(String url [, String EmbedlyKey])

Extract article data from specified url using Embedly Extract API:

The second parameter is optional. If you've added your Embedly key via configure() method, you can ignore it here.

var {
  extractWithEmbedly
} = require('article-parser');

let url = 'https://goo.gl/MV8Tkh';

extractWithEmbedly(url).then((article) => {
  console.log(article);
}).catch((err) => {
  console.log(err);
});

getConfig()

Return the current configurations.

Test

git clone https://github.com/ndaidong/article-parser.git
cd article-parser
npm install
npm test

License

The MIT License (MIT)

article-parser's People

Contributors

ndaidong avatar alanhoff avatar simsim0709 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.