GithubHelp home page GithubHelp logo

transitive-bullshit / text-summarization Goto Github PK

View Code? Open in Web Editor NEW
65.0 4.0 20.0 334 KB

Automagically generates summaries from html or text.

JavaScript 99.03% Shell 0.97%
summarization text summary summarize extractive-summarization extractive-text-summarization

text-summarization's Introduction

text-summarization

Automagically generates summaries from html or text.

NPM Build Status JavaScript Style Guide

Intro

This module powers Automagical's text summarization, which was acquired by Verblio in 2018.

It provides the most powerful and comprehensive text summarization available on NPM.

Features

  • Uses a variety of metrics to generate quality extractive text summaries
  • Handles html or text-based content
  • Utilizes html structure as a signal of text importance
  • Includes basic abstractive shortening of extracted sentences
  • Usable as a node module or cli
  • Thoroughly tested and used in production

Install

This module is usable either as a CLI or as a module.

npm install --save text-summarization

Usage

const summarize = require('text-summarization')

const fs = require('fs')
const html = fs.readFileSync('fixtures/automagical-1.html')

const summary = await summarize({ html })
console.log(JSON.stringify(summary, null, 2))

which outputs:

{
  "extractive": [
    "Why you should drop everything and try Automagical",
    "Video content is significantly more engaging than text content",
    "Go from blog post → video in 5 minutes.",
    "Our builder is exceptionally easy to use.",
    "For the cost of 1 highly produced video, you can get a year's worth of videos from Automagical."
  ]
}

CLI

npm install -g text-summarization

This installs a summarize binary globally.

  Usage: summarize [options] <file>

  Options:
    -V, --version              output the version number
    -n, --num-sentences <n>    number of sentences (defaults to variable length)
    -t, --title <title>        title
    -c, --content-type <type>  sets content type to html or text
    -d, --detailed             print detailed info for top sentences
    -D, --detailedAll          print detailed info for all sentences
    -m, --media                resolve <a> links using iframely and return best matching media
    -P, --no-pretty-print      disable pretty-printing output
    -h, --help                 output usage information

Metrics

  • tfidf overlap for base relative sentence importance
  • html node boosts for tags like <h1> and <strong>
  • listicle boosts for lists like 2) second item
  • penalty for poor readability or really long sentences

Here's an example of a sentence's internal structure after normalization, processing, and scoring:

{
  "index": 8,
  "sentence": {
    "original": "4. For the cost of 1 highly produced video, you can get a year's worth of videos from Automagical.",
    "listItem": 4,
    "actual": "For the cost of 1 highly produced video, you can get a year's worth of videos from Automagical.",
    "normalized": "for the cost of 1 highly produced video you can get a years worth of videos from automagical",
    "tokenized": [
      "cost",
      "highly",
      "produced",
      "video",
      "years",
      "worth",
      "videos",
      "automagical"
    ]
  },
  "liScore": 1,
  "nodeScore": 0.7,
  "readabilityPenalty": 0,
  "tfidfScore": 0.8019447657605553,
  "score": 5.601944765760555
}

Iframely

This module optionally supports using iframely to get social previews for any external links in the source html, adding the resulting images and summary text to the source pool of candidate sentences.

To enable this, set the IFRAMELY_BASE_URL and IFRAMELY_API_KEY environment variables.

References

License

MIT © Travis Fischer

Support my OSS work by following me on twitter twitter

text-summarization's People

Contributors

mayrmartin avatar transitive-bullshit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

text-summarization's Issues

Text Summarization Example

Hi @transitive-bullshit , I was having trouble using the text-summarization package for summarizing text files instead of HTML. So, I thought I would reach out to you. Could you please give an example of how exactly to use this package on a text document? I would really appreciate the help! 😊

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.