GithubHelp home page GithubHelp logo

ktodorov / go-summarizer Goto Github PK

View Code? Open in Web Editor NEW
8.0 3.0 2.0 153 KB

Summarize text and websites and optionally saves the data to a local file

License: MIT License

Go 100.00%
summarizer readability html-parsing parser

go-summarizer's Introduction

go-summarizer

This is a Go library for summarizing text and websites and optionally saving the data to a local file

License MIT

Installing

go get github.com/ktodorov/go-summarizer

Creating Summarizer instance

From text

var unsummarizedText = "unsummarized text"
var s = CreateFromText(unsummarizedText)

From website url

var urlToSummarize = "http://testurl.test/"
var s = CreateFromURL(urlToSummarize)

Supported methods

Summarize

var customNewsStoryURL = `https://techcrunch.com/2017/01/14/spacex-successfully-returns-to-launch-with-iridium-1-next-falcon-9-mission/`

var s = CreateFromURL(customNewsStoryURL)
summary, err := s.Summarize()
if err != nil {
	fmt.Println("Error occurred: ", err.Error())
    return
}

fmt.Println(summary)

Output*:

SpaceX successfully returns to launch with Iridium-1 NEXT Falcon 9 mission

It’s a huge victory for SpaceX, which has had to delay its launch schedule since the explosion. The launch also resulted in a successful recovery of the Falcon 9 rocket’s first stage, which marks the seventh time SpaceX has succeed in landing this stage back for potential later re-use It’s also a green light for SpaceX in terms of the company pursuing its aggressive launch schedule, which is something the private launch provider needs to do in order to continue locking in new contracts and working towards its goal of decreasing the cost of launches even further still. In 2016, SpaceX completed only 8 of a planned 20 launches, due to the September 1 explosion that halted all new launches for four months SpaceX also had to push back its timelines for test launches of its Dragon crew capsule as a result of the September incident It also sets the stage for SpaceX’s future goals of providing missions to Mars, with a target initial date for those aspirations still set for 2024. All satellites were successfully deployed as of 11:13 AM PT / 2:12 PM PT, signalling a successful mission for the space company’s first flight back.

*Note that it first prints the title of the web page if there is such

GetSummaryInfo

var s = CreateFromText("first sentence. second sentence")
s.Summarize()
summaryInfo, err := s.GetSummaryInfo()
if err != nil {
	fmt.Println("Error occurred: ", err.Error())
}

fmt.Println(summaryInfo)

Output:

Summary info:
- Original length: 31 symbols
- Summary length: 14 symbols
- Summary ratio: 54.84%

IsSummarized

var s = CreateFromText("first sentence. second sentence")
fmt.Println("Before summarizing: ", s.IsSummarized())
s.Summarize()
fmt.Println("After summarizing: ", s.IsSummarized())

Output:

Before summarizing: false
After summarizing: true

StoreToFile

var s = CreateFromText("first sentence. second sentence")
s.Summarize()
stored, err := s.StoreToFile("some/path/to/file.txt")
if err != nil {
	fmt.Println("Error occurred: ", err.Error())
}

fmt.Println(stored)

Output:

true

*Currently supported file types: txt and pdf

go-summarizer's People

Contributors

ktodorov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

go-summarizer's Issues

Segmentation fault with particular URL

I've tried using your library with the following url

http://www.capital.bg/politika_i_ikonomika/bulgaria/2017/02/17/2919580_zashto_izstinaha_trubite_na_toplofikaciia_sofiia/

And it crashed with a segfault panic:

$ ./go-summarizer "http://www.capital.bg/politika_i_ikonomika/bulgaria/2017/02/17/2919580_zashto_izstinaha_trubite_na_toplofikaciia_sofiia/"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0xa5377]

goroutine 1 [running]:
panic(0x286e00, 0xc4200100c0)
	/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/ktodorov/go-summarizer/helpers.replaceBrs(0xc420243420, 0x2d2fec, 0x4, 0xc420243420)
	/Users/iron4o/gopath/src/github.com/ktodorov/go-summarizer/helpers/htmlHelper.go:275 +0x77
github.com/ktodorov/go-summarizer/helpers.getMainInfoFromHTML(0xc42021e000, 0x1ac00, 0xc42021e000, 0x1ac00, 0x0, 0x0, 0x0, 0x28, 0x2a3380, 0xc420132380, ...)
	/Users/iron4o/gopath/src/github.com/ktodorov/go-summarizer/helpers/htmlHelper.go:369 +0x134
github.com/ktodorov/go-summarizer/helpers.ExtractMainInfoFromURL(0x7fff5fbffa18, 0x78, 0x12, 0x0, 0x0, 0x0, 0x0, 0xa0, 0x2fa1ed, 0x2fa1ec, ...)
	/Users/iron4o/gopath/src/github.com/ktodorov/go-summarizer/helpers/urlHelper.go:30 +0xf7
github.com/ktodorov/go-summarizer.(*Summarizer).GetMainTextFromURL(0xc42004bed8, 0x90, 0xa0, 0xc4200ab0e0, 0x0)
	/Users/iron4o/gopath/src/github.com/ktodorov/go-summarizer/summarizer.go:72 +0x66
github.com/ktodorov/go-summarizer.(*Summarizer).Summarize(0xc42004bed8, 0x10e38, 0xa0, 0xc42009b658, 0x1)
	/Users/iron4o/gopath/src/github.com/ktodorov/go-summarizer/summarizer.go:44 +0x1f3
main.main()
	/Users/iron4o/playfield/go-summarizer/main.go:15 +0xb6

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.