GithubHelp home page GithubHelp logo

Add More Stopword Lists about python-rake HOT 8 CLOSED

fabianvf avatar fabianvf commented on September 1, 2024
Add More Stopword Lists

from python-rake.

Comments (8)

fabianvf avatar fabianvf commented on September 1, 2024

hmm, wonder if there's a good way we can pull those down and cache them if they're requested, rather than adding them all to the repository. Or just generally adding the ability to pull a stopwords list from a url...

from python-rake.

jkterry1 avatar jkterry1 commented on September 1, 2024

from python-rake.

fabianvf avatar fabianvf commented on September 1, 2024

Well, if you went the URL route I'd thought you'd provide a URL and separation regex, so like

RAKE.load_stopwords('http://example.com/beststopwords', re.compile('super-cool-regex'))

so it wouldn't matter how they formatted it so long as it was a list of some kind. Just feel like it would be convenient, especially if you were just hacking/prototyping and wanted to experiment with different stoplists, without requiring you to download/format them manually.

from python-rake.

jkterry1 avatar jkterry1 commented on September 1, 2024

Interesting. You may be right that that's a useful feature and I don't see it, but I've never seen someone who wanted to do that as a data scientist. Also it'd require more than just a regex for the vast majority of sites--it'd require playing around in beautiful soup or something too. The way I've seen everyone do it because it's always been the fastest has been to copy and paste into ipython and do some quick for loop.

from python-rake.

fabianvf avatar fabianvf commented on September 1, 2024

It looks like this project has amassed a large group of stopwords lists from a variety of sources, do you think we could leverage this work?
https://github.com/igorbrigadir/stopwords

from python-rake.

jkterry1 avatar jkterry1 commented on September 1, 2024

For posterities sake:

Hi Justin,

Thanks for asking.
Yes you can use our stopword lists if you credit 'ranks.nl'

Does your script work with HTML documents or text without markup only ?

If HTML, I'm curious if you've had a chance to test the results from the Page Analyzer tool on ranks.nl ?
It is basically a tool for Automatic Keyword Extraction from Individual HTML Documents.

Kind regards,
Damian Doyle
Ranks NL

On Tue, Aug 1, 2017 at 10:02 PM, Justin Terry [email protected] wrote:
Hello, I'm working on an MIT licensed open source natural language processing tool in python: https://github.com/fabianvf/python-rake

Can I include your stop word lists into the package by default if I credit you?

--
Thank you for your time,
Justin Terry

from python-rake.

jkterry1 avatar jkterry1 commented on September 1, 2024

@fabianvf please close this, I fixed this in my last PR that you merged and forgot to mention it.

from python-rake.

jkterry1 avatar jkterry1 commented on September 1, 2024

nevermind apparnetly i can now

from python-rake.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.