Comments (8)
hmm, wonder if there's a good way we can pull those down and cache them if they're requested, rather than adding them all to the repository. Or just generally adding the ability to pull a stopwords list from a url...
from python-rake.
from python-rake.
Well, if you went the URL route I'd thought you'd provide a URL and separation regex, so like
RAKE.load_stopwords('http://example.com/beststopwords', re.compile('super-cool-regex'))
so it wouldn't matter how they formatted it so long as it was a list of some kind. Just feel like it would be convenient, especially if you were just hacking/prototyping and wanted to experiment with different stoplists, without requiring you to download/format them manually.
from python-rake.
Interesting. You may be right that that's a useful feature and I don't see it, but I've never seen someone who wanted to do that as a data scientist. Also it'd require more than just a regex for the vast majority of sites--it'd require playing around in beautiful soup or something too. The way I've seen everyone do it because it's always been the fastest has been to copy and paste into ipython and do some quick for loop.
from python-rake.
It looks like this project has amassed a large group of stopwords lists from a variety of sources, do you think we could leverage this work?
https://github.com/igorbrigadir/stopwords
from python-rake.
For posterities sake:
Hi Justin,
Thanks for asking.
Yes you can use our stopword lists if you credit 'ranks.nl'
Does your script work with HTML documents or text without markup only ?
If HTML, I'm curious if you've had a chance to test the results from the Page Analyzer tool on ranks.nl ?
It is basically a tool for Automatic Keyword Extraction from Individual HTML Documents.
Kind regards,
Damian Doyle
Ranks NL
On Tue, Aug 1, 2017 at 10:02 PM, Justin Terry [email protected] wrote:
Hello, I'm working on an MIT licensed open source natural language processing tool in python: https://github.com/fabianvf/python-rake
Can I include your stop word lists into the package by default if I credit you?
--
Thank you for your time,
Justin Terry
from python-rake.
@fabianvf please close this, I fixed this in my last PR that you merged and forgot to mention it.
from python-rake.
nevermind apparnetly i can now
from python-rake.
Related Issues (20)
- Would be nice to example usage to show how to get to the included stoplists HOT 3
- Parameters for configuring word length, phrase length, vector weight. HOT 9
- README.md Issue With Python 3.5 HOT 1
- Change Run Syntax HOT 3
- You Uploaded Your Private PyPI Info in The Repository HOT 1
- Imports broken in python3 HOT 15
- Badly need tests HOT 13
- Add CSV Support HOT 5
- seperate_words function based on \W+ re instead? HOT 5
- Adding an Asterisk * to StopWords HOT 15
- All phrases scored as 1.0? HOT 11
- Rake.split_sentences(text) uses 'u' as separator HOT 3
- Unexpected results for german text with umlauts HOT 4
- word with "-" inside not found HOT 5
- i have to remove noise and make more smart RAKE ... please give me suggestions..
- Filter results by word/phrase category HOT 2
- Scaling to massive datasets HOT 3
- How can I get the resulting keywords? HOT 1
- New release for python 3 support HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-rake.