Comments (7)
I'm not sure. Where did you see that: "it cannot be used in any commercial application as the license of the data only allows it to be used for educational/research purposes?"
from python-wordsegment.
I've wanted to use the trillion corpus before... so I remember it from then.
You link to: https://catalog.ldc.upenn.edu/LDC2006T13 which under license contains a link to: https://catalog.ldc.upenn.edu/license/web-1t-5-gram-version-1.pdf
Then look at section 1.1 and 1.2 (can't copy paste it).
from python-wordsegment.
Hmm, that looks conclusive. I don't think I got it from LDC though. I thought I got it from the book publisher but I can't find that data now. And likely the publisher would have inherited the same restriction.
Perhaps I should make that more clear on the landing page.
How do you want to use it?
from python-wordsegment.
There's also corpus data at http://storage.googleapis.com/books/ngrams/books/datasetsv2.html
from python-wordsegment.
That one also mentions Creative Commons, which is not fully compatible with Apache I believe (w.r.t. commercial use). It's fun for experimentation though.
from python-wordsegment.
When I click the link it says: "You are free to: Adapt — remix, transform, and build upon the material
for any purpose, even commercially.
from python-wordsegment.
Oh my bad! I see that this is a different version. Great! I will have a go myself :)
from python-wordsegment.
Related Issues (20)
- Buffering issue in main() HOT 5
- ZeroDivisionError HOT 2
- max() arg is an empty sequence HOT 1
- How to add custom values? HOT 1
- unigrams HOT 1
- import error HOT 2
- Return a list of the most probable segmentations. HOT 3
- Text with numbers doesn't segment as expected HOT 3
- `exhilarate` does not segment as expected HOT 1
- Can I use this from C or C++? HOT 1
- Please allow separation of numbers from text HOT 1
- Corpus python HOT 1
- russian language HOT 2
- Training on new, modern data. HOT 1
- feature_request(mode): preserve all punctuation marks HOT 2
- Support for Other Languages HOT 2
- RecursionError on segment call HOT 6
- 'helloworld' does not segment as expected HOT 3
- Words segmenting in one direction, but not another. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-wordsegment.