sszuev / fasttext_java Goto Github PK
View Code? Open in Web Editor NEWThis project forked from ivanhk/fasttext_java
Java port of c++ version of facebook fasttext
License: Other
This project forked from ivanhk/fasttext_java
Java port of c++ version of facebook fasttext
License: Other
Hello,
I want to use your version of fasttext in my poject, but i want to check before if this version handle the OOV words.
Thank in advance for your confirmation on how your code deals with the OOV during test phase
Regards
Hi, trying it to generate embeddings of a medium sized corpus like http://mattmahoney.net/dc/enwik8.zip does not train with less 12g of memory, aside of that the speed is 9 words/sec resulting in thousands of hours, while C++ implementation does it in 10min.
Have you ever tried to train such a big corpus?
Hi there,
I run into a compile error:
FastText.java:860: error: cannot find symbol
map.forEach((f, i) -> res.put(dict.getLabel(i), f));
^
symbol: method forEach((f,i)->res[...]), f))
location: variable map of type TreeMultimap<Float,Integer>
Any hint? Thanks!
which was changed in pull request #8
at least, there is duplicated #toString()
call; that is not optimal
Hi. Thanks for your work on keeping java version of fastText up-to-date.
I have an issue.
When I train model in supervised mode using pretrained vectors (wiki-news-300d-1M.vec) and then use it to classify sentences, it's result is quite different from original facebook implementation. This happens because you don't add words to dictionary in this loop in FastText.java (lines 1371-1386):
for (int i = 0; i < n; i++) {
String line = in.readLine();
String word;
String[] array;
if (StringUtils.isEmpty(line) || (array = line.split(" ")).length == 0 || StringUtils.isEmpty(word = array[0])) {
throw new IllegalArgumentException("Wrong line: " + line);
}
List<Float> numbers = Arrays.stream(array).skip(1).limit(dim + 1).map(Float::parseFloat).collect(Collectors.toList());
if (numbers.size() < dim)
throw new IllegalArgumentException("Wrong numbers in the line: " + numbers.size() + ". Expected " + dim);
words.add(word);
for (int j = 0; j < numbers.size(); j++) {
mat.set(i, j, numbers.get(j));
}
}
Could you please fix it?
Hi there,
Is there a roadmap for an non snapshot release into jitpack or other maven repo?
Cheers
George
On line 76 of src/main/java/cc/fasttext/extra/ExraMain.java
, org.apache.hadoop.hdfs
does not exist
i have model trained by c++ version fastText and the model is used for predicting in java version. But the two results is much different and the result of java version is obviously wrong.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.