centre-for-humanities-computing / tweetopic Goto Github PK
View Code? Open in Web Editor NEWBlazing fast topic modelling for short texts.
Home Page: https://centre-for-humanities-computing.github.io/tweetopic/
License: MIT License
Blazing fast topic modelling for short texts.
Home Page: https://centre-for-humanities-computing.github.io/tweetopic/
License: MIT License
Empty components cannot be displayed because of pyLDAvis's validation steps.
Indices do not align with the indices of the topics in the model.
Setting random seed for numpy.random is not enough to make model fitting reproducible. My experiments indicate that one has to set seed for both numpy and numba (njit) to make DMM model fitting reproducible.
Based on the documentation of numba (https://numba.readthedocs.io/en/stable/reference/pysupported.html#pysupported-random), I added the following code which seems to solve the issue:
`from numba import njit
@njit
def seed(a):
np.random.seed(a)
np.random.seed(1234)
seed(1234)`
I recommend that this should at least be documented for other users.
I observed this behavior in training DMM models. I have not examined whether it also applies to BTM.
This article could help: https://doi.org/10.1016/j.patrec.2017.03.009
As discussed in the ML team meeting last week, there is a dependency clash with topic-wizard
related to numpy
versions.
Running tweetopic
and then visualising the results using topic-wizard
should reproduce the kind of output described in the README.md.
In a virtual environment, installing tweetopic
via pip
and then installing topic-wizard
results in the following error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tweetopic 0.2.0 requires numpy<1.23.0,>=1.22.4, but you have numpy 1.24.2 which is incompatible.
tweetopic 0.2.0 requires scikit-learn<1.2.0,>=1.1.1, but you have scikit-learn 1.2.1 which is incompatible.
numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.24.2 which is incompatible.
This means that the workflow outlined in README.md is not possible.
(The inverse is obviously the case, if topic-wizard
is installed before tweetopic
.)
Not a possible solution, but I'm wondering why exactly the two packages require different versions of numpy
in the first place. Have there been breaking changes in numpy
between minor versions which mean that the two packages cannot use the same dependencies? I haven't had a chance to dig into the code, so might be missing something!
Error has been reproduced on Ubuntu and MacOS, and for Python versions 3.9, 3.10, and 3.11.
python3.9 -m venv env
pip install --upgrade pip
source .env/bin/activate
pip install tweetopic
pip install topic-wizard
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.