Comments (5)
Thank you for the support.
The best place to start is this notebook: https://colab.research.google.com/github/neuml/txtai/blob/master/examples/01_Introducing_txtai.ipynb
The index in the notebook above uses sentence-transformers. This link has a list of all the sentence transformer models available: https://huggingface.co/models?search=sentence-transformers
The following is an example modification using a multi-lingual model
# Create embeddings model, backed by sentence-transformers & transformers
embeddings = Embeddings({"method": "transformers", "path": "sentence-transformers/xlm-r-100langs-bert-base-nli-stsb-mean-tokens"})
If this doesn't work well, another model to try: sentence-transformers/LaBSE
Then change the sections text below to a couple examples in the target language you want to experiment with.
sections = ["US tops 5 million confirmed virus cases",
"Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
"Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
"The National Park Service warns against sacrificing slower friends in a bear attack",
"Maine man wins $1M from $25 lottery ticket",
"Make huge profits without work, earn up to $100,000 a day"]
Finally change the queries to the target language as well:
for query in ("feel good story", "climate change", "health", "war", "wildlife", "asia", "north america", "dishonest junk"):
I would just iterate over a couple different models until you find one that works well. Let me know how it works out.
from txtai.
I appreciate for your feedback and concern. I will try your recommendation as soon as I'm available, then I will let you know.
from txtai.
Hello again, now I tried to transformers that you suggested. I tried both of them for Turkish. First one worked in some cases but it is not efficient. Second one didn't work. I need Turkish language support of transformers
from txtai.
Another one to try: sentence-transformers/distilbert-multilingual-nli-stsb-quora-ranking
Otherwise you can also try any of the generic Turkish transformer models: https://huggingface.co/models?search=turkish
Those multilingual models are the ones that should support multiple languages. I suspect a model specifically trained for Turkish language and on a NLI/STSB like task for Turkish would work best.
txtai uses the sentence-transformers library to build transformer-based sentence embeddings. I would suggest trying as many of the models to see if any of them work at an acceptable level for your task.
from txtai.
Issue should now be resolved. Tokenization can be disabled by setting the config option:
Embeddings({"method": "transformers", path: "/path/to/model", "tokenize": False})
from txtai.
Related Issues (20)
- Cannot get it to work on M2 Mac HOT 3
- Fix issue with hardcoded autoawq version in example notebooks HOT 5
- AWQ is only available on GPU - no LLM instanciation possible HOT 6
- ImportError: NetworkX is not available - install "graph" extra to enable HOT 4
- Segmentation fault HOT 2
- API deps missing Pillow
- Add indexids only search
- Create temporary tables once per database session
- Add batch node and edge creation for graphs
- Add notebook on Retrieval Augmented and Guided Generation (RAGG)
- [Feature Request]: Auto-save during indexing HOT 3
- Split similarity extras install
- Cuda error on initialzing Embedding instance in a spawned subprocess aka a celery background task. HOT 2
- Add pgvector ANN backend
- Add RDBMS Graph
- New to txtai, some general questions HOT 2
- Add notebook covering txtai integration with Postgres
- 60th example error with litellm LLM HOT 1
- Configuration documentation update request HOT 1
- UNIQUE constraint failed: sections.indexid HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from txtai.