bhoov / exbert Goto Github PK
View Code? Open in Web Editor NEWA Visual Analysis Tool to Explore Learned Representations in Transformers Models
Home Page: http://exbert.net
License: Apache License 2.0
A Visual Analysis Tool to Explore Learned Representations in Transformers Models
Home Page: http://exbert.net
License: Apache License 2.0
Hi all,
I am training a transformer model to predict chemical reactions from chemical molecules in string representation.
Does your project support plug-n-chug for models not trained on languages?
If not, any pointers on how I should proceed?
Awesome tool! Can't wait to dig into it more.
The options for selecting corpus search on the demo page aren't available to click on (when I do so, I get a red circle with a slash through it). I'm using chrome.
Am I missing something?
Hi, thanks for the corpus creating scripts. They're very helpful.
I got an index error with BPE/spacy tokenization as shown below:
Extracting embeddings into /felicity/workspace/exbert/server/data/mt/deepak/embeddings/embeddings.hdf5
['deep', '##ak', 'i', 'don', '’', 't', 'feel', 'like', 'doing', 'my', 'meditation', 'today', '.']
['deepak', 'i', 'do', 'n’t', 'feel', 'like', 'doing', 'my', 'meditation', 'today', '.']
11
0
Deepak I don’t feel like doing my meditation today.
Traceback (most recent call last):
File "/felicity/workspace/exbert/server/data/processing/create_corpus.py", line 19, in <module>
create_hdf5.main(unique_sent_pckl, args.outdir, args.force)
File "/felicity/workspace/exbert/server/data/processing/create_hdf5.py", line 221, in main
sentences_to_hdf5(embedding_extractor, str(embedding_outpath), sentences, clear_file=force)
File "/felicity/workspace/exbert/server/data/processing/create_hdf5.py", line 179, in sentences_to_hdf5
b_pos = combine_tokens_meta(b_tokens, s_tokens, s_pos)
File "/felicity/workspace/exbert/server/utils/token_processing.py", line 121, in combine_tokens_meta
meta_list.append(spacy_meta[j])
IndexError: list index out of range
In IndexError designed to raise under certain circumstances? If not how can I solve it?
Thank you very much.
Originally posted by @felicitywang in #4 (comment)
Today I was trying to generate embeddings with the create_corpus.py
script and the respective hdf5/faiss scripts. Initially things were working but in an attempt to try and figure out how to increase the max input length of text, I ended up re-pulling the repo.
What ensued, and after multiple varying attempts to create brand new environments, was this error regarding from_pretrained
and transformer_details
from the ` hdf5 script. There may also be an issue with the faiss script too, but I haven't tried it.
Any idea what's causing this? Could it be a dependency issue from one of the packages that didn't have a specified version indicated? I've attached the environment package information below
exbert_dependencies.txt
Is there any example about the format of the txt file? Because i always got an error in the file "create_faiss.py".
Creating Embedding faiss files in /home/ubuntu/will/exbert/server/data/processing/embeddings from /home/ubuntu/will/exbert/server/data/processing/embeddings/embeddings.hdf5
Traceback (most recent call last):
File "create_faiss.py", line 145, in <module>
main(args.directory)
File "create_faiss.py", line 114, in main
embedding_ce = CorpusEmbeddings(str(embedding_hdf5))
File "/home/ubuntu/will/exbert/server/data/processing/corpus_embeddings.py", line 304, in __init__
self.embedding_dim = self.embeddings['0001'].shape[-1]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/ubuntu/anaconda3/envs/exbert/lib/python3.6/site-packages/h5py/_hl/group.py", line 264, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object '0001' doesn't exist)"
I have no idea about why the key error happened... Or someone can tell me how can i solve the problem. thanks very much.
Hi, thanks for your project.
I had a simple question, I didn't find answer to this query in the README
. I want to use bert-base-uncased
and bert-large-uncased
with my weights (trained on a GLUE task) and want to visualize what you showed in demo on RHS (which token the model chooses to pay attention to). Could you please provide the recommended steps ?
Hey
I can't install it using node version 19 as it is only supported for node version 9.
It would be a great help if you can provide a solution so that I can integrate it in another React.js project with node version 19. I need it for an essential project.
Hej,
thank you for your excellent work!
Could you please tell a little more about the process of getting embeddings for the reference corpus? It was not clear from the paper how your models 'processed this corpus' and gave you embeddings. Also, maybe you can point me to the code where I can see this process happening?
My interpretation is the following one: so, to get both token and context embeddings, you are basically:
Thank you.
This site can’t be reached exbert.net
took too long to respond.
First, thanks for the work you've done on this, it's really neat.
I've recently tried to implement the project with the Wizard of Oz corpus and am intending on implementing with my own text data, but have run into several issues.
Side note: while figuring all of the below out I successfully created my embeddings for woz
corpus.
Package versioning
First off, is a package versioning issue. The environment.yml
specifies our required packages, and more specifically points to connexion=1.5.3
. If I use conda to build my environment, and then install the required spacy
package you mention, getting the flask app to build fails with the error outlined here: spec-first/connexion#1149
According to that issue, and after inspecting versions of packages, the werkzeug
<-> connexion
integration is the culprit. I was able to get around this in several ways.
connexion
version restriction I get past the initial error, but then run into a problem associated with line 24 of main.py
: app = connexion.FlaskApp(__name__, static_folder='client/dist', specification_dir='.')
. In this case the static_folder
command is no longer a part of werkzeug
(which is now v1.0.1
).connexion
and then pinned werkzeug=0.16.1
. This also does not make it through the running of python server/main.py
due to the reason mentioned in 1. Commenting out static_folder
can bypass the issue, but obviously this isn't preferable/I'm sure there is a reason why you've included it there.connexion=1.5.3
and werkzeug=0.16.1
, and made sure that line 24 of the main.py
was set to it's original values, and this starts the server. ✅Accessing the web server
Initially I had Node.js v14 installed. But recompiling the frontend kept failing continuously. Finally, I was able to do npm install
by downgrading to Node v 10.16.3 due to some dependency issues regarding node-gyp
. I found the suggestion via: nodejs/node-gyp#1906
After this, the npm run build
fails, I've attached my terminal outputs as well as the associated log if there are any suggestions:
npm_install_and_build.log
2020-05-14T21_34_31_166Z-debug.log
Both before and after recompiling the frontend I am running into my final issue. Which was touched upon to some extent in another issue #6
This is what I'm seeing:
So long story short, do you have any suggestions? Addtionally, what should replace those demo lines that are referenced in the logs?
Thanks so much!
The attention block doesn't load when running locally.
terminal log
Initializing reference map for embedding vector...
Initializing reference map for embedding vector...
AFTER SETUP
Initiating app
* Serving Flask app "main" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://0.0.0.0:5555/ (Press CTRL+C to quit)
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET / HTTP/1.1" 302 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/exBERT.html HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/main.css HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/img/exBERT.png HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/main.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/vendor.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/05ca9c06114e79436ea9b5c8d4a7869c.ttf HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/4171e41154ba857f85c536f167d581ba.ttf HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/demo/354992f2ee236604c874a3a627e4042bc68586f8.json HTTP/1.1" 404 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/img/exBERT_favicon.png HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:50:07] "GET /client/exBERT.html?sent-a-input=hello+world HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:50:07] "GET /client/demo/354992f2ee236604c874a3a627e4042bc68586f8.json HTTP/1.1" 404 -
Does running locally require compiling frontend? There're some npm dependencies that I can't solve either on linux(ubuntu 18.04) or macos:
npm WARN optional Skipping failed optional dependency /chokidar/fsevents:
npm WARN notsup Not compatible with your operating system or architecture: [email protected]
npm WARN [email protected] requires a peer of ava@0.* but none was installed.
npm WARN [email protected] requires a peer of ts-node@^3.0.0 || ^4.0.1 || ^5.0.0 || ^6.0.0 || ^7.0.0 but none was installed.
npm WARN [email protected] requires a peer of tslint@^4.0.0 || ^5.0.0 but none was installed.
Thank you.
Running the Makefile still requires me to run pip install -e server/spacyface
after the makefile completes the run. Otherwise I get an error about spacy
not being present.
This happens even though the output of the makefile does show that the spacy
installation was successful.
Hi,
Is it possible that you provide some example data files? If I run the backend, it prompts errors such as exbert/server/data/woz/embeddings/combined.hdf5', errno = 2, error message = 'No such file or directory'
. I tried to use the data tools to create such embeddings, but I encountered multiple errors during that process as well.
Thank you.
Traceback (most recent call last):
File "create_hdf5.py", line 9, in <module>
from utils.token_processing import (
ModuleNotFoundError: No module named 'utils.token_processing'
i got the error while i was dealing with the data for server. i installed utils with "pip install utils", but still got the problems. i was wondering if someone can help me...
thanks so much!!
Hi,
Thank you so much for this awesome project! I'm trying to visualize the attention weights of the model I trained (RoBERTa model with a multi-task final layer, fine-tuned using HuggingFace) and I'm wondering if that is supported by exBERT.
Also, I'm wondering if there will be any modification of the code needed if I trained my model using HuggingFace version later than v2.8. Thanks a lot!
Hi, I run 'python server/main.py' and I got this error after open the server address:
Traceback (most recent call last):
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/uvicorn/protocols/http/h11_impl.py", line 369, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 59, in _call_
return await self.app(scope, receive, send)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/fastapi/applications.py", line 199, in _call_
await super()._call_(scope, receive, send)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/applications.py", line 112, in _call_
await self.middleware_stack(scope, receive, send)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in _call_
raise exc from None
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in _call_
await self.app(scope, receive, _send)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/middleware/cors.py", line 78, in _call_
await self.app(scope, receive, send)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in _call_
raise exc from None
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in _call_
await self.app(scope, receive, sender)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/routing.py", line 580, in _call_
await route.handle(scope, receive, send)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/routing.py", line 241, in handle
await self.app(scope, receive, send)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/routing.py", line 52, in app
response = await func(request)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/fastapi/routing.py", line 202, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/fastapi/routing.py", line 148, in run_endpoint_function
return await dependant.call(**values)
File "/home/stefano-spindola/Documentos/rag-test/exbert/server/main.py", line 140, in get_model_details
deets = aconf.from_pretrained(model)
File "/home/stefano-spindola/Documentos/rag-test/exbert/server/main.py", line 93, in from_pretrained
return get_details(model_name)
File "/home/stefano-spindola/Documentos/rag-test/exbert/server/transformer_details.py", line 31, in get_details
return ModelDetails(mname)
File "/home/stefano-spindola/Documentos/rag-test/exbert/server/transformer_details.py", line 44, in _init_
self.model, self.aligner = get_model_tok(self.mname)
File "/home/stefano-spindola/Documentos/rag-test/exbert/server/utils/f.py", line 74, in helper
memo[x] = f(*x)
File "/home/stefano-spindola/Documentos/rag-test/exbert/server/transformer_details.py", line 36, in get_model_tok
tok = auto_aligner(mname, config=conf)
File "/home/stefano-spindola/Documentos/rag-test/exbert/server/spacyface/spacyface/aligner.py", line 263, in auto_aligner
return MakeAligner(tok_class, english).from_pretrained(pretrained_name_or_path, config=config)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 911, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 1065, in _from_pretrained
"Unable to load vocabulary from file. "
OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted.
How could I fix this?
Wanted to try Corpus View on the live demo page https://exbert.net/
but when I click the buttons
it seems to be getting POST https://exbert.net/api/k-nearest-embeddings 500 (Internal Server Error)
Hi!, I hope you are doing good.
I'm having broblems to run the server/main.py after installing as default with makefile.
The Traceback is:
The above exception was the direct cause of the following exception: Traceback (most recent call last): File "server/main.py", line 20, in <module> from data_processing import from_model, from_base_dir File "/home/jose/Codes/Work/GUANE/AVIGAIL/exBERT/exbert/server/data_processing/__init__.py", line 1, in <module> from .corpus_data_wrapper import CorpusDataWrapper File "/home/jose/Codes/Work/GUANE/AVIGAIL/exBERT/exbert/server/data_processing/corpus_data_wrapper.py", line 6, in <module> from spacyface.simple_spacy_token import SimpleSpacyToken File "/home/jose/miniconda3/envs/exbert/lib/python3.7/site-packages/spacyface/__init__.py", line 1, in <module> from .aligner import ( File "/home/jose/miniconda3/envs/exbert/lib/python3.7/site-packages/spacyface/aligner.py", line 8, in <module> from transformers import ( File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist File "/home/jose/miniconda3/envs/exbert/lib/python3.7/site-packages/transformers/file_utils.py", line 2271, in __getattr__ module = self._get_module(self._class_to_module[name]) File "/home/jose/miniconda3/envs/exbert/lib/python3.7/site-packages/transformers/file_utils.py", line 2285, in _get_module ) from e RuntimeError: Failed to import transformers.models.bert because of the following error (look up to see its traceback): Failed to import transformers.modeling_utils because of the following error (look up to see its traceback): module 'torch' has no attribute 'BoolTensor'
Thanks in advance for any help!
Regards
Whenever I try to use the commands in https://github.com/bhoov/exbert/blob/master/server/data_processing/README.md to create a corpus with a text file as input. (P.S. I have tried all types of text files. Big, small, english, non english.) While creating the hdf5 file, the following error always comes up:
"TypeError: Can't implicitly convert non-string objects to strings"
Thanks for the library, this is very useful.
I want to use this to analyze attention for question answering models, like BertForQuestionAnswering
models from Huggingface, and they are (automatically) cast into AutoModelWithLMHead
, which is great, so it mostly works out of the box.
However, I want to use the [SEP]
tokens in the middle of my sentence, and the spacyface aligner breaks it down to something like [ SE ##P ]
. I wrote a post-processing function to patch these together during the tokenization, but that doesn't work. I never see any attention on the [SEP]
token (which is not true). Plus it also breaks it down again (probably happening elsewhere in the code).
What
is
the
effect
of
[SEP]
[
SE
##P
]
and
what
[SEP]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.