GithubHelp home page GithubHelp logo

bhoov / exbert Goto Github PK

View Code? Open in Web Editor NEW
575.0 575.0 51.0 13.39 MB

A Visual Analysis Tool to Explore Learned Representations in Transformers Models

Home Page: http://exbert.net

License: Apache License 2.0

CSS 0.15% HTML 0.45% TypeScript 3.41% JavaScript 0.47% Python 91.27% Makefile 0.04% Shell 0.20% Dockerfile 0.08% Jupyter Notebook 3.66% SCSS 0.27%

exbert's Introduction

Hi there 👋

I research foundation models from the unique perspectives of visualization and dynamical systems.

Talk to me about

  • 🧠 Hopfield Networks
  • 🧩 Understandable latent spaces
  • 🦾 AI Frameworks/libraries
  • 🧱 Full stack development

I try to make all of my work+research public and open source.

I am affiliated with the Visual AI Lab at IBM Research & PoloClub at GA Tech.

I work closely with Hendrik Strobelt, Dmitry Krotov, Polo Chau.

exbert's People

Contributors

bhoov avatar dependabot[bot] avatar hendrikstrobelt avatar wrosko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

exbert's Issues

Problem running locally

Hi!, I hope you are doing good.
I'm having broblems to run the server/main.py after installing as default with makefile.
The Traceback is:
The above exception was the direct cause of the following exception: Traceback (most recent call last): File "server/main.py", line 20, in <module> from data_processing import from_model, from_base_dir File "/home/jose/Codes/Work/GUANE/AVIGAIL/exBERT/exbert/server/data_processing/__init__.py", line 1, in <module> from .corpus_data_wrapper import CorpusDataWrapper File "/home/jose/Codes/Work/GUANE/AVIGAIL/exBERT/exbert/server/data_processing/corpus_data_wrapper.py", line 6, in <module> from spacyface.simple_spacy_token import SimpleSpacyToken File "/home/jose/miniconda3/envs/exbert/lib/python3.7/site-packages/spacyface/__init__.py", line 1, in <module> from .aligner import ( File "/home/jose/miniconda3/envs/exbert/lib/python3.7/site-packages/spacyface/aligner.py", line 8, in <module> from transformers import ( File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist File "/home/jose/miniconda3/envs/exbert/lib/python3.7/site-packages/transformers/file_utils.py", line 2271, in __getattr__ module = self._get_module(self._class_to_module[name]) File "/home/jose/miniconda3/envs/exbert/lib/python3.7/site-packages/transformers/file_utils.py", line 2285, in _get_module ) from e RuntimeError: Failed to import transformers.models.bert because of the following error (look up to see its traceback): Failed to import transformers.modeling_utils because of the following error (look up to see its traceback): module 'torch' has no attribute 'BoolTensor'

Thanks in advance for any help!
Regards

Running locally with custom models

Hi,

Thank you so much for this awesome project! I'm trying to visualize the attention weights of the model I trained (RoBERTa model with a multi-task final layer, fine-tuned using HuggingFace) and I'm wondering if that is supported by exBERT.
Also, I'm wondering if there will be any modification of the code needed if I trained my model using HuggingFace version later than v2.8. Thanks a lot!

i got the error while i process the text?

Traceback (most recent call last):
  File "create_hdf5.py", line 9, in <module>
    from utils.token_processing import (
ModuleNotFoundError: No module named 'utils.token_processing'

i got the error while i was dealing with the data for server. i installed utils with "pip install utils", but still got the problems. i was wondering if someone can help me...
thanks so much!!

while i am doing the data processing?

Is there any example about the format of the txt file? Because i always got an error in the file "create_faiss.py".

Creating Embedding faiss files in /home/ubuntu/will/exbert/server/data/processing/embeddings from /home/ubuntu/will/exbert/server/data/processing/embeddings/embeddings.hdf5
Traceback (most recent call last):
  File "create_faiss.py", line 145, in <module>
    main(args.directory)
  File "create_faiss.py", line 114, in main
    embedding_ce = CorpusEmbeddings(str(embedding_hdf5))
  File "/home/ubuntu/will/exbert/server/data/processing/corpus_embeddings.py", line 304, in __init__
    self.embedding_dim = self.embeddings['0001'].shape[-1]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/ubuntu/anaconda3/envs/exbert/lib/python3.6/site-packages/h5py/_hl/group.py", line 264, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object '0001' doesn't exist)"  

I have no idea about why the key error happened... Or someone can tell me how can i solve the problem. thanks very much.

spaCy - BPE Alignment sometimes faulty, raises index errors

Hi, thanks for the corpus creating scripts. They're very helpful.

I got an index error with BPE/spacy tokenization as shown below:

Extracting embeddings into /felicity/workspace/exbert/server/data/mt/deepak/embeddings/embeddings.hdf5
['deep', '##ak', 'i', 'don', '’', 't', 'feel', 'like', 'doing', 'my', 'meditation', 'today', '.']
['deepak', 'i', 'do', 'n’t', 'feel', 'like', 'doing', 'my', 'meditation', 'today', '.']
11
0
Deepak I don’t feel like doing my meditation today.
Traceback (most recent call last):
  File "/felicity/workspace/exbert/server/data/processing/create_corpus.py", line 19, in <module>
    create_hdf5.main(unique_sent_pckl, args.outdir, args.force)
  File "/felicity/workspace/exbert/server/data/processing/create_hdf5.py", line 221, in main
    sentences_to_hdf5(embedding_extractor, str(embedding_outpath), sentences, clear_file=force)
  File "/felicity/workspace/exbert/server/data/processing/create_hdf5.py", line 179, in sentences_to_hdf5
    b_pos = combine_tokens_meta(b_tokens, s_tokens, s_pos)
  File "/felicity/workspace/exbert/server/utils/token_processing.py", line 121, in combine_tokens_meta
    meta_list.append(spacy_meta[j])
IndexError: list index out of range

In IndexError designed to raise under certain circumstances? If not how can I solve it?

Thank you very much.

Originally posted by @felicitywang in #4 (comment)

Server got error after open on browser

Hi, I run 'python server/main.py' and I got this error after open the server address:

Traceback (most recent call last):
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/uvicorn/protocols/http/h11_impl.py", line 369, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 59, in _call_
    return await self.app(scope, receive, send)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/fastapi/applications.py", line 199, in _call_
    await super()._call_(scope, receive, send)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/applications.py", line 112, in _call_
    await self.middleware_stack(scope, receive, send)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in _call_
    raise exc from None
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in _call_
    await self.app(scope, receive, _send)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/middleware/cors.py", line 78, in _call_
    await self.app(scope, receive, send)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in _call_
    raise exc from None
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in _call_
    await self.app(scope, receive, sender)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/routing.py", line 580, in _call_
    await route.handle(scope, receive, send)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/starlette/routing.py", line 52, in app
    response = await func(request)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/fastapi/routing.py", line 202, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/fastapi/routing.py", line 148, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/stefano-spindola/Documentos/rag-test/exbert/server/main.py", line 140, in get_model_details
    deets = aconf.from_pretrained(model)
  File "/home/stefano-spindola/Documentos/rag-test/exbert/server/main.py", line 93, in from_pretrained
    return get_details(model_name)
  File "/home/stefano-spindola/Documentos/rag-test/exbert/server/transformer_details.py", line 31, in get_details
    return ModelDetails(mname)
  File "/home/stefano-spindola/Documentos/rag-test/exbert/server/transformer_details.py", line 44, in _init_
    self.model, self.aligner = get_model_tok(self.mname)
  File "/home/stefano-spindola/Documentos/rag-test/exbert/server/utils/f.py", line 74, in helper
    memo[x] = f(*x)
  File "/home/stefano-spindola/Documentos/rag-test/exbert/server/transformer_details.py", line 36, in get_model_tok
    tok = auto_aligner(mname, config=conf)
  File "/home/stefano-spindola/Documentos/rag-test/exbert/server/spacyface/spacyface/aligner.py", line 263, in auto_aligner
    return MakeAligner(tok_class, english).from_pretrained(pretrained_name_or_path, config=config)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 911, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/home/stefano-spindola/anaconda3/envs/exbert/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 1065, in _from_pretrained
    "Unable to load vocabulary from file. "
OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted.

How could I fix this?

corpus explorer unavailable?

Awesome tool! Can't wait to dig into it more.

The options for selecting corpus search on the demo page aren't available to click on (when I do so, I get a red circle with a slash through it). I'm using chrome.

Am I missing something?

Not able to install exbert using node version 19

Hey
I can't install it using node version 19 as it is only supported for node version 9.
It would be a great help if you can provide a solution so that I can integrate it in another React.js project with node version 19. I need it for an essential project.

Makefile doesn't work for spacy

Running the Makefile still requires me to run pip install -e server/spacyface after the makefile completes the run. Otherwise I get an error about spacy not being present.

This happens even though the output of the makefile does show that the spacy installation was successful.

Inability to launch responsive server possibly due to package version issues

First, thanks for the work you've done on this, it's really neat.
I've recently tried to implement the project with the Wizard of Oz corpus and am intending on implementing with my own text data, but have run into several issues.

Side note: while figuring all of the below out I successfully created my embeddings for woz corpus.

Package versioning
First off, is a package versioning issue. The environment.yml specifies our required packages, and more specifically points to connexion=1.5.3. If I use conda to build my environment, and then install the required spacy package you mention, getting the flask app to build fails with the error outlined here: spec-first/connexion#1149

According to that issue, and after inspecting versions of packages, the werkzeug <-> connexion integration is the culprit. I was able to get around this in several ways.

  1. by removing the connexion version restriction I get past the initial error, but then run into a problem associated with line 24 of main.py: app = connexion.FlaskApp(__name__, static_folder='client/dist', specification_dir='.'). In this case the static_folder command is no longer a part of werkzeug (which is now v1.0.1).
  2. I then took connexion and then pinned werkzeug=0.16.1. This also does not make it through the running of python server/main.py due to the reason mentioned in 1. Commenting out static_folder can bypass the issue, but obviously this isn't preferable/I'm sure there is a reason why you've included it there.
  3. I finally set connexion=1.5.3 and werkzeug=0.16.1, and made sure that line 24 of the main.py was set to it's original values, and this starts the server. ✅

Accessing the web server
Initially I had Node.js v14 installed. But recompiling the frontend kept failing continuously. Finally, I was able to do npm install by downgrading to Node v 10.16.3 due to some dependency issues regarding node-gyp. I found the suggestion via: nodejs/node-gyp#1906

After this, the npm run build fails, I've attached my terminal outputs as well as the associated log if there are any suggestions:
npm_install_and_build.log
2020-05-14T21_34_31_166Z-debug.log

Both before and after recompiling the frontend I am running into my final issue. Which was touched upon to some extent in another issue #6
This is what I'm seeing:
image
image

So long story short, do you have any suggestions? Addtionally, what should replace those demo lines that are referenced in the logs?

Thanks so much!

loading weights with default model

Hi, thanks for your project.

I had a simple question, I didn't find answer to this query in the README. I want to use bert-base-uncased and bert-large-uncased with my weights (trained on a GLUE task) and want to visualize what you showed in demo on RHS (which token the model chooses to pay attention to). Could you please provide the recommended steps ?

How to avoid splitting [SEP]

Thanks for the library, this is very useful.

I want to use this to analyze attention for question answering models, like BertForQuestionAnswering models from Huggingface, and they are (automatically) cast into AutoModelWithLMHead, which is great, so it mostly works out of the box.

However, I want to use the [SEP] tokens in the middle of my sentence, and the spacyface aligner breaks it down to something like [ SE ##P ] . I wrote a post-processing function to patch these together during the tokenization, but that doesn't work. I never see any attention on the [SEP] token (which is not true). Plus it also breaks it down again (probably happening elsewhere in the code).

What
is
the
effect
of
[SEP]
[
SE
##P
]
and
what
[SEP]

Processing Reference Corpus [details]

Hej,

thank you for your excellent work!
Could you please tell a little more about the process of getting embeddings for the reference corpus? It was not clear from the paper how your models 'processed this corpus' and gave you embeddings. Also, maybe you can point me to the code where I can see this process happening?

My interpretation is the following one: so, to get both token and context embeddings, you are basically:

  1. Running the GPT model (for example).
  2. And force every next word to be a word from the ground-truth text (otherwise, the following input would be the predicted word, which is not necessarily a correct word, and it would affect the embeddings)
  3. Save concatenated heads or heads passed through the linear layer?
    The sentence 'The model then processes this corpus' from the paper was not fully clear to me - how is this processing conducted?

Thank you.

Attention block doesn't load when running locally

The attention block doesn't load when running locally.

image

terminal log

Initializing reference map for embedding vector...
Initializing reference map for embedding vector...
AFTER SETUP
Initiating app
 * Serving Flask app "main" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:5555/ (Press CTRL+C to quit)
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET / HTTP/1.1" 302 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/exBERT.html HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/main.css HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/img/exBERT.png HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/main.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/vendor.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/05ca9c06114e79436ea9b5c8d4a7869c.ttf HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/4171e41154ba857f85c536f167d581ba.ttf HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/demo/354992f2ee236604c874a3a627e4042bc68586f8.json HTTP/1.1" 404 -
127.0.0.1 - - [22/Oct/2019 23:49:59] "GET /client/img/exBERT_favicon.png HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:50:07] "GET /client/exBERT.html?sent-a-input=hello+world HTTP/1.1" 200 -
127.0.0.1 - - [22/Oct/2019 23:50:07] "GET /client/demo/354992f2ee236604c874a3a627e4042bc68586f8.json HTTP/1.1" 404 -

Does running locally require compiling frontend? There're some npm dependencies that I can't solve either on linux(ubuntu 18.04) or macos:

npm WARN optional Skipping failed optional dependency /chokidar/fsevents:
npm WARN notsup Not compatible with your operating system or architecture: [email protected]
npm WARN [email protected] requires a peer of ava@0.* but none was installed.
npm WARN [email protected] requires a peer of ts-node@^3.0.0 || ^4.0.1 || ^5.0.0 || ^6.0.0 || ^7.0.0 but none was installed.
npm WARN [email protected] requires a peer of tslint@^4.0.0 || ^5.0.0 but none was installed.

Thank you.

Environment not able to recognize `from_pretrained` from `transformer_details`

Today I was trying to generate embeddings with the create_corpus.py script and the respective hdf5/faiss scripts. Initially things were working but in an attempt to try and figure out how to increase the max input length of text, I ended up re-pulling the repo.

What ensued, and after multiple varying attempts to create brand new environments, was this error regarding from_pretrained and transformer_details from the ` hdf5 script. There may also be an issue with the faiss script too, but I haven't tried it.
image

Any idea what's causing this? Could it be a dependency issue from one of the packages that didn't have a specified version indicated? I've attached the environment package information below
exbert_dependencies.txt

provide data files?

Hi,

Is it possible that you provide some example data files? If I run the backend, it prompts errors such as exbert/server/data/woz/embeddings/combined.hdf5', errno = 2, error message = 'No such file or directory'. I tried to use the data tools to create such embeddings, but I encountered multiple errors during that process as well.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.