In the notebook "ProteinBERT - final paper analyses.ipynb"<a href="https://github.com/

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Missing files/directories for running about protein_bert HOT 4 CLOSED

skr3178 commented on July 17, 2024

Missing files/directories for running

from protein_bert.

Comments (4)

ddofer commented on July 17, 2024 1

Hi, you can download the phosphosite data from phosphositeplus website

https://www.phosphosite.org/staticDownloads

from protein_bert.

nadavbra commented on July 17, 2024

These are huge files (about 1TB if I remember correctly), so it's very difficult to share them. You can create the dataset from scratch if you want (there's an explanation at our README how to create the UniRef dataset).
What are you trying to do? Are you trying to replicate a specific analysis?

from protein_bert.

skr3178 commented on July 17, 2024

I was trying to get the model to run at first and both the notebook couldn't be run in their default state.
These were some issues which I encountered when running the model.

Environment and python version compatibility- tensorflow scikit-learn compatibility- after quite a few trial I found this config to work for anyone in the future.
load_pretrained_model() has some issues where even on typing in "yes" it does not load the pkl files.
The workaround I found was manually downloading .pkl file using wget and then renaming it to default. Also within existing_model_loading.py the location for "DEFAULT_LOCAL_MODEL_DUMP_DIR" has to be updated to the location of the renamed "default.pkl"
Environment created using python 3.8.10. Below is the environment.yml file

name: prob7
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - anyio=3.6.2=pyhd8ed1ab_0
  - argon2-cffi=21.3.0=pyhd8ed1ab_0
  - argon2-cffi-bindings=21.2.0=py38h0a891b7_3
  - asttokens=2.2.1=pyhd8ed1ab_0
  - attrs=22.2.0=pyh71513ae_0
  - babel=2.11.0=pyhd8ed1ab_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.1=pyhd3eb1b0_0
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - beautifulsoup4=4.11.2=pyha770c72_0
  - bleach=6.0.0=pyhd8ed1ab_0
  - brotlipy=0.7.0=py38h0a891b7_1005
  - ca-certificates=2023.01.10=h06a4308_0
  - certifi=2022.12.7=py38h06a4308_0
  - cffi=1.15.1=py38h74dc2b5_0
  - comm=0.1.2=pyhd8ed1ab_0
  - cryptography=39.0.0=py38h1724139_0
  - cudatoolkit=11.2.2=hbe64b41_11
  - cudnn=8.1.0.77=h90431f1_0
  - debugpy=1.6.6=py38h8dc9893_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - entrypoints=0.4=pyhd8ed1ab_0
  - executing=1.2.0=pyhd8ed1ab_0
  - flit-core=3.8.0=pyhd8ed1ab_0
  - idna=3.4=pyhd8ed1ab_0
  - importlib-metadata=6.0.0=pyha770c72_0
  - importlib_metadata=6.0.0=hd8ed1ab_0
  - importlib_resources=5.10.2=pyhd8ed1ab_0
  - ipykernel=6.21.0=pyh210e3f2_0
  - ipython=8.9.0=pyh41d4057_0
  - ipython_genutils=0.2.0=py_1
  - jedi=0.18.2=pyhd8ed1ab_0
  - jinja2=3.1.2=pyhd8ed1ab_1
  - json5=0.9.6=pyhd3eb1b0_0
  - jsonschema=4.17.3=pyhd8ed1ab_0
  - jupyter_client=8.0.2=pyhd8ed1ab_0
  - jupyter_core=5.2.0=py38h578d9bd_0
  - jupyter_events=0.6.3=pyhd8ed1ab_0
  - jupyter_server=2.2.0=pyhd8ed1ab_0
  - jupyter_server_terminals=0.4.4=pyhd8ed1ab_1
  - jupyterlab=3.5.3=pyhd8ed1ab_0
  - jupyterlab_pygments=0.2.2=pyhd8ed1ab_0
  - jupyterlab_server=2.19.0=pyhd8ed1ab_0
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=12.2.0=h65d4601_19
  - libgomp=12.2.0=h65d4601_19
  - libsodium=1.0.18=h36c2ea0_1
  - libstdcxx-ng=12.2.0=h46fd767_19
  - markupsafe=2.1.2=py38h1de0b5d_0
  - matplotlib-inline=0.1.6=pyhd8ed1ab_0
  - mistune=2.0.4=pyhd8ed1ab_0
  - nb_conda=2.2.1=py38h06a4308_1
  - nb_conda_kernels=2.3.1=py38h06a4308_0
  - nbclassic=0.4.8=pyhd8ed1ab_0
  - nbclient=0.7.2=pyhd8ed1ab_0
  - nbconvert=7.2.9=pyhd8ed1ab_0
  - nbconvert-core=7.2.9=pyhd8ed1ab_0
  - nbconvert-pandoc=7.2.9=pyhd8ed1ab_0
  - nbformat=5.7.3=pyhd8ed1ab_0
  - ncurses=6.4=h6a678d5_0
  - nest-asyncio=1.5.6=pyhd8ed1ab_0
  - notebook=6.5.2=pyha770c72_1
  - notebook-shim=0.2.2=pyhd8ed1ab_0
  - openssl=1.1.1s=h7f8727e_0
  - packaging=23.0=pyhd8ed1ab_0
  - pandoc=2.19.2=ha770c72_0
  - pandocfilters=1.5.0=pyhd8ed1ab_0
  - parso=0.8.3=pyhd8ed1ab_0
  - pexpect=4.8.0=py38h32f6830_1
  - pickleshare=0.7.5=py38h32f6830_1002
  - pip=22.3.1=py38h06a4308_0
  - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_0
  - platformdirs=2.6.2=pyhd8ed1ab_0
  - prometheus_client=0.16.0=pyhd8ed1ab_0
  - prompt-toolkit=3.0.36=pyha770c72_0
  - psutil=5.9.4=py38h0a891b7_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pure_eval=0.2.2=pyhd8ed1ab_0
  - pycparser=2.21=pyhd8ed1ab_0
  - pygments=2.14.0=pyhd8ed1ab_0
  - pyopenssl=23.0.0=pyhd8ed1ab_0
  - pyrsistent=0.19.3=py38h1de0b5d_0
  - pysocks=1.7.1=py38h578d9bd_5
  - python=3.8.10=h12debd9_8
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python-fastjsonschema=2.16.2=pyhd8ed1ab_0
  - python-json-logger=2.0.4=pyhd8ed1ab_0
  - python_abi=3.8=2_cp38
  - pytz=2022.7.1=pyhd8ed1ab_0
  - pyyaml=6.0=py38h0a891b7_5
  - pyzmq=25.0.0=py38he24dcef_0
  - readline=8.2=h5eee18b_0
  - requests=2.28.2=pyhd8ed1ab_0
  - rfc3339-validator=0.1.4=pyhd8ed1ab_0
  - rfc3986-validator=0.1.1=pyh9f0ad1d_0
  - send2trash=1.8.0=pyhd8ed1ab_0
  - setuptools=65.6.3=py38h06a4308_0
  - six=1.16.0=pyh6c4a22f_0
  - sniffio=1.3.0=pyhd8ed1ab_0
  - soupsieve=2.3.2.post1=pyhd8ed1ab_0
  - sqlite=3.40.1=h5082296_0
  - stack_data=0.6.2=pyhd8ed1ab_0
  - terminado=0.17.1=pyh41d4057_0
  - tinycss2=1.2.1=pyhd8ed1ab_0
  - tk=8.6.12=h1ccaba5_0
  - tomli=2.0.1=pyhd8ed1ab_0
  - tornado=6.2=py38h0a891b7_1
  - traitlets=5.9.0=pyhd8ed1ab_0
  - typing-extensions=4.4.0=hd8ed1ab_0
  - typing_extensions=4.4.0=pyha770c72_0
  - urllib3=1.26.14=pyhd8ed1ab_0
  - wcwidth=0.2.6=pyhd8ed1ab_0
  - webencodings=0.5.1=py_1
  - websocket-client=1.5.0=pyhd8ed1ab_0
  - wheel=0.37.1=pyhd3eb1b0_0
  - xz=5.2.10=h5eee18b_1
  - yaml=0.2.5=h7f98852_2
  - zeromq=4.3.4=h9c3ff4c_1
  - zipp=3.12.0=pyhd8ed1ab_0
  - zlib=1.2.13=h5eee18b_0
  - pip:
    - absl-py==1.4.0
    - astunparse==1.6.3
    - cachetools==5.3.0
    - charset-normalizer==3.0.1
    - cycler==0.11.0
    - flatbuffers==1.12
    - gast==0.4.0
    - google-auth==2.16.0
    - google-auth-oauthlib==0.4.6
    - google-pasta==0.2.0
    - grpcio==1.51.1
    - h5py==3.8.0
    - joblib==1.2.0
    - keras==2.9.0
    - keras-preprocessing==1.1.2
    - kiwisolver==1.4.4
    - libclang==15.0.6.1
    - lxml==4.9.2
    - markdown==3.4.1
    - matplotlib==3.2.2
    - numpy==1.24.1
    - oauthlib==3.2.2
    - opt-einsum==3.3.0
    - pandas==1.3.5
    - protobuf==3.19.6
    - pyasn1==0.4.8
    - pyasn1-modules==0.2.8
    - pyparsing==3.0.9
    - requests-oauthlib==1.3.1
    - rsa==4.9
    - scikit-learn==1.0.2
    - scipy==1.10.0
    - tensorboard==2.9.1
    - tensorboard-data-server==0.6.1
    - tensorboard-plugin-wit==1.8.1
    - tensorflow==2.9.2
    - tensorflow-estimator==2.9.0
    - tensorflow-io-gcs-filesystem==0.30.0
    - termcolor==2.2.0
    - threadpoolctl==3.1.0
    - werkzeug==2.2.2
    - wrapt==1.14.1
prefix: /home/skr/anaconda3/envs/prob7

from protein_bert.

skr3178 commented on July 17, 2024

@nadavbra couldn't find this file "PhosphositePTM.train.csv"
there was comment for the file being too large 50mb in the commits.
how to access them?

from protein_bert.

Missing files/directories for running about protein_bert HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs