neuronets / trained-models Goto Github PK
View Code? Open in Web Editor NEWTrained TensorFlow models for 3D image processing
Home Page: https://neuronets.dev/trained-models
Trained TensorFlow models for 3D image processing
Home Page: https://neuronets.dev/trained-models
@gaiborjosue See here for a cleaner way of adding models to the osf-storage without mixing up git-annex and datalad.
Let me know if you need more clarity about how all these tools work. I suggest we should stay away from git annex add-url
as we have done in the past.
@Hoda1394 and @kaczmarj - renamed this repo, so it would allow us to track models that others have published as well. let's include a license file for every model that's included.
Originally posted by @satra in #4 (comment)
@satra Can you please advise about the type of license file to add?
I tried to push a model to osf
after adding it with git annex addurl
and I got an error. here is what I did:
While in the root path of trained_model:
mkdir UCL/SynthSeg/2.0.0/conventional/weights
git annex addurl --relaxed --file UCL/SynthSeg/2.0.0/conventional/weights/synthseg_2.0.h5 https://www.dropbox.com/s/nu8ap1iicmute3y/synthseg_2.0.h5?dl=1
datalad get UCL/SynthSeg/2.0.0/conventional/weights/synthseg_2.0.h5
datalad push --to origin
I am getting this error:
[ERROR ] KeyError('bytesize') (KeyError)
and push gets aborted. I tested the downloaded model file with this link and it works. so it seems the model file is OK.
the original dropbox link has dl=0
at the end and by changing it to dl=1
it is possible to get the file with curl
or wget
and the file works fine when used for inference.
Referring to bullet point 3 in add_model_instructions.md
- We should point to files on the upstream repo and not the forked repo.
Personal observations/experiences (will be haphazard):
actions-cool/issues-helper@v3 is not allowed to be used in neuronets/trained-models. Actions in this workflow must be: within a repository owned by neuronets.
See https://github.com/neuronets/trained-models/actions/runs/6612609799
@hvgazula Check if this has to do with GH_TOKEN or anything else. #85
This will update instructions and templates for adding a new model to the repo.
Hello,
I have added a model to the upcoming issue-form workflow to retrieve the file extension and use it to download the weights and sample dataset correctly into a local file with datalad download-url.
The step for the above-mentioned is the following: https://github.com/gaiborjosue/trained-models-fork/actions/runs/6422584380/workflow#L332
The Python script used to retrieve the information is the following: https://github.com/gaiborjosue/trained-models-fork/blob/master/.github/workflows/getFileExtension.py
The reason for adding such a thing to the workflow is that users can submit their weights in multiple formats (.h5, .pth, etc.) and sample datasets (.nii, .nii.gz, etc.), and there is currently no way for us to know that extension.
Therefore, the above-attached python script that uses requests to extract info from the metadata of the URL currently only works with google direct download URLs. GitHub raw URLs don't provide that information in the metadata; the same thing happens to OneDrive download URLs.
My question is the following: Is there a better way to extract metadata from the URLs so that it does not matter the source of the URL, will still give back the file extension? Or should we ask users only to use google drive to submit the weights and sample dataset URLs?
Thanks!
in the nobrainer dockerfiles this is executed: https://github.com/neuronets/nobrainer/blob/4059376723379f19adc49ea1d67e6ebaacdd3fea/docker/cpu.Dockerfile#L14
but results in not being able to fetch these two objects:
#12 77.77 get(error): DDIG/SynthStrip/1.0.0/weights/synthstrip.1.pt (file) [not available; (Note that these git remotes have annex-ignore set: origin)]
#12 77.77 get(error): neuronets/kwyk/0.4.1/bwn_multi/weights/saved_model.pb (file) ['MD5E-s195234--50d0e6fbeb3d84be1dbfd241726ee2e8.pb'
#12 77.77 'MD5E-s195234--50d0e6fbeb3d84be1dbfd241726ee2e8.pb'
#12 77.77 'MD5E-s195234--50d0e6fbeb3d84be1dbfd241726ee2e8.pb']
i think there should be a test in this repo that is similar.
currently missing info on how to run the model.
trained-models/.github/workflows/new_model.yml
Lines 44 to 47 in 8cdfa99
fix: checkout the repo with a fetch-depth
of 0. For more info, please refer to https://github.com/actions/checkout/blob/main/README.md
users should be able to run the docker/singularity-ized forms of these models. currently the readme does not cover how to do this, nor a pointer to the docker container
users have to navigate to the release to understand how to load and do transfer learning. this should become a collab tutorial
there should also be a tutorial inference example with the pre-trained model + nibabel (i.e. the docker code).
add_model.yml
and get_model_data.yml
don't seem to be working as expected. Probably we should rewrite them or think of cleaner/better ways to integrate them in the workflow.
change_label.yml runs every time a comment or reply is added to an issue. We should constrain this behavior to only issues that are specific to adding models and not all issues. Creating labels is one way to minimize unnecessary runs.
For example, we should change
trained-models/.github/workflows/change_label.yml
Lines 4 to 6 in b49f52d
on: issues types: labeled
and put the onus of labeling/commenting on the user post-failure.TODO: I will be adding the UniverSeg model to this repo within the DDIG folder.
Project webpage: https://universeg.csail.mit.edu
Paper: http://arxiv.org/abs/2304.06131
Model weights: https://github.com/JJGO/UniverSeg/releases/tag/weights
https://github.com/balbasty/nitorch
@balbasty - posting this here. @Hoda1394 can guide as to what is needed.
Note that the path bindings (--bind
) in the singularity calls in
/output
is never used in the test command. See below for why this works-For more info, please refer to the outputs of the successful workflow runs on a simplified version of new_model.yml
.
Please refer to https://github.com/neuroneural/Vox2Cortex_fork for all the tools needed to integrate this model.
trained-models/DeepCSR/deepcsr/1.0/spec.yaml
Lines 22 to 41 in e6d42db
@gaiborjosue We should remove some of these fields as they will be repeated in the model card.
Hello @hvgazula, nice. Now we need to add the env variables and secrets. This should be related to Ec2 and a github token with issue and workflow permissions.
Originally posted by @gaiborjosue in #83 (comment)
There is a mismatch between the hardware supported by the cuda versions referenced in the docker files and the ones we have at our disposal (for testing purposes). Requesting the said resources (which are outside gablab reserved resources) is taking a lot of time.
Building a zoo that can hold different models from different environments means we should have the requisite range of hardware (gpu cards) to support such diverse environments. However, the range of cards we have is limiting and thus impacts our ability to test all models. Am I missing any other simple solution to overcome this problem?
Hello,
While building a singularity image from the docker hub I get this error:
The command I used was: singularity build pialnn.sif docker://edwardjosue2005/pialnn
@hvgazula tested building the image on his end, and it did work:
I already changed singularity's cache directory to om2.
@gaiborjosue quick question: Looking at the add model
instructions, you said the path to data should be relative to the repo structure. But at inference time, in the zoo cmd-line, the data can be anywhere on the system. In that case, relative path will raise an error. Won't it?
https://github.com/neuronets/trained-models/blob/master/add_model_instructions.md#8-test-command
Hello,
I am considering adding a field "Test-command" in the upcoming issue-form workflow to add a new model to this repo.
This is because to test the Python scripts, we need to know how to run those scripts. If we use spec.yaml for that, it contains placeholders for the path, such as "{infile[0]}"
So, I suggest adding a new input text field so that users input how to run their predict.py file (it can be called whatever name) without any flags containing the paths.
For example, the user would input:
python predict.py --length 15 --height 50
This will enable us to manually add the paths to the weights and sample dataset while testing, following our directory structure as suggested by @hvgazula.
For example, we will addon the following paths:
python predict.py --length 15 --height 50 --weights ./Model/best_model.pth --sampleDataset ./Model/test.nii.gz
This way, the user only gives the barebone test command to the script, and we worry about adding paths automatically.
Thanks.
Maybe we should replace it with a separate action.yml which is then used by new_model.yml? @gaiborjosue What do you think?
This will enable users to add their model by only entering URLs for their folders/files, and the workflow will add their model to this repository.
If the user adds a model and the action fails, it will assign the issue a label "failed," and the user can update the URLs and change the label to rerun the action. Also, if the user already has a model in the zoo and wants to update it following the recommendations to the user specified in the form, there is a second issue that will update specific files/folders of the model.
The workflow will also open a Pull request, create a branch, and link it to the opened issue. When merged, the issue will close.
Pending tasks for the workflow actions/steps:
Update Model Workflow:
Add Model Workflow:
These updates can be seen in my forked repository: https://github.com/gaiborjosue/trained-models-fork/tree/master/.github
Also, these updates will overwrite/disable the previous workflow of adding a model directly through Pull Request #67 since adding/updating through an issue form is an enhancement to the pipeline.
I reckon this could be important because, the workflow will accumulate unnecessary files even in the case of failure while adding models.
Hello @satra and @hvgazula, I hope you are doing well.
Following your suggestion of activating environments at startup in a singularity image, I can't find a working solution. I tried neuro docker's recipe file generation, but when building the singularity image, I got the following:
FATAL: You must be the root user, however you can use --remote or --fakeroot to build from a Singularity recipe file
Also, referring to this issue ReproNim/neurodocker#354 and ReproNim/neurodocker#346 the general conclusion seems to be to install everything in base.
In our case scenario, we are not building a singularity image from scratch; it simply builds a sif image from a docker image available at the hub.
Your help and guidance on this issue would be greatly appreciated.
Thank you!
Pretty much all surface reconstruction models from William work with multiple files at once. I am on the lookout for models that work with only one file at inference time. This is to enable testing out the workflow.
This is the model that needs to be integrated https://github.com/neuroneural/PialNN_fork
Please confirm the final version to be used throughout the workflow or should they be different because they are serving different purposes.
The docker image build and push workflow fails at tar -xf CBSI.tar.gz
.
Is it possible to use the braingen model for transfer learning for a segmentation problem, similar to how brainy was used for transfer learning in the AMS paper?
In the following line, docker push ${{ needs.build-docker.outputs.IMAGENAME }}
be replaced with docker push ${{ needs.build-docker.outputs.IMAGENAME }}:${{ steps.modelVersion.outputs.model_version }}
otherwise the default image gets pushed to the hub. For example, see below. I have a test image and tagged it as aws
. Docker created a new image instead of renaming the image meaning when you run docker push test
it is equivalent to docker push test:latest
and not docker push test:aws
.
Directly downloading models from Google Drive URLs is not always possible. For example, if the file size is too big, we get the following message (meaning the html will be downloaded instead of the actual file):
This is not a problem for small files. However, a work around is to extract the direct download url from the javascript elements of that page. This is a hassle.
When running the datalad save -m “Added model X”
to add a new model to trained_models as specified in: https://github.com/gaiborjosue/trained-models/blob/master/add_model_instructions.md I keep getting this error:
datalad.runner.exception.CommandError: CommandError: 'git --git-dir=/dev/null config -z -l --show-origin' failed with exitcode 129 [err: 'error: unknown option
show-origin'`
I already checked I have the latest git, git annex, and datalad versions.
The saved model and predict.py scripts can be found at https://github.com/neuroneural/topofit_fork.
The utility script should have a template (yml/config/dict) for the model card which can be used in 2 ways. One for displaying during the PR creation (for merging model) and the other for converting the values entered in this template for creating the model_card.md
Please refer to https://github.com/neuroneural/corticalflow_fork for all the tools needed to add this model.
Originally posted by hvgazula July 20, 2023
How do we ensure the integrity of the train/predict/{any other}.py
scripts uploaded by the users? Are the scripts really doing their job? Do we test this in trained-models
or nobrainer-zoo
?
axial and coronal are swapped.
Primarily around clinical imaging, thus I guess operating on DICOMs, but overall sounds related -- check out e.g. https://mhub.ai/models.html . May be the same portal could/should be reused for this effort too?
when this repo is cloned, it would be nice to be able to download all of the pre-trained models easily. for example...
git clone https://github.com/neuronets/nobrainer-models
cd nobrainer-models
get-models ???
i tried using git-annex in the branch https://github.com/neuronets/nobrainer-models/tree/add/gitannex but when i clone the repository, the location of https://github.com/neuronets/nobrainer-models/blob/add/gitannex/sig/ams/meningioma_T1wc_128iso_v1.h5 cannot be found. it is available online at https://dl.dropbox.com/s/whbeot2wriab9v2/meningioma_T1wc_128iso_v1.h5.
when i run git-annex whereis
on my laptop, i get the correct remote:
git-annex whereis
whereis sig/ams/meningioma_T1wc_128iso_v1.h5 (2 copies)
00000000-0000-0000-0000-000000000001 -- web
af2fc714-62b2-48ea-9a90-b62db6ff2aa7 -- jakub@dash:/code/nobrainer-models [here]
web: https://dl.dropbox.com/s/whbeot2wriab9v2/meningioma_T1wc_128iso_v1.h5
ok
@yarikoptic is there any chance you can help me out with this? how can i clone this repository and have git-annex know the correct remote urls?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.