GithubHelp home page GithubHelp logo

lhackel-tub / configilm Goto Github PK

View Code? Open in Web Editor NEW
33.0 33.0 4.0 236.76 MB

A Library for configurable combination of pre-configured and possibly pre-trained Image and Language Models

Home Page: https://lhackel-tub.github.io/ConfigILM/

License: MIT License

Python 100.00%

configilm's People

Contributors

kai-tub avatar lhackel-tub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

configilm's Issues

Rename ILMTypes to Image as well

ILMTypes are currently called Vision... instead of Image... even tho everything else is called Image...
They should be renamed

Tests are very slow

There are a lot of tests and they are very small. Optimizing the tests could help development speed

Rename to ConfigILM

The current naming suggests that any Vision-related task is included. However, the Library is specific to Images. Therefore the library should be renamed to ConfigILM - including all references for vision within.

Add benchmark models from the literature to be able to test fast against new methods

If the library implements some benchmark models using the configurations, it would be easier for end users to fast test against the existing methods. In addition these benchmarks would show as examples how the library could be used.

Possible starting points could be:

Add a simple training script without logging

The current Scripts use WandB for logging. However, this requires a WandB account and Internet access to work properly. To showcase how to use the framework, logging is not necessary.

Extend Docstrings

Documentation (docstrings) should be extended/created for

  • ConfigILM.get_timm_model

  • ConfigILM.ILMType

  • ConfigILM.ILMConfiguration

  • ConfigILM.ConfigILM.init

  • ConfigILM.ConfigILM.init

  • ConfigILM.ConfigILM._check_input

  • ConfigILM.ConfigILM.forward

  • ConfigILM.util.huggingface_tokenize_and_pad

  • ConfigILM.extra.BEN_lmdb_utils (all)

  • ConfigILM.extra.CustomTorchClasses (all)

  • ConfigILM.extra.RSVQAxBEN_DataModule_LMDB_Encoder (all)

  • ConfigILM.extra.BEN_DataModule_LMDB_Encoder (all)

Examples for pure Pytorch

Currently, the example scripts are only written in pytorch lightning. Add example scripts (one per use case) in pure pytorch as well.

can't load full RSVQALR dataset

The RSVQALR dataset cannot be loaded because there is a KeyError in _get_question_answers when loading the answers. The problem here is, that inactive answers are still part of the answer list before they are filtered and therefore the question_id key is accessed for inactive elements which don't have this attribute

Training is very slow

The current version uses a pytorch/huggingface (or maybe other dependencies) combination that is very slow, as shown in this and this issue.

The current workaround would be to use version 0.3.0 and manually update the packages (e.g., timm) after installing ConfigILM until a fix is given in the dependencies or otherwise known.

psutil not a requirement

psutil is not a requirement when installing, however it is used in RSVQAxBEN Data module. The import fails when not installed additionally

API Documentation

A documentation of the api including the descriptions of the docstrings should be added in the guide to look up e.g. parameter names for specific parts of the library.

optional omitting of ground truth label entries

In BENLMDBReader there should be an option that allows to omit specified classes in the returned ground truth label vectors (and reduce the ground truth vectors accordingly). This can be helpful when working with subsets of BigEarthNet that do not contain all the classes.

Choose More Intuitive Parameter Names For Loading Pretrained Models

The parameters load_hf_if_available and load_timm_if_available of the configilm.ILMConfiguration class are switches for loading a pretrained huggingface or timm model according to this docstring. Their purpose could be made clearer and more explicit by renaming them to something like load_pretrained_hf_if_available and load_pretrained_timm_if_available respectively.

This is actually done in a less user-facing function here.

Classes in DataModules

The provided Datamodules do not expose a way of changing the number of classes in the datasets. Therefore datamodules are not usable when a different number of classes is required or wanted

Create an abstract class that all datamodules inherit from

The datamodules contain a lot of redundant code. To make this more flexible and less duplicate, an abstract class (based on Lightning datamodule) should be implemented that all datamodules inherit from.
Also Tests should use this class to test all basic functionality and only the individual additional functionality should be in the individual test classes

Multi-Dim Fusion

The current fusion implementation only supports fusion types with the same input and output dimensions. However, there are some fusion types - e.g. MUTAN like in this implementation where the fusion output dimension can be different than the input dimension.

Update to lightning

Right now, the library uses pytorch_lightning, which was renamed to lightning. To reflect this change, the dependency should be updated.

wheel size reduction

The current state of the library cannot be published as there are to many examples in the mock data. Only up to 100MB can be published, therefore the number of examples in the mock data has to be reduces significantly

RSVQA quantization

The datasets as implemented use every answer as its own class. However, in the paper the datasets use quantization (II.B, page 5) so that e.g. are buckets and number buckets are created, which results in drastically fewer answer classes. A flag should be added to the datasets to allow the same quantization approach

Document Design Ideas of Library

The documentation seems to explain features only by example so far. I would like to see a more abstract documentation page which answers questions like:

  • What are the main user-facing classes of this library? (Answer as far as I understand: ILMConfiguration and ConfigILM)
  • What are the roles of these classes?

example_scripts and baselines

The folder example_scripts should be called only scripts and the scripts from baselines folder should also be moved to scripts

Reduce Dependencies

The current state of dependencies always installs everything. This should not be the case and some dependencies should be optional - therefore the name extra for the subpackage

Option to return patch-names for BENDataSet

For visualization of image retrieval results using BEN, it is necessary to assign patch-names to samples returned from the BENDataSet.__getitem__() method. A convenient extension to this class would therefore be to (optionally) return the patch-names (key) in addition to image and labels.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.