GithubHelp home page GithubHelp logo

data-gradients's Introduction

DataGradients

DataGradients is an open-source python based library designed for computer vision dataset analysis.

Extract valuable insights from your datasets and get comprehensive reports effortlessly.

πŸ” Detect Common Data Issues

  • Corrupted data
  • Labeling errors
  • Underlying biases, and more.

πŸ’‘ Extract Insights for Better Model Design

  • Informed decisions based on data characteristics.
  • Object size and location distributions.
  • High frequency details.

🎯 Reduce Guesswork for Hyperparameters

  • Define the correct NMS and filtering parameters.
  • Identify class distribution issues.
  • Calibrate metrics for your unique dataset.

πŸ›  Capabilities

Non-exhaustive list of supported features.

  • General Image Metrics: Explore key attributes like resolution, color distribution, and average brightness.
  • Class Overview: Get a snapshot of class distributions, most frequent classes, and unlabelled images.
  • Positional Heatmaps: Visualize where objects tend to appear within your images.
  • Bounding Box & Mask Details: Delve into dimensions, area coverages, and resolutions of objects.
  • Class Frequencies Deep Dive: Dive deeper into class distributions, understanding anomalies and rare classes.
  • Detailed Object Counts: Examine the granularity of components per image, identifying patterns and outliers.
  • And many more!

πŸ“˜ Deep Dive into Data Profiling
Puzzled by some dataset challenges while using DataGradients? We've got you covered.
Enrich your understanding with this πŸŽ“free online course. Dive into dataset profiling, confront its complexities, and harness the full potential of DataGradients.

Example of pages from the Report


Example of specific features

Check out the pre-computed dataset analysis for a deeper dive into reports.

Table of Contents

Installation

You can install DataGradients directly from the GitHub repository.

pip install data-gradients

Quick Start

Prerequisites

  • Dataset: Includes a Train set and a Validation or a Test set.
  • Dataset Iterable: A method to iterate over your Dataset providing images and labels. Can be any of the following:
    • PyTorch Dataloader
    • PyTorch Dataset
    • Generator that yields image/label pairs
    • Any other iterable you use for model training/validation
  • One of:
    • Class Names: Either the list of all class names in the dataset OR dictionary mapping of class_id -> class_name.
    • Number of classes: Indicate how many unique classes are in your dataset. Ensure this number is greater than the highest class index (e.g., if your highest class index is 9, the number of classes should be at least 10).

Please ensure all the points above are checked before you proceed with DataGradients.

Example

from torchvision.datasets import CocoDetection

train_data = CocoDetection(...)
val_data = CocoDetection(...)
class_names = ["person", "bicycle", "car", "motorcycle", ...]
# OR
# class_names = {0: "person", 1:"bicycle", 2:"car", 3: "motorcycle", ...}

Good to Know - DataGradients will try to find out how the dataset returns images and labels.

  • If something cannot be automatically determined, you will be asked to provide some extra information through a text input.
  • In some extreme cases, the process will crash and invite you to implement a custom dataset extractor

Heads up - DataGradients provides a few out-of-the-box dataset/dataloader implementation. You can find more dataset implementations in PyTorch or SuperGradients.

Dataset Analysis

You are now ready to go, chose the relevant analyzer for your task and run it over your datasets!

Image Classification

from data_gradients.managers.classification_manager import ClassificationAnalysisManager 

train_data = ...  # Your dataset iterable (torch dataset/dataloader/...)
val_data = ...    # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]

analyzer = ClassificationAnalysisManager(
    report_title="Testing Data-Gradients Classification",
    train_data=train_data,
    val_data=val_data,
    class_names=class_names,
)

analyzer.run()

Object Detection

from data_gradients.managers.detection_manager import DetectionAnalysisManager

train_data = ...  # Your dataset iterable (torch dataset/dataloader/...)
val_data = ...    # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]

analyzer = DetectionAnalysisManager(
    report_title="Testing Data-Gradients Object Detection",
    train_data=train_data,
    val_data=val_data,
    class_names=class_names,
)

analyzer.run()

Semantic Segmentation

from data_gradients.managers.segmentation_manager import SegmentationAnalysisManager 

train_data = ...  # Your dataset iterable (torch dataset/dataloader/...)
val_data = ...    # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]

analyzer = SegmentationAnalysisManager(
    report_title="Testing Data-Gradients Segmentation",
    train_data=train_data,
    val_data=val_data,
    class_names=class_names,
)

analyzer.run()

Example

You can test the segmentation analysis tool in the following example which does not require you to download any additional data.

Report

Once the analysis is done, the path to your pdf report will be printed. You can find here examples of pre-computed dataset analysis reports.

Feature Configuration

The feature configuration allows you to run the analysis on a subset of features or adjust the parameters of existing features. If you are interested in customizing this configuration, you can check out the documentation on that topic.

Dataset Extractors

Ensuring Comprehensive Dataset Compatibility

DataGradients is adept at automatic dataset inference; however, certain specificities, such as nested annotations structures or unique annotation format, may necessitate a tailored approach.

To address this, DataGradients offers extractors tailored for enhancing compatibility with diverse dataset formats.

For an in-depth understanding and implementation details, we encourage a thorough review of the Dataset Extractors Documentation.

Pre-computed Dataset Analysis

Example notebook on Colab

Detection

Common Datasets

Roboflow 100 Datasets

Segmentation

Community

Click here to join our Discord Community

License

This project is released under the Apache 2.0 license.

data-gradients's People

Contributors

bloodaxe avatar louis-dupont avatar natanbagrov avatar ofrimasad avatar ranrubin avatar ranzilberstein avatar rotemy-x10 avatar shanibenbaruch avatar shaydeci avatar tomerkeren42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

data-gradients's Issues

Image channel description not working in datagradients

πŸ’‘ Your Question

What should I enter as a response in here? or is it a bug? I've tried all possible answers and it doesn't work.

Please describe your image channels?

Image Shape: (256, 256)

Enter the channel format representing your image:

RGB : Red, Green, Blue
BGR : Blue, Green, Red
G : Grayscale
LAB : Luminance, A and B color channels

ADDITIONAL CHANNELS?
If your image contains channels other than the standard ones listed above (e.g., Depth, Heat), prefix them with 'O'.
For instance:

ORGBO: Can represent (Heat, Red, Green, Blue, Depth).
OBGR: Can represent (Alpha, Blue, Green, Red).
GO: Can represent (Gray, Depth).

IMPORTANT: Make sure that your answer represents all the image channels.
Enter your response >>> RGB : Red, Green, Blue
RGB : Red, Green, Blue is not a valid input! Please check the instruction and try again.

Enter your response >>> rgb
rgb is not a valid input! Please check the instruction and try again.

Enter your response >>> RGB
RGB is not a valid input! Please check the instruction and try again.

Enter your response >>> RGB
RGB is not a valid input! Please check the instruction and try again.

Enter your response >>> RGB : Red, Green, Blue
RGB : Red, Green, Blue is not a valid input! Please check the instruction and try again.

Versions

No response

Detection TinyCOCO Example Fails to Run

πŸ› Describe the bug

Hi, I am trying to run the detection_tinycoco.ipynb notebook from examples in Google Colab, with a fresh install, but it fails at this step:

from data_gradients.managers.detection_manager import DetectionAnalysisManager
from data_gradients.datasets.detection.coco_detection_dataset import COCODetectionDataset

Error:

Downloading: "https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v3_small-047dcff4.pth
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9.83M/9.83M [00:00<00:00, 24.1MB/s]
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-2-5657470712ca>](https://localhost:8080/#) in <cell line: 2>()
      1 from data_gradients.managers.detection_manager import DetectionAnalysisManager
----> 2 from data_gradients.datasets.detection.coco_detection_dataset import COCODetectionDataset

2 frames
[/usr/local/lib/python3.10/dist-packages/data_gradients/datasets/segmentation/voc_segmentation_dataset.py](https://localhost:8080/#) in <module>
      2 from typing import Union
      3 
----> 4 from data_gradients.datasets.download.voc import download_VOC
      5 from data_gradients.datasets.segmentation.voc_format_segmentation_dataset import VOCFormatSegmentationDataset
      6 

ModuleNotFoundError: No module named 'data_gradients.datasets.download'

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

I have tried the DG_Demo.ipynb notebook as well and the same issue happened there.
Although, the classification notebook under examples works fine.

Versions

Collecting environment information...
PyTorch version: 2.0.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.27.4
Libc version: glibc-2.35

Python version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.120+-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   46 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          2
On-line CPU(s) list:             0,1
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) CPU @ 2.20GHz
CPU family:                      6
Model:                           79
Thread(s) per core:              2
Core(s) per socket:              1
Socket(s):                       1
Stepping:                        0
BogoMIPS:                        4399.99
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       32 KiB (1 instance)
L1i cache:                       32 KiB (1 instance)
L2 cache:                        256 KiB (1 instance)
L3 cache:                        55 MiB (1 instance)
NUMA node(s):                    1
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Vulnerable; SMT Host state unknown
Vulnerability Meltdown:          Vulnerable
Vulnerability Mmio stale data:   Vulnerable
Vulnerability Retbleed:          Vulnerable
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:        Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Vulnerable

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==2.0.1+cu118
[pip3] torchaudio==2.0.2+cu118
[pip3] torchdata==0.6.1
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.15.2
[pip3] torchvision==0.15.2+cu118
[pip3] triton==2.0.0
[conda] Could not collect

I am using data-gradients for object-detection data analysis. but end up with the following error. Is this caused by different number of objects in each image? I am using Yolo-format label file.

πŸ’‘ Your Question

File "analyse_dataset.py", line 27, in
analyzer.run()
File "/usr/local/lib/python3.8/dist-packages/data_gradients/managers/abstract_manager.py", line 226, in run
self.execute()
File "/usr/local/lib/python3.8/dist-packages/data_gradients/managers/abstract_manager.py", line 114, in execute
for i, (train_batch, val_batch) in enumerate(datasets_tqdm):
File "/usr/local/lib/python3.8/dist-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 635, in next
data = self._next_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 679, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/collate.py", line 265, in default_collate
return collate(batch, collate_fn_map=default_collate_fn_map)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/collate.py", line 143, in collate
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/collate.py", line 143, in
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/collate.py", line 120, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/collate.py", line 172, in collate_numpy_array_fn
return collate([torch.as_tensor(b) for b in batch], collate_fn_map=collate_fn_map)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/collate.py", line 120, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/collate.py", line 163, in collate_tensor_fn
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [1, 5] at entry 0 and [4, 5] at entry 2

Versions

[pip3] numpy==1.22.2
[pip3] pytorch-quantization==2.1.2
[pip3] torch==1.14.0a0+44dac51
[pip3] torch-tensorrt==1.4.0.dev0
[pip3] torchtext==0.13.0a0+fae8e8c
[pip3] torchvision==0.15.0a0
[pip3] triton==2.0.0

data-gradients with yolo-nas

πŸ’‘ Your Question

how i can use data-gradients in yolo-nas to show me a repport of accurancy,recall ,precision..etc

[import problems] Can't import detection manager

πŸ’‘ Your Question

Hey, I have a problem with importing DetectionAnalysisManager after installing data-gradients .So it's present in my site-packages. I tried using different PYTHONPATHes but it doesn't help. And I get ModuleNotFoundError again.

from data_gradients.managers.detection_manager import DetectionAnalysisManager

And have an error

Traceback (most recent call last):
  File "/home/user/data-gradients/data_gradients.py", line 1, in <module>
    from data_gradients.managers.detection_manager import DetectionAnalysisManager
  File "/home/user/data-gradients/data_gradients.py", line 1, in <module>
    from data_gradients.managers.detection_manager import DetectionAnalysisManager
ModuleNotFoundError: No module named 'data_gradients.managers'; 'data_gradients' is not a package

python - 3.9.17. data-gradients - 0.1.4

How can this be resolved?
Thanks

Versions

No response

which version of pytorch and torchvision is required?

πŸ’‘ Your Question

I got this error:
cannot import name 'vit_b_16' from 'torchvision.models'

for this:
from data_gradients.managers.segmentation_manager import SegmentationAnalysisManager

Is there required version of torch/torchvision ?

Versions

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.