GithubHelp home page GithubHelp logo

shiva-sankaran / lineex Goto Github PK

View Code? Open in Web Editor NEW
15.0 3.0 4.0 2.68 MB

Data Extraction from Scientific Line Charts

License: Apache License 2.0

Python 99.59% Shell 0.41%
computer-vision data-extraction keypoint-detectors line-charts

lineex's Introduction

LineEX: Data Extraction from Scientific Line Charts

This repo contains code and models for the LineEX system, (link paper), which extracts data from scientific line charts. We adapt existing vision transformers and pose detection methods and showcase significant performance gains over existing SOTA baselines. We also propose a new loss function and present its effectiveness against existing loss functions.

The LineEX pipeline consists of three modular stages, which can be used independent from each other. They are :

  • Keypoint Extraction
  • Chart Element Detection and Text Extraction
  • Keypoint Grouping, Legend Mapping and Datapoint Scaling

Usage

Clone this repository:

git clone https://github.com/Shiva-sankaran/LineEX.git
cd LineEX

Install the dependencies:

conda env create -f environment.yml
conda activate LineEX

Download weights and data

Weights and data will be placed at the correct folders

Set corresponding DATA_flag(True/False) to download a particular data set.

chmod +x download.sh
./download.sh -T False -V False  -L True  # To download only the test data 

UPDATE: Dataset moved to here.

UPDATE: Weights can be found here

Testing

Each of the modules can be used separately, or the entire pipeline can be called at once to extract the desired information. Output is stored in the corresponding directory

Overall

python pipeline.py --input_path = sample_input/

Keypoint detection

cd modules/KP_detection
python run.py

Chart element detection

cd modules/CE_detection
python run.py

Evaluation

Refer to the paper for more information about the metrics

Overall

Overall metrics is essentially the metric for grouping and legend mapping

cd modules/Grouping_legend_mapping
python eval.py

Keypoint detection

cd modules/KP_detection
python eval.py

Chart element detection

cd modules/CE_detection
python run.py

Training

Keypoint Extraction

cd modules/KP_detection
python -m torch.distributed.launch --nproc_per_node=3 --node_rank=0 train.py --vit_arch xcit_small_12_p16 --batch_size 42 --input_size 288 384 --hidden_dim 384 --vit_dim 384 --num_workers 24 --vit_weights https://dl.fbaipublicfiles.com/xcit/xcit_small_12_p16_384_dist.pth --alpha 0.99

Chart Element Detection and Text Extraction

cd modules/CE_detection
python -m torch.distributed.launch train.py --coco_path path_to_data

TBA

Need to change data paths

Citation

Shivasankaran, V. P., Muhammad Yusuf Hassan, and Mayank Singh. "LineEX: Data Extraction from Scientific Line Charts." 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2023.

lineex's People

Contributors

mayank4490 avatar md-hassan avatar shiva-sankaran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lineex's Issues

Grouping_legend_mapping权重文件

你好,请问可以提供modules/Grouping_legend_mapping/ckpts/mlp_ckpt.t7、modules/Grouping_legend_mapping/ckpts/ckpt_30.t7两个权重文件吗,感谢

unrecognized argument--input_path and attribute error for new test image prediction

I tried to detect the objects in a random line chart diagram with "python pipeline.py --input_path = sample_input/"
But it resulted in unrecognized arguments error(screenshot attached).
Then I just ran the command as "python pipeline.py" , the sample image files were processed, but when it came to parsing the new diagram that I uploaded in the sample_input folder, it resulted in an attribute error. The diagram I used looked similar to the diagrams that had been used for training the model.

3
Bildschirmfoto 2023-08-18 um 11 22 07 PM
Bildschirmfoto 2023-08-18 um 11 29 48 PM

Output variation with Input

Hi @Shiva-sankaran,

I used the pretrained weights and data and got the output as same as that is given in this repository, but facing a wide variation in the generated JSON output values with the input image.

How to check those values with the input image?

unable to download weights on google colab

getting the following error:

Cannot retrieve the public link of the file. You may need to change the permission to 'Anyone with the link', or have had many accesses. You may still be able to access the file from the browser: https://drive.google.com/uc?id=176BjH_6W-HRoU9RvsysSNW27GVIHPlHV

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.