GithubHelp home page GithubHelp logo

vbhavank / idar Goto Github PK

View Code? Open in Web Editor NEW

This project forked from stevenlujpl/idar

0.0 0.0 0.0 15.91 MB

Interactive Data Analyzer and Reviewer for Machine Learning Systems - Python 3

Python 5.44% CSS 12.09% TeX 2.31% Makefile 0.02% HTML 80.13%

idar's Introduction

Interactive Data Analyzer and Reviewer (IDAR) for Machine Learning Systems

IDAR is a tool written in Python, Javascript, HTML5, and CSS.

It provides functionalities to help analyze ambiguous subjects/labels. It takes a raw CSV file exported from Zooniverse as input, and categorizes the records in the CSV file into "gold standard" and "ambiguous" categories. The records in "gold standard" category will be saved as gold_standard.csv in specified output directory, and the records in "ambiguous" category will be used to construct a static HTML page for further analysis.

It currently only supports CSV file exported from Zooniverse, but it can be extended to support other format. The rules for determining whether an image goes into "gold standard" and "ambiguous" can be customized. For adapting it to use for other projects, please read Extension and Adaptation section.

Dependency

  • Python dependencies:

    • numpy
    • Cheetah --- it is an open source template engine and code-generation tool.
  • HTML5/Javascript dependencies:

    • A browser that supports HTML5 Web Storage and Download Attribute.
    • Google Chrome is recommended. The Javascript/HTML/CSS code should work on most of browsers (e.g. chrome, safari, firefox, IE) updated to recent versions. However, the code is not tested on browsers other than Google Chrome.
    • Be sure to enable third-party cookies in your browser. The HTML/Javascript code uses web storage, and it will be disabled if cookie is disabled.

Usages

ALA Usage

Use the following command to generate a static HTML page for analyzing "ambiguous" subjects, and it will also generate a "gold standard" csv file.

python ambiguous_label_analyzer.py image_dir input_csv template_dir output_dir

Where image_dir is a directory that contains all of the images; input_csv is the csv file exported from Zooniverse; template_dir is a directory that contains the HTML template files (.tmpl); output_dir is the output directory that contains the static "ambiguous" HTML page and the "gold standard" csv file.

For example:

python ambiguous_label_analyzer.py /Users/youlu/Desktop/PDS_image_classification/salience_experiments/cropped_images/ ../test/input/mars-landmarks-classifications-2018-04-24.csv templates/ ../test/output/

OIA Usage:

You can also generate an HTML page to review outliers (given feature vectors in your chosen representation).

python outlier_image_analyzer.py image_dir feature_file template_dir output_dir

e.g.

python outlier_image_analyzer.py ~/Research/DEMUD/results/mislabeled/landmarks-v2/v2/5 ~/Research/DEMUD/DEMUD-github/scripts/cnn_feat_extraction/feats/v2fc6-class5.csv templates/ class5

In this case, the feature vectors were extracted for class 5 from the AlexNet CNN. See

https://github.com/wkiri/DEMUD/tree/master/scripts/cnn_feat_extraction

for instructions on how to extract these feature vectors.

EAT Usage:

You can also generate an HTML page to review mis-classified images.

python error_analyzer.py image_dir label_file pred_file template_dir output_dir classmap_file -n=integer

e.g.

python error_analyzer.py ~/PDS_image_classification/images/ ~/COSMIC/working_dir/eat_v1.1.0_2019_3_14/labels-val.txt ~/COSMIC/working_dir/eat_v1.1.0_2019_3_14/preds-val.txt templates/ ~/COSMIC/working_dir/eat_v1.1.0_2019_3_14/output ~/COSMIC/working_dir/eat_v1.1.0_2019_3_14/classmap.txt -n=200

Note that classmap_file and -n arguments are optional.

Logging

The tool doesn't provide a parameter to save a log file, instead, the python code uses print statement to write to stdout, so one can redirect the stdout to a file for logging purposes. See the following example,

python ambiguous_label_analyzer.py /Users/youlu/Desktop/PDS_image_classification/salience_experiments/cropped_images/ ../test/input/mars-landmarks-classifications-2018-04-24.csv templates/ ../test/output/ > ../test/log.txt

Design

class diagram and flow diagram

Extension and Adaptation

The tool is designed and modularized so that it is relatively easy to be adapted by other projects. In order to use this tool for other projects, you need to make the following changes to the code (the modules highlighted in green in the above diagram).

Copyright

Copyright (c) 2019, California Institute of Technology ("Caltech"). U.S. Government sponsorship acknowledged. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of Caltech nor its operating division, the Jet Propulsion Laboratory, nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

idar's People

Contributors

stevenlujpl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.