GithubHelp home page GithubHelp logo

flo-schu / peek Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 4.65 MB

Peek is short for photography enhanced environmental knowledge. It contains algorithms for detecting organisms on photographs

License: GNU General Public License v3.0

Shell 14.85% Python 85.15%

peek's People

Contributors

flo-schu avatar

Watchers

 avatar

peek's Issues

add features for annotation app

  • add extra labels in keymap: s - sediment daphnia, w - water shedding,
  • remove path to csv file
  • can rectangular tags also be reproduced? --> no: fix

click tags

  • what happens with new tags? --> why are they not saved
  • apply filter when creating new tag
  • clickable tags
  • remove margin attribute of Annotations class

future

create optimizer for fitting the detector to annotated tags

Optimization Problem

What can the target function look like?

Regression

My dataset contains x,y coordinates and a label, indicating "Daphnia"+, "Culex", "unidentified" and "?"

I need a Classifier that returns also x,y coordinates and a label. In the easiest case, this classifier filters labels and names all the objects "Daphnia".

This prediction set can then be compared with the test set. Metrics can be:

  • N Daphnia
  • How many tags were found and correctly labelled within a margin of error. This could work with a loop like this
train = np.array(groundthruth)

# make sure all labels can be matched. i.e. all relevant Daphnia+ labels --> Daphnia

for point in prediction_points:
    # point has x,y coordinates
    offset = sum_over_xy(abs(train - point))
    
    # get minimum offset
    candidate = argsort(offset)[0]
    
    # test if offset falls within margin of detection, should be very close
    if offset[candidate] < 2:
        match = candidate
        true_positive_detects += 1
    else:
        false_positives_detects += 1

    if point.label == match.label:
        point.label == "Daphnia": 
            true_positive_classifications += 1
        else:
            true_negative_classification += 1
    else:
        if point.label == "Daphnia":
            false_positive_classifications += 1
        else:
            false_negative_classification += 1

this fct. will iterate over each point in the prediction and try to find a corresponding annotated tag. Success will be mesured as detection accuracy. If a match could be found, it will be measured whether the label was correct.

alternative: ML approach

I could use a logistic Regression classification scheme, where I give several predictors to the regression such as:

  • number of clusters
  • size central cluster
  • average size of non central clusters
  • xcenter
  • ycenter
  • color of central cluster
  • length major axis
  • length minor axis
  • angle of major axis
  • ...

and then for training and testing I can probably use a standard ML approach.

The benefit of logistic regression is that I get a probability of detection.
In a second step I could manually label the ones with a low probability

Also, for this approach I already have some scripts in peek

If I'm not mistaken, I can just take the tag database (or combine the databases from the tagging) for predictors and results

Resources:

https://scikit-learn.org/stable/auto_examples/calibration/plot_compare_calibration.html#sphx-glr-auto-examples-calibration-plot-compare-calibration-py

https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html#sphx-glr-auto-examples-classification-plot-classifier-comparison-py

https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html#sphx-glr-auto-examples-model-selection-plot-underfitting-overfitting-py

classifier options

Steps:

  • combine tag databases
  • filter out rows that should not be used for training
  • transform data if needed
  • train classifier
  • evaluate classifier (metrics, build report --> .md file, hyperparameter optimization)
    tools exist: https://scikit-learn.org/stable/modules/cross_validation.html
  • test classifier
  • if needed calculate other metrics for predictors (based on threshold slice and original image)
  • repeat with different classifier
  • model comparison
  • use for predictions

optimize structure

I do a lot of double bookkeeping which is very stupid.

when I modify a Tag i also have to modify the database. Since I want to keep the database as pandas (csv) format for easy export import and compüatibility

Tag values should always only be derived from the database and not the other way round.

  • creating a Tag adds one line to the database
  • modification of a tag updates the respective entry in the database
  • a call to the tag fetches the appropriate entries (label, ... ) or derived entries (contours, etc)

avoid saving all tags as images

Why would this be necessary?

Currently, when tags are imported, their slices and contours are saved to the disk.
When sliders should be used meaningfully, the set of imported images should be large. This consumes a lot of disk space (up to 10mb per image).

Instead, the tagging could be completely moved inside the annotation process (also because it is fast). Instead of saving the whole image, the non white pixels could be saved as a flattened array separated by whitespace. From this, contours could be extracted with ease, together with this the box margin is saved to make it easy to re-construct the tag

  • non white pixels should be saved in detector to Tagger
  • removed axis titles (re-do)
  • in show_tag replace method extrafileobjects
  • in manual_tag replace part where fileobjects are saved
  • draw contour on slice is not used any longer
  • saving tags to database saves the wrong image!!

improve annotation app and selection algorithm

Object selection

Selection of candidate tags is the first step of the object detection pipeling

Issues

annotation app

Potential improvements

  • save current detector settings in CSV metadata
  • consider using netcdf because of metadata availability. CSV Exports and txt for metadata can be always generated from it. Nevertheless I like the raw text component of it, but .nc is probably also more efficient. pytables may also be an option https://www.pytables.org/usersguide/introduction.html --> supports nestesd table cells (virtually ideal for adding the slices in the table); --> Viewers are: panoply (HDF5, netcdf), vitables (HDF5). Note that netcdf is built on HDF5
  • remove passing of image files and make everything reproducible from csv file
    • original image slice --> is already in Tag

stronger integration between classes Detector, Tag, Annotation, Tagger

Annotations should be usable to postprocess tags. Particularly since there is already the option of filtering tags

actually, this may not be such a good idea. Annotations is designed to annotate a single image. Instead, it is probably better to convert between csv annotations and Tagger class tags, which are then compatible with the detector class.

  • Alternatively CSV style tags can be made compatible with Detector. This approach was taken.
  • Eventually Tagger class can go and will be replaced with a list of Tag instances

write tests

write tests for annotation class. See method test on how to do that. For this a mini image would be enough for testing. Also the various annotation scenarios can be probed

motion analysis should analyze all pictures in one script and take automated decisions if a combination of images was successful.

analyze replicate images and take some decisions in analysis script

  • Particularly moving camera images should be kicked out based on the criterium percent pixels moved (this is already computed in motion detector) --> the tag.csv file should then take the reason for disqualitifcation in as a value
  • this will also drastically speed up the combination of result dataframes
  • this can be checked in save_new_tags, whether the image is verwackelt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.