add extra labels in keymap: s - sediment daphnia, w - water shedding,
remove path to csv file
can rectangular tags also be reproduced? --> no: fix

click tags

what happens with new tags? --> why are they not saved
apply filter when creating new tag
clickable tags
remove margin attribute of Annotations class

future

#7
#8
#9

create optimizer for fitting the detector to annotated tags

Optimization Problem

What can the target function look like?

Regression

My dataset contains x,y coordinates and a label, indicating "Daphnia"+, "Culex", "unidentified" and "?"

I need a Classifier that returns also x,y coordinates and a label. In the easiest case, this classifier filters labels and names all the objects "Daphnia".

This prediction set can then be compared with the test set. Metrics can be:

N Daphnia
How many tags were found and correctly labelled within a margin of error. This could work with a loop like this

train = np.array(groundthruth)

# make sure all labels can be matched. i.e. all relevant Daphnia+ labels --> Daphnia

for point in prediction_points:
    # point has x,y coordinates
    offset = sum_over_xy(abs(train - point))
    
    # get minimum offset
    candidate = argsort(offset)[0]
    
    # test if offset falls within margin of detection, should be very close
    if offset[candidate] < 2:
        match = candidate
        true_positive_detects += 1
    else:
        false_positives_detects += 1

    if point.label == match.label:
        point.label == "Daphnia": 
            true_positive_classifications += 1
        else:
            true_negative_classification += 1
    else:
        if point.label == "Daphnia":
            false_positive_classifications += 1
        else:
            false_negative_classification += 1

this fct. will iterate over each point in the prediction and try to find a corresponding annotated tag. Success will be mesured as detection accuracy. If a match could be found, it will be measured whether the label was correct.

alternative: ML approach

I could use a logistic Regression classification scheme, where I give several predictors to the regression such as:

number of clusters
size central cluster
average size of non central clusters
xcenter
ycenter
color of central cluster
length major axis
length minor axis
angle of major axis
...

and then for training and testing I can probably use a standard ML approach.

The benefit of logistic regression is that I get a probability of detection.
In a second step I could manually label the ones with a low probability

Also, for this approach I already have some scripts in peek

If I'm not mistaken, I can just take the tag database (or combine the databases from the tagging) for predictors and results

Resources:

https://scikit-learn.org/stable/auto_examples/calibration/plot_compare_calibration.html#sphx-glr-auto-examples-calibration-plot-compare-calibration-py

https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html#sphx-glr-auto-examples-classification-plot-classifier-comparison-py

https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html#sphx-glr-auto-examples-model-selection-plot-underfitting-overfitting-py

classifier options

Steps:

fix error with margin of image tags. As long as there is a black frame around the image, there is no problem but if not, there will be issues

optimize structure

I do a lot of double bookkeeping which is very stupid.

when I modify a Tag i also have to modify the database. Since I want to keep the database as pandas (csv) format for easy export import and compüatibility

Tag values should always only be derived from the database and not the other way round.

creating a Tag adds one line to the database
modification of a tag updates the respective entry in the database
a call to the tag fetches the appropriate entries (label, ... ) or derived entries (contours, etc)

write parameter class for detector parameters

this class contains some attributes for the parameters that are relevant for optimization and for sliders as well. When this is done well, add to gist for future reference

avoid saving all tags as images

Why would this be necessary?

Currently, when tags are imported, their slices and contours are saved to the disk.
When sliders should be used meaningfully, the set of imported images should be large. This consumes a lot of disk space (up to 10mb per image).

Instead, the tagging could be completely moved inside the annotation process (also because it is fast). Instead of saving the whole image, the non white pixels could be saved as a flattened array separated by whitespace. From this, contours could be extracted with ease, together with this the box margin is saved to make it easy to re-construct the tag

non white pixels should be saved in detector to Tagger
removed axis titles (re-do)
in show_tag replace method extrafileobjects
in manual_tag replace part where fileobjects are saved
draw contour on slice is not used any longer
saving tags to database saves the wrong image!!

improve annotation app and selection algorithm

Object selection

Selection of candidate tags is the first step of the object detection pipeling

Issues

objects can be selected multiple time https://github.com/flo-schu/nanocosm/issues/2

annotation app

Potential improvements

save current detector settings in CSV metadata
consider using netcdf because of metadata availability. CSV Exports and txt for metadata can be always generated from it. Nevertheless I like the raw text component of it, but .nc is probably also more efficient. pytables may also be an option https://www.pytables.org/usersguide/introduction.html --> supports nestesd table cells (virtually ideal for adding the slices in the table); --> Viewers are: panoply (HDF5, netcdf), vitables (HDF5). Note that netcdf is built on HDF5
remove passing of image files and make everything reproducible from csv file
- original image slice --> is already in Tag

stronger integration between classes Detector, Tag, Annotation, Tagger

Annotations should be usable to postprocess tags. Particularly since there is already the option of filtering tags

actually, this may not be such a good idea. Annotations is designed to annotate a single image. Instead, it is probably better to convert between csv annotations and Tagger class tags, which are then compatible with the detector class.

Alternatively CSV style tags can be made compatible with Detector. This approach was taken.
Eventually Tagger class can go and will be replaced with a list of Tag instances

write tests

write tests for annotation class. See method test on how to do that. For this a mini image would be enough for testing. Also the various annotation scenarios can be probed

motion analysis should analyze all pictures in one script and take automated decisions if a combination of images was successful.

analyze replicate images and take some decisions in analysis script

Particularly moving camera images should be kicked out based on the criterium percent pixels moved (this is already computed in motion detector) --> the tag.csv file should then take the reason for disqualitifcation in as a value
this will also drastically speed up the combination of result dataframes
this can be checked in save_new_tags, whether the image is verwackelt

flo-schu / peek Goto Github PK

peek's People

Contributors

Watchers

peek's Issues