matthewja / blobmatch Goto Github PK

Radio-radio cross-identification

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 42.29% TeX 39.08% Python 18.63%

blobmatch's Introduction

Blobmatch: Machine learning for cross-identification of radio surveys
James Gardner, Cheng Soon Ong, Matthew Alger
README

Abstract:
Success in radio-radio survey cross-identification is determining the real, physical objects that we're looking at. The naivest measure of two sources (or blobs) being a match for an actual object is their separation on the sky. Using this separation, we train a logistic regression classifier on the TGSS (TIFR GMRT Sky Survey Alternative Data Release 1) and NVSS (NRAO VLA Sky Survey) radio surveys. Then use its predictions to partition a patch of the sky into objects, by transitively grouping any chain of predicted matches. Although the classifier successfully learns the importance of separation, we find that the naive partitioning fails to convincingly identify objects in the sky.

Current build found at:
https://github.com/MatthewJA/blobmatch

Directory structure:
blobmatch/
	source/
		(all .ipynb notebooks, manual_labels.csv)
	report/
		pics/
			(all plots as .pdf saved by above notebooks, also cut-out comparison)
		main.tex
		report.pdf
	project/
		(non-plot outputs of notebooks except sky_matches.csv and sky_catalogue.csv, also defunct scipts and plots)
	README.txt
	LICENSE
	.gitignore

---
Guide to replicate results, please follow exactly

Blobmatch uses python 3.6.8 in jupyter notebook and has the requirements:
astropy==3.2.1
ipython==5.5.0
jupyter==1.0.0
jupyter-client==5.2.2
jupyter-console==6.0.0
jupyter-core==4.4.0
matplotlib==3.0.3
numpy==1.16.2
pandas==0.24.2
scikit-learn==0.21.3
sklearn==0.0
torch==1.1.0
torchvision==0.3.0
tqdm==4.33.0

Download the TGSS and NVSS radio object surveys from the links below and unzip them
https://github.com/MatthewJA/blobmatch/releases/download/v0.1/TGSSADR1_7sigma_catalog.tsv.gz
https://github.com/MatthewJA/blobmatch/releases/download/v0.1/CATALOG.FIT.gz

Open up a directory with the extracted surveys and all of the source code (as in source/ folder) in a jupyter notebook

Run feature_vectors.ipynb, executing all cells from top to bottom
(constructs feature vectors from source catalogues in a patch, labels based off of positional matching)
(warning: will take a few minutes to create combined catalogue, appending to a pandas dataframe is slow)
feature_vectors.ipynb requires the above TGSSADR1_7sigma_catalog.tsv and CATALOG.FIT to be present in cwd
feature_vectors.ipynb will save patch_catalogue.csv of combined match feature vectors with attached labels,
as well as tgss.csv and nvss.csv of feature vectors of individual sources

Have manual_labels.csv (found in source folder) present in cwd, manual labels made from cut-outs taken from:
http://tgssadr.strw.leidenuniv.nl/hips/
http://alasky.u-strasbg.fr/NVSS/intensity/

Run torch_logistic_regression.ipynb, executing all cells from top to bottom
(performs logistic regression using pytorch, partitions sky into physical objects)
torch_logistic_regression.ipynb requires patch_catalogue.csv (as above) and manual_labels.csv be present in cwd
torch_logistic_regression.ipynb will save weights.csv, predictions.csv, objects.csv, multi_objects.csv,
torch_lr_losses.pdf, torch_lr_weights.pdf, torch_lr_predictions.pdf, and torch_lr_partition.pdf

This ends the main-line results using logistic regression, the following are auxillary

Run sklearn_logistic_regression.ipynb, executing all cells from top to bottom
(performs logistic regression and random forest using sklearn)
sklearn_logistic_regression.ipynb requires patch_catalogue.csv and manual_labels.csv be present in cwd
sklearn_logistic_regression.ipynb will save sklearn_lr.pdf and sklearn_rf.pdf

Run score_feature_vectors.ipynb, executing all cells from top to bottom
(scores the match feature vectors in patch against various metrics, finds each individual source's best match)
score_feature_vectors.ipynb requires patch_catalogue.csv, tgss.csv, nvss.csv be present in cwd
score_feature_vectors.ipynb will save tgss_sorted.csv, nvss_sorted.csv,
hist_patch_cat_score_naive.pdf, hist_patch_cat_score_separation.pdf,
hist_patch_cat_score_spectral.pdf, and hist_patch_cat_score_combo.pdf

Run sky_positional_matching.ipynb, executing all cells from top to bottom
(constructs catalogue of primitive feature vectors over entire sky in catalogues, performs positional matching)
(warning: will take a much longer time, at least 30 minutes)
sky_positional_matching.ipynb requires TGSSADR1_7sigma_catalog.tsv and CATALOG.FIT be present in cwd
sky_positional_matching.ipynb saves sky_matches.csv, sky_catalogue.csv, hist_angle.pdf, and hist_alpha.pdf
---

blobmatch's People

Contributors

Stargazers

Watchers

Forkers

daccordeon

blobmatch's Issues

Seminar title and abstract

@daccordeon to get @MatthewJA feedback by Friday 5pm.

Generate .csv catalogue of all matches

Include:

names
separation distance
flux densities (in Jy, not Jy/beam)
alpha value, spectral index
uniqueness flag

Fix NVSS naming

Extend scoring to something other than just position

Remove non-unique filter and determine effect on plots

Project timeline

Could @daccordeon please add a file containing dates and items to be completed, leading all the way to the end of the project?

Cross-identify TGSS and NVSS catalogues with positional matching

Obtain the TGSS and NVSS catalogues.
- TGSS: http://tgssadr.strw.leidenuniv.nl/doku.php
- NVSS: ftp://nvss.cv.nrao.edu/pub/nvss/CATALOG/ or mirrored https://heasarc.gsfc.nasa.gov/W3Browse/all/nvss.html
Following Section 2.3 of https://arxiv.org/pdf/1609.01308.pdf, match TGSS and NVSS.
Plot a histogram of the angular separation between matches.
Using Equation 3 of the above paper, compute the spectral index ɑ and reproduce Figure 2.

Sort the sky

Once you have a score, sort all TGSS sources, and all NVSS sources.

Implement logistic regression in pytorch

Reading on flux density

Section 2.1: https://www.cv.nrao.edu/~sransom/web/Ch2.html#S1

Add a README

Add a README.md file in the project root with a brief project description.

requirements.txt containing all import versions

pip3 freeze > requirements.txt
only call things you use

Score pairs based on spectral index

#11

Increase number of pairs by raising the patch size

Flesh out inputs and outputs of predictors

Write one or two pages precisely describing the inputs and outputs of the predictors.

Only to be done after having a sensible experiment from positional matching.

Fix logistic regression

add a bias
random weights
don't do vote models (committee)
accuracy threshold is 70%

Put existing plots into report and add some descriptions

Label some pairs

Score NVSS x TGSS pairs

Develop a function that takes an NVSS and a TGSS object and scores how related they are.

Derived features in catalogue

cosine of angle of displacement vector

Describe all features in report

Use the NVSS and TGSS catalogue papers for reference.

Define loss, cost, score, etc.

Artefact construction

To do logistic regression, only need to run through feature_vectors.ipynb and logistic_regression.ipynb , all other files auxiliary.

Write about positional matching

A few pages, about the data and what positional matching does.

Might as well get started on your report.

Give Matthew some pairs to label

As a CSV file with TGSS, NVSS columns.

Start your report

Make a neural network

Don't spend too much time

Labelled histograms of logistic regression

Find multi-component objects in NVSS and TGSS

Using #31

Write .py script to do all of positional matching and catalogue making

Read "data meets model"

Chapter 8 of mml-book.com

And flesh out bits of napkin.tex if needed.

Use logistic regression scorer to partition your patch of sky into physical objects

Each object will have some number of TGSS and NVSS components.

Cheng's pretty snowflake picture

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble