GithubHelp home page GithubHelp logo

jwuphysics / astroclip Goto Github PK

View Code? Open in Web Editor NEW

This project forked from polymathicai/astroclip

0.0 0.0 0.0 285.46 MB

Multimodal contrastive pretraining for astronomical data

License: MIT License

Python 0.01% Jupyter Notebook 99.99%

astroclip's Introduction

AstroCLIP

Multimodal contrastive pretraining for astronomical data

The goal of this project is to demonstrate the ability of contrastive pre-training between two different kinds of astronomical data modalities (multi-band imaging, and optical spectra), to yield a meaningful embedding space which captures physical information about galaxies and is shared between both modalities.

image

Results

We encourage you to take a look at our NeurIPS 2023 AI4Science submission (still under review) for a longer form description of our results, but here are the main takeaways:

  • Both image and spectra encoders are able to extract meaningful physical information from the input data.
  • The embeddings of both images and spectra are well aligned, allowing us to retrieve spectra that correspond to a given image, and vice-versa.

The notebook used to generate the plots of the paper can be found here.

Below is a visualization of the learned embeddings, by taking the 2 first PCA components of spectra and image embeddings. As one can see, images and spectra discover similar main factors of variations. emb_pca

Visualizing the structure of the latent space by UMAP dimensionality reduction further higlights some of its information content. Below is an example of a UMAP of the spectra embeddings:

image

Products: Datasets and Trained Models

Dataset

As part of this project, we compile and make available a combined dataset of DESI Legacy Survey g,r,z images, and DESI Early Data Release spectra. These images are a subset of the ssl-legacysurvey sample compiled by @georgestein from the Legacy Survey DR9. Scripts used to match these datasets are available here.

For convenience, we provide a Hugging Face Datasets loading script which will automatically download the data needed and prepare the dataset on your computer.

from datasets import load_dataset

# This downloads about 60 GB of data
dset = load_dataset('astroclip/datasets/legacy_survey.py')

For an example of getting started with this dataset, for example to simply predict redsfhit from the spectra, you can take a look at this notebook notebook.

Training scripts and model weights

[Coming soon]

Requirements

This repo should only have basic pytorch and huggingface requirements. The following should install all that is needed (when run from this repository):

pip install .

astroclip's People

Contributors

eiffl avatar golkar avatar lhparker1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.