GithubHelp home page GithubHelp logo

astroml_figures's Introduction

AstroML: Machine Learning for Astronomy

Reference proceedings

Github Actions CI Status

Latest PyPI version

PyPI download stat

License badge

AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, and matplotlib, and distributed under the BSD license. It contains a growing library of statistical and machine learning routines for analyzing astronomical data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and visualizing astronomical datasets.

This project was started in 2012 by Jake VanderPlas to accompany the book Statistics, Data Mining, and Machine Learning in Astronomy by Zeljko Ivezic, Andrew Connolly, Jacob VanderPlas, and Alex Gray.

Installation

Before installation, make sure your system meets the prerequisites listed in Dependencies, listed below.

Core

To install the core astroML package in your home directory, use:

pip install astroML

A conda package for astroML is also available either on the conda-forge or on the astropy conda channels:

conda install -c astropy astroML

The core package is pure python, so installation should be straightforward on most systems. To install from source, use:

python setup.py install

You can specify an arbitrary directory for installation using:

python setup.py install --prefix='/some/path'

To install system-wide on Linux/Unix systems:

python setup.py build
sudo python setup.py install

Dependencies

There are two levels of dependencies in astroML. Core dependencies are required for the core astroML package. Optional dependencies are required to run some (but not all) of the example scripts. Individual example scripts will list their optional dependencies at the top of the file.

Core Dependencies

The core astroML package requires the following (some of the functionality might work with older versions):

Optional Dependencies

Several of the example scripts require specialized or upgraded packages. These requirements are listed at the top of the particular scripts

  • HEALPy provides an interface to the HEALPix pixelization scheme, as well as fast spherical harmonic transforms.

Development

This package is designed to be a repository for well-written astronomy code, and submissions of new routines are encouraged. After installing the version-control system Git, you can check out the latest sources from GitHub using:

git clone git://github.com/astroML/astroML.git

or if you have write privileges:

git clone [email protected]:astroML/astroML.git

Contribution

We strongly encourage contributions of useful astronomy-related code: for astroML to be a relevant tool for the python/astronomy community, it will need to grow with the field of research. There are a few guidelines for contribution:

General

Any contribution should be done through the github pull request system (for more information, see the help page Code submitted to astroML should conform to a BSD-style license, and follow the PEP8 style guide.

Documentation and Examples

All submitted code should be documented following the Numpy Documentation Guide. This is a unified documentation style used by many packages in the scipy universe.

In addition, it is highly recommended to create example scripts that show the usefulness of the method on an astronomical dataset (preferably making use of the loaders in astroML.datasets). These example scripts are in the examples subdirectory of the main source repository.

Authors

Package Author

Maintainer

Contributors

  • Alex Conley
  • Andreas Kopecky
  • Andrew Connolly
  • Asif Imran
  • Benjamin Alan Weaver
  • Brigitta Sipőcz
  • Chris Desira
  • Daniel Andreasen
  • Dino Bektešević
  • Edward Betts
  • Hans Moritz Günther
  • Hugo van Kemenade
  • Jake Vanderplas
  • Jeremy Blow
  • Jonathan Sick
  • Joris van Vugt
  • Juanjo Bazán
  • Julian Taylor
  • Lars Buitinck
  • Michael Radigan
  • Morgan Fouesneau
  • Nicholas Hunt-Walker
  • Ole Streicher
  • Pey Lian Lim
  • Rodrigo Nemmen
  • Ross Fadely
  • Vlad Skripniuk
  • Zlatan Vasović
  • Engineero
  • stonebig

astroml_figures's People

Contributors

bsipocz avatar connolly2 avatar ivezic avatar jakevdp avatar stephenportillo avatar suberlak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

astroml_figures's Issues

CNN cartoon issues due to M51 picture

The M51 picture for the CNN cartoon brings up two low priority issues, one needs documentation only, the other a solution:

  • the jpeg file requires Pillow as a dependency. Maybe the best solution is to convert is to png (the only issue is to make sure the result image is the same as what went into the book).

  • using the current test and pdf generating mechanism (relying on extracting the code out to a temporary "somefile.py") is not working with the current solution for the file path of the image. Running the script directly works, so users shouldn't be affected by is.
    Copying a workaround from the astroML pickle_results mechanism is probably the easiest solution here.

Change default for use_latex to False

While we had it True to generate the figures for the books, this default regularly causes issues for users working with the figure files.

Therefore I think changing the detault to False has more benefits, and adding a comment about it in all the code files that in the book we used True should provide the necessary information for reproducibility.

Figure 10.3 inconsistencies

I just noticed that, somehow, the top 4 and the bottom 4 panels in the 2nd edition of the web version of fig. 10.3 are swapped  (and now the caption is wrong). The two printed versions and the 1st edition of the web version are fine. See https://www.astroml.org/book_figures/chapter10/fig_FFT_aliasing.html 

Note: we should also check the notebook version of this figure.

Figure 4.2

Hi,
upon running the code for Book Figure 4.2 on Ubuntu, Python returned an error: 'GMM' object has no attribute 'eval' for logprob, responsibilities =M_best.eval(x).
To solve the problem, I replaced M_best.eval(x) (line 85) with:
M_best.score_samples(x.reshape((-1,1)))
and M_best.predict_proba(x) (line 110) with:
p = M_best.predict_proba(x.reshape((-1,1)))

I'm using scikit-learn 0.17

(was astroML/astroML#82, more discussion is on that issue)

Wrong equation reference in Figure 5.9

In the comment in Figure 5.9, there is an incorrect reference in the equations.
For the probability p(b), instead of "eqn. 5.70", it should be "eqn. 5.71".
For the gaussian approximation, the equation is not "eqn. 5.71".

Figure 9.12: sklearn.tree.DecisionTreeClassifier incompatibility

I am using:

sklearn.version: 0.16.1
astroML.version: 0.3

File "fig_rrlyrae_treevis.py", line 242, in 
random_state=0, criterion='entropy')
TypeError: init() got an unexpected keyword argument 'compute_importances'

I added these lines to fix my fork:

# in 0.14+ Setting compute_importances=True is no longer required. 
try:
  # version < 0.14
 clf = DecisionTreeClassifier(compute_importances=True,
                             random_state=0, criterion='entropy')
except:
  # version 0.14+
  clf = DecisionTreeClassifier(
                             random_state=0, criterion='entropy')

see also: astroML/astroML#77

(was astroML/astroML#78)

RuntimeError triggered by pymc3 for figure 5.24

at the time of opening this issue I suspect this is a local issue on my laptop, but either case having the issue doesn't hurt.

I now run into pymc3 issues a few times with pycharm mostly when examples are embended in notebooks, but this now consistently appears on the command line, too. I only see the error using python3.8, while it works as expected with identical numpy and pymc3 versions on python3.7.

python book_figures/chapter5/fig_model_comparison_mcmc.py 
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [M1_log_sigma, M1_mu]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [M1_log_sigma, M1_mu]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 262, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 95, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/bsipocz/munka/devel/worktrees/astroML_figures/giant_figure_generating_branch_ed2/book_figures/chapter5/fig_model_comparison_mcmc.py", line 87, in <module>
    trace1 = pm.sample(draws=2500, tune=100)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pymc3/sampling.py", line 469, in sample
    trace = _mp_sample(**sample_args)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pymc3/sampling.py", line 1053, in _mp_sample
    sampler = ps.ParallelSampler(
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pymc3/parallel_sampling.py", line 355, in __init__
    self._samplers = [
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pymc3/parallel_sampling.py", line 356, in <listcomp>
    ProcessAdapter(
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pymc3/parallel_sampling.py", line 242, in __init__
    self._process.start()
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.


Figure 6.17

In figure 6.17, we should use the correlation from the full data rather than the mean of bootstrap samples as the best estimate.

(was: astroML/astroML#76)

Avoid hacky way of setting up GaussianMixture dataset

Some of the current examples are hacking GaussianMixture() to set up the input dataset. In more recent versions of scikit-learn sampling with a none witted GaussianMixture is not really supported feature (discussion around scikit-learn/scikit-learn#7822 (comment)).

So while is possible to hack it around, we should look into other ways to generate the input dataset for these user facing examples.

examples are e.g.: book_figures/chapter6/fig_GMM_nclusters.py

Fig 3.19 is double Weibull

cross ref from https://github.com/astroML/text_errata:

Page 104: Figure 3.19 shows the positive part of a double Weibull distribution, not a Weibull distribution. In this case it means that the values on the y axis are half of what they should be. To get a Weibull distribution in scipy, use exponweib with a=1 rather than dweibull.

Plot regression with newer sklearn.decomposition.PCA

The PCA projection in book_figures/chapter7/fig_S_manifold_PCA.py has changed depending on the sklearn version being used (y range should be flipped).

Investigate the cause of it, and report upstream if it looks like a bug.

Python 3.7 compatibility: issues with pymc (at least 10 figures)

pymc has a method called await. Given that async and await are reserved keywords in python 3.7 pymc is not even importable causing at least the following figures not compatible with python3.7 either:

  • book_figures/chapter5/fig_cauchy_mcmc.py
  • book_figures/chapter5/fig_signal_background.py
  • book_figures/chapter5/fig_model_comparison_mcmc.py
    - [ ] book_figures/chapter1/fig_moving_objects_multicolor.py this was never problematic, not sure how it ended up on this list
  • book_figures/chapter10/fig_matchedfilt_chirp2.py
  • book_figures/chapter10/fig_matchedfilt_chirp.py
  • book_figures/chapter10/fig_arrival_time.py
  • book_figures/chapter10/fig_matchedfilt_burst.py
  • book_figures/chapter5/fig_gaussgauss_mcmc.py
  • book_figures/chapter8/fig_outlier_rejection.py

Setting up CI

Some sort of CI testing here would be useful, preferably we would also need a cron job that regularly runs to double check nothing has been broken.

chapter 9 "Star/Quasar Classification ROC Curves" example trains classifiers on the whole data set rather than the train split

In fig_star_quasar_ROC.py, inside compute_results(), classifiers are trained on X rather than X_train. This means that the test set has been observed from the classifiers during training which of course is a bad practice.

The way to fix this would be to change line 90 from

model.fit(X, y)

to

model.fit(X_train, y_train)

Additionally, the figure fig_star_quasar_ROC_1.png needs to be updated as it is the result of the execution of the script.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.