astroml / gatspy Goto Github PK

General tools for Astronomical Time Series in Python

License: BSD 2-Clause "Simplified" License

Makefile 0.17% Python 99.83%

gatspy's Introduction

AstroML: Machine Learning for Astronomy

AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, and matplotlib, and distributed under the BSD license. It contains a growing library of statistical and machine learning routines for analyzing astronomical data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and visualizing astronomical datasets.

This project was started in 2012 by Jake VanderPlas to accompany the book Statistics, Data Mining, and Machine Learning in Astronomy by Zeljko Ivezic, Andrew Connolly, Jacob VanderPlas, and Alex Gray.

Important Links

HTML documentation: https://www.astroML.org
Core source-code repository: https://github.com/astroML/astroML
Figure source-code repository: https://github.com/astroML/astroML-figures
Issue Tracker: https://github.com/astroML/astroML/issues
Mailing List: https://groups.google.com/forum/#!forum/astroml-general

Installation

Before installation, make sure your system meets the prerequisites listed in Dependencies, listed below.

Core

To install the core astroML package in your home directory, use:

pip install astroML

A conda package for astroML is also available either on the conda-forge or on the astropy conda channels:

conda install -c astropy astroML

The core package is pure python, so installation should be straightforward on most systems. To install from source, use:

python setup.py install

You can specify an arbitrary directory for installation using:

python setup.py install --prefix='/some/path'

To install system-wide on Linux/Unix systems:

python setup.py build
sudo python setup.py install

Dependencies

There are two levels of dependencies in astroML. Core dependencies are required for the core astroML package. Optional dependencies are required to run some (but not all) of the example scripts. Individual example scripts will list their optional dependencies at the top of the file.

Core Dependencies

The core astroML package requires the following (some of the functionality might work with older versions):

Python version 3.6+
Numpy >= 1.13
Scipy >= 0.19
Scikit-learn >= 0.18
Matplotlib >= 3.0
AstroPy >= 3.0

Optional Dependencies

Several of the example scripts require specialized or upgraded packages. These requirements are listed at the top of the particular scripts

HEALPy provides an interface to the HEALPix pixelization scheme, as well as fast spherical harmonic transforms.

Development

This package is designed to be a repository for well-written astronomy code, and submissions of new routines are encouraged. After installing the version-control system Git, you can check out the latest sources from GitHub using:

git clone git://github.com/astroML/astroML.git

or if you have write privileges:

git clone [email protected]:astroML/astroML.git

Contribution

We strongly encourage contributions of useful astronomy-related code: for astroML to be a relevant tool for the python/astronomy community, it will need to grow with the field of research. There are a few guidelines for contribution:

General

Any contribution should be done through the github pull request system (for more information, see the help page Code submitted to astroML should conform to a BSD-style license, and follow the PEP8 style guide.

Documentation and Examples

All submitted code should be documented following the Numpy Documentation Guide. This is a unified documentation style used by many packages in the scipy universe.

In addition, it is highly recommended to create example scripts that show the usefulness of the method on an astronomical dataset (preferably making use of the loaders in astroML.datasets). These example scripts are in the examples subdirectory of the main source repository.

Authors

Package Author

Jake Vanderplas https://github.com/jakevdp http://jakevdp.github.com

Maintainer

Brigitta Sipocz https://github.com/bsipocz

Contributors

Alex Conley
Andreas Kopecky
Andrew Connolly
Asif Imran
Benjamin Alan Weaver
Brigitta Sipőcz
Chris Desira
Daniel Andreasen
Dino Bektešević
Edward Betts
Hans Moritz Günther
Hugo van Kemenade
Jake Vanderplas
Jeremy Blow
Jonathan Sick
Joris van Vugt
Juanjo Bazán
Julian Taylor
Lars Buitinck
Michael Radigan
Morgan Fouesneau
Nicholas Hunt-Walker
Ole Streicher
Pey Lian Lim
Rodrigo Nemmen
Ross Fadely
Vlad Skripniuk
Zlatan Vasović
Engineero
stonebig

gatspy's People

Contributors

Stargazers

Watchers

gatspy's Issues

Inconsistent treatment of DC component

I am attempting to compute a spectral window by taking the power spectrum of ones sampled as the time series. I expect a value of 1 at frequency=0, but usually I get zero power at zero frequency... but not always! I set fit_offset=False and center_data=False to try to retain the the DC value. Here's an example:

import numpy as np
import matplotlib.pyplot as plt
import gatspy.periodic as gp

time = np.linspace(0,9,1e5)
nyq=1./(2.*np.median(np.diff(time)))
deltaf=(1./time[-1])/4.5
freq,sw=gp.lomb_scargle_fast.lomb_scargle_fast(time,np.ones(time.size)/time.size,
                                               f0=0.,df=deltaf,Nf=nyq/deltaf,fit_offset=False,
                                               center_data=False)

plt.plot(freq,sw,label='oversample by 4.5')

deltaf=(1./time[-1])/5.
freq,sw=gp.lomb_scargle_fast.lomb_scargle_fast(time,np.ones(time.size)/time.size,
                                               f0=0.,df=deltaf,Nf=nyq/deltaf,fit_offset=False,
                                               center_data=False)
plt.plot(freq,sw,label='oversample by 5')
plt.title('why this different behavior at zero frequency?')
plt.legend()
plt.xlim(0,.5)
plt.xlabel('frequency')
plt.ylabel('spectral window')
plt.show()

This seems to only affect the power at exactly frequency = 0, and the result appears to be sensitive to some precise relationship between the frequency sampling and the time sampling. Am I correct to expect power=1 for frequency=0? Is the "spectral window" a well defined quantity for the Lomb-Scargle periodogram (though I expect it to reproduce the Fourier transform result for the evenly-sampled example above)?

Optimizer issue

We look at performing iterations of a Gaussian distribution of data across each data points error bars. The optimizer is used to examine the best period of the L-S model. A histogram plot is created of the no. of iterations against the value for the optimized period. We observe a large spike at a particular value (1763 days) for no apparent reason.

The optimizer is compared to a 'MAX' method whereby the best period is selected through examination of the max power of the L-S model fit.

The results highlight that often, over 50% of the optimizer chosen periods fall onto the same exact value as another iteration (up to ~ 10 sig figs) however, for the MAX method only a small percentage exhibit such behaviour (heavily influenced by oversampling fact size).

We see no reason for such a difference to arise and can only conclude that the optimizer for whatever reason does not seem to work for this particular data.
Note that the optimizer does appear to work for other objects that we have examined.

Believe code is implemented correctly, although mistakes/mis-understandings may be present.

Code located in Word Doc; Data in Text document (Download to same file location to run properly)

Thanks.

Gatspy_Issue.docx

Gatspy_issue_Data.txt

Multiband Base model should ``ravel()`` inputs in ``fit()`` method

Currently the API fails as documented.

Also need updated examples of this.

Better docs of grid search parameters

I need to better document how to use the grid period search effectively (see #3)

Add normalization of PSD to documentation

I'm currently poking around in some data using gatspy, and wanted to take a stab at calculating some false-alarm probabilities. It took me a while of reading the gatspy source code to figure out that the score method returns the PSD in the least-squares normalization (?). Perhaps this is obvious to anyone with more experience with Lomb-Scargle periodograms than me, but for newcomers, it might be useful to explicitly mention that in the docs?

I'm happy to submit a PR, but I wanted to make sure that I'd understood the correctly what it returns before I do that. :)

Error when fetch_rrlyrae(), (HTTP Error 404: Not Found)

My python version: 3.7.4
Gatspy 0.3 installed via pip

When trying to use the simplest rrlyrae = fetch_rrlyrae(), it shows the error:

HTTP Error 404: Not Found

The whole error is as follows:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-11-b9aee16fde99> in <module>
      1 from gatspy.datasets import fetch_rrlyrae
----> 2 rrlyrae = fetch_rrlyrae()

/usr/local/anaconda3/lib/python3.7/site-packages/gatspy/datasets/rrlyrae.py in fetch_rrlyrae(partial, **kwargs)
    387     else:
    388         return RRLyraeLC('table1.tar.gz',
--> 389                          cache_kwargs=kwargs)
    390 
    391 

/usr/local/anaconda3/lib/python3.7/site-packages/gatspy/datasets/rrlyrae.py in __init__(self, tablename, dirname, cache_kwargs)
     89         self.dirname = dirname
     90         self.cache_kwargs = cache_kwargs
---> 91         self._load_data()
     92 
     93     def _load_data(self):

/usr/local/anaconda3/lib/python3.7/site-packages/gatspy/datasets/rrlyrae.py in _load_data(self)
     93     def _load_data(self):
     94         filename = _get_download_or_cache(self.tablename,
---> 95                                           **(self.cache_kwargs or {}))
     96         self.data = tarfile.open(filename)
     97         self._metadata = None

/usr/local/anaconda3/lib/python3.7/site-packages/gatspy/datasets/rrlyrae.py in _get_download_or_cache(filename, data_home, url, force_download)
     43 
     44     if force_download or not os.path.exists(save_loc):
---> 45         fhandle = urlopen(src_url)
     46         with open(save_loc, 'wb') as cache:
     47             cache.write(fhandle.read())

/usr/local/anaconda3/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

/usr/local/anaconda3/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

/usr/local/anaconda3/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

/usr/local/anaconda3/lib/python3.7/urllib/request.py in error(self, proto, *args)
    561             http_err = 0
    562         args = (dict, proto, meth_name) + args
--> 563         result = self._call_chain(*args)
    564         if result:
    565             return result

/usr/local/anaconda3/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

/usr/local/anaconda3/lib/python3.7/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
    753         fp.close()
    754 
--> 755         return self.parent.open(new, timeout=req.timeout)
    756 
    757     http_error_301 = http_error_303 = http_error_307 = http_error_302

/usr/local/anaconda3/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

/usr/local/anaconda3/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

/usr/local/anaconda3/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

/usr/local/anaconda3/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

/usr/local/anaconda3/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found

Soften `supersmoother` requirement?

Related to #13, but slightly different: is there a not-too-hacky way to make gatspy.periodic be imported successfully without supersmoother being installed? It seems like if we just want to use, say, LombScargle, we should be able to do so without installing supersmoother. Maybe we could move the ImportError into the SuperSmoother class somewhere? Not a huge issue as supersmoother is quite light-weight, but would make things a little cleaner for some users.

LombScargleFast.find_best_periods does not agree with LombScargleFast.score

import numpy as np
from astroML.time_series import lomb_scargle_bootstrap
from gatspy.periodic import LombScargleFast

print np.__version__ #1.9.1
print gatspy.__version__ #0.2.1
print astroML.__version__ #0.3

time=np.array([ 2456627.79138722,  2456706.70756269,  2456707.71879721,
                2456710.7089872 ,  2456714.6781531 ,  2457003.78622566,
                2457004.78832308,  2457008.79132022,  2457011.77312267,
                2457013.77502331])
rv=np.array([ 30356.93030223,   6381.7927134 ,   6121.9466685 ,   5884.45430149,
              5641.85369914,  11957.91925097,  12068.74753209,  10966.23226051,
              11870.72687516,   9647.03955393])
e=np.array([ 80.0980572 ,  80.33726674,  80.98308236,  82.29154533,
             91.65632576,  87.24849974,  81.73203085,  82.35611381,
             82.30579078,  79.94037756])

print 'N={}'.format(time.size) #10
print 'T.max()-T.min(): {:.0f}'.format(time.max()-time.min()) #386
print 'stdev(RV): {:.0f}'.format(rv.std()) #6.9 km/s
print 'mean error: {:.0f}'.format(e.mean()) #83 m/s

model = LombScargleFast().fit(time,rv,e)

model.optimizer.period_range=(1.2,3000)
fbpP,fbpS=model.find_best_periods(n_periods=6, return_scores=True)

periods=np.arange(1.2,3000,.1)
scores=model.score(periods)
autoP,autoS=model.periodogram_auto(nyquist_factor=100)

print ('Do scores returned by model.find_best_periods '
       'match those from model.score?')
try:
    assert (model.score(fbpP)==fbpS).all()
    print '   Yes'
except AssertionError:
    print '   No'

print ('Do scores returned by model.periodogram '
       'match those from model.score?')
try:
    assert (model.score(fbpP)==model.periodogram(fbpP)).all()
    print '   Yes'
except AssertionError:
    print '   No'

print ('Do the periods returned by model.find_best_periods have ranking as '
       'those from model.score?')
try:
    assert (model.score(fbpP).argsort()==fbpS.argsort()).all()
    print '   Yes'
except AssertionError:
    print '   No'


D=lomb_scargle_bootstrap(time, rv, e, 1/periods, generalized=True,
                         N_bootstraps=1000, random_state=0)
sig1 = np.percentile(D,[99])
D=lomb_scargle_bootstrap(time, rv, e, 1/autoP, generalized=True,
                         N_bootstraps=1000, random_state=0)
sig1_auto = np.percentile(D,[99])
D=lomb_scargle_bootstrap(time, rv, e, 1/fbpP, generalized=True,
                         N_bootstraps=1000, random_state=0)
sig1_fbpP = np.percentile(D,[99])

plt.plot(autoP,autoS,'k')
plt.axhline(sig1_auto,color='k',linestyle='--')
plt.plot(periods,scores,'r')
plt.axhline(sig1,color='r',linestyle='--')
plt.axhline(sig1_fbpP,color='g',linestyle='--')
plt.ylim(0,2)
plt.xlim(.9,3000)
plt.semilogx()
for i,x,y in zip(fbpS.size-fbpS.argsort(),fbpP, model.score(fbpP)):
    plt.annotate(str(i),(x,y),(x,1.5),
                 arrowprops=dict(facecolor='black', shrink=0.05,
                                 width=.5,headwidth=.5,frac=.1))
plt.axvline(1.2,color='k',linestyle=':')
plt.xlabel('P (days)')
plt.ylabel('LS Power')

handles=[plt.plot([-1,-1],[-1,-2],'k--',label='99% CI')[0],
         plt.plot([-1,-1],[-1,-2],'k',label='Auto P')[0],
         plt.plot([-1,-1],[-1,-2],'k',label='Manual P')[0],
         plt.plot([-1,-1],[-1,-2],'k:',label='Min. physical P')[0]]
plt.legend(loc='upper left',frameon=0)
plt.subplots_adjust(.11,.12,.96,.96)


"""
I would think that at a bare minimum the relative ordering of the periods
from find_best_period would match those from .score and .periodogram.

Thought not demonstrated here, I'm also somewhat unnerved that the scores
returned by.score are a function of the input periods at least at the 3% level.
Which is higher than I'd expect from something like variations due to machine
precision alone.

This, combined with the issue above, makes be uncomfortable quiting 
significance of peaks deriving from both find_best_periods and a manual search
of scores. My initial interest in find_best_periods was that it seemed to offer
an efficient way to account for the frequency gridding seen in peaks located 
using say autoS.argsorted()[::-1][:6] alone. In a large sample of stars this
simplistic approach will see long (~300-2000 day) periods pile up at 
~950,620, 500, etc. days.
""

How to return higher Fourier parameters with LombScargle()?

Sorry for this simple question, but how to return the Fourier parameters after it has been fitted with LombScargle()?

Moving the multiband LS upstream

If I understand the status correctly, it's only the multiband LS that is unique in gatspy, the rest had been moved upstream to astropy a couple of years ago.

@jakevdp - how would you feel to move the multi band as well and maybe archive this repo? It would make somewhat easier to communicate to the community that all your astroML & gatspy implementations has been moved, and receive the maintenance and occasional fixes and improvements in astropy.timeseries rather than at multiple places?

I volunteer to do the actual moving bit, but it's one of those cases where it feels better to ask for permission and opinion first rather than forgiveness after the fact.

Please declare "astroML" and "supersmoother" as dependencies

... in setup.py, so that pip install gatspy "just works".

Conda package for simplified bundling into other projects

Hello.

I just released a package https://github.com/robdmc/pandashells that depends on gatspy. Since many of the tools in Pandashells depend on matplotlib and friends, I really want to make a conda package out of it. I spent a little time trying to figure out how to include pip packages in conda, but I think that may be a fools errand.

Have you considered creating/maintaining a conda package out of gatspy? I see one version on anacoda.org, but it doesn't seem to be maintained by you.

Brani's website with RR Lyrae data is gone

It looks like Brani's UW website is gone: this is now a broken link
http://www.astro.washington.edu/users/bsesar/S82_RRLyr/

Referenced here:
https://github.com/astroML/gatspy/blob/master/gatspy/datasets/rrlyrae.py

cannot import name 'datasets'

Any idea why this is the case?

Incorrect periodogram with multiple terms

The code snippet below leads to a periodogram with incorrrect values (less than zero, greater than one). An image showing the periodogram is a little lower down and the data used in the example is here.

from __future__ import print_function, division

from astropy.table import Table

from gatspy.periodic import LombScargleFast, LombScargle

import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

# use seaborn for plot styles
import seaborn; seaborn.set()

s = Table.read('tres_10c.csv')
t = s['BJD_TDB']
mag = s['Source-Sky_C10']
dmag = s['mag_error']

model = LombScargle(Nterms=5).fit(t, mag, dmag)
periods, power = model.periodogram_auto(nyquist_factor=100, oversampling=5)

Tests do not run when being offline at build time

I'm packaging all astroML related packages for Fedora (astroML, astroML-addons and supersmoother already done). For gatspy I haven't found a way to run the tests without an internet connection, I get something like

ERROR: gatspy.datasets.tests.test_download_data.test_downloads
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/builddir/build/BUILD/gatspy-0.2.1/gatspy/datasets/tests/test_download_data.py", line 19, in test_downloads
    data = downloader()
  File "/builddir/build/BUILD/gatspy-0.2.1/gatspy/datasets/rrlyrae.py", line 381, in fetch_rrlyrae
    cache_kwargs=kwargs)
  File "/builddir/build/BUILD/gatspy-0.2.1/gatspy/datasets/rrlyrae.py", line 82, in __init__
    self._load_data()
  File "/builddir/build/BUILD/gatspy-0.2.1/gatspy/datasets/rrlyrae.py", line 86, in _load_data
    **(self.cache_kwargs or {}))
  File "/builddir/build/BUILD/gatspy-0.2.1/gatspy/datasets/rrlyrae.py", line 37, in _get_download_or_cache
    buf = download_with_progress_bar(src_url)
  File "/usr/lib/python2.7/site-packages/astroML/datasets/tools/download.py", line 41, in download_with_progress_bar
    fhandle = urlopen(data_url)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1229, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1199, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

Is there any possibility to run tests without network? Is required in Fedora Packaging Guidelines: https://fedoraproject.org/wiki/Packaging:Guidelines#Build_time_network_access

For now I'll build the package without running tests, they run fine on local machines with network access.

Best Regards
Christian

SuperSmoother algorithm cannot be used

Even after I have installed supersmoother via pip and imported it, it cannot be simply used and shows warning:

ImportError: Package supersmoother is required. Use pip install supersmoother to install

optimizer.py: LinearScanOptimzer.find_best_periods is sensitive to time scale

I found a minor issue with the number of steps required to compute the best period. It seems that the number of steps scales linearly with the scale of the time units. A simple workaround is to just use the MJD time units to prevent the excess calculations.

Example from IPython notebook (below):

With the units of time values in MJD, max(times) - min(times) may be ~3e3 units (days) and require ~70e3 steps to compute the best period.
With the units of time values in Unix time, max(times) - min(times) may be ~300e6 units (seconds) and require ~7e9 steps to compute the best period. (In the example ipynb, this example overflows 8GB of RAM.)

Example IPython notebook (taken from gatspy/examples, tests added to end): http://nbviewer.ipython.org/gist/stharrold/874c9f588c02b661b50b

Relevant gatspy code: https://github.com/astroML/gatspy/blob/master/gatspy/periodic/optimizer.py#L49-L63

RRLyrae website down

Hello

It seems this website is now down :

http://www.mpia.de/~bsesar/S82_RRLyr/RRLyr_ugriz_templates.tar.gz

creating crash when requested. The template seems to be here :

http://www.astro.caltech.edu/~bsesar/S82_RRLyr/RRLyr_ugriz_templates.tar.gz

or in SIMBAD.

Is it possible to have an update?

Thanks

optimizer sometimes searches outside the given period range

import numpy as np
from gatspy.periodic import LombScargleFast

np.random.seed(0)
t = 100 * np.random.rand(100)
dy = 0.1
y = np.sin(t) + dy * np.random.randn(100)

model = LombScargleFast().fit(t, y, dy)
model.optimizer.period_range = (6.3, 10)
print(model.best_period)
#6.2859733116463481

This happens when the low-frequency end of the grid is near a peak. It gets worse when the frequency becomes very small compared to the data range.

astroml / gatspy Goto Github PK

gatspy's Introduction

AstroML: Machine Learning for Astronomy

Important Links

Installation

Core

Dependencies

Core Dependencies

Optional Dependencies

Development

Contribution

General

Documentation and Examples

Authors

Package Author

Maintainer

Contributors

gatspy's People

Contributors

Stargazers

Watchers

Forkers

gatspy's Issues

My python version: 3.7.4 Gatspy 0.3 installed via pip

Recommend Projects

Recommend Topics

Recommend Org

Jobs

My python version: 3.7.4
Gatspy 0.3 installed via pip