GithubHelp home page GithubHelp logo

gaa-uam / scikit-fda Goto Github PK

View Code? Open in Web Editor NEW
273.0 273.0 50.0 13.21 MB

Functional Data Analysis Python package

Home Page: https://fda.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
alignment classification clustering curves dimensionality-reduction functional-data-analysis functions machine-learning python python3 registration regression scikits smoothing statistics visualization

scikit-fda's People

Contributors

allcontributors[bot] avatar alvaro-castillo avatar amandaher avatar clej avatar davidgarciafer avatar ddelval avatar dserna4 avatar e105d104u125 avatar ego-thales avatar elenapetrunina avatar eliegoudout avatar half-adder avatar hzzhyj avatar jdtuck avatar jiduque avatar jltorrecilla avatar lena123315 avatar m5signorini avatar manso92 avatar mcarbajo avatar mellamansanchez avatar opintosant avatar pablomm avatar pcuestas avatar pedrog99 avatar pedrorponga avatar quentin62 avatar rafa9811 avatar saumya-ranjan avatar vnmabus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scikit-fda's Issues

K-means tolerance

This tolerance is not the same that Sklearn uses. Sklearn uses a L2 norm, while you are using something similar to a L-infinity norm

Originally posted by @vnmabus in #93

Problem with pandas methods

After update the neighbors brach (#112), I have an error on this line:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

Traceback (most recent call last):
File "plot_k_neighbors_classification.py", line 63, in
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)
File "/Users/pablomm/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2124, in train_test_split
safe_indexing(a, test)) for a in arrays))
File "/Users/pablomm/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2124, in
safe_indexing(a, test)) for a in arrays))
File "/Users/pablomm/anaconda3/lib/python3.6/site-packages/sklearn/utils/init.py", line 219, in safe_indexing
return X.take(indices, axis=0)
TypeError: take() got an unexpected keyword argument 'axis'

It seems that the sklearn API detects that the FData has the take method and uses it to slice the samples, but using the axis argument.

In fact, the take method also does not work properly without the axis parameter.

>>> import skfda
>>> a = skfda.datasets.make_sinusoidal_process()
>>> a.take([1,2,3])

Traceback (most recent call last):
File "", line 1, in
File "/Users/pablomm/anaconda3/lib/python3.6/site-packages/scikit_fda-0.2.3-py3.6-macosx-10.7-x86_64.egg/skfda/representation/_functional_data.py", line 1288, in take
return self._from_sequence(result, dtype=self.dtype)
File "/Users/pablomm/anaconda3/lib/python3.6/site-packages/scikit_fda-0.2.3-py3.6-macosx-10.7-x86_64.egg/skfda/representation/_functional_data.py", line 1224, in _from_sequence
return cls(scalars, dtype=dtype)
TypeError: init() got an unexpected keyword argument 'dtype'

Matplotlib deprecation warning - mpldatacursor

After update matplotlib I get the following warning when running the example plot_clustering.py.

/Users/pablomm/anaconda3/lib/python3.6/site-packages/mpldatacursor-0.6.2-py3.6.egg/mpldatacursor/convenience.py:160: MatplotlibDeprecationWarning:
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead.

if not cbook.iterable(axes):

It is due to the mpldatacursor package. There is an open issue in its repository, but it seems that the package is no longer actively maintained (The last commit is from 2016).

Perhaps we should change its use to another package, such as mplcursors, which offers the same functionality, but is currently active.

Version information

  • OS: MacOs
  • Python version: 3.6.5
  • Matplotlib: 3.1.1
  • Mpldatacursor: 0.6.2
  • scikit-fda version: develop

Typo in documentation

Little typo in the reference of the fda book, in the readme and in the documentation of the source code.

Ramsay, J., Silverman, B. W. (2005). Functional Data Analysis. Springler Springer.

Retrieve coefficients for function reconstruction

Hey there!

I fitted a KNN FDataGrid to my input data. It actually looks pretty good so far and I now would like to "export" it so I can represent the function as numerical values (preferable in a numpy array).

I saw that you offer some basis that can be used to "export" the underlying representation. Could you elaborrate on what basis should be used when?
My data represents a demand/supply curve. I tried the BSpline one but it only constructs something close to a sine wave which doesn't really represent my data.

Here is an image of the graph itself:
grafik

Is there some way to get the raw representation instead of transforming it to another basis?

Linear differential operator is not being built properly.

>>> from skfda.misc import LinearDifferentialOperator as Ldf
>>> Ldf(weights=[3, 4, 5])
LinearDifferentialOperator(
    nderiv=2,
    bwtlist=[
    FDataBasis(
        basis=Constant(domain_range=[array([0, 1])], nbasis=1),
        coefficients=[[3]],
        dataset_label=None,
        axes_labels=None,
        extrapolation=None,
        keepdims=False),
    FDataBasis(
        basis=Constant(domain_range=[array([0, 1])], nbasis=1),
        coefficients=[[4]],
        dataset_label=None,
        axes_labels=None,
        extrapolation=None,
        keepdims=False)]

Originally posted by @pablomm in #139

Missing examples

I open this issue to list missing examples that should be in the documentation:

  • A good FDataBasis tutorial, presenting each basis and comparing between them.
  • UEA multivariate datasets
  • A numpy ufuncs + FDataGrid example
  • Feature selection examples
  • Examples on Pandas integration

ImportError running test cases

Hi developer,

Sorry if my question is more like seeking help rather than contributing to the package development. I am trying to run your fda package locally while I could not build it successfully by running into:
ImportError: dynamic module does not define module export function (PyInit_optimum_reparam_extension), and error:
import optimum_reparam_extension.
I wish this package performs similar function as the fda package in R. I am brand new to Github, so all I did after downloading the source code from Github using "git" command is installing all the requirements in requirements.txt and building the project by running "python3 setup.py ". For , I ran "build", "install", "bdist" and "check". If it ever helps, I am using Python 3.6.4 on a Mac Mojave 10.14. It would be more than great to get your help on where might be going wrong. I indeed have no idea if I should expect this "optimum_reparam_extension" file to be originally in your package, or it should be something generated after I build it myself. Great many thanks!

Allow labeling of functional observations

A user has commented us the need for preserving labels for the observations. We should consider the possibility of adding them or, probably even better, allow an xarray to be used internally.

mpldatacursor

Is this using global state? Because you are not passing any parameter to link this with the figure.

Originally posted by @vnmabus in #93

Numpy Future Warning

Since numpy 1.16 a warning is raised in basis.py:

/home/travis/build/GAA-UAM/fda/fda/basis.py:917:
FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support
for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an
error in the future.
self.knots[:-1]))

Unify scalar and functional regressors

I was mentioned in a meeting that it makes no sense to differentiate between scalar and functional regressors. First, because there are not scalar, but multivariate, and second, because there is no confusion: the output of predict depends on the input to fit.

I forgot to mention this on the review of #112, but it should not be difficult to change. Can you do it @pablomm ?

FDataGrid indexing

Current __getitem__ implementation does not support indexing properly.

Fourier period

I find on the original fda R code that the period of Fourier Basis is diff(domain_range) by default but we set it to 1 by default. Is it an issue or a improvement?

Registration and interpolation to-do's

To-do list not covered in #9:

Registration

  • Tests of registration.py
  • Doctests of registration.py
  • Review documentation of registration.py
  • Add sample generators to make examples of registration.py
  • Examples of registration.py

Interpolation

  • Write documentation of grid_interpolation.py
  • Review documentation of grid_interpolation.py
  • Tests of interpolation
  • Examples of interpolation
  • Inherit from ABC

Extrapolation

  • Consider more appropriate names for the extrapolation types
  • Think how to unify the extrapolation in the evaluate methods of FDataBasisand FDataGrid
  • Decide if the default extrapolation of each type of basis should be a class attribute or an attribute of the instance

Warning fetching tecator

A warning is generated when fetch_tecator is called.

fda.datasets.fetch_tecator()
Using TensorFlow backend.
/Users/pablomm/anaconda3/lib/python3.6/site-packages/rdata-0.2.1-py3.6.egg/rdata/conversion/_conversion.py:197: UserWarning: Unknown encoding. Assumed ASCII.
warnings.warn(f"Unknown encoding. Assumed ASCII.")

Error while importing skfda

The installation goes through but on importing skfda I run into error ImportError: cannot import name 'OutlierMixin'
Capture

Remove `shape` property of FDataGrid

ndim is going to change to return 1, for Pandas compatibility. Thus, it does not longer coincide with the dimensions of shape, which are those of the data matrix. As the shape can be obtained directly from the matrix, I propose to remove this method.

Travis doctest

After #96 travis are not being running the doctests (or it is not being shown in the travis log).

Lp-distance matrix

Description
In the documentation for skfda.misc.metrics.lp_distance it is said that the function will calculate the distance between all possible pairs of samples between two FDataGrid objects. In this moment, it only calculates the distance between the n-th sample from the first FDataGrid and the n-th sample from the second one.

Possible solution
Let fd1 be the first FDataGrid with samples [f_11, ..., f_1n] and fd2 the second FDataGrid with samples [f_21, ..., f_2m]. The function should return a nxm matrix where the component i,j should be d(f_1i, f_2j).

References
The functionality described here is the followed in fda.usc metric.lp.

Missing methods and functions from fda

I open this issue to discuss missing functionality from the fda package. I will only put here the missing parts. In the wiki there is the full comparative.

General functionality

  • Arithmetic operands for FDataBasis. Sum and subtraction easy if the basis is the same. Otherwise, medium.
  • Multidimensional support for FDataBasis. Hard.
  • Canonical correlation analysis for FDataBasis.
  • Center method for FData (subtract the mean). Easy.
  • Correlation and autocorrelations. Easy.
  • Exponential basis.
  • Polygonal basis.
  • Power basis.
  • ¿Density?
  • Derivatives.
  • Conversion from degrees of freedom to lambda smoothing parameter and vice-versa.
  • Monotone functional data.
  • Positive functional data.
  • Exponentiation of FDataBasis.
  • Intensity of Poisson process.
  • Standard deviation.
  • Sum. Easy.
  • Better representation functions. "Rich" repr for IPython.
  • Taylor representation of a B-Spline.
  • Permutation t-test for two groups of functional data objects.
  • Roation of PCA/CCA with VARIMAX Criterion.

Datasets

  • Continuously Stirred Tank Reactor (CSTR) Ordinary Differential Equations (ODEs).
  • Obtain data from the Human Mortality Database.

Plotting

  • Plot cycles.
  • Principal differential analysis plots (Stability Analysis).
  • Phase-plane plot.
  • Plot Functional Canonical Correlation Weight Functions.
  • Plot PCA.
  • Plot functional parameter objects with confidence limits.
  • Plot real data + Functional data.
  • Plot the results of the registration of a set of curves.
  • Plot Principal Component Scores.

Dimensionality reduction

  • PCA.
  • Principal differential analysis.

Registering

  • Register Functional data objects using a continuous criterion.

Regression

  • Functional linear regression with scalar response.
  • Functional concurrent linear regression with functional response.
  • Fully functional linear regression with functional response.
  • Winsorized regression.

Enhancement in sample_labels

The current sample_labels argument of plot to group samples by colors is very restrictive, only accepts labels of the form 0, 1, 2, ... n_classes -1 without skip any number.

Can be fixed using LabelEncoder internally.

To Reproduce
Code to reproduce the behavior:

from skfda.datasets import make_sinusoidal_process
fd = make_sinusoidal_process(n_samples=5)

fd.plot(sample_labels=[0, 0, 2, 2, 2])  # Not valid
fd.plot(sample_labels=[-1, -1, 0, 1, 1]) # Not valid

Result

ValueError: sample_labels must contain at least an occurence of numbers between 0 and number of distint sample labels.

Bug in plot function

When a FData with an unidimensional domain object is plotted after other plot a matplotlib deprecated warning is obtained, but the result is correct.

>>> import matplotlib.pyplot as plt; import fda
>>> a = fda.datasets.make_multimodal_samples(ndim_domain=1)
>>> a.plot()
(<Figure size 640x480 with 1 Axes>, [<matplotlib.axes._subplots.AxesSubplot object at 0x1c1133a3c8>])
>>> a.plot()
/Users/pablomm/anaconda3/lib/python3.6/site-packages/matplotlib/cbook/deprecation.py:107: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
  warnings.warn(message, mplDeprecation, stacklevel=1)
(<Figure size 640x480 with 1 Axes>, [<matplotlib.axes._subplots.AxesSubplot object at 0x1c1133a3c8>])
>>> plt.plot()

In case of surfaces, the result is an empty drawing, because the second call creates a second axes on top of the previous one, but draws on the first ax.

>>> a = fda.datasets.make_multimodal_samples(ndim_domain=2)
>>> a.plot()
(<Figure size 640x480 with 1 Axes>, [<matplotlib.axes._subplots.Axes3DSubplot object at 0x1c22094400>])
>>> a.plot()
(<Figure size 640x480 with 2 Axes>, [<matplotlib.axes._subplots.Axes3DSubplot object at 0x1c22094400>, <matplotlib.axes._subplots.Axes3DSubplot object at 0x1c22106400>])
>>> plt.show()

Pandas integration

Allow to use functional data objects as Pandas columns. Useful to treat functional data as an atomic unit, while allowing it to be mixed with univariate/multivariate data in the datasets.

Similar functionality has been done for tidyfun in R.

Smoothing in several dimensions

The current smoothers only work for one-dimensional functions. It should be reasonably easy to extend them to several dimensions.

Fourier basis in representation example

# We can also see the effect of changing the basis.
# For example, in the Fourier basis the functions start and end at the same
# points, so this basis is clearly non suitable for the Growth dataset.
fd_basis = fd.to_basis(
basis.Fourier(domain_range=fd.domain_range[0], nbasis=7)
)
fd_basis.plot()

I don't know if this sentence is completely correct. The functions will start and end at the same point if the period of the basis is the same than the domain range, but we can set the period to 2*|domain_range| to avoid this problem.

period = 2 * ( fd.domain_range[0][1] - fd.domain_range[0][0])
fd_basis = fd.to_basis( 
      basis.Fourier(domain_range=fd.domain_range[0], nbasis=7, period=period)
  )
fd_basis.plot()

Result:
Figure_1

Documentation

I open this issue to unify all pending tasks (in my opinion) with respect to documentation.

  • Write readme
  • Write index page in readthedocs, linking to the different modules and to the github repository
  • Provide a project description in pypi
  • Make missing examples listed in #117
  • Fix pdf documentation (#70)
  • Show documentation of methods (#91)
  • Make explicit imports in doctests
  • Remove __author__ and __email__ variables in modules
  • Update skfda/__init__.py docstring, shown when doing import skfda; help(skfda)
  • Write docstring of top level modules (e.g. help(skfda.preprocessing))
  • Integrate PEP 257 checker (?)

FDatagrid.to_basis and basis range

Currently, the range of the basis in the FDatagrid.to_basis is assigned when the basis object is created, and by default in the [0, 1] interval. It would be useful, to prevent confusion, that the basis range, if not especifically set at creation time, is set inside the to_basis method to the domain range of the FDatagrid object.

Domain range on basis multiplication

On the original fda R package it returns a value error when you try to multiply two basis of different domain range.

It could be possible to intersect the domain ranges to perform the multiplication?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.