gaa-uam / scikit-fda Goto Github PK
View Code? Open in Web Editor NEWFunctional Data Analysis Python package
Home Page: https://fda.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
Functional Data Analysis Python package
Home Page: https://fda.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
After update the neighbors brach (#112), I have an error on this line:
Traceback (most recent call last):
File "plot_k_neighbors_classification.py", line 63, in
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)
File "/Users/pablomm/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2124, in train_test_split
safe_indexing(a, test)) for a in arrays))
File "/Users/pablomm/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2124, in
safe_indexing(a, test)) for a in arrays))
File "/Users/pablomm/anaconda3/lib/python3.6/site-packages/sklearn/utils/init.py", line 219, in safe_indexing
return X.take(indices, axis=0)
TypeError: take() got an unexpected keyword argument 'axis'
It seems that the sklearn API detects that the FData has the take
method and uses it to slice the samples, but using the axis argument.
In fact, the take method also does not work properly without the axis parameter.
>>> import skfda
>>> a = skfda.datasets.make_sinusoidal_process()
>>> a.take([1,2,3])
Traceback (most recent call last):
File "", line 1, in
File "/Users/pablomm/anaconda3/lib/python3.6/site-packages/scikit_fda-0.2.3-py3.6-macosx-10.7-x86_64.egg/skfda/representation/_functional_data.py", line 1288, in take
return self._from_sequence(result, dtype=self.dtype)
File "/Users/pablomm/anaconda3/lib/python3.6/site-packages/scikit_fda-0.2.3-py3.6-macosx-10.7-x86_64.egg/skfda/representation/_functional_data.py", line 1224, in _from_sequence
return cls(scalars, dtype=dtype)
TypeError: init() got an unexpected keyword argument 'dtype'
After update matplotlib I get the following warning when running the example plot_clustering.py
.
/Users/pablomm/anaconda3/lib/python3.6/site-packages/mpldatacursor-0.6.2-py3.6.egg/mpldatacursor/convenience.py:160: MatplotlibDeprecationWarning:
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead.
if not cbook.iterable(axes):
It is due to the mpldatacursor package. There is an open issue in its repository, but it seems that the package is no longer actively maintained (The last commit is from 2016).
Perhaps we should change its use to another package, such as mplcursors, which offers the same functionality, but is currently active.
Version information
The random state of the tests of magnitude shape plot is not being fixed, which makes the tests fail.
In the example of surface boxplot the random seed isn't being fixed either to generate the datasets.
Second parameter in the function fpca
is not used.
https://github.com/GAA-UAM/fda/blob/0b13e96fbf5012dca863ea48a5559cc24d3fbd31/fda/math.py#L408-L432
Maybe the default value of n
should return all the principal components.
Little typo in the reference of the fda book, in the readme and in the documentation of the source code.
Ramsay, J., Silverman, B. W. (2005). Functional Data Analysis.
SpringlerSpringer.
Hey there!
I fitted a KNN FDataGrid to my input data. It actually looks pretty good so far and I now would like to "export" it so I can represent the function as numerical values (preferable in a numpy array).
I saw that you offer some basis that can be used to "export" the underlying representation. Could you elaborrate on what basis should be used when?
My data represents a demand/supply curve. I tried the BSpline one but it only constructs something close to a sine wave which doesn't really represent my data.
Here is an image of the graph itself:
Is there some way to get the raw
representation instead of transforming it to another basis?
>>> from skfda.misc import LinearDifferentialOperator as Ldf
>>> Ldf(weights=[3, 4, 5])
LinearDifferentialOperator(
nderiv=2,
bwtlist=[
FDataBasis(
basis=Constant(domain_range=[array([0, 1])], nbasis=1),
coefficients=[[3]],
dataset_label=None,
axes_labels=None,
extrapolation=None,
keepdims=False),
FDataBasis(
basis=Constant(domain_range=[array([0, 1])], nbasis=1),
coefficients=[[4]],
dataset_label=None,
axes_labels=None,
extrapolation=None,
keepdims=False)]
I open this issue to list missing examples that should be in the documentation:
Hi developer,
Sorry if my question is more like seeking help rather than contributing to the package development. I am trying to run your fda package locally while I could not build it successfully by running into:
ImportError: dynamic module does not define module export function (PyInit_optimum_reparam_extension), and error:
import optimum_reparam_extension.
I wish this package performs similar function as the fda package in R. I am brand new to Github, so all I did after downloading the source code from Github using "git" command is installing all the requirements in requirements.txt and building the project by running "python3 setup.py ". For , I ran "build", "install", "bdist" and "check". If it ever helps, I am using Python 3.6.4 on a Mac Mojave 10.14. It would be more than great to get your help on where might be going wrong. I indeed have no idea if I should expect this "optimum_reparam_extension" file to be originally in your package, or it should be something generated after I build it myself. Great many thanks!
The naming should be consistent between these properties.
A user has commented us the need for preserving labels for the observations. We should consider the possibility of adding them or, probably even better, allow an xarray to be used internally.
The following example crashed due to the use of numpy array to store the colors added in #66
https://github.com/GAA-UAM/fda/blob/940caa755e40cf26395e22a772806cba3ff18fd7/examples/plot_pairwise_alignment.py#L126-L133
The LocalOutlierFactor
estimator was added in the private module skfda._neighbors.outlier.LocalOutlierFactor
, without references in the documentation, until the theoretical aspects of this method are clarified in the context of FDA (see #164).
The example written for this class can be found at https://gist.github.com/pablomm/eb93c469473ea76baed7e3e72578de68
Shouldn't it make a copy of the coefficients too?
Since numpy 1.16 a warning is raised in basis.py:
/home/travis/build/GAA-UAM/fda/fda/basis.py:917:
FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support
for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an
error in the future.
self.knots[:-1]))
I was mentioned in a meeting that it makes no sense to differentiate between scalar and functional regressors. First, because there are not scalar, but multivariate, and second, because there is no confusion: the output of predict
depends on the input to fit
.
I forgot to mention this on the review of #112, but it should not be difficult to change. Can you do it @pablomm ?
In the branch feature/fdata
I created the class FData
to unify the API of FDataBasis
and FDataGrid
. It is just a suggestion of a possible API. I would like to discuss which methods are more appropriated to be in the common class and the interface of the methods.
In particular the api of evaluate
and the parameter keepdims
.
https://github.com/GAA-UAM/fda/blob/859bb4055bd9a325fb1321ddb6b4d761013489ad/fda/fdata.py#L47-L50
Current __getitem__
implementation does not support indexing properly.
I find on the original fda R code that the period of Fourier Basis is diff(domain_range) by default but we set it to 1 by default. Is it an issue or a improvement?
If I create a BSpline in this way
BSpline(nbasis=6, order=4, knots=[0, 0.3, 0.3, 1])
It throws an error , but I think it should take de domain_range from the first and last knot
https://github.com/GAA-UAM/fda/blob/9893dccf16a5742399dfa5c9af0ffdc32697b668/fda/basis.py#L549-L566
Make class documentation show the complete documentation for each method, as in Scikit-learn.
To-do list not covered in #9:
Registration
Interpolation
Extrapolation
evaluate
methods of FDataBasis
and FDataGrid
The pdf documentation fails. We should fix it.
A warning is generated when fetch_tecator
is called.
fda.datasets.fetch_tecator()
Using TensorFlow backend.
/Users/pablomm/anaconda3/lib/python3.6/site-packages/rdata-0.2.1-py3.6.egg/rdata/conversion/_conversion.py:197: UserWarning: Unknown encoding. Assumed ASCII.
warnings.warn(f"Unknown encoding. Assumed ASCII.")
ndim
is going to change to return 1, for Pandas compatibility. Thus, it does not longer coincide with the dimensions of shape, which are those of the data matrix. As the shape can be obtained directly from the matrix, I propose to remove this method.
After #96 travis are not being running the doctests (or it is not being shown in the travis log).
The first plot of a multivariate functional object is messed up. Subsequent plots are ok.
Matplotlib is currently imported, but it is not marked as a dependency in setup.py
In this functions if checks for ndim_domain and ndim_image but this are attributes from FData object, not Basis objects
https://github.com/GAA-UAM/fda/blob/92c984cb42a505661716c26f4cd56a2f1e69e26f/fda/basis.py#L138-L155
Cython should be a dependency. Here says how to do that.
Description
In the documentation for skfda.misc.metrics.lp_distance it is said that the function will calculate the distance between all possible pairs of samples between two FDataGrid objects. In this moment, it only calculates the distance between the n-th sample from the first FDataGrid and the n-th sample from the second one.
Possible solution
Let fd1 be the first FDataGrid with samples [f_11, ..., f_1n] and fd2 the second FDataGrid with samples [f_21, ..., f_2m]. The function should return a nxm matrix where the component i,j should be d(f_1i, f_2j).
References
The functionality described here is the followed in fda.usc metric.lp.
I open this issue to discuss missing functionality from the fda
package. I will only put here the missing parts. In the wiki there is the full comparative.
FDataBasis
. Sum and subtraction easy if the basis is the same. Otherwise, medium.FDataBasis
. Hard.FDataBasis
.FData
(subtract the mean). Easy.FDataBasis
.Surface plots are plotted one in top of another. Thus, they do not intersect where they should.
The current sample_labels
argument of plot
to group samples by colors is very restrictive, only accepts labels of the form 0, 1, 2, ... n_classes -1 without skip any number.
Can be fixed using LabelEncoder internally.
To Reproduce
Code to reproduce the behavior:
from skfda.datasets import make_sinusoidal_process
fd = make_sinusoidal_process(n_samples=5)
fd.plot(sample_labels=[0, 0, 2, 2, 2]) # Not valid
fd.plot(sample_labels=[-1, -1, 0, 1, 1]) # Not valid
Result
ValueError: sample_labels must contain at least an occurence of numbers between 0 and number of distint sample labels.
When a FData with an unidimensional domain object is plotted after other plot a matplotlib deprecated warning is obtained, but the result is correct.
>>> import matplotlib.pyplot as plt; import fda
>>> a = fda.datasets.make_multimodal_samples(ndim_domain=1)
>>> a.plot()
(<Figure size 640x480 with 1 Axes>, [<matplotlib.axes._subplots.AxesSubplot object at 0x1c1133a3c8>])
>>> a.plot()
/Users/pablomm/anaconda3/lib/python3.6/site-packages/matplotlib/cbook/deprecation.py:107: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
warnings.warn(message, mplDeprecation, stacklevel=1)
(<Figure size 640x480 with 1 Axes>, [<matplotlib.axes._subplots.AxesSubplot object at 0x1c1133a3c8>])
>>> plt.plot()
In case of surfaces, the result is an empty drawing, because the second call creates a second axes on top of the previous one, but draws on the first ax.
>>> a = fda.datasets.make_multimodal_samples(ndim_domain=2)
>>> a.plot()
(<Figure size 640x480 with 1 Axes>, [<matplotlib.axes._subplots.Axes3DSubplot object at 0x1c22094400>])
>>> a.plot()
(<Figure size 640x480 with 2 Axes>, [<matplotlib.axes._subplots.Axes3DSubplot object at 0x1c22094400>, <matplotlib.axes._subplots.Axes3DSubplot object at 0x1c22106400>])
>>> plt.show()
Allow to use functional data objects as Pandas columns. Useful to treat functional data as an atomic unit, while allowing it to be mixed with univariate/multivariate data in the datasets.
Similar functionality has been done for tidyfun
in R.
In the FDataBasis
method cov
is returned the variance instead of the covariance.
The current smoothers only work for one-dimensional functions. It should be reasonably easy to extend them to several dimensions.
Magic constants are used in the discretisation of the FDataBasis, for instance., in FDataBasis.to_grid or in the registration and regression methods. We should think how to manage it, with an enum or something similar.
https://github.com/GAA-UAM/fda/blob/0b13e96fbf5012dca863ea48a5559cc24d3fbd31/fda/basis.py#L1411
https://github.com/GAA-UAM/fda/blob/62c5f518ab2e13b9a4cb9903ecf69f61dcf798cd/fda/basis.py#L1459
scikit-fda/examples/plot_representation.py
Lines 97 to 104 in ee8b316
2*|domain_range|
to avoid this problem.
period = 2 * ( fd.domain_range[0][1] - fd.domain_range[0][0])
fd_basis = fd.to_basis(
basis.Fourier(domain_range=fd.domain_range[0], nbasis=7, period=period)
)
fd_basis.plot()
I open this issue to unify all pending tasks (in my opinion) with respect to documentation.
__author__
and __email__
variables in modulesskfda/__init__.py
docstring, shown when doing import skfda; help(skfda)
help(skfda.preprocessing)
)Currently, the range of the basis in the FDatagrid.to_basis is assigned when the basis object is created, and by default in the [0, 1] interval. It would be useful, to prevent confusion, that the basis range, if not especifically set at creation time, is set inside the to_basis method to the domain range of the FDatagrid object.
I have an error related with the windows build.
On the original fda R package it returns a value error when you try to multiply two basis of different domain range.
It could be possible to intersect the domain ranges to perform the multiplication?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.