Comments (7)
Latent ODE code is designed to handle the datasets where time series are of different length, and each time series measured at different times. It can be used with multivariate time series. Different dimensions of data might be measured at different times as well.
All the datasets are loaded in lib/parse_datasets.py
using torch.utils.data.DataLoader
.
Easy case: all time series share observation times
This approach with be the easiest to use in Jupiter notebooks. A good example is Periodic_1d
dataset (use input parameter --dataset periodic
).
Data format
- a dataset in form of a tensor [B x T x D], where B is number of time series in the dataset and N is number of time points in each time series, D β data dimensionality.
- a vector [T] with observations times. Observation times can be irregular.
For example, in periodic dataset, observation times are generated here; dataset is generated here.
Steps
- Pass the dataset through DataLoader like here.
In the code,train_y
is the dataset;time_steps_extrap
are the shared time points.
Optional
Here we assumed that time series share the time points. You can use the input parameter --sample-tp
to randomize the time points that are used in each batch. For example, if dataset has 100 time points and we use --sample-tp 30
, each batch will have 30 randomly sampled time points out of the original set of 100.
Time series have different observation times and/or have different length
This case is useful for real data. Good examples are PhysioNet class (βdataset physionet
) and PersonActivity class (--dataset activity
).
To use Latent ODE on your dataset, I recommend formatting your data as a list of records, as described below. You can use the collate function for DataLoader from here
Data format
Since each time series has different length, we represent the dataset as a list of records: [record1, record2, record3, β¦]. Each record represents one time series (e.g. one patient in Physionet).
Each record has the following format:
(record_id, observation_times, values, mask, labels)
- record_id: an id of this string
- observation_times: a 1-dimensional numpy array containing T time values of
- values: a (T, D) numpy array containing observed D-dimensional values at T time points
- mask: a (T, D) tensor containing 1 where values were observed and 0 otherwise. Useful if different dimensions are observed at different times. If all dimensions are observed at the same time, fill the mask with ones.
- labels: a list of labels for the current patient, if labels are available. Otherwise None.
Pipeline of the Physionet dataset
To use it on your dataset, you need to only replace step 1 to produce the list of records.
-
Physionet class loads the dataset from files (each patient is stored in its own file) and outputs the list of records (format described above) like so
-
Physionet class is called in
lib/parse_dataset.py
to get a list of records. List of records is then split into train/test here -
Dataloader takes the list of records and collates them into batches like so Function that collates records into batches is here
-
During the training, the model calls data loader to get a new batch.
from latent_ode.
Concerning the labels, we support two tasks right now:
-
Binary classification per time series (see Physionet).
-
Multi-class classification per time point (see PersonActivity)
Your case seems to be multi-class classification per time series, which is the blend of the two set-ups that we have now.
You can do the following:
-
Convert the labels to one-hot encoding. For each time series, labels will be a binary vector [C], where C β number of classes. This solves the problem of semantic meaning that you mentioned.
-
Use the collate function for Physionet from here, since it creates labels per time series.
Set
N_labels = C
, where C is the number of classes.Note that we used normalization for Physionet dataset here. You can try with and without normalization and see if it helps the training on your case.
-
In
lib/parse_datasets.py
setclassif_per_tp
to False for your dataset like so. This signifies the model that classification should be run per time series rather than per time point. -
Lastly, we need to modify the loss.
Function for computing classification loss is called here if you are using Latent ODE model, or here for ODE-RNN. Make sure to use multi-class CE loss instead of binary CE loss. The multi-class CE losscompute_multiclass_CE_loss
expects number of time points to be the third dimension. You can hack it by adding the third dimention of size 1 tolabel_predictions
just before you callcompute_multiclass_CE_loss
.
from latent_ode.
I would like to echo what Andrew said.
I can't seem to find a clean way to use my own dataset with the provided code.
From my digging, it looks like I need to create a file for my dataset, similar to person_activity.py.
I think the main source of my confusion is the sheer number of arguments and references to other files. It has made it difficult to trace down what code I need to write and how I need to adapt it to work with latent_ode.
If I can figure out how to make a simple model that runs on pandas, I might try to make a pull request.
I think that the way I would try to format it is using less dictionaries and more canonical ML/NN types:
simple classify framework
model = LatentODE()
for i, (x, y) in enumerate(data_loader):
optim.zero_grad()
logits = model(x)
loss = CrossEntropyLoss(logits, y)
loss.backward()
optim.step()
Thank you for providing latent_ode!
It is a fantastic concept and paper. I'm just having a hard time understanding how all of the files come together.
from latent_ode.
Thank you for this, this clarifies a ton!
You mention that labels is a list.
I have a list of astronomical objects that each time curve could be. (Not classifying per time point).
Since the classification target (astronomical object class) does not have any semantic meaning, it wouldn't make much since to do regression on it.
Do I make a list of the labels and then just have the labels in the record tuple be an index?
Does the shape of labels need to be the length of the time points?
Thanks again!
from latent_ode.
Thanks so much
you rock!
from latent_ode.
-
C == 14 like this: [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]
I have made my dataset produce records with binary classification labels using the record tuple form specified(id, t, vals, mask, labels)
-
I changed N_labels in physionet to 14.
I'm using the physionet collate fn and I think I'm using it correctly. -
I've changed
classif_per_tp
to False in parse_datasets. -
I'm getting stuck on this part, I unsqueezed the third dimension, but I'm getting the following error:
File "latent_pandas.py", line 274, in <module>
train_res = model.compute_all_losses(batch_dict, n_traj_samples = 1, kl_coef = kl_coef)
File "/mnt/c/Users/***/home/fermi/cosmoNODE/cosmoNODE/latent_ode/lib/base_models.py", line 322, in compute_all_losses
mask = batch_dict["mask_predicted_data"])
File "/mnt/c/Users/***/home/fermi/cosmoNODE/cosmoNODE/latent_ode/lib/likelihood_eval.py", line 118, in compute_multiclass_CE_loss
pred_mask = pred_mask.reshape(n_traj_samples * n_traj * n_tp, n_dims)
RuntimeError: shape '[50, 14]' is invalid for input of size 4391800
For context, I've duplicated run_models into latent_pandas.
label_predictions
is of size [1, 50, 1, 14]. (I set n_traj_samples to 1, because I didn't know exactly what it did)
This might be a cryptic error message, but I'm not exactly sure what I need to change.
If it's helpful here is my dataloader:
class FluxNet(object):
def __init__(self):
self.df = pd.read_csv(cosmoNODE.__path__[0] + '/../demos/data/training_set.csv')
self.meta = pd.read_csv(cosmoNODE.__path__[0] + '/../demos/data/training_set_metadata.csv')
self.merged = pd.merge(self.df, self.meta, on='object_id')
self.mins = self.merged.min()
self.maxes = self.merged.max()
self.params = self.merged.columns.drop(['object_id', 'mjd', 'target'])
self.classes = sorted(self.merged['target'].unique())
self.num_classes = len(self.classes)
self.labels = []
self.groups = self.merged.groupby('object_id')
self.curves = []
self.get_curves()
self.length = len(self.curves)
print('fluxnet loaded')
def get_curves(self):
for i, group in enumerate(self.groups):
object_id = group[0]
data = group[1]
times = torch.tensor(data['mjd'].values, dtype=torch.float)
values = torch.tensor(data.drop(['mjd', 'target'], axis=1).fillna(0).values, dtype=torch.float)
mask = torch.ones(values.shape, dtype=torch.float)
target = data['target'].iloc[0]
# label = labels.index(data['target'].iloc[0])
label = one_hot(self.classes, target)
self.labels.append(label)
record = (object_id, times, values, mask, label)
self.curves.append(record)
def __getitem__(self, index):
return self.curves[index]
def __len__(self):
# number of light curves in the dataset
return self.length
def get_label(self, index):
return self.labels[index
Thanks for your help
from latent_ode.
Thank you for your amazing paper and this code repository. I went through your code and your explanation for this issue, but I was unable to come to a conclusion on how to use your code for a generic Multivariate Time Series Classification with each feature observed at all times.
Could you please how to guide me on how to run your code given the following type of data (Multivariate Time Series Classification).
Data Description:
Input: is a Tensor [N * T * C] where:
N: Number of examples in Train / Test set. ex: 1000
B: Time Dimension: ex: 5/50/100 time steps.
C: Input Features for each observation. ex: 5 features.
Output: 0 or 1.
Thank you for your time.
from latent_ode.
Related Issues (13)
- Latent ODE with RNN encoder Error HOT 1
- About Poisson Likelihood
- Periodic 1D with large --max-t value produces nans HOT 1
- Trained model, now how do I apply to forecast on another unseen time-series?
- Error - Mujoco experiment HOT 2
- NameError: name 'mse' is not defined HOT 1
- AttributeError: 'tuple' object has no attribute 'permute' HOT 2
- Different features in a time series have different observation times
- How to use scheduled sampling with Latent ODE?
- Please add requirements.txt HOT 1
- Please share supplementary pdf HOT 2
- A small bug causes RuntimeError with PyTorch 1.4.0 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from latent_ode.