GithubHelp home page GithubHelp logo

location-prediction's Introduction

Context-aware next location prediction

This repository represents the implementation of the paper:

Ye Hong, Yatao Zhang, Konrad Schindler, Martin Raubal
| MIE, ETH Zurich | FRS, Singapore-​ETH Centre | PRS, ETH Zurich |

flowchart

Requirements and dependencies

This code has been tested on

  • Python 3.9.12, trackintel 1.2.4, gensim 4.1.2, PyTorch 1.12.1, transformers 4.16.2, cudatoolkit 11.3, GeForce RTX 3090

To create a virtual environment and install the required dependencies, please run the following:

    git clone https://github.com/mie-lab/location-prediction.git
    cd location-prediction
    conda env create -f environment.yml
    conda activate loc-pred

in your working folder.

Folder structure

The respective code files are stored in separate modules:

  • /preprocessing/*. Functions that are used for preprocessing the dataset. It should be executed before training a model. poi.py includes POI preprocessing and embedding methods (LDA and TF-IDF).
  • /models/*. Implementation of Transformer learning model.
  • /baselines/*. (Non-ML) Baseline methods that we implemented to compare with the proposed model. The methods include persistent forecast, most frequent forecast and Markov models.
  • /config/*. Hyperparameter settings are saved in the .yml files under the respective dataset folder under config/. For example, /config/geolife/transformer.yml contains hyperparameter settings of the transformer model for the geolife dataset.
  • /utils/*. Helper functions that are used for model training.
  • /analysis/*. Analysis function for getting dataset properties and visualizing training results of the model. entropy.py includes functions to calculate the random, uncorrelated and real entropy. stats.py includes functions to calculate the mobility motifs.

The main starting point for training a model is as follows:

  • main.py for starting the deep learning model training.
  • main_individual.py for starting the training of individual models.

Model variations

The repo contains different model variations, which can be controlled as follows:

  • Individual vs collective model - Running main.py or main_individual.py. Config files for individual models contain ind_ as the prefix.
  • Including different contexts - Whether to include a specific context can be controlled in the config files, with if_embed_user, if_embed_poi, if_embed_time, and if_embed_duration parameters.
  • Including different previous days - The length of considered historical previous days can be controlled through the previous_day parameter in each config file.
  • Including separate previous days - The selection of single historical previous days can be controlled through the day_selection parameter in each config file. default includes all days, and specific day selection can be passed in a list, e.g., [0, 1, 7] to include only the current, previous and one week before.

Reproducing models on the Geolife dataset

To run the whole pipeline on the Geolife dataset, follow the steps below:

1. Install dependencies

Download the repo, and install the necessary Requirements and dependencies.

2. Download Geolife

Download the Geolife GPS tracking dataset from here. Create a new folder in the repo root and name it data. Unzip and copy the Geolife Data folder into data/. The file structure should look like data/Data/000/....

Create a file paths.json in the repo root, and define your working directories by writing:

{
    "raw_geolife": "./data/Data"
}

3. Preprocess the dataset

run

    python preprocessing/geolife.py 20

for executing the preprocessing script for the geolife dataset. The process takes 15-30min. dataSet_geolife.csv, sp_time_temp_geolife.csv and valid_ids_geolife.pk will be created under the data/ folder, geolife_slide_filtered.csv will be created under data/quality folder.

4. Run the proposed transformer model

run

    python main.py config/geolife/transformer.yml

for starting the training process. The dataloader will create intermediate data files and save them under data/temp/ folder. The configuration of the current run, the network paramters and the performance indicators will be stored under the outputs/ folder.

5. Get dataset statistics

run

    python analysis/stats.py

for generating the mobility entropy plot, the basic statistics of the Geolife dataset, and generating the tracking quality plot.

Reproducing models on check-in datasets

To run the whole pipeline on Gowalla or Foursquare New York City (NYC) datasets, follow the steps below:

1. Switch branch and install dependencies

Switch to lbsn branch. Download the repo, and install the necessary Requirements and dependencies.

2. Download the datasets

Download the Gowalla dataset from here or the Foursquare NYC dataset from here. Create a new folder in the repo root and name it data. Unzip and copy the Gowalla Gowalla_totalCheckins.txt file into a new folder data/gowalla. The file structure should look like data/gowalla/Gowalla_totalCheckins.txt for Gowalla. Alternatively, unzip and copy the Foursquare dataset_TSMC2014_NYC.txt file into a new folder data/tsmc2014. The file structure should look like data/tsmc2014/dataset_TSMC2014_NYC.txt for Foursquare.

Create a file paths.json in the repo root, and define your working directories by writing:

{
    "raw_gowalla": "./data/gowalla"
}

or

{
    "raw_foursquare": "./data/tsmc2014"
}

3. Preprocess the dataset

run

    python preprocessing/gowalla.py

or

    python preprocessing/foursquare.py

for executing the preprocessing script for the datasets. dataSet_*.csv, locations_*.csv, sp_time_temp_*.csv and valid_ids_*.pk will be created under the data/ folder,

4. Run the proposed transformer model

run

    python main.py config/gowalla/transformer.yml

or

    python main.py config/foursquare/transformer.yml

for starting the training process. The dataloader will create intermediate data files and save them under the data/temp/ folder. The configuration of the current run, the network paramters and the performance indicators will be stored under the outputs/ folder.

Citation

If you find this code useful for your work or use it in your project, please consider citing:

@article{hong_context_2023,
  title   = {Context-aware multi-head self-attentional neural network model for next location prediction},
  journal = {Transportation Research Part C: Emerging Technologies},
  author  = {Hong, Ye and Zhang, Yatao and Schindler, Konrad and Raubal, Martin},
  year    = {2023},
  volume  = {156},
  pages   = {104315},
  doi     = {10.1016/j.trc.2023.104315}
}

Contact

If you have any questions, please open an issue or let me know:

location-prediction's People

Contributors

hongyeehh avatar

Stargazers

Alexis Balayre avatar Wbx avatar Muhammad Monjurul Karim avatar sixxx avatar Zipeng Dai avatar foqenkfo avatar Han avatar  avatar  avatar Junran Liang avatar  avatar  avatar Han Zhang avatar  avatar zrg1048 avatar  avatar  avatar James Gaboardi avatar WBC-ML avatar Jie Feng avatar Bien Do avatar Aleksandra Łucja Jaworska avatar Xinglei Wang avatar  avatar YuFan Su avatar euphoria avatar Yue Pan  avatar  avatar  avatar

Watchers

 avatar Kostas Georgiou avatar

location-prediction's Issues

error in running code in vs code

Holle Hongy,
I tried to use your code in vs code but encountered an error. I would appreciate it if you could guide me.
when I run the geilife.py file in vs code I got this error:

Exception has occurred: IndexError
index out of range in self
File "C:\Users\Desktop\13khordad1402\location-prediction-main\models\embed.py", line 158, in forward
emb = self.emb_loc(src)
File "C:\Users\Desktop\13khordad1402\location-prediction-main\models\mobtcast.py", line 58, in forward
emb = self.Embedding(src, context_dict)
File "C:\Users\Desktop\13khordad1402\location-prediction-main\utils\train.py", line 261, in train
logits, pred_geoms = model(x, x_dict, device)
File "C:\Users\Desktop\13khordad1402\location-prediction-main\utils\train.py", line 188, in trainNet
globaliter = train(
File "C:\Users\Desktop\13khordad1402\location-prediction-main\utils\utils.py", line 52, in get_trainedNets
best_model, performance = trainNet(
File "C:\Users\Desktop\13khordad1402\location-prediction-main\main.py", line 34, in single_run
model, perf = get_trainedNets(
File "C:\Users\Desktop\13khordad1402\location-prediction-main\main.py", line 98, in
res_single = single_run(
IndexError: index out of range in self

Is it OK to have different dictionaries for different buffers?

Hi Ye Hong,

Thank you for your great work. I have a question that I hope you can enlighten me about.

In the context vectors calculation in poi.py, you specify 11 buffer distances and calculate the land use context under each distance. This results in different dictionaries for each buffer. Do you think it is okay that the context vectors are derived with different dictionaries?

My initial thought is that it is not right, and we should use one dictionary to calculate all the context vectors. However, when it comes to how you actually use the context vectors, it seems that it doesn't really matter, since the aggregation of the context vectors is done through a neural network (POINet). The different context vectors represent the different topic (poi category) distribution and the topics/categories don't have to be the same (or in the same order), especially under the non-linear aggregation of neural networks. Ultimately, the model performance is not affected - I am not so sure about it, so please correct me if I am wrong.

That being said, I think it is still beneficial to only have one dictionary as it is more rigorous and easier to be interpreted.

What do you think of this issue? I would appreciate it if you could share your thoughts. Thanks.

Running model for inference

Hi,
I was able to successfully train the model using my dataset.
Now, I would like to run it for inference on production datasets produced by my application.
How do I do that? Should I simply run
python preprocessing/geolife.py 20
and
python main.py config/geolife/transformer.yml
on production datasets?
Thanks

IndexError: index out of range in self

Hi,
I did my best to follow the instructions in readme. When I ran

    python main.py config/geolife/transformer.yml

execution failed producing the traceback log listed below.

Did I do something wrong? Could you please help me resolve this issue?

Thanks

TRACEBACK LOG

Traceback (most recent call last):
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/main.py", line 65, in <module>
    res_single = single_run(train_loader, val_loader, test_loader, config, device, log_dir)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/main.py", line 22, in single_run
    model, perf = get_trainedNets(config, model, train_loader, val_loader, device, log_dir)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/utils/utils.py", line 46, in get_trainedNets
    best_model, performance = trainNet(config, model, train_loader, val_loader, device, log_dir=log_dir)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/utils/train.py", line 183, in trainNet
    globaliter = train(
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/utils/train.py", line 263, in train
    logits = model(x, x_dict, device)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/models/MHSA.py", line 35, in forward
    emb = self.Embedding(src, context_dict)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/models/embed.py", line 149, in forward
    emb = self.emb_loc(src)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 163, in forward
    return F.embedding(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 2264, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

ValueError: Length of values (105) does not match length of index (104)

Hi,

Again sorry to bother with what probably is my issue. However, I am a bit stuck. Perhaps, you have an idea/suggestion as to how I could resolve these latest problems I have run into. Execution of python3 preprocessing/geolife.py 20 fails producing the traceback log #1 reported below.

If I remove a user and associated trajectory folder from the folder data then execution fails with the traceback log #2 reported below.

Did you ever see these problems?

Thanks

TRACEBACK LOG #1

Traceback (most recent call last):
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/preprocessing/geolife.py", line 168, in <module>
    get_dataset(config=CONFIG, epsilon=args.epsilon)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/preprocessing/geolife.py", line 48, in get_dataset
    valid_user = calculate_user_quality(sp.copy(), trips.copy(), quality_file, quality_filter)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/preprocessing/utils.py", line 138, in calculate_user_quality
    total_quality["days"] = (
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 3980, in __setitem__
    self._set_item(key, value)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 4174, in _set_item
    value = self._sanitize_column(value)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 4915, in _sanitize_column
    com.require_length_match(value, self.index)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/pandas/core/common.py", line 571, in require_length_match
    raise ValueError(
ValueError: Length of values (105) does not match length of index (104)

TRACEBACK LOG #2

Traceback (most recent call last):
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/preprocessing/geolife.py", line 168, in <module>
    get_dataset(config=CONFIG, epsilon=args.epsilon)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/preprocessing/geolife.py", line 87, in get_dataset
    _filter_sp_history(sp_time)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/preprocessing/geolife.py", line 101, in _filter_sp_history
    vali_data["location_id"] = enc.transform(vali_data["location_id"].values.reshape(-1, 1)) + 2
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 930, in transform
    X_int, X_mask = self._transform(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 124, in _transform
    X_list, n_samples, n_features = self._check_X(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 44, in _check_X
    X_temp = check_array(X, dtype=None, force_all_finite=force_all_finite)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/sklearn/utils/validation.py", line 805, in check_array
    raise ValueError(
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required.

Trajectory Semantic Clustering

Hi,

Thank you for your research and for publishing this code. I'm reproducing your experiment on the Foursquare dataset and have some questions (not technical ones):

  1. The output of your model is the next location. What changes should I make to output clusters of user trajectory movement semantically? For example, two users do not necessarily have to be in the same physical trajectory to be in the same cluster, but rather in the same semantic context (home, work, etc.).
  2. Instead of next-location prediction, do you have any idea of using a seq2seq autoencoder to encode trajectory features, and then cluster them using K-MEANS or something?

I would appreciate your help with this.

trackintel data format error

Hi,
I am using your model with my dataset formatted like the geolife dataset.
Until last week, there was no problem completing the pre-processing step.. Since then, pre-processing fails with the traceback log reported below.

Below you will also find the input dataset which causes the failure.

Below, I also include an input dataset which does not cause pre-processing to fail.

I cannot see any difference between the two input datasets in terms of record structure.

I also do not quite understand the error message:

ValueError: time data "2024-0-2 7:0:57" at position 0 doesn't match format specified

Is the error message saying that one of the records in the dataset contains
2024-0-2 7:0:57? As you can see below no record in the dataset contains this pattern.

Could you please help me resolve this issue? What am I not seeing and/or doing wrong?

Thanks

TRACEBACK LOG

Traceback (most recent call last):
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/preprocessing/geolife.py", line 168, in <module>
    get_dataset(config=CONFIG, epsilon=args.epsilon)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/LocationPrediction/preprocessing/geolife.py", line 26, in get_dataset
    pfs, _ = read_geolife(config["raw_geolife"], print_progress=True)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/trackintel/io/dataset_reader.py", line 117, in read_geolife
    gdf = pd.concat(_get_df(geolife_path, uids, print_progress), axis=0, ignore_index=True)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 368, in concat
    op = _Concatenator(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 422, in __init__
    objs = list(objs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/trackintel/io/dataset_reader.py", line 189, in _get_df
    data["tracked_at"] = pd.to_datetime(data["date"] + " " + data["time"], format="%Y-%m-%d %H:%M:%S", utc=True)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 1068, in to_datetime
    values = convert_listlike(arg._values, format)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 438, in _convert_listlike_datetimes
    result, tz_parsed = objects_to_datetime64ns(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2177, in objects_to_datetime64ns
    result, tz_parsed = tslib.array_to_datetime(
  File "pandas/_libs/tslib.pyx", line 427, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 599, in pandas._libs.tslib.array_to_datetime
ValueError: time data "2024-0-2 7:0:57" at position 0 doesn't match format specified

INPUT DATASET WHICH CAUSES FAILURE

Geolife trajectory
WGS 84
Altitude is in Feet
Reserved 3
0,2,255,My Track,0,0,2,8421376
0
42.96057621353667,-81.34552067527518,0,0,45293,2024-0-2,7:0:57
42.960854208386955,-81.3463197534195,0,0,45293,2024-0-2,7:0:57
42.96091271518786,-81.34635018147927,0,0,45293,2024-0-2,7:0:57
42.9737293483869,-81.32829863748867,0,0,45293,2024-0-2,7:0:57
42.973755004795365,-81.32842788316327,0,0,45293,2024-0-2,7:0:57
42.97564504340588,-81.32194977472004,0,0,45293,2024-0-2,7:0:57
42.97593313104018,-81.32199810424413,0,0,45293,2024-0-2,7:0:57
42.975969807381006,-81.32198093591352,0,0,45293,2024-0-2,7:0:57
42.9760710258729,-81.32179280518828,0,0,45293,2024-0-2,7:0:57
42.976106288386895,-81.32209860085274,0,0,45293,2024-0-2,7:0:57
42.9761150983869,-81.32209800082813,0,0,45293,2024-0-2,7:0:57
42.97624936154101,-81.32201358626166,0,0,45293,2024-0-2,7:0:57
42.97625405914777,-81.32214258413602,0,0,45293,2024-0-2,7:0:57
42.987786275880026,-81.32659013043525,0,0,45293,2024-0-2,7:0:57
42.99006595838683,-81.31835366186378,0,0,45293,2024-0-2,7:0:57
42.993750715746145,-81.30514688644925,0,0,45293,2024-0-2,7:0:57
43.000314908386756,-81.300993833221,0,0,45293,2024-0-2,7:0:57

SAMPLE INPUT DATASET WHICH DOES NOT CAUSE PRE-PROCESSING TO FAIL

Geolife trajectory
WGS 84
Altitude is in Feet
Reserved 3
0,2,255,My Track,0,0,2,8421376
0
44.43645965890645,-81.40441678790644,0,0,45284,2023-11-24,18:16:17
44.43650193688819,-81.40460628161391,0,0,45284,2023-11-24,18:16:17
44.436505806041566,-81.40444538043556,0,0,45284,2023-11-24,18:16:17
44.436558719075165,-81.40496128823457,0,0,45284,2023-11-24,18:16:17
44.4365688055671,-81.40466895052742,0,0,45284,2023-11-24,18:16:17
44.43657726997211,-81.40477304125808,0,0,45284,2023-11-24,18:16:17
44.43657726997211,-81.40477304125808,0,0,45284,2023-11-24,18:16:17
44.436600603305465,-81.40471637458066,0,0,45284,2023-11-24,18:16:17
44.436608936638784,-81.40487637457682,0,0,45284,2023-11-24,18:16:17
44.436608936638784,-81.40487637457682,0,0,45284,2023-11-24,18:16:17
44.43662226997211,-81.40469804123732,0,0,45284,2023-11-24,18:16:17
44.43662226997211,-81.40469804123732,0,0,45284,2023-11-24,18:16:17
44.43667913937491,-81.40447871317181,0,0,45284,2023-11-24,18:16:17
44.43669060330546,-81.40472470787246,0,0,45284,2023-11-24,18:16:17
44.43669060330546,-81.40472470787246,0,0,45284,2023-11-24,18:16:17
44.43672993275473,-81.40442388112147,0,0,45284,2023-11-24,18:16:17
44.43678254039121,-81.40446641887749,0,0,45284,2023-11-24,18:16:17

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.