GithubHelp home page GithubHelp logo

Training on custom dataset about openibl HOT 21 CLOSED

yxgeee avatar yxgeee commented on July 17, 2024
Training on custom dataset

from openibl.

Comments (21)

yxgeee avatar yxgeee commented on July 17, 2024 6

Oops, sorry for pressing the wrong button..

It's indeed hard to understand without comments. I cannot even remember it now as I wrote it half a year ago. I will add some comments regarding the dataset or prepare a template to write a custom dataset later.

from openibl.

yxgeee avatar yxgeee commented on July 17, 2024 5

You have 5 sublists in identities, but 6 indices. So it would raise the error list index out of range.
The indices are expected to be [0,1,2,3,4] in your case. You need to double-check why this problem occurs.

from openibl.

yxgeee avatar yxgeee commented on July 17, 2024 4

Hello, please refer to https://github.com/yxgeee/OpenIBL/blob/master/docs/INSTALL.md#use-custom-dataset-optional for creating a custom dataset.

from openibl.

yxgeee avatar yxgeee commented on July 17, 2024 3

@yxgeee should I create vgg16_pitts_64_desc_cen for my dataset as well? I just changed the "dataset" to "dummy" and it throws and error stating that Dataset not found

Generally speaking, you need to create vgg16_pitts_64_desc_cen for your dataset. However, I am not sure whether your own vgg16_pitts_64_desc_cen would perform better than the original vgg16_pitts_64_desc_cen, although the original vgg16_pitts_64_desc_cen was generated based on Pitts dataset. You could try them both and compare the performance.

Did you register the dataset as mentioned in step 2 (https://github.com/yxgeee/OpenIBL/blob/master/docs/INSTALL.md#use-custom-dataset-optional)? If you used python setup.py install to install the library, you need to set up the library again once upon you change the files in ibl/.

from openibl.

yxgeee avatar yxgeee commented on July 17, 2024 1

I have tried your dummy dataset, and it works
Screenshot 2020-09-03_17-14-48-953

Did you modify anything that leads to the error? Try to refresh the code by pulling the repo again.

Plus, I found that you use "examples/data/my_dataset/raw/IMG_7201.JPG" in identities, you should use "IMG_7201.JPG" instead. Note that every image needs to be save under "examples/data/my_dataset/raw".

from openibl.

yxgeee avatar yxgeee commented on July 17, 2024 1

Could you provide your training command?

from openibl.

yxgeee avatar yxgeee commented on July 17, 2024

Hi,
I have updated the code and added a quite simple way to extract descriptors. Please refer to https://github.com/yxgeee/OpenIBL#extract-descriptor-for-a-single-image

from openibl.

Zumbalamambo avatar Zumbalamambo commented on July 17, 2024

@yxgeee nice... May I know how to train as well on a custom dataset? I have got rgb image with pose information.

from openibl.

Zumbalamambo avatar Zumbalamambo commented on July 17, 2024

@yxgeee It throws the following error,

ValueError: Unknown resampling filter (640). Use Image.NEAREST (0), Image.LANCZOS (1), Image.BILINEAR (2), Image.BICUBIC (3), Image.BOX (4) or Image.HAMMING (5)

from openibl.

yxgeee avatar yxgeee commented on July 17, 2024

Modify transforms.Resize(480, 640) to transforms.Resize((480, 640))

from openibl.

yxgeee avatar yxgeee commented on July 17, 2024

@yxgeee nice... May I know how to train as well on a custom dataset? I have got rgb image with pose information.

To train on a custom dataset, you need to write a dataset file following https://github.com/yxgeee/OpenIBL/blob/master/ibl/datasets/pitts.py. The key is to generate two json files, meta.json and splits.json. Please refer to https://drive.google.com/drive/folders/1ZFMUW0BAcdi_vp88K4ZqrDcQGDH3da5v?usp=sharing for an example of generated json files.

from openibl.

Zumbalamambo avatar Zumbalamambo commented on July 17, 2024

@yxgeee May I know what is mean by utm in meta.json as well as what are those numbers in splits.json?

from openibl.

Zumbalamambo avatar Zumbalamambo commented on July 17, 2024

Thank you!... I'm waiting for it!..

from openibl.

Zumbalamambo avatar Zumbalamambo commented on July 17, 2024

@yxgeee it's better to have a template for custom dataset since Pittsburg dataset is not available. I tried my best to gain access to it :( v

At the moment I have sequence of frames and a csv file containing imagename,x,y,z positions

from openibl.

Zumbalamambo avatar Zumbalamambo commented on July 17, 2024

@yxgeee thank you!.. I have followed your guideline to load the dataset. I have doubt on the values that must go inside q_train_pids, db_train_pids,q_val_pids ,db_val_pids,q_test_pids,db_test_pids.

I have created a dummy dataset with very few images to start with and used the same data for both validation and testing as well. This is my code at the moment.

import os.path as osp

from ..utils.data import Dataset
from ..utils.serialization import write_json
from ..utils.dist_utils import synchronize


class MyDataset(Dataset):


    def __init__(self, root, scale=None, verbose=True):
        super(MyDataset, self).__init__(root)

        self.arrange()
        self.load(verbose)

    def arrange(self):
        if self._check_integrity():
            return

        try:
            rank = dist.get_rank()
        except:
            rank = 0

        # the root path for raw dataset
        raw_dir = osp.join(self.root, 'raw')
        if (not osp.isdir(raw_dir)):
            raise RuntimeError("Dataset not found.")


        identities = [["examples/data/my_dataset/query/IMG_7201.JPG","examples/data/my_dataset/raw/IMG_7201.JPG"],
                      ["examples/data/my_dataset/query/IMG_7207.JPG","examples/data/my_dataset/raw/IMG_7207.JPG"],
                      ["examples/data/my_dataset/query/IMG_7208.JPG","examples/data/my_dataset/raw/IMG_7208.JPG"],
                      ["examples/data/my_dataset/query/IMG_7209.JPG","examples/data/my_dataset/raw/IMG_7209.JPG"],
                      ["examples/data/my_dataset/query/IMG_7210.JPG","examples/data/my_dataset/raw/IMG_7210.JPG"]]

        utms = [[1.294619, 0.885227],
                [-0.409010, -0.449514],
                [-0.109162, 0.164040],
                [0.094267, 0.795477],
                [0.351835, 1.336169]
                ]
        # Save meta information into a json file
        meta = {
                'name': 'demo', # change it to your dataset name
                'identities': identities,
                'utm': utms
                }

        if rank == 0:
            write_json(meta, osp.join(self.root, 'meta.json'))

        q_train_pids = [i for i in range(len(identities))]
        db_train_pids = [i for i in range(len(identities))]
        q_val_pids = [i for i in range(len(identities))]
        db_val_pids = [i for i in range(len(identities))]
        q_test_pids = [i for i in range(len(identities))]

        # Save the training / test / val split into a json file
        splits = {
            'q_train': sorted(q_train_pids),
            'db_train': sorted(db_train_pids),
            'q_val': sorted(q_val_pids),
            'db_val': sorted(db_val_pids),
            'q_test': sorted(q_test_pids),
            'db_test': sorted(q_test_pids)}

        if rank == 0:
            write_json(splits, osp.join(self.root, 'splits.json'))

        synchronize()

It throws the following error :(,

/home/anaconda3//bin/python /home/workspace/OpenIBL/train.py
Traceback (most recent call last):
  File "/home/workspace/OpenIBL/train.py", line 2, in <module>
    dataset = create('my_dataset', 'examples/data/my_dataset')
  File "/home/workspace/OpenIBL/ibl/datasets/__init__.py", line 33, in create
    return __factory[name](root, *args, **kwargs)
  File "/home/workspace/OpenIBL/ibl/datasets/my_dataset.py", line 15, in __init__
    self.load(verbose)
  File "/home/workspace/OpenIBL/ibl/utils/data/dataset.py", line 75, in load
    self.q_train = _pluck(identities, utm, q_train_pids, relabel=False)
  File "/home/workspace/OpenIBL/ibl/utils/data/dataset.py", line 14, in _pluck
    pid_images = identities[pid]
IndexError: list index out of range

from openibl.

Zumbalamambo avatar Zumbalamambo commented on July 17, 2024

@yxgeee I have spotted the error. Since the meta.json and splits.json exists already, it was not being overwritten. I have just deleted those two files and rerun the loader, It works now... Let me try the training pipeline :)

from openibl.

Zumbalamambo avatar Zumbalamambo commented on July 17, 2024

May I know what is the use of "vgg16_pitts_64_desc_cen". I suppose it has clustered centroids of features.

I have the following error when I ran ,sh train_baseline_dist.sh triplet


===> Start calculating pairwise distances
===> Start sorting gallery
Traceback (most recent call last):
  File "examples/netvlad_img.py", line 294, in <module>
    main()
  File "examples/netvlad_img.py", line 114, in main
    main_worker(args)
  File "examples/netvlad_img.py", line 188, in main_worker
    vlad=args.vlad, loss_type=args.loss_type)
  File "/home/workspace/OpenIBL/ibl/trainers.py", line 33, in train
    data_loader.new_epoch()
  File "/home/workspace/OpenIBL/ibl/utils/data/__init__.py", line 20, in new_epoch
    self.iter = iter(self.loader)
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 746, in __init__
    self._try_put_index()
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 861, in _try_put_index
    index = self._next_index()
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 339, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 200, in __iter__
    for idx in self.sampler:
  File "/home/workspace/OpenIBL/ibl/utils/data/sampler.py", line 85, in __iter__
    assert(len(neg_indices)==self.neg_num)
AssertionError
Traceback (most recent call last):
  File "/home/anaconda3/envs/odom/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/anaconda3/envs/odom/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/anaconda3/envs/odom/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/anaconda3/envs/odom/bin/python', '-u', 'examples/netvlad_img.py', '--launcher', 'pytorch', '--tcp-port', '10000', '-d', 'dummy', '--scale', '30k', '-a', 'vgg16', '--layers', 'conv5', '--vlad', '--syncbn', '--sync-gather', '--width', '640', '--height', '480', '--tuple-size', '1', '-j', '1', '--neg-num', '1', '--test-batch-size', '1', '--margin', '0.1', '--lr', '0.001', '--weight-decay', '0.001', '--loss-type', 'triplet', '--eval-step', '1', '--epochs', '5', '--step-size', '1', '--cache-size', '1000', '--logs-dir', 'logs/netVLAD/dummy30k-vgg16/conv5-triplet-lr0.001-tuple1']' returned non-zero exit status 1.

I use single GPU by the way

from openibl.

Zumbalamambo avatar Zumbalamambo commented on July 17, 2024

@yxgeee i tried to print print(len(neg_indices),self.neg_num) in sampler.py. I'm getting 0,2

from openibl.

yxgeee avatar yxgeee commented on July 17, 2024

len(neg_indices)=0 is abnormal, and it seems no negative sample is found. I guess the problem is still on the dataset.
Did you use [abscissa, ordinate] in utms as the coordinates? In https://github.com/yxgeee/OpenIBL/blob/master/ibl/utils/data/dataset.py#L43, we use 10m as the positive distance threshold and 25m as the negative distance threshold. If all the samples in your dataset are within 25m, no negative pair can be found. If this is the problem, you need to modify the intra_thres and inter_thres values.

from openibl.

Zumbalamambo avatar Zumbalamambo commented on July 17, 2024

@yxgeee I have loaded almost 50k images with pose information. when I try to train, my system freezes. Following is the setting that I use,


Use GPU: 0 for training, rank no.0 of world_size 1

Args:Namespace(arch='vgg16', cache_size=1000, data_dir='/home/workspace/OpenIBL/examples/data', dataset='dummy', deterministic=False, epochs=5, eval_step=1, features=4096, gpu=0, height=480, init_dir='/home/workspace/OpenIBL/examples/../logs', iters=0, launcher='pytorch', layers='conv5', logs_dir='logs/netVLAD/dummy30k-vgg16/conv5-triplet-lr0.001-tuple1', loss_type='triplet', lr=0.001, margin=0.1, momentum=0.9, neg_num=10, neg_pool=1000, ngpus_per_node=1, nowhiten=False, num_clusters=64, print_freq=10, rank=0, rerank=False, resume='', scale='30k', seed=43, step_size=5, sync_gather=True, syncbn=True, tcp_port='10000', test_batch_size=1, tuple_size=1, vlad=True, weight_decay=0.001, width=640, workers=1, world_size=1)

from openibl.

Zumbalamambo avatar Zumbalamambo commented on July 17, 2024

i ran ./train_baseline_dist.sh triplet command.

Since I can't upload the file of type .sh and .py, I'm pasting the content of my sh file

#!/bin/sh
PYTHON=${PYTHON:-"python"}
GPUS=1

DATASET=dummy
SCALE=30k
ARCH=vgg16
LAYERS=conv5
LOSS=$1
LR=0.001

if [ $# -ne 1 ]
  then
    echo "Arguments error: <LOSS_TYPE (triplet|sare_ind|sare_joint)>"
    exit 1
fi


PORT=$(( ((RANDOM<<15)|RANDOM) % 49152 + 10000 ))
status="$(nc -z 127.0.0.1 $PORT < /dev/null &>/dev/null; echo $?)"
echo $PORT
if [ "${status}" != "0" ]; then
    break;
fi

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT --use_env \
examples/netvlad_img.py --launcher pytorch --tcp-port ${PORT} \
  -d ${DATASET} --scale ${SCALE} \
  -a ${ARCH} --layers ${LAYERS} --vlad --syncbn --sync-gather \
  --width 640 --height 480 --tuple-size 1 -j 1 --test-batch-size 1 \
  --margin 0.1 --lr ${LR} --weight-decay 0.001 --loss-type ${LOSS} \
  --eval-step 1 --epochs 5 --step-size 5 --cache-size 200 \
  --logs-dir logs/netVLAD/${DATASET}${SCALE}-${ARCH}/${LAYERS}-${LOSS}-lr${LR}-tuple${GPUS}

Setting in netvlad_img.py

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="NetVLAD/SARE training")
    parser.add_argument('--launcher', type=str,
                        choices=['none', 'pytorch', 'slurm'],
                        default='none', help='job launcher')
    parser.add_argument('--tcp-port', type=str, default='5017')
    # data
    parser.add_argument('-d', '--dataset', type=str, default='pitts',
                        choices=datasets.names())
    parser.add_argument('--scale', type=str, default='30k')
    parser.add_argument('--tuple-size', type=int, default=1,
                        help="tuple numbers in a batch")
    parser.add_argument('--test-batch-size', type=int, default=1,
                        help="tuple numbers in a batch")
    parser.add_argument('--cache-size', type=int, default=200)
    parser.add_argument('-j', '--workers', type=int, default=1)
    parser.add_argument('--height', type=int, default=480, help="input height")
    parser.add_argument('--width', type=int, default=640, help="input width")
    parser.add_argument('--neg-num', type=int, default=2,
                        help="negative instances for one anchor in a tuple")
    parser.add_argument('--num-clusters', type=int, default=64)
    parser.add_argument('--neg-pool', type=int, default=200)
    # model
    parser.add_argument('-a', '--arch', type=str, default='vgg16',
                        choices=models.names())
    parser.add_argument('--layers', type=str, default='conv5')
    parser.add_argument('--nowhiten', action='store_true')
    parser.add_argument('--syncbn', action='store_true')
    parser.add_argument('--sync-gather', action='store_true')
    parser.add_argument('--features', type=int, default=4096)
    # optimizer
    parser.add_argument('--lr', type=float, default=0.001,
                        help="learning rate of new parameters, for pretrained ")
    parser.add_argument('--momentum', type=float, default=0.9)
    parser.add_argument('--weight-decay', type=float, default=0.001)
    parser.add_argument('--loss-type', type=str, default='triplet', help="[triplet|sare_ind|sare_joint]")
    parser.add_argument('--step-size', type=int, default=5)
    # training configs
    parser.add_argument('--resume', type=str, default='', metavar='PATH')
    parser.add_argument('--vlad', action='store_true')
    parser.add_argument('--eval-step', type=int, default=1)
    parser.add_argument('--rerank', action='store_true',
                        help="evaluation only")
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--iters', type=int, default=0)
    parser.add_argument('--seed', type=int, default=43)
    parser.add_argument('--deterministic', action='store_true')
    parser.add_argument('--print-freq', type=int, default=10)
    parser.add_argument('--margin', type=float, default=0.1, help='margin for the triplet loss with batch hard')
    # path
    working_dir = osp.dirname(osp.abspath(__file__))
    parser.add_argument('--data-dir', type=str, metavar='PATH',
                        default=osp.join(working_dir, 'data'))
    parser.add_argument('--logs-dir', type=str, metavar='PATH',
                        default=osp.join(working_dir, 'logs'))
    parser.add_argument('--init-dir', type=str, metavar='PATH',
                        default=osp.join(working_dir, '..', 'logs'))
    main()

from openibl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.