GithubHelp home page GithubHelp logo

Training problem about smoke HOT 14 OPEN

lzccccc avatar lzccccc commented on June 1, 2024
Training problem

from smoke.

Comments (14)

vobecant avatar vobecant commented on June 1, 2024 2

@nikhil-nakhate I didn't make any additional changes. I run this exact command:

now=$(date +"%Y%m%d_%H%M%S")
EXPNAME=SMOKE_trainOnTrain_120ep_${now}
JOB_FILE=./jobs/${EXPNAME}.job
g=4
NUMCPUS=16
CONFIG_FILE=/path/to/SMOKE/configs/smoke_trainOnTrain_testOnVal_120ep.yaml
python tools/plain_train_net.py --num-gpus 4 --config-file ${CONFIG_FILE}

To get the numbers that I reported, I used this command

python tools/plain_train_net.py --eval-only --config-file "${CONFIG_FILE}"

followed by running my new script:

import numpy as np
import os


def get_ap(prec, ap_type):
    prec = np.asarray(prec)
    sums = 0
    if ap_type == 11:
        for i in range(0, prec.shape[-1], 4):
            sums = sums + prec[..., i]
        ap = sums / 11 * 100
    else:
        for i in range(1, prec.shape[-1]):
            sums = sums + prec[..., i]
        ap = sums / 40 * 100
    return ap


def get_aps(results_dir):
    def print_file(s, f):
        f.write('{}\n'.format(s))
        print(s)

    labels = ['car', 'pedestrian', 'cyclist']
    eval_types = ['detection', 'detection_ground', 'detection_3d', 'orientation']
    eval_types_short = {'detection': '2d', 'detection_3d': '3d', 'orientation': 'aos', 'detection_ground': 'bev'}
    difficulties = ['easy', 'moderate', 'hard']

    res_path = os.path.join(results_dir, 'parsed_res.txt')
    f = open(res_path, 'w')
    for label in labels:
        for ap_type in [11, 40]:
            print_file('\n{}_R{}'.format(label, ap_type), f)
            for eval_type in eval_types:
                res_file = os.path.join(results_dir, 'stats_{}_{}.txt'.format(label, eval_type))
                with open(res_file, 'r') as fl:
                    lines = fl.readlines()
                diff_res = [eval_types_short[eval_type]]
                for i, difficulty in enumerate(difficulties):
                    prec = [float(tmp) for tmp in lines[i].strip().split(' ')]
                    ap_res = get_ap(prec, ap_type)
                    diff_res.append('{:.2f}'.format(ap_res))
                print_file(', '.join(diff_res), f)
    f.close()
    print('Saved parsed results to {}'.format(res_path))


if __name__ == '__main__':
    results_dir = '/path/to/results/logs_trainOnTrain_120ep/inference/kitti_train'
    get_aps(results_dir)

Feel free to check the code!

from smoke.

lzccccc avatar lzccccc commented on June 1, 2024

Hi,

How much AP did you achieve?
As indicated in Sec. 5.1 of our paper, the performance degrades on train/val set because of the lack of examples. This is reasonable since detecting each object as a single point is a difficult task.

Or maybe there is an unobserved bug in the code....

from smoke.

lianqing01 avatar lianqing01 commented on June 1, 2024

Hi @lzccccc ,

Thanks for your wonderful code. I also try to train on training dataset (3712) and evaluate on the validation dataset (3769), but only can get the following ap (3d/bev):

Easy Moderate Hard
Car 10.7 / 15.9 7.7 / 12.2 7.7 / 10.2

which is much lower than the paper reported.

I use the original setting that trains on 4 gpus, should I train longer or use a smaller learning rate?

Best regards,
Qing LIAN

from smoke.

lianqing11 avatar lianqing11 commented on June 1, 2024

I can get similar results from the released code in the validation dataset.

Thanks for kindly sharing the code.

from smoke.

ZhxJia avatar ZhxJia commented on June 1, 2024

I can get similar results from the released code in the validation dataset.

Thanks for kindly sharing the code.

Hi @lzccccc ,

Thanks for your wonderful code. I also try to train on training dataset (3712) and evaluate on the validation dataset (3769), but only can get the following ap (3d/bev):

Easy Moderate Hard
Car 10.7 / 15.9 7.7 / 12.2 7.7 / 10.2
which is much lower than the paper reported.

I use the original setting that trains on 4 gpus, should I train longer or use a smaller learning rate?

Best regards,
Qing LIAN

Hi, could you share how you solve it? Thks

from smoke.

ZiYang-xie avatar ZiYang-xie commented on June 1, 2024

I get exactly the same problem, I follow the paper, training on the train(3712) for 60 epochs and drops learning rate at 25 & 40 epoch by a factor of 10, but i still get low AP on the val(3769)

car_detection_3d AP: 11.122600 8.383281 6.976796
pedestrian_detection_3d AP: 2.784572 2.732527 2.739558
cyclist_detection_3d AP: 0.606061 0.568182 0.568182

Could anyone share the solution? Thks

from smoke.

ZiYang-xie avatar ZiYang-xie commented on June 1, 2024

Hi, thks for the high-quality code, I have solved the problem, just using the 14500 iter in the default.py, about 60 epochs on trainval(7481) set but 120 epochs on train(3712) set. And I get the AP car_detection_3d AP: 16.485666 14.154558 11.966417

from smoke.

mrsempress avatar mrsempress commented on June 1, 2024

Hi, thks for the high-quality code, I have solved the problem, just using the 14500 iter in the default.py, about 60 epochs on trainval(7481) set but 120 epochs on train(3712) set.
And I get the AP car_detection_3d AP: 16.485666 14.154558 11.966417

Do you mean that after you set iter 7250 to keep 60 epochs on train set, the result is normal?
I didn't change any code(that means I use 25000 iter) and the results even lower than you.

  Easy Moderate Hard
Car 6.74 / 12.17 4.35 / 8.09 4.02 / 7.65
Pedestrian 2.12 / 2.63 1.78 / 1.93 1.39 / 1.52
Cyclist 1.04 / 1.34 0.41 / 0.50 0.41 / 0.51

Is that means my training is overfitting?

from smoke.

vobecant avatar vobecant commented on June 1, 2024

I trained the network using the following setup

MODEL:
  WEIGHT: "catalog://ImageNetPretrained/DLA34"
INPUT:
  FLIP_PROB_TRAIN: 0.5
  SHIFT_SCALE_PROB_TRAIN: 0.3
OUTPUT_DIR: "./tools/logs_trainOnTrain_120ep"
DATASETS:
  DETECT_CLASSES: ("Car", "Cyclist", "Pedestrian")
  TRAIN: ("kitti_train",)
  TEST: ("kitti_train",)
  TRAIN_SPLIT: "train"
  TEST_SPLIT: "val"
SOLVER:
  BASE_LR: 2.5e-4
  STEPS: (5800, 9280)
  MAX_ITERATION: 13920
  IMS_PER_BATCH: 32

and got these results:

car_R11
2d, 88.27, 78.84, 70.18
bev, 24.41, 19.57, 16.81
3d, 18.25, 15.26, 14.21
aos, 88.04, 78.41, 69.58

car_R40
2d, 91.47, 83.36, 74.03
bev, 19.91, 13.72, 11.61
3d, 12.68, 8.85, 7.12
aos, 91.19, 82.84, 73.27

pedestrian_R11
2d, 63.12, 48.89, 41.05
bev, 11.49, 11.27, 11.15
3d, 11.03, 10.89, 10.35
aos, 49.35, 38.70, 32.88

pedestrian_R40
2d, 60.76, 50.06, 41.44
bev, 4.49, 3.91, 3.22
3d, 3.44, 2.92, 2.29
aos, 46.02, 37.66, 31.16

cyclist_R11
2d, 48.29, 34.82, 33.71
bev, 4.23, 3.80, 3.03
3d, 3.25, 2.27, 2.27
aos, 31.60, 22.19, 21.75

cyclist_R40
2d, 48.38, 31.85, 29.94
bev, 2.13, 1.18, 0.87
3d, 1.74, 0.74, 0.72
aos, 32.51, 21.08, 19.96

from smoke.

nikhil-nakhate avatar nikhil-nakhate commented on June 1, 2024

@vobecant Could you share what IoU you have used to get the results.

from smoke.

nikhil-nakhate avatar nikhil-nakhate commented on June 1, 2024

I used your config and wasn't able to replicate the results. Did you make any additional changes? @vobecant

from smoke.

nikhil-nakhate avatar nikhil-nakhate commented on June 1, 2024

Hey @vobecant, Thanks so much for this. It really helps. Let me get back to you with the training results.

from smoke.

nikhil-nakhate avatar nikhil-nakhate commented on June 1, 2024

@vobecant The following are the results that I got with your configs:

car_R11
2d, 84.38, 76.68, 68.68
bev, 21.89, 18.15, 15.98
3d, 16.25, 13.93, 13.66
aos, 84.17, 76.05, 67.74

car_R40
2d, 84.13, 77.06, 70.24
bev, 15.62, 10.93, 9.48
3d, 9.06, 6.34, 5.79
aos, 83.92, 76.36, 69.15

pedestrian_R11
2d, 54.75, 47.58, 40.32
bev, 6.33, 6.18, 5.85
3d, 5.87, 5.62, 5.64
aos, 37.60, 33.65, 29.05

pedestrian_R40
2d, 54.87, 46.86, 40.52
bev, 3.05, 2.71, 2.14
3d, 2.09, 1.88, 1.69
aos, 34.69, 30.42, 26.16

cyclist_R11
2d, 49.69, 35.25, 34.59
bev, 2.06, 1.14, 1.14
3d, 2.05, 1.14, 1.14
aos, 36.92, 23.43, 22.65

cyclist_R40
2d, 48.46, 32.58, 30.64
bev, 0.99, 0.61, 0.52
3d, 0.83, 0.47, 0.35
aos, 33.77, 22.03, 20.56

from smoke.

arwagh avatar arwagh commented on June 1, 2024

@vobecant hello, I run the code with your configurations, and I got these results:

car_R11
2d, 19.04, 13.31, 12.71
bev, 9.09, 4.55, 4.55
3d, 9.09, 4.55, 4.55
aos, 18.28, 12.45, 11.73

car_R40
2d, 12.98, 12.04, 10.87
bev, 0.55, 0.47, 0.35
3d, 0.10, 0.05, 0.05
aos, 12.17, 11.16, 9.93

pedestrian_R11
2d, 14.54, 9.74, 9.79
bev, 0.91, 0.71, 0.75
3d, 0.64, 0.56, 0.56
aos, 6.74, 4.66, 4.69

pedestrian_R40
2d, 10.67, 8.03, 6.73
bev, 0.46, 0.20, 0.20
3d, 0.34, 0.15, 0.15
aos, 4.94, 3.76, 3.17

cyclist_R11
2d, 1.73, 1.57, 1.57
bev, 0.00, 0.00, 0.00
3d, 0.00, 0.00, 0.00
aos, 0.01, 0.46, 0.46

cyclist_R40
2d, 0.48, 0.43, 0.43
bev, 0.00, 0.00, 0.00
3d, 0.00, 0.00, 0.00

Do you have any idea why they are so low?
Thank you.

from smoke.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.