Hi, thank you for providing these useful tools. Currently, I'm working on INT8 quantiz

Seems to be related to <a class="issue-link js-issue-link" data-error-text="Failed to

Try checking out <a class="issue-link js-issue-link" data-error-text="Failed to load t

Seems to be related to <a class="issue-link js-issue-link" data-error-tex

@summer110669 - sorry, made a mistake there - should read <code class="no

Slower inference with INT8 for NNCF compared to Post-Training Optimization Toolkit and FP32 about nncf HOT 8 CLOSED

openvinotoolkit commented on May 29, 2024 1

Slower inference with INT8 for NNCF compared to Post-Training Optimization Toolkit and FP32

from nncf.

Comments (8)

vshampor commented on May 29, 2024

Seems to be related to #117.

Also, it looks like the NNCF path used per-channel activation quantization, while the POT path used per-tensor quantization. This might be a sign of HW config file misalignment between NNCF and POT (our fault), or mismatching configurations between POT and NNCF if you did not use specific HW configuration. Try forcing per-tensor in NNCF via config file parameters ("per_channel": false) as a workaround in NNCF and see if it improves the situation.

from nncf.

vshampor commented on May 29, 2024

Try checking out #124 to see if it fixes the FakeQuantize order in NNCF.

from nncf.

seanxxia commented on May 29, 2024

Seems to be related to #117.

Also, it looks like the NNCF path used per-channel activation quantization, while the POT path used per-tensor quantization. This might be a sign of HW config file misalignment between NNCF and POT (our fault), or mismatching configurations between POT and NNCF if you did not use specific HW configuration. Try forcing per-tensor in NNCF via config file parameters ("per_channel": true) as a workaround in NNCF and see if it improves the situation.

Hi, thank you for the fast reply :-)
I used ("hw_config_type": "cpu") on the NNCF config. Did you mean that directly add ("per_channel": true) on the NNCF config? Thank you.

from nncf.

vshampor commented on May 29, 2024

@summer110669 - sorry, made a mistake there - should read "per_channel": false, of course. I edited the comment so as not to mislead future readers. Add the "per_channel": false line at the same level in the .json structure as the "algorithm": "quantization" key-value pair.

from nncf.

seanxxia commented on May 29, 2024

@summer110669 - sorry, made a mistake there - should read "per_channel": false, of course. I edited the comment so as not to mislead future readers. Add the "per_channel": false line at the same level in the .json structure as the "algorithm": "quantization" key-value pair.

I added the "per_channel": false in config as you said, and here's the error:
ERROR:nncf:Invalid NNCF config supplied!
Traceback (most recent call last):
File "train.py", line 247, in
main(args)
File "train.py", line 160, in main
nncf_config = NNCFConfig.from_json(args.nncf_config)
File "/usr/local/lib/python3.6/dist-packages/nncf/config.py", line 45, in from_json
NNCFConfig.validate(loaded_json)
File "/usr/local/lib/python3.6/dist-packages/nncf/config.py", line 81, in validate
validate_single_compression_algo_schema(compression_section)
File "/usr/local/lib/python3.6/dist-packages/nncf/config_schema.py", line 595, in validate_single_compression_algo_schema
raise type(e)("For algorithm: '{}'\n".format(algo_name) + str(e)).with_traceback(sys.exc_info()[2])
File "/usr/local/lib/python3.6/dist-packages/nncf/config_schema.py", line 592, in validate_single_compression_algo_schema
jsonschema.validate(single_compression_algo_dict, schema=REF_VS_ALGO_SCHEMA[algo_name])
File "/usr/local/lib/python3.6/dist-packages/jsonschema/validators.py", line 934, in validate
raise error
jsonschema.exceptions.ValidationError: For algorithm: 'quantization'
Additional properties are not allowed ('per_channel' was unexpected)

Failed validating 'additionalProperties' in schema:
{'additionalProperties': False,
'properties': {'activations': {'additionalProperties': False,
'description': 'Constraints to be '
'applied to model '
'activations '
'quantization only. '
'Overrides higher-level '
'settings.',
'properties': {'bits': {'description': 'Bitwidth '
'to '
'quantize '
'to.',
'type': 'number'},
'ignored_scopes': {'description': 'A '
'list '
'of '
'model '
'control '
'flow '
'graph '
'node '
'scopes '
'to '
'be '
'ignored '
'for '
'this '
'operation '
'- '
'functions '
'as '
'a '
"'blacklist'. "
'Optional.',
'oneOf': [{'items': {'type': 'string'},
'type': 'array'},
{'type': 'string'}]},
'mode': {'description': 'Mode '
'of '
'quantization',
'type': 'string'},
'per_channel': {'description': 'Whether '
'to '
'quantize '
'inputs '
'per '
'channel '
'(i.e. '
'per '
'0-th '
'dimension '
'for '
'weight '
'quantization, '
'and '
'per '
'1-st '
'dimension '
'for '
'activation '
'quantization)',
'type': 'boolean'},
'signed': {'description': 'Whether '
'to '
'use '
'signed '
'or '
'unsigned '
'input/output '
'values '
'for '
'quantization. '
'If '
'specified '
'as '
'unsigned '
'and '
'the '
'input '
'values '
'during '
'initialization '
'have '
'differing '
'signs, '
'will '
'reset '
'to '
'performing '
'signed '
'quantization '
'instead.',
'type': 'boolean'},
'target_scopes': {'description': 'A '
'list '
'of '
'model '
'control '
'flow '
'graph '
'node '
'scopes '
'to '
'be '
'considered '
'for '
'this '
'operation '
'- '
'functions '
'as '
'a '
"'whitelist'. "
'Optional.',
'oneOf': [{'items': {'type': 'string'},
'type': 'array'},
{'type': 'string'}]}},
'type': 'object'},
'algorithm': {'const': 'quantization'},
'export_to_onnx_standard_ops': {'description': 'Determines '
'how '
'should '
'the '
'additional '
'quantization '
'operations '
'be '
'exported '
'into '
'the '
'ONNX '
'format. '
'Set '
'this '
'to '
'false '
'for '
'export '
'to '
'OpenVINO-supported '
'FakeQuantize '
'ONNX, '
'or to '
'true '
'for '
'export '
'to '
'ONNX '
'standard '
'QuantizeLinear-DequantizeLinear '
'node '
'pairs '
'(8-bit '
'quantization '
'only '
'in the '
'latter '
'case). '
'Default: '
'false',
'type': 'boolean'},
'ignored_scopes': {'description': 'A list of model '
'control flow graph '
'node scopes to be '
'ignored for this '
'operation - '
'functions as a '
"'blacklist'. "
'Optional.',
'items': {'type': 'string'},
'type': ['array', 'string']},
'initializer': {'additionalProperties': False,
'properties': {'batchnorm_adaptation': {'additionalProperties': False,
'properties': {'num_bn_adaptation_steps': {'description': 'Number '
'of '
'batches '
'from '
'the '
'training '
'dataset '
'to '
'use '
'for '
'model '
'inference '
'during '
'the '
'BatchNorm '
'statistics '
'adaptation '
'procedure '
'for '
'the '
'compressed '
'model',
'type': 'number'},
'num_bn_forget_steps': {'description': 'Number '
'of '
'batches '
'from '
'the '
'training '
'dataset '
'to '
'use '
'for '
'model '
'inference '
'during '
'the '
'BatchNorm '
'statistics '
'adaptation '
'in '
'the '
'initial '
'statistics '
'forgetting '
'step',
'type': 'number'}},
'type': 'object'},
'precision': {'additionalProperties': False,
'properties': {'bits': {'description': 'A '
'list '
'of '
'bitwidth '
'to '
'choose '
'from '
'when '
'performing '
'precision '
'initialization.',
'examples': [[4,
8]],
'items': {'type': 'number'},
'type': 'array'},
'bitwidth_per_scope': {'description': 'Manual '
'settings '
'for '
'the '
'quantizer '
'bitwidths. '
'Scopes '
'are '
'used '
'to '
'identify '
'the '
'quantizers.',
'items': {'description': 'A '
'tuple '
'of '
'a '
'bitwidth '
'and '
'a '
'scope '
'of '
'the '
'quantizer '
'to '
'assign '
'the '
'bitwidth '
'to.',
'items': [{'type': 'number'},
{'type': 'string'}],
'type': 'array'},
'type': 'array'},
'iter_number': {'description': 'Maximum '
'number '
'of '
'iterations '
'of '
'Hutchinson '
'algorithm '
'to '
'Estimate '
'Hessian '
'trace, '
'200 '
'by '
'default',
'type': 'number'},
'num_data_points': {'description': 'Number '
'of '
'data '
'points '
'to '
'iteratively '
'estimate '
'Hessian '
'trace, '
'200 '
'by '
'default.',
'type': 'number'},
'tolerance': {'description': 'Minimum '
'relative '
'tolerance '
'for '
'stopping '
'the '
'Hutchinson '
'algorithm. '
"It's "
'calculated '
'between '
'mean '
'average '
'trace '
'from '
'previous '
'iteration '
'and '
'current '
'one. '
'1e-5 '
'by '
'defaultbitwidth_per_scope',
'type': 'number'},
'type': {'description': 'Type '
'of '
'precision '
'initialization.',
'type': 'string'}},
'type': 'object'},
'range': {'additionalProperties': False,
'properties': {'max_percentile': {'description': 'For '
"'percentile' "
'type '
'- '
'specify '
'the '
'percentile '
'of '
'input '
'value '
'histograms '
'to '
'be '
'set '
'as '
'the '
'initial '
'value '
'for '
'maximum '
'quantizer '
'input',
'type': 'number'},
'min_percentile': {'description': 'For '
"'percentile' "
'type '
'- '
'specify '
'the '
'percentile '
'of '
'input '
'value '
'histograms '
'to '
'be '
'set '
'as '
'the '
'initial '
'value '
'for '
'minimum '
'quantizer '
'input',
'type': 'number'},
'num_init_steps': {'description': 'Number '
'of '
'batches '
'from '
'the '
'training '
'dataset '
'to '
'consume '
'as '
'sample '
'model '
'inputs '
'for '
'purposes '
'of '
'setting '
'initial '
'minimum '
'and '
'maximum '
'quantization '
'ranges',
'type': 'number'},
'type': {'description': 'Type '
'of '
'the '
'initializer '
'- '
'determines '
'which '
'statistics '
'gathered '
'during '
'initialization '
'will '
'be '
'used '
'to '
'initialize '
'the '
'quantization '
'ranges',
'type': 'string'}},
'type': 'object'}},
'type': 'object'},
'params': {'additionalProperties': False,
'properties': {'activations_quant_start_epoch': {'description': 'Epoch '
'to '
'start '
'binarizing '
'activations',
'type': 'number'},
'base_lr': {'description': 'Initial '
'value '
'of '
'learning '
'rate',
'type': 'number'},
'base_wd': {'description': 'Initial '
'value '
'of '
'weight '
'decay',
'type': 'number'},
'batch_multiplier': {'description': 'Gradients '
'will '
'be '
'accumulated '
'for '
'this '
'number '
'of '
'batches '
'before '
'doing '
'a '
"'backward' "
'call. '
'Increasing '
'this '
'may '
'improve '
'training '
'quality, '
'since '
'binarized '
'networks '
'exhibit '
'noisy '
'gradients '
'requiring '
'larger '
'batch '
'sizes '
'than '
'could '
'be '
'accomodated '
'by '
'GPUs',
'type': 'number'},
'disable_wd_start_epoch': {'description': 'Epoch '
'to '
'disable '
'weight '
'decay '
'in '
'the '
'optimizer',
'type': 'number'},
'lr_poly_drop_duration_epochs': {'description': 'Duration, '
'in '
'epochs, '
'of '
'the '
'learning '
'rate '
'dropping '
'process.',
'type': 'number'},
'lr_poly_drop_start_epoch': {'description': 'Epoch '
'to '
'start '
'dropping '
'the '
'learning '
'rate',
'type': 'number'},
'weights_quant_start_epoch': {'description': 'Epoch '
'to '
'start '
'binarizing '
'weights',
'type': 'number'}},
'type': 'object'},
'quantizable_subgraph_patterns': {'description': 'Each '
'sub-list '
'in '
'this '
'list '
'will '
'correspond '
'to a '
'sequence '
'of '
'operations '
'in '
'the '
'model '
'control '
'flow '
'graph '
'that '
'will '
'have '
'a '
'quantizer '
'appended '
'at '
'the '
'end '
'of '
'the '
'sequence',
'examples': [['cat',
'batch_norm'],
'h_swish'],
'items': {'items': {'type': 'string'},
'type': ['array',
'string']},
'type': 'array'},
'quantize_inputs': {'default': True,
'description': 'Whether the model '
'inputs should be '
'immediately '
'quantized prior to '
'any other model '
'operations.',
'type': 'boolean'},
'quantize_outputs': {'default': False,
'description': 'Whether the model '
'outputs should be '
'additionally '
'quantized.',
'type': 'boolean'},
'scope_overrides': {'description': 'This option is '
'used to specify '
'overriding '
'quantization '
'constraints for '
'specific '
'scope,e.g. in case '
'you need to '
'quantize a single '
'operation '
'differently than '
'the rest of the '
'model.',
'patternProperties': {'.*': {'additionalProperties': False,
'properties': {'bits': {'description': 'Bitwidth '
'to '
'quantize '
'to.',
'type': 'number'},
'mode': {'description': 'Mode '
'of '
'quantization',
'type': 'string'},
'per_channel': {'description': 'Whether '
'to '
'quantize '
'inputs '
'per '
'channel '
'(i.e. '
'per '
'0-th '
'dimension '
'for '
'weight '
'quantization, '
'and '
'per '
'1-st '
'dimension '
'for '
'activation '
'quantization)',
'type': 'boolean'},
'signed': {'description': 'Whether '
'to '
'use '
'signed '
'or '
'unsigned '
'input/output '
'values '
'for '
'quantization. '
'If '
'specified '
'as '
'unsigned '
'and '
'the '
'input '
'values '
'during '
'initialization '
'have '
'differing '
'signs, '
'will '
'reset '
'to '
'performing '
'signed '
'quantization '
'instead.',
'type': 'boolean'}},
'type': 'object'}},
'type': 'object'},
'target_scopes': {'description': 'A list of model '
'control flow graph '
'node scopes to be '
'considered for this '
'operation - '
'functions as a '
"'whitelist'. "
'Optional.',
'items': {'type': 'string'},
'type': ['array', 'string']},
'weights': {'additionalProperties': False,
'description': 'Constraints to be applied '
'to model weights '
'quantization only. '
'Overrides higher-level '
'settings.',
'properties': {'bits': {'description': 'Bitwidth '
'to '
'quantize '
'to.',
'type': 'number'},
'ignored_scopes': {'description': 'A '
'list '
'of '
'model '
'control '
'flow '
'graph '
'node '
'scopes '
'to '
'be '
'ignored '
'for '
'this '
'operation '
'- '
'functions '
'as '
'a '
"'blacklist'. "
'Optional.',
'oneOf': [{'items': {'type': 'string'},
'type': 'array'},
{'type': 'string'}]},
'mode': {'description': 'Mode '
'of '
'quantization',
'type': 'string'},
'per_channel': {'description': 'Whether '
'to '
'quantize '
'inputs '
'per '
'channel '
'(i.e. '
'per '
'0-th '
'dimension '
'for '
'weight '
'quantization, '
'and '
'per '
'1-st '
'dimension '
'for '
'activation '
'quantization)',
'type': 'boolean'},
'signed': {'description': 'Whether '
'to '
'use '
'signed '
'or '
'unsigned '
'input/output '
'values '
'for '
'quantization. '
'If '
'specified '
'as '
'unsigned '
'and '
'the '
'input '
'values '
'during '
'initialization '
'have '
'differing '
'signs, '
'will '
'reset '
'to '
'performing '
'signed '
'quantization '
'instead.',
'type': 'boolean'},
'target_scopes': {'description': 'A '
'list '
'of '
'model '
'control '
'flow '
'graph '
'node '
'scopes '
'to '
'be '
'considered '
'for '
'this '
'operation '
'- '
'functions '
'as '
'a '
"'whitelist'. "
'type': 'array'},
{'type': 'string'}]}},
'type': 'object'}},
'required': ['algorithm'],
'type': 'object'}

On instance:
{'algorithm': 'quantization',
'initializer': {'range': {'num_init_steps': 50}},
'per_channel': False}

I also tried 'per_channel': False in the config, but it is still incorrect. The error shows as follows:
Traceback (most recent call last):
File "train.py", line 247, in
main(args)
File "train.py", line 160, in main
nncf_config = NNCFConfig.from_json(args.nncf_config)
File "/usr/local/lib/python3.6/dist-packages/nncf/config.py", line 44, in from_json
loaded_json = json.load(f)
File "/usr/local/lib/python3.6/dist-packages/jstyleson.py", line 127, in load
return loads(fp.read(), **kwargs)
File "/usr/local/lib/python3.6/dist-packages/jstyleson.py", line 123, in loads
return json.loads(dispose(text), **kwargs)
File "/usr/lib/python3.6/json/init.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 10 column 18 (char 215)

from nncf.

vshampor commented on May 29, 2024

Try "activations": { "per_channel": false } at the same JSON level then

from nncf.

vshampor commented on May 29, 2024

@summer110669 have you had any luck with this yet?

from nncf.

rosspleban commented on May 29, 2024

@summer110669 Have you been able to improve nncf inference time?

from nncf.

Slower inference with INT8 for NNCF compared to Post-Training Optimization Toolkit and FP32 about nncf HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs