rrze-hpc / kerncraft Goto Github PK
View Code? Open in Web Editor NEWLoop Kernel Analysis and Performance Modeling Toolkit
License: GNU Affero General Public License v3.0
Loop Kernel Analysis and Performance Modeling Toolkit
License: GNU Affero General Public License v3.0
Please check the following code, I get the following error: AssertionError: all statements before the for loop need to be declarations or pragmas. This happens if pragma appears in the inner loops.
See kerncraft -p ECMData -m machine-files/seiferth.yaml -D N 400 -D s 5 -D heat_N 20 -v kernels/irk_A_1_heat.c -vvv
see compilerbau 2 vorlesung (Rückwärtskanten).
In case of long outer dimensions, memory usage explodes due to recent changes for #26
Solve by never exceeding the cache size in number of iterations
In case of a Broadwell architecture, it retrieves:
cores per NUMA domain: 0.1
In case of a Bulldozer architecture, it retrieves
cores per NUMA domain: 0.125
Support other caching behaviors than LRU and inclusive caching. May be even take associativity into consideration.
Allow loading of machine model from current machine, as well as from file.
Reproduce by using L,M,N in 3d-long-range-stencil.
emmy$ kerncraft -m machine-files/IvyBridgeEP_E5-2660v2.yml --pmodel Benchmark -D L 50 -D M 500 -D N 500 --cache-predictor LC --compiler icc kernels/himeno.c
[...]
Runtime (per cacheline update): 120.84 cy/CL
MEM volume (per repetition): 760030000 Byte
Performance: 4952.08 MFLOP/s
Performance: 145.65 MLUP/s
Performance: 145.65 It/s
The performance output (4952 MFLOP/s) does not match the runtime per CL output:
(16 LUPs / 121 cy) * 34 FLOP/LUP * 2.2 Gcy/s = 9.89 GFLOP/s
This is exactly twice the reported performance number above. Based on my own benchmarks (and the performance analysis provided by the ECM model in kerncraft) I conclude that the cy/CL number is too small by a factor of 2 but the performance output is correct. I suspect this has to to with the Himeno benchmark using single-precision data.
In kernel.py at line 583 you still have:
parser = CParser(lextab='kerncraft.pycparser.lextab',
yacctab='kerncraft.pycparser.yacctab')
shouldn't it be changed after the recent change?
currently hardcoded in models/benchmark.py
In the documentation/ usage it is suggested to doqnload the machine file phinally.yaml via
wget https://raw.githubusercontent.com/RRZE-HPC/kerncraft/master/examples/machine-files/phinally.yaml
It returns a
ERROR 404: Not Found
Documentation should be modified.
Reduces tests and allows use of py3-only features
When trying to install it via pip:
pip install kerncraft
Collecting kerncraft
Downloading kerncraft-0.5.10.tar.gz (189kB)
100% |████████████████████████████████| 194kB 1.5MB/s
Collecting ruamel.yaml<0.14.0,>=0.13.4 (from kerncraft)
Using cached ruamel.yaml-0.13.14-cp27-cp27mu-manylinux1_x86_64.whl
Collecting six (from kerncraft)
Using cached six-1.11.0-py2.py3-none-any.whl
Collecting sympy>=0.7.7 (from kerncraft)
Collecting pycachesim>=0.1.5 (from kerncraft)
Collecting pylru (from kerncraft)
Collecting numpy (from kerncraft)
Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl
Collecting requests (from kerncraft)
Using cached requests-2.18.4-py2.py3-none-any.whl
Collecting ruamel.ordereddict (from ruamel.yaml<0.14.0,>=0.13.4->kerncraft)
Using cached ruamel.ordereddict-0.4.13-cp27-cp27mu-manylinux1_x86_64.whl
Collecting typing (from ruamel.yaml<0.14.0,>=0.13.4->kerncraft)
Using cached typing-3.6.2-py2-none-any.whl
Collecting mpmath>=0.19 (from sympy>=0.7.7->kerncraft)
Collecting urllib3<1.23,>=1.21.1 (from requests->kerncraft)
Using cached urllib3-1.22-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests->kerncraft)
Using cached idna-2.6-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->kerncraft)
Using cached chardet-3.0.4-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->kerncraft)
Downloading certifi-2017.11.5-py2.py3-none-any.whl (330kB)
100% |████████████████████████████████| 337kB 1.0MB/s
Building wheels for collected packages: kerncraft
Running setup.py bdist_wheel for kerncraft ... error
Complete output from command /users/staff/ifi/guerrera/anaconda2/envs/myenv/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-hA1hbo/kerncraft/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpK5yDlrpip-wheel- --python-tag cp27:
running bdist_wheel
running build
running build_py
Build the lexing/parsing tables
Traceback (most recent call last):
File "_build_tables.py", line 15, in <module>
ast_gen = ASTCodeGenerator('_c_ast.cfg')
File "/tmp/pip-build-hA1hbo/kerncraft/kerncraft/pycparser/_ast_gen.py", line 24, in __init__
for (name, contents) in self.parse_cfgfile(cfg_filename)]
File "/tmp/pip-build-hA1hbo/kerncraft/kerncraft/pycparser/_ast_gen.py", line 42, in parse_cfgfile
with open(filename, "r") as f:
IOError: [Errno 2] No such file or directory: '_c_ast.cfg'
creating build
creating build/lib
creating build/lib/kerncraft
copying kerncraft/roofline-plot.py -> build/lib/kerncraft
copying kerncraft/prefixedunit.py -> build/lib/kerncraft
copying kerncraft/picklemerge.py -> build/lib/kerncraft
copying kerncraft/machinemodel.py -> build/lib/kerncraft
copying kerncraft/likwid_bench_auto.py -> build/lib/kerncraft
copying kerncraft/kernel.py -> build/lib/kerncraft
copying kerncraft/kerncraft.py -> build/lib/kerncraft
copying kerncraft/intervals.py -> build/lib/kerncraft
copying kerncraft/iaca_get.py -> build/lib/kerncraft
copying kerncraft/iaca.py -> build/lib/kerncraft
copying kerncraft/cachetile.py -> build/lib/kerncraft
copying kerncraft/cacheprediction.py -> build/lib/kerncraft
copying kerncraft/__init__.py -> build/lib/kerncraft
creating build/lib/kerncraft/pycparser
copying kerncraft/pycparser/lextab.py -> build/lib/kerncraft/pycparser
copying kerncraft/pycparser/yacctab.py -> build/lib/kerncraft/pycparser
copying kerncraft/pycparser/plyparser.py -> build/lib/kerncraft/pycparser
copying kerncraft/pycparser/c_parser.py -> build/lib/kerncraft/pycparser
copying kerncraft/pycparser/c_lexer.py -> build/lib/kerncraft/pycparser
copying kerncraft/pycparser/c_generator.py -> build/lib/kerncraft/pycparser
copying kerncraft/pycparser/c_ast.py -> build/lib/kerncraft/pycparser
copying kerncraft/pycparser/ast_transforms.py -> build/lib/kerncraft/pycparser
copying kerncraft/pycparser/_build_tables.py -> build/lib/kerncraft/pycparser
copying kerncraft/pycparser/_ast_gen.py -> build/lib/kerncraft/pycparser
copying kerncraft/pycparser/__init__.py -> build/lib/kerncraft/pycparser
creating build/lib/kerncraft/models
copying kerncraft/models/roofline.py -> build/lib/kerncraft/models
copying kerncraft/models/layer_condition.py -> build/lib/kerncraft/models
copying kerncraft/models/ecm.py -> build/lib/kerncraft/models
copying kerncraft/models/benchmark.py -> build/lib/kerncraft/models
copying kerncraft/models/__init__.py -> build/lib/kerncraft/models
creating build/lib/kerncraft/pycparser/ply
copying kerncraft/pycparser/ply/ygen.py -> build/lib/kerncraft/pycparser/ply
copying kerncraft/pycparser/ply/yacc.py -> build/lib/kerncraft/pycparser/ply
copying kerncraft/pycparser/ply/lex.py -> build/lib/kerncraft/pycparser/ply
copying kerncraft/pycparser/ply/ctokens.py -> build/lib/kerncraft/pycparser/ply
copying kerncraft/pycparser/ply/cpp.py -> build/lib/kerncraft/pycparser/ply
copying kerncraft/pycparser/ply/__init__.py -> build/lib/kerncraft/pycparser/ply
running egg_info
writing requirements to kerncraft.egg-info/requires.txt
writing kerncraft.egg-info/PKG-INFO
writing top-level names to kerncraft.egg-info/top_level.txt
writing dependency_links to kerncraft.egg-info/dependency_links.txt
writing entry points to kerncraft.egg-info/entry_points.txt
reading manifest file 'kerncraft.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files found matching 'examples/kernels/*.c_compilable.c'
writing manifest file 'kerncraft.egg-info/SOURCES.txt'
creating build/lib/kerncraft/headers
copying kerncraft/headers/dummy.c -> build/lib/kerncraft/headers
copying kerncraft/headers/kerncraft.h -> build/lib/kerncraft/headers
copying kerncraft/headers/timing.c -> build/lib/kerncraft/headers
copying kerncraft/headers/timing.h -> build/lib/kerncraft/headers
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/kerncraft
creating build/bdist.linux-x86_64/wheel/kerncraft/headers
copying build/lib/kerncraft/headers/timing.h -> build/bdist.linux-x86_64/wheel/kerncraft/headers
copying build/lib/kerncraft/headers/timing.c -> build/bdist.linux-x86_64/wheel/kerncraft/headers
copying build/lib/kerncraft/headers/kerncraft.h -> build/bdist.linux-x86_64/wheel/kerncraft/headers
copying build/lib/kerncraft/headers/dummy.c -> build/bdist.linux-x86_64/wheel/kerncraft/headers
creating build/bdist.linux-x86_64/wheel/kerncraft/models
copying build/lib/kerncraft/models/__init__.py -> build/bdist.linux-x86_64/wheel/kerncraft/models
copying build/lib/kerncraft/models/benchmark.py -> build/bdist.linux-x86_64/wheel/kerncraft/models
copying build/lib/kerncraft/models/ecm.py -> build/bdist.linux-x86_64/wheel/kerncraft/models
copying build/lib/kerncraft/models/layer_condition.py -> build/bdist.linux-x86_64/wheel/kerncraft/models
copying build/lib/kerncraft/models/roofline.py -> build/bdist.linux-x86_64/wheel/kerncraft/models
creating build/bdist.linux-x86_64/wheel/kerncraft/pycparser
creating build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
copying build/lib/kerncraft/pycparser/ply/__init__.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
copying build/lib/kerncraft/pycparser/ply/cpp.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
copying build/lib/kerncraft/pycparser/ply/ctokens.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
copying build/lib/kerncraft/pycparser/ply/lex.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
copying build/lib/kerncraft/pycparser/ply/yacc.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
copying build/lib/kerncraft/pycparser/ply/ygen.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
copying build/lib/kerncraft/pycparser/__init__.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
copying build/lib/kerncraft/pycparser/_ast_gen.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
copying build/lib/kerncraft/pycparser/_build_tables.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
copying build/lib/kerncraft/pycparser/ast_transforms.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
copying build/lib/kerncraft/pycparser/c_ast.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
copying build/lib/kerncraft/pycparser/c_generator.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
copying build/lib/kerncraft/pycparser/c_lexer.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
copying build/lib/kerncraft/pycparser/c_parser.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
copying build/lib/kerncraft/pycparser/plyparser.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
copying build/lib/kerncraft/pycparser/yacctab.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
copying build/lib/kerncraft/pycparser/lextab.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
copying build/lib/kerncraft/__init__.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/cacheprediction.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/cachetile.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/iaca.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/iaca_get.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/intervals.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/kerncraft.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/kernel.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/likwid_bench_auto.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/machinemodel.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/picklemerge.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/prefixedunit.py -> build/bdist.linux-x86_64/wheel/kerncraft
copying build/lib/kerncraft/roofline-plot.py -> build/bdist.linux-x86_64/wheel/kerncraft
running install_egg_info
Copying kerncraft.egg-info to build/bdist.linux-x86_64/wheel/kerncraft-0.5.10-py2.7.egg-info
running install_scripts
error: [Errno 2] No such file or directory: 'LICENSE'
----------------------------------------
Failed building wheel for kerncraft
Running setup.py clean for kerncraft
Failed to build kerncraft
machine description -> hardware description
model -> mode
replace b[j][i]
with b[i][j]
in 2d-5pt.c
and run:
kerncraft -p ECMData 2d-5pt.c -m ../machine-files/IvyBridgeEP_E5-2660v2.yml -D N 1000 -D M 1000 --cache-pre LC
it reports (wrongfully): 6 / 6 / 0. Where as --cache-predictor SIM
would reports some much higher numbers due to cache misses in L1 and L2.
The entries in a machine description file should follow some logical order, that the most interesting data for human readers is at top, while long tables of numbers are at the bottom.
Since all scalars and arrays are initialized with the same value (0.23), the following code will run faster than expected:
(U_op - y[j]) / R
since U_op - y[j]
will be zero and the division will take less cycles to complete.
A solution would be, to make the initial values random or have the user define them.
I have tried kerncraft (current checkout, 0.5.7) with the himeno.c code and clang:
(python) gh@einstein:~/programming/python/kerncraft/examples$ kerncraft --machine machine-files/BroadwellEP_E5-2697_CoD.yml --pmodel ECM -D M 50 -D N 50 -D L 500 --cache-predictor LC --compiler clang-4.0 kernels/himeno.c
[...]
IACA analysis failed: pointer_increment could not be detected automatically
This happens with clang 3.8 and 4.0.
In the code attached LC running kerncraft in LC mode predicts 7 misses and 1 evict, but actually the value is 6 misses and 1 evict. The SIM mode is fine here. An example run : kerncraft -p ECM -m Emmy.yml -D n 1000000 -D s 4 LC_lji.c --cache-predictor=LC -vvv
For the following kernel at n=7*1e5 to n=13*1e5, there should be 1 evict and 1 miss corresponding to Y vector. In kerncraft this works for approx. n=7*1e5 to n=8.0*1e5 then the evicts drop to 0 strangely till approx n=11*1e5 then it raises slowly till it reaches 13*1e5.
Try for instance kerncraft -p ECM -m ~/Emmy.yml -D n 1000000 -D s 4 LC_ilj_2.c --cache-predictor=SIM -vvv , here evicts are 0.
After running kerncraft the following files can be found in the current working directory:
They do not seem to be needed by the user.
kerncraft seems not to be completely compatible with python3(.4.3)
E.g.:
File "kerncraft.py", line 187, in run
required_consts = [v[1] for v in kernel.variables.itervalues() if v[1] is not None]
AttributeError: 'dict' object has no attribute 'itervalues'
Pass #pragma directives to compiler and ignore them otherwise. Requires modification of pycparser.
When using as a machine file the one generated for the Intel Xeon E5-2640v4:
FLOPs per cycle:
DP:
ADD: 4
FMA: 8
MUL: 4
total: 16
SP:
ADD: 8
FMA: 16
MUL: 8
total: 32
NUMA domains per socket: 1.0
...
cacheline size: 64 B
clock: 2.47 GHz
compiler:
clang: -03 -mavx2 -D_POSIX_C_SOURCE=200112L
gcc: -O3 -march=core-avx2 -D_POSIX_C_SOURCE=200112L
icc: -O3 -xCORE-AVX2 -fno-alias
cores per NUMA domain: 0.1
cores per socket: 10
memory hierarchy:
- cache per group:
cl_size: 64
load_from: L2
replacement_policy: LRU
sets: 64
store_to: L2
ways: 8
write_allocate: True
write_back: True
cores per group: 1.0
cycles per cacheline transfer: 1
groups: 20
level: L1
performance counter metrics:
accesses: MEM_UOPS_RETIRED_LOADS_ALL:PMC[0-3]
evicts: L2_TRANS_L1D_WB:PMC[0-3]
misses: L1D_REPLACEMENT:PMC[0-3]
size per group: !!python/object:prefixedunit.PrefixedUnit
prefix: k
unit: B
value: 32.0
threads per group: 1.0
- cache per group:
cl_size: 64
load_from: L3
replacement_policy: LRU
sets: 512
store_to: L3
ways: 8
write_allocate: True
write_back: True
cores per group: 1.0
cycles per cacheline transfer: 2
groups: 20
level: L2
performance counter metrics:
accesses: L1D_REPLACEMENT:PMC[0-3]
evicts: L2_TRANS_L2_WB:PMC[0-3]
misses: L2_LINES_IN_ALL:PMC[0-3]
size per group: !!python/object:prefixedunit.PrefixedUnit
prefix: k
unit: B
value: 256.0
threads per group: 1.0
- cache per group:
cl_size: 64
replacement_policy: LRU
sets: 20480
ways: 20
write_allocate: True
write_back: True
cores per group: 10.0
cycles per cacheline transfer: INFORMATION_REQUIRED
groups: 2
level: L3
performance counter metrics:
accesses: L2_LINES_IN_ALL:PMC[0-3]
evicts: (LLC_VICTIMS_M:CBOX0C[01] + LLC_VICTIMS_M:CBOX1C[01] + LLC_VICTIMS_M:CBOX2C[01] +
LLC_VICTIMS_M:CBOX3C[01] + LLC_VICTIMS_M:CBOX4C[01] + LLC_VICTIMS_M:CBOX5C[01] +
LLC_VICTIMS_M:CBOX6C[01] + LLC_VICTIMS_M:CBOX7C[01] + LLC_VICTIMS_M:CBOX8C[01] +
LLC_VICTIMS_M:CBOX9C[01] + LLC_VICTIMS_M:CBOX10C[01] + LLC_VICTIMS_M:CBOX11C[01] +
LLC_VICTIMS_M:CBOX12C[01] + LLC_VICTIMS_M:CBOX13C[01] + LLC_VICTIMS_M:CBOX14C[01] +
LLC_VICTIMS_M:CBOX15C[01] + LLC_VICTIMS_M:CBOX16C[01] + LLC_VICTIMS_M:CBOX17C[01] +
LLC_VICTIMS_M:CBOX18C[01] + LLC_VICTIMS_M:CBOX19C[01] + LLC_VICTIMS_M:CBOX20C[01] +
LLC_VICTIMS_M:CBOX21C[01])
misses: (LLC_LOOKUP_DATA_READ:CBOX0C[01] + LLC_LOOKUP_DATA_READ:CBOX1C[01] +
LLC_LOOKUP_DATA_READ:CBOX2C[01] + LLC_LOOKUP_DATA_READ:CBOX3C[01] +
LLC_LOOKUP_DATA_READ:CBOX4C[01] + LLC_LOOKUP_DATA_READ:CBOX5C[01] +
LLC_LOOKUP_DATA_READ:CBOX6C[01] + LLC_LOOKUP_DATA_READ:CBOX7C[01] +
LLC_LOOKUP_DATA_READ:CBOX8C[01] + LLC_LOOKUP_DATA_READ:CBOX9C[01] +
LLC_LOOKUP_DATA_READ:CBOX10C[01] + LLC_LOOKUP_DATA_READ:CBOX11C[01] +
LLC_LOOKUP_DATA_READ:CBOX12C[01] + LLC_LOOKUP_DATA_READ:CBOX13C[01] +
LLC_LOOKUP_DATA_READ:CBOX14C[01] + LLC_LOOKUP_DATA_READ:CBOX15C[01] +
LLC_LOOKUP_DATA_READ:CBOX16C[01] + LLC_LOOKUP_DATA_READ:CBOX17C[01] +
LLC_LOOKUP_DATA_READ:CBOX18C[01] + LLC_LOOKUP_DATA_READ:CBOX19C[01] +
LLC_LOOKUP_DATA_READ:CBOX20C[01] + LLC_LOOKUP_DATA_READ:CBOX21C[01])
size per group: !!python/object:prefixedunit.PrefixedUnit
prefix: M
unit: B
value: 25.0
threads per group: 10.0
- cores per group: 10
cycles per cacheline transfer: null
level: MEM
penalty cycles per read stream: 0
size per group: null
threads per group: 10
micro-architecture: BDW
model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
model type: Intel Xeon Broadwell EN/EP/EX processor
non-overlapping model:
performance counter metric: T_OL + T_L1L2 + T_L2L3 + T_L3MEM
ports: ["2D", "3D"]
overlapping model:
performance counter metric:
Max(UMASK_UOPS_EXECUTED_PORT_PORT_0:PMC[0-3],
UMASK_UOPS_EXECUTED_PORT_PORT_1:PMC[0-3],
UMASK_UOPS_EXECUTED_PORT_PORT_4:PMC[0-3],
UMASK_UOPS_EXECUTED_PORT_PORT_5:PMC[0-3],
UMASK_UOPS_EXECUTED_PORT_PORT_6:PMC[0-3],
UMASK_UOPS_EXECUTED_PORT_PORT_7:PMC[0-3])
ports: ["0", "0DV", "1", "2", "2D", "3", "3D", "4", "5", "6", "7"]
sockets: 2
threads per core: 1
I get the following error:
Traceback (most recent call last):
File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/bin/kerncraft", line 11, in <module>
load_entry_point('kerncraft==0.5.10', 'console_scripts', 'kerncraft')()
File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/kerncraft.py", line 295, in main
run(parser, args)
File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/kerncraft.py", line 259, in run
model = getattr(models, model_name)(kernel, machine, args, parser)
File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/models/ecm.py", line 88, in __init__
self.predictor = CacheSimulationPredictor(self.kernel, self.machine, self.cores)
File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/cacheprediction.py", line 218, in __init__
csim = self.machine.get_cachesim(self.cores)
File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/machinemodel.py", line 71, in get_cachesim
cs, caches, mem = cachesim.CacheSimulator.from_dict(cache_dict)
File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/cachesim/cache.py", line 63, in from_dict
name=name, **{k:v for k,v in conf.items() if k not in ['store_to', 'load_from']})
File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/cachesim/cache.py", line 253, in __init__
assert is_power2(ways), "ways needs to be a power of 2"
AssertionError: ways needs to be a power of 2
In this case L3 has 20 way associativity.
Should I bring it to the closest power of 2 or what?
In case of long outer dimensions, memory usage explodes due to recent changes for #26
Solve by never exceeding the cache size in number of iterations
When running:
kerncraft -p ECM stencil.c -m Intel_Xeon_CPU_X5650_2.67GHz_mod.yml -D M 1224 -D N 1224
I get:
kerncraft
stencil.c -m Intel_Xeon_CPU_X5650_2.67GHz_mod.yml
-D M 1224 -D N 1224
------------------------------------- ECM --------------------------------------
IACA analysis failed: pointer_increment could not be detected automatically. Use --pointer-increment to set manually to byte offset of store pointer address between consecutive assembly block iterations.
Content of stencil.c:
double a[M][N];
double b[M][N];
double W[M][N][2];
for(int j=1; j < M-1; j++){
for(int i=1; i < N-1; i++){
b[j][i] = W[j][i][0] * a[j][i]
+ W[j][i][1] * ((a[j][i-1] + a[j][i+1]) + (a[j-1][i] + a[j+1][i]))
;
}
}
The LC analysis in Kerncraft does not seem to cope well with arrays that have more (leading) indices than the number of loops in the nest:
$ kerncraft --machine machine-files/BroadwellEP_E5-2697_CoD.yml --pmodel LC -D M 500 -D N 500 -D L 500 --cache-predictor LC kernels/himeno.c
================================== kerncraft ===================================
kernels/himeno.c -m machine-files/BroadwellEP_E5-2697_CoD.yml
-D L 500 -D M 500 -D N 500
-------------------------------------- LC --------------------------------------
Traceback (most recent call last):
File "/home/gh/programming/python/bin/kerncraft", line 11, in <module>
sys.exit(main())
File "/home/gh/programming/python/lib/python3.5/site-packages/kerncraft/kerncraft.py", line 289, in main
run(parser, args)
File "/home/gh/programming/python/lib/python3.5/site-packages/kerncraft/kerncraft.py", line 255, in run
model.analyze()
File "/home/gh/programming/python/lib/python3.5/site-packages/kerncraft/models/layer_condition.py", line 222, in analyze
raise ValueError("Can not apply layer-condition, order of indices in array "
ValueError: Can not apply layer-condition, order of indices in array does not follow order of loop indices. Single-dimension is currently not supported.
The following code has a problem in the calculation of iteration counts. This could be clearly observed by the deviation in ECM and Benchmark modes.
wrong_cl.tar.gz
a solution would be to manualy set --pointer-increment
The goal is (and always was) to allow the user intuitive and transparent access to intermediately generated files (namely a compilable C version for IACA analysis, an assembly version for IACA analysis, a compilable C version with LIKWID marker API for execution). The current approach is to create a $kernel_file_name.c_compilable.c for any compilable C code (either for IACA analysis or execution with LIKWID) and a $kernel_file_name.c_compilable.s for assembly files. The c file will be different, depending on the performance model last run.
I'm in search for a cleaner way to do this, especially one that is non-ambigues about the origin and purpose of the intermediates and one that does not clutter the kernels directory.
I would like to have an extra path for the search if possible. I installed kerncraft via EasyBuild (a scientific software management tool): it makes the software available as a modulefile. I also installed IACA through EasyBuild. When I run kerncraft with the ECM model I get the following error:
[guerrera@dmi-cl-login ~]$ kerncraft -p ECM code.c -m HaswellEX_E5-2695v3.yml -D M 200 -D N 200
================================== kerncraft ===================================
cazzo.c -m HaswellEX_E5-2695v3.yml
-D M 200 -D N 200
------------------------------------- ECM --------------------------------------
IACA analysis failed: No IACA installation found in ['/users/staff/ifi/guerrera/.kerncraft/iaca/lin64/', '/users/staff/ifi/guerrera/.local/easybuild/software/kerncraft/20171205-foss-2017a-Python-3.6.1/lib/python3.6/site-packages/kerncraft-0.5.9-py3.6.egg/kerncraft/iaca/lin64/']. Run iaca_get command to fix this issue.
IACA is not installed in that path, but is installed and available in the environment at $EBROOTIACA
Or simply test if iaca is present, because being the the PATH, you can directly call as iaca and you get it running.
Is it possible to add it to the search path or test if it is available?
The predicted cache misses oscillate, probably due to cache line alignments. So far I could not find any bugs in the cache simulator or prediction algorithms that cause this behavior.
Under the assumption that there are no bugs, possible solutions could be:
especially GCC, but also clang and PGI
kerncraft/examples$ kerncraft -p ECM -m machine-files/IvyBridgeEP_E5-2660v2.yml kernels/2d-5pt.c -D M 200000 -D N 20 --cache-predictor=SIM
================================== kerncraft ===================================
kernels/2d-5pt.c -m machine-files/IvyBridgeEP_E5-2660v2.yml
-D N 20 -D M 200000
------------------------------------- ECM --------------------------------------
Traceback (most recent call last):
File "/home/hpc/unrz/unrza308/.conda/envs/my/bin/kerncraft", line 11, in <module>
load_entry_point('kerncraft', 'console_scripts', 'kerncraft')()
File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/kerncraft.py", line 294, in main
run(parser, args)
File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/kerncraft.py", line 261, in run
model.analyze()
File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/models/ecm.py", line 356, in analyze
self._data.analyze()
File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/models/ecm.py", line 152, in analyze
self.calculate_cache_access()
File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/models/ecm.py", line 87, in calculate_cache_access
self.predictor = CacheSimulationPredictor(self.kernel, self.machine)
File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/cacheprediction.py", line 282, in __init__
iteration=range(bench_iteration_start, bench_iteration_end)))
File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/kernel.py", line 421, in compile_global_offsets
assert max(iteration) < self.subs_consts(total_length), \
ValueError: max() arg is an empty sequence
I'm trying to generate a machine file for an Intel Skylake CPU (I5-6600) and continue to run into problems. Using Python 2.7.9 I installed the latest Kerncraft version with pip.
There is a typo in the current likwid_bench_auto.py:48
'cores per NUMA domain': codes_per_numa_domain,
There also seem to be problems with the PrefixedUnit type in lines 232 and 233. Sadly the script stops with an error.
sizes_per_core = [t/cores[i] for i, t in enumerate(total_sizes)]
sizes_per_thread = [t/threads[i] for i, t in enumerate(total_sizes)]
Error:
./likwid_bench_auto > mf.yml
Traceback (most recent call last):
File "./likwid_bench_auto", line 11, in
sys.exit(main())
File "/mnt/home/stud-erha1011/workspace/kerncraft/kerncraft_env/local/lib/python2.7/site-packages/kerncraft/likwid_bench_auto.py", line 206, in main
sizes_per_core = [t/cores[i] for i, t in enumerate(total_sizes)]
TypeError: unsupported operand type(s) for /: 'PrefixedUnit' and 'int'
Generate a single report, combining all information that can be gathered about a kernel on a specific architecture. Including graphs and such.
Currently kerncraft crashes when the code contains comments.
Merge changes to upstream and go back to using vanilla pycparser
When running:
kerncraft -p LC stencil.c -m Intel_Xeon_CPU_X5650_2.67GHz_mod.yml -D M 1224 -D N 1224
I get:
kerncraft
stencil.c -m Intel_Xeon_CPU_X5650_2.67GHz_mod.yml
-D M 1224 -D N 1224
-------------------------------------- LC --------------------------------------
Traceback (most recent call last):
File "/home/hpc/ihpc/ihpc07/miniconda2/envs/myenv/bin/kerncraft", line 11, in <module>
load_entry_point('kerncraft==0.6.0', 'console_scripts', 'kerncraft')()
File "/home/hpc/ihpc/ihpc07/miniconda2/envs/myenv/lib/python3.5/site-packages/kerncraft/kerncraft.py", line 284, in main
run(parser, args)
File "/home/hpc/ihpc/ihpc07/miniconda2/envs/myenv/lib/python3.5/site-packages/kerncraft/kerncraft.py", line 250, in run
model.analyze()
File "/home/hpc/ihpc/ihpc07/miniconda2/envs/myenv/lib/python3.5/site-packages/kerncraft/models/layer_condition.py", line 201, in analyze
raise ValueError("Can not apply layer condition, order of indices in array "
ValueError: Can not apply layer condition, order of indices in array does not follow order of loop indices. Single-dimension is currently not supported.
Content of stencil.c:
double a[M][N];
double b[M][N];
double W[M][N][2];
for(int j=1; j < M-1; j++){
for(int i=1; i < N-1; i++){
b[j][i] = W[j][i][0] * a[j][i]
+ W[j][i][1] * ((a[j][i-1] + a[j][i+1]) + (a[j-1][i] + a[j+1][i]))
;
}
}
cache simulation is present in ecm
and roofline
models, as well as IACA instrumentalization (probably more)
The position of pragmas is misplaced in the generated c code. I have attached an example. Without this pragma kerncraft predicts loop increment wrongly for the particular kernel on Intel compilers, because the Intel compiler unrolls and jams the outer loop.
pragma_problem.tar.gz
This happens only in Benchmark mode, I ran the code like this
kerncraft -p Benchmark -m HaswellEX_E5-2695v3.yml -D N 2000000 -D s 4 irk_A_2_3loop.c
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.