GithubHelp home page GithubHelp logo

rrze-hpc / kerncraft Goto Github PK

View Code? Open in Web Editor NEW
85.0 85.0 24.0 25.04 MB

Loop Kernel Analysis and Performance Modeling Toolkit

License: GNU Affero General Public License v3.0

C 0.16% Python 21.64% Assembly 2.10% Shell 0.35% Jupyter Notebook 75.75%

kerncraft's Issues

Error with associativity: "ways needs to be a power of 2"

When using as a machine file the one generated for the Intel Xeon E5-2640v4:

FLOPs per cycle:
  DP:
    ADD: 4
    FMA: 8
    MUL: 4
    total: 16
  SP:
    ADD: 8
    FMA: 16
    MUL: 8
    total: 32
NUMA domains per socket: 1.0

...


cacheline size:  64 B
clock: 2.47 GHz
compiler:
  clang: -03 -mavx2 -D_POSIX_C_SOURCE=200112L
  gcc: -O3 -march=core-avx2 -D_POSIX_C_SOURCE=200112L
  icc: -O3 -xCORE-AVX2 -fno-alias
cores per NUMA domain: 0.1
cores per socket: 10
memory hierarchy:
- cache per group:
    cl_size: 64
    load_from: L2
    replacement_policy: LRU
    sets: 64
    store_to: L2
    ways: 8
    write_allocate: True
    write_back: True
  cores per group: 1.0
  cycles per cacheline transfer: 1
  groups: 20
  level: L1
  performance counter metrics:
    accesses: MEM_UOPS_RETIRED_LOADS_ALL:PMC[0-3]
    evicts: L2_TRANS_L1D_WB:PMC[0-3]
    misses: L1D_REPLACEMENT:PMC[0-3]
  size per group: !!python/object:prefixedunit.PrefixedUnit
    prefix: k
    unit: B
    value: 32.0
  threads per group: 1.0
- cache per group:
    cl_size: 64
    load_from: L3
    replacement_policy: LRU
    sets: 512
    store_to: L3
    ways: 8
    write_allocate: True
    write_back: True
  cores per group: 1.0
  cycles per cacheline transfer: 2
  groups: 20
  level: L2
  performance counter metrics:
    accesses: L1D_REPLACEMENT:PMC[0-3]
    evicts: L2_TRANS_L2_WB:PMC[0-3]
    misses: L2_LINES_IN_ALL:PMC[0-3]
  size per group: !!python/object:prefixedunit.PrefixedUnit
    prefix: k
    unit: B
    value: 256.0
  threads per group: 1.0
- cache per group:
    cl_size: 64
    replacement_policy: LRU
    sets: 20480
    ways: 20
    write_allocate: True
    write_back: True
  cores per group: 10.0
  cycles per cacheline transfer: INFORMATION_REQUIRED
  groups: 2
  level: L3
  performance counter metrics:
    accesses: L2_LINES_IN_ALL:PMC[0-3]
    evicts: (LLC_VICTIMS_M:CBOX0C[01] + LLC_VICTIMS_M:CBOX1C[01] + LLC_VICTIMS_M:CBOX2C[01] +
               LLC_VICTIMS_M:CBOX3C[01] + LLC_VICTIMS_M:CBOX4C[01] + LLC_VICTIMS_M:CBOX5C[01] +
               LLC_VICTIMS_M:CBOX6C[01] + LLC_VICTIMS_M:CBOX7C[01] + LLC_VICTIMS_M:CBOX8C[01] +
               LLC_VICTIMS_M:CBOX9C[01] + LLC_VICTIMS_M:CBOX10C[01] + LLC_VICTIMS_M:CBOX11C[01] +
               LLC_VICTIMS_M:CBOX12C[01] + LLC_VICTIMS_M:CBOX13C[01] + LLC_VICTIMS_M:CBOX14C[01] +
               LLC_VICTIMS_M:CBOX15C[01] + LLC_VICTIMS_M:CBOX16C[01] + LLC_VICTIMS_M:CBOX17C[01] +
               LLC_VICTIMS_M:CBOX18C[01] + LLC_VICTIMS_M:CBOX19C[01] + LLC_VICTIMS_M:CBOX20C[01] +
               LLC_VICTIMS_M:CBOX21C[01])
    misses: (LLC_LOOKUP_DATA_READ:CBOX0C[01] + LLC_LOOKUP_DATA_READ:CBOX1C[01] +
               LLC_LOOKUP_DATA_READ:CBOX2C[01] + LLC_LOOKUP_DATA_READ:CBOX3C[01] +
               LLC_LOOKUP_DATA_READ:CBOX4C[01] + LLC_LOOKUP_DATA_READ:CBOX5C[01] +
               LLC_LOOKUP_DATA_READ:CBOX6C[01] + LLC_LOOKUP_DATA_READ:CBOX7C[01] +
               LLC_LOOKUP_DATA_READ:CBOX8C[01] + LLC_LOOKUP_DATA_READ:CBOX9C[01] +
               LLC_LOOKUP_DATA_READ:CBOX10C[01] + LLC_LOOKUP_DATA_READ:CBOX11C[01] +
               LLC_LOOKUP_DATA_READ:CBOX12C[01] + LLC_LOOKUP_DATA_READ:CBOX13C[01] +
               LLC_LOOKUP_DATA_READ:CBOX14C[01] + LLC_LOOKUP_DATA_READ:CBOX15C[01] +
               LLC_LOOKUP_DATA_READ:CBOX16C[01] + LLC_LOOKUP_DATA_READ:CBOX17C[01] +
               LLC_LOOKUP_DATA_READ:CBOX18C[01] + LLC_LOOKUP_DATA_READ:CBOX19C[01] +
               LLC_LOOKUP_DATA_READ:CBOX20C[01] + LLC_LOOKUP_DATA_READ:CBOX21C[01])
  size per group: !!python/object:prefixedunit.PrefixedUnit
    prefix: M
    unit: B
    value: 25.0
  threads per group: 10.0
- cores per group: 10
  cycles per cacheline transfer: null
  level: MEM
  penalty cycles per read stream: 0
  size per group: null
  threads per group: 10
micro-architecture: BDW
model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
model type: Intel Xeon Broadwell EN/EP/EX processor
non-overlapping model:
  performance counter metric: T_OL + T_L1L2 + T_L2L3 + T_L3MEM
  ports: ["2D", "3D"]
overlapping model:
  performance counter metric: 
    Max(UMASK_UOPS_EXECUTED_PORT_PORT_0:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_1:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_4:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_5:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_6:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_7:PMC[0-3])
  ports: ["0", "0DV", "1", "2", "2D", "3", "3D", "4", "5", "6", "7"]
sockets: 2
threads per core: 1

I get the following error:

Traceback (most recent call last):
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/bin/kerncraft", line 11, in <module>
    load_entry_point('kerncraft==0.5.10', 'console_scripts', 'kerncraft')()
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/kerncraft.py", line 295, in main
    run(parser, args)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/kerncraft.py", line 259, in run
    model = getattr(models, model_name)(kernel, machine, args, parser)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/models/ecm.py", line 88, in __init__
    self.predictor = CacheSimulationPredictor(self.kernel, self.machine, self.cores)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/cacheprediction.py", line 218, in __init__
    csim = self.machine.get_cachesim(self.cores)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/machinemodel.py", line 71, in get_cachesim
    cs, caches, mem = cachesim.CacheSimulator.from_dict(cache_dict)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/cachesim/cache.py", line 63, in from_dict
    name=name, **{k:v for k,v in conf.items() if k not in ['store_to', 'load_from']})
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/cachesim/cache.py", line 253, in __init__
    assert is_power2(ways), "ways needs to be a power of 2"
AssertionError: ways needs to be a power of 2

In this case L3 has 20 way associativity.

Should I bring it to the closest power of 2 or what?

Support for #pragma

Pass #pragma directives to compiler and ignore them otherwise. Requires modification of pycparser.

LC analysis fails due to order of indices (?)

When running:

kerncraft -p LC stencil.c -m Intel_Xeon_CPU_X5650_2.67GHz_mod.yml  -D M 1224 -D N 1224

I get:

kerncraft                                    
stencil.c -m Intel_Xeon_CPU_X5650_2.67GHz_mod.yml
-D M 1224 -D N 1224
-------------------------------------- LC --------------------------------------
Traceback (most recent call last):
  File "/home/hpc/ihpc/ihpc07/miniconda2/envs/myenv/bin/kerncraft", line 11, in <module>
    load_entry_point('kerncraft==0.6.0', 'console_scripts', 'kerncraft')()
  File "/home/hpc/ihpc/ihpc07/miniconda2/envs/myenv/lib/python3.5/site-packages/kerncraft/kerncraft.py", line 284, in main
    run(parser, args)
  File "/home/hpc/ihpc/ihpc07/miniconda2/envs/myenv/lib/python3.5/site-packages/kerncraft/kerncraft.py", line 250, in run
    model.analyze()
  File "/home/hpc/ihpc/ihpc07/miniconda2/envs/myenv/lib/python3.5/site-packages/kerncraft/models/layer_condition.py", line 201, in analyze
    raise ValueError("Can not apply layer condition, order of indices in array "
ValueError: Can not apply layer condition, order of indices in array does not follow order of loop indices. Single-dimension is currently not supported.

Content of stencil.c:

double a[M][N];
double b[M][N];
double W[M][N][2];

for(int j=1; j < M-1; j++){
for(int i=1; i < N-1; i++){
b[j][i] = W[j][i][0] * a[j][i]
+ W[j][i][1] * ((a[j][i-1] + a[j][i+1]) + (a[j-1][i] + a[j+1][i]))
;
}
}

Extra leading array indices do not work with --pmodel LC

The LC analysis in Kerncraft does not seem to cope well with arrays that have more (leading) indices than the number of loops in the nest:

$ kerncraft --machine machine-files/BroadwellEP_E5-2697_CoD.yml --pmodel LC -D M 500 -D N 500 -D L 500 --cache-predictor LC kernels/himeno.c 
================================== kerncraft ===================================
kernels/himeno.c                        -m machine-files/BroadwellEP_E5-2697_CoD.yml
-D L 500 -D M 500 -D N 500
-------------------------------------- LC --------------------------------------
Traceback (most recent call last):
  File "/home/gh/programming/python/bin/kerncraft", line 11, in <module>
    sys.exit(main())
  File "/home/gh/programming/python/lib/python3.5/site-packages/kerncraft/kerncraft.py", line 289, in main
    run(parser, args)
  File "/home/gh/programming/python/lib/python3.5/site-packages/kerncraft/kerncraft.py", line 255, in run
    model.analyze()
  File "/home/gh/programming/python/lib/python3.5/site-packages/kerncraft/models/layer_condition.py", line 222, in analyze
    raise ValueError("Can not apply layer-condition, order of indices in array "
ValueError: Can not apply layer-condition, order of indices in array does not follow order of loop indices. Single-dimension is currently not supported.

Single Report Generation

Generate a single report, combining all information that can be gathered about a kernel on a specific architecture. Including graphs and such.

Memory usage explodes

In case of long outer dimensions, memory usage explodes due to recent changes for #26

Solve by never exceeding the cache size in number of iterations

LC predictor does not complain in strided store access

replace b[j][i] with b[i][j] in 2d-5pt.c and run:

kerncraft -p ECMData 2d-5pt.c -m ../machine-files/IvyBridgeEP_E5-2660v2.yml -D N 1000 -D M 1000 --cache-pre LC

it reports (wrongfully): 6 / 6 / 0. Where as --cache-predictor SIM would reports some much higher numbers due to cache misses in L1 and L2.

likwid_bench_auto problems

I'm trying to generate a machine file for an Intel Skylake CPU (I5-6600) and continue to run into problems. Using Python 2.7.9 I installed the latest Kerncraft version with pip.

There is a typo in the current likwid_bench_auto.py:48

'cores per NUMA domain': codes_per_numa_domain,

There also seem to be problems with the PrefixedUnit type in lines 232 and 233. Sadly the script stops with an error.

sizes_per_core = [t/cores[i] for i, t in enumerate(total_sizes)]
sizes_per_thread = [t/threads[i] for i, t in enumerate(total_sizes)]

Error:

./likwid_bench_auto > mf.yml
Traceback (most recent call last):
File "./likwid_bench_auto", line 11, in
sys.exit(main())
File "/mnt/home/stud-erha1011/workspace/kerncraft/kerncraft_env/local/lib/python2.7/site-packages/kerncraft/likwid_bench_auto.py", line 206, in main
sizes_per_core = [t/cores[i] for i, t in enumerate(total_sizes)]
TypeError: unsupported operand type(s) for /: 'PrefixedUnit' and 'int'

Memory usage explodes

In case of long outer dimensions, memory usage explodes due to recent changes for #26

Solve by never exceeding the cache size in number of iterations

Benchmark mode cy/CL output for Himeno is off by a factor of 2

emmy$ kerncraft -m machine-files/IvyBridgeEP_E5-2660v2.yml --pmodel Benchmark -D L 50 -D M 500 -D N 500 --cache-predictor LC --compiler icc kernels/himeno.c
[...]
Runtime (per cacheline update): 120.84 cy/CL
MEM volume (per repetition): 760030000 Byte
Performance: 4952.08 MFLOP/s
Performance: 145.65 MLUP/s
Performance: 145.65 It/s

The performance output (4952 MFLOP/s) does not match the runtime per CL output:

(16 LUPs / 121 cy) * 34 FLOP/LUP * 2.2 Gcy/s = 9.89 GFLOP/s

This is exactly twice the reported performance number above. Based on my own benchmarks (and the performance analysis provided by the ECM model in kerncraft) I conclude that the cy/CL number is too small by a factor of 2 but the performance output is correct. I suspect this has to to with the Himeno benchmark using single-precision data.

Problems with cache predictor persists for some cases

For the following kernel at n=7*1e5 to n=13*1e5, there should be 1 evict and 1 miss corresponding to Y vector. In kerncraft this works for approx. n=7*1e5 to n=8.0*1e5 then the evicts drop to 0 strangely till approx n=11*1e5 then it raises slowly till it reaches 13*1e5.

Try for instance kerncraft -p ECM -m ~/Emmy.yml -D n 1000000 -D s 4 LC_ilj_2.c --cache-predictor=SIM -vvv , here evicts are 0.

LC_ilj.tar.gz

IACA analysis fails due to pointer_increment

When running:

 kerncraft -p ECM stencil.c -m Intel_Xeon_CPU_X5650_2.67GHz_mod.yml  -D M 1224 -D N 1224

I get:

kerncraft                                    
stencil.c -m Intel_Xeon_CPU_X5650_2.67GHz_mod.yml
-D M 1224 -D N 1224
------------------------------------- ECM --------------------------------------
IACA analysis failed: pointer_increment could not be detected automatically. Use --pointer-increment to set manually to byte offset of store pointer address between consecutive assembly block iterations.

Content of stencil.c:

double a[M][N];
double b[M][N];
double W[M][N][2];

for(int j=1; j < M-1; j++){
for(int i=1; i < N-1; i++){
b[j][i] = W[j][i][0] * a[j][i]
+ W[j][i][1] * ((a[j][i-1] + a[j][i+1]) + (a[j-1][i] + a[j+1][i]))
;
}
}

Offsets compilation error

kerncraft/examples$ kerncraft -p ECM -m machine-files/IvyBridgeEP_E5-2660v2.yml kernels/2d-5pt.c -D M 200000 -D N 20 --cache-predictor=SIM
================================== kerncraft ===================================
kernels/2d-5pt.c                        -m machine-files/IvyBridgeEP_E5-2660v2.yml
-D N 20 -D M 200000
------------------------------------- ECM --------------------------------------
Traceback (most recent call last):
  File "/home/hpc/unrz/unrza308/.conda/envs/my/bin/kerncraft", line 11, in <module>
    load_entry_point('kerncraft', 'console_scripts', 'kerncraft')()
  File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/kerncraft.py", line 294, in main
    run(parser, args)
  File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/kerncraft.py", line 261, in run
    model.analyze()
  File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/models/ecm.py", line 356, in analyze
    self._data.analyze()
  File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/models/ecm.py", line 152, in analyze
    self.calculate_cache_access()
  File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/models/ecm.py", line 87, in calculate_cache_access
    self.predictor = CacheSimulationPredictor(self.kernel, self.machine)
  File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/cacheprediction.py", line 282, in __init__
    iteration=range(bench_iteration_start, bench_iteration_end)))
  File "/home/hpc/unrz/unrza308/msc/prototype/kerncraft/kernel.py", line 421, in compile_global_offsets
    assert max(iteration) < self.subs_consts(total_length), \
ValueError: max() arg is an empty sequence

Deduplicate code in models

cache simulation is present in ecm and roofline models, as well as IACA instrumentalization (probably more)

Order YAML Entries for Easier Readability

The entries in a machine description file should follow some logical order, that the most interesting data for human readers is at top, while long tables of numbers are at the bottom.

phinally.yaml not available

In the documentation/ usage it is suggested to doqnload the machine file phinally.yaml via
wget https://raw.githubusercontent.com/RRZE-HPC/kerncraft/master/examples/machine-files/phinally.yaml

It returns a
ERROR 404: Not Found

Documentation should be modified.

Error while installing via pip

When trying to install it via pip:

pip install kerncraft
Collecting kerncraft
  Downloading kerncraft-0.5.10.tar.gz (189kB)
    100% |████████████████████████████████| 194kB 1.5MB/s 
Collecting ruamel.yaml<0.14.0,>=0.13.4 (from kerncraft)
  Using cached ruamel.yaml-0.13.14-cp27-cp27mu-manylinux1_x86_64.whl
Collecting six (from kerncraft)
  Using cached six-1.11.0-py2.py3-none-any.whl
Collecting sympy>=0.7.7 (from kerncraft)
Collecting pycachesim>=0.1.5 (from kerncraft)
Collecting pylru (from kerncraft)
Collecting numpy (from kerncraft)
  Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl
Collecting requests (from kerncraft)
  Using cached requests-2.18.4-py2.py3-none-any.whl
Collecting ruamel.ordereddict (from ruamel.yaml<0.14.0,>=0.13.4->kerncraft)
  Using cached ruamel.ordereddict-0.4.13-cp27-cp27mu-manylinux1_x86_64.whl
Collecting typing (from ruamel.yaml<0.14.0,>=0.13.4->kerncraft)
  Using cached typing-3.6.2-py2-none-any.whl
Collecting mpmath>=0.19 (from sympy>=0.7.7->kerncraft)
Collecting urllib3<1.23,>=1.21.1 (from requests->kerncraft)
  Using cached urllib3-1.22-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests->kerncraft)
  Using cached idna-2.6-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->kerncraft)
  Using cached chardet-3.0.4-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->kerncraft)
  Downloading certifi-2017.11.5-py2.py3-none-any.whl (330kB)
    100% |████████████████████████████████| 337kB 1.0MB/s 
Building wheels for collected packages: kerncraft
  Running setup.py bdist_wheel for kerncraft ... error
  Complete output from command /users/staff/ifi/guerrera/anaconda2/envs/myenv/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-hA1hbo/kerncraft/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpK5yDlrpip-wheel- --python-tag cp27:
  running bdist_wheel
  running build
  running build_py
  Build the lexing/parsing tables
  Traceback (most recent call last):
    File "_build_tables.py", line 15, in <module>
      ast_gen = ASTCodeGenerator('_c_ast.cfg')
    File "/tmp/pip-build-hA1hbo/kerncraft/kerncraft/pycparser/_ast_gen.py", line 24, in __init__
      for (name, contents) in self.parse_cfgfile(cfg_filename)]
    File "/tmp/pip-build-hA1hbo/kerncraft/kerncraft/pycparser/_ast_gen.py", line 42, in parse_cfgfile
      with open(filename, "r") as f:
  IOError: [Errno 2] No such file or directory: '_c_ast.cfg'
  creating build
  creating build/lib
  creating build/lib/kerncraft
  copying kerncraft/roofline-plot.py -> build/lib/kerncraft
  copying kerncraft/prefixedunit.py -> build/lib/kerncraft
  copying kerncraft/picklemerge.py -> build/lib/kerncraft
  copying kerncraft/machinemodel.py -> build/lib/kerncraft
  copying kerncraft/likwid_bench_auto.py -> build/lib/kerncraft
  copying kerncraft/kernel.py -> build/lib/kerncraft
  copying kerncraft/kerncraft.py -> build/lib/kerncraft
  copying kerncraft/intervals.py -> build/lib/kerncraft
  copying kerncraft/iaca_get.py -> build/lib/kerncraft
  copying kerncraft/iaca.py -> build/lib/kerncraft
  copying kerncraft/cachetile.py -> build/lib/kerncraft
  copying kerncraft/cacheprediction.py -> build/lib/kerncraft
  copying kerncraft/__init__.py -> build/lib/kerncraft
  creating build/lib/kerncraft/pycparser
  copying kerncraft/pycparser/lextab.py -> build/lib/kerncraft/pycparser
  copying kerncraft/pycparser/yacctab.py -> build/lib/kerncraft/pycparser
  copying kerncraft/pycparser/plyparser.py -> build/lib/kerncraft/pycparser
  copying kerncraft/pycparser/c_parser.py -> build/lib/kerncraft/pycparser
  copying kerncraft/pycparser/c_lexer.py -> build/lib/kerncraft/pycparser
  copying kerncraft/pycparser/c_generator.py -> build/lib/kerncraft/pycparser
  copying kerncraft/pycparser/c_ast.py -> build/lib/kerncraft/pycparser
  copying kerncraft/pycparser/ast_transforms.py -> build/lib/kerncraft/pycparser
  copying kerncraft/pycparser/_build_tables.py -> build/lib/kerncraft/pycparser
  copying kerncraft/pycparser/_ast_gen.py -> build/lib/kerncraft/pycparser
  copying kerncraft/pycparser/__init__.py -> build/lib/kerncraft/pycparser
  creating build/lib/kerncraft/models
  copying kerncraft/models/roofline.py -> build/lib/kerncraft/models
  copying kerncraft/models/layer_condition.py -> build/lib/kerncraft/models
  copying kerncraft/models/ecm.py -> build/lib/kerncraft/models
  copying kerncraft/models/benchmark.py -> build/lib/kerncraft/models
  copying kerncraft/models/__init__.py -> build/lib/kerncraft/models
  creating build/lib/kerncraft/pycparser/ply
  copying kerncraft/pycparser/ply/ygen.py -> build/lib/kerncraft/pycparser/ply
  copying kerncraft/pycparser/ply/yacc.py -> build/lib/kerncraft/pycparser/ply
  copying kerncraft/pycparser/ply/lex.py -> build/lib/kerncraft/pycparser/ply
  copying kerncraft/pycparser/ply/ctokens.py -> build/lib/kerncraft/pycparser/ply
  copying kerncraft/pycparser/ply/cpp.py -> build/lib/kerncraft/pycparser/ply
  copying kerncraft/pycparser/ply/__init__.py -> build/lib/kerncraft/pycparser/ply
  running egg_info
  writing requirements to kerncraft.egg-info/requires.txt
  writing kerncraft.egg-info/PKG-INFO
  writing top-level names to kerncraft.egg-info/top_level.txt
  writing dependency_links to kerncraft.egg-info/dependency_links.txt
  writing entry points to kerncraft.egg-info/entry_points.txt
  reading manifest file 'kerncraft.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no previously-included files found matching 'examples/kernels/*.c_compilable.c'
  writing manifest file 'kerncraft.egg-info/SOURCES.txt'
  creating build/lib/kerncraft/headers
  copying kerncraft/headers/dummy.c -> build/lib/kerncraft/headers
  copying kerncraft/headers/kerncraft.h -> build/lib/kerncraft/headers
  copying kerncraft/headers/timing.c -> build/lib/kerncraft/headers
  copying kerncraft/headers/timing.h -> build/lib/kerncraft/headers
  installing to build/bdist.linux-x86_64/wheel
  running install
  running install_lib
  creating build/bdist.linux-x86_64
  creating build/bdist.linux-x86_64/wheel
  creating build/bdist.linux-x86_64/wheel/kerncraft
  creating build/bdist.linux-x86_64/wheel/kerncraft/headers
  copying build/lib/kerncraft/headers/timing.h -> build/bdist.linux-x86_64/wheel/kerncraft/headers
  copying build/lib/kerncraft/headers/timing.c -> build/bdist.linux-x86_64/wheel/kerncraft/headers
  copying build/lib/kerncraft/headers/kerncraft.h -> build/bdist.linux-x86_64/wheel/kerncraft/headers
  copying build/lib/kerncraft/headers/dummy.c -> build/bdist.linux-x86_64/wheel/kerncraft/headers
  creating build/bdist.linux-x86_64/wheel/kerncraft/models
  copying build/lib/kerncraft/models/__init__.py -> build/bdist.linux-x86_64/wheel/kerncraft/models
  copying build/lib/kerncraft/models/benchmark.py -> build/bdist.linux-x86_64/wheel/kerncraft/models
  copying build/lib/kerncraft/models/ecm.py -> build/bdist.linux-x86_64/wheel/kerncraft/models
  copying build/lib/kerncraft/models/layer_condition.py -> build/bdist.linux-x86_64/wheel/kerncraft/models
  copying build/lib/kerncraft/models/roofline.py -> build/bdist.linux-x86_64/wheel/kerncraft/models
  creating build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  creating build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
  copying build/lib/kerncraft/pycparser/ply/__init__.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
  copying build/lib/kerncraft/pycparser/ply/cpp.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
  copying build/lib/kerncraft/pycparser/ply/ctokens.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
  copying build/lib/kerncraft/pycparser/ply/lex.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
  copying build/lib/kerncraft/pycparser/ply/yacc.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
  copying build/lib/kerncraft/pycparser/ply/ygen.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser/ply
  copying build/lib/kerncraft/pycparser/__init__.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  copying build/lib/kerncraft/pycparser/_ast_gen.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  copying build/lib/kerncraft/pycparser/_build_tables.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  copying build/lib/kerncraft/pycparser/ast_transforms.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  copying build/lib/kerncraft/pycparser/c_ast.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  copying build/lib/kerncraft/pycparser/c_generator.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  copying build/lib/kerncraft/pycparser/c_lexer.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  copying build/lib/kerncraft/pycparser/c_parser.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  copying build/lib/kerncraft/pycparser/plyparser.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  copying build/lib/kerncraft/pycparser/yacctab.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  copying build/lib/kerncraft/pycparser/lextab.py -> build/bdist.linux-x86_64/wheel/kerncraft/pycparser
  copying build/lib/kerncraft/__init__.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/cacheprediction.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/cachetile.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/iaca.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/iaca_get.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/intervals.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/kerncraft.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/kernel.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/likwid_bench_auto.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/machinemodel.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/picklemerge.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/prefixedunit.py -> build/bdist.linux-x86_64/wheel/kerncraft
  copying build/lib/kerncraft/roofline-plot.py -> build/bdist.linux-x86_64/wheel/kerncraft
  running install_egg_info
  Copying kerncraft.egg-info to build/bdist.linux-x86_64/wheel/kerncraft-0.5.10-py2.7.egg-info
  running install_scripts
  error: [Errno 2] No such file or directory: 'LICENSE'
  
  ----------------------------------------
  Failed building wheel for kerncraft
  Running setup.py clean for kerncraft
Failed to build kerncraft

User-defined Initial Values for scalars and arrays

Since all scalars and arrays are initialized with the same value (0.23), the following code will run faster than expected:

(U_op - y[j]) / R

since U_op - y[j] will be zero and the division will take less cycles to complete.

A solution would be, to make the initial values random or have the user define them.

Oscillating Cache Predictions

The predicted cache misses oscillate, probably due to cache line alignments. So far I could not find any bugs in the cache simulator or prediction algorithms that cause this behavior.

Under the assumption that there are no bugs, possible solutions could be:

  • Simulate one complete inner-loop iteration, which should smudge the effect of alignments.
  • Better alignment of the benchmark cache line, but what is the perfect alignment?

Extend search path for IACA

I would like to have an extra path for the search if possible. I installed kerncraft via EasyBuild (a scientific software management tool): it makes the software available as a modulefile. I also installed IACA through EasyBuild. When I run kerncraft with the ECM model I get the following error:

[guerrera@dmi-cl-login ~]$ kerncraft -p ECM code.c -m HaswellEX_E5-2695v3.yml -D M 200 -D N 200
================================== kerncraft ===================================
cazzo.c                                               -m HaswellEX_E5-2695v3.yml
-D M 200 -D N 200
------------------------------------- ECM --------------------------------------
IACA analysis failed: No IACA installation found in ['/users/staff/ifi/guerrera/.kerncraft/iaca/lin64/', '/users/staff/ifi/guerrera/.local/easybuild/software/kerncraft/20171205-foss-2017a-Python-3.6.1/lib/python3.6/site-packages/kerncraft-0.5.9-py3.6.egg/kerncraft/iaca/lin64/']. Run iaca_get command to fix this issue.

IACA is not installed in that path, but is installed and available in the environment at $EBROOTIACA
Or simply test if iaca is present, because being the the PATH, you can directly call as iaca and you get it running.

Is it possible to add it to the search path or test if it is available?

Lextab in the KernelCode

In kernel.py at line 583 you still have:

parser = CParser(lextab='kerncraft.pycparser.lextab',
                         yacctab='kerncraft.pycparser.yacctab')

shouldn't it be changed after the recent change?

Generic Cache Simulation

Support other caching behaviors than LRU and inclusive caching. May be even take associativity into consideration.

IACA analysis incompatible with clang (3.8 and 4.0)

I have tried kerncraft (current checkout, 0.5.7) with the himeno.c code and clang:

(python) gh@einstein:~/programming/python/kerncraft/examples$ kerncraft --machine machine-files/BroadwellEP_E5-2697_CoD.yml --pmodel ECM -D M 50 -D N 50 -D L 500 --cache-predictor LC --compiler clang-4.0 kernels/himeno.c
[...]
IACA analysis failed: pointer_increment could not be detected automatically

This happens with clang 3.8 and 4.0.

#pragma directives not embedded correctly in generated c code

The position of pragmas is misplaced in the generated c code. I have attached an example. Without this pragma kerncraft predicts loop increment wrongly for the particular kernel on Intel compilers, because the Intel compiler unrolls and jams the outer loop.
pragma_problem.tar.gz

This happens only in Benchmark mode, I ran the code like this
kerncraft -p Benchmark -m HaswellEX_E5-2695v3.yml -D N 2000000 -D s 4 irk_A_2_3loop.c

Improve handling of intermediate files

The goal is (and always was) to allow the user intuitive and transparent access to intermediately generated files (namely a compilable C version for IACA analysis, an assembly version for IACA analysis, a compilable C version with LIKWID marker API for execution). The current approach is to create a $kernel_file_name.c_compilable.c for any compilable C code (either for IACA analysis or execution with LIKWID) and a $kernel_file_name.c_compilable.s for assembly files. The c file will be different, depending on the performance model last run.

I'm in search for a cleaner way to do this, especially one that is non-ambigues about the origin and purpose of the intermediates and one that does not clutter the kernels directory.

incompatible with Python 3

kerncraft seems not to be completely compatible with python3(.4.3)

E.g.:

  File "kerncraft.py", line 187, in run
    required_consts = [v[1] for v in kernel.variables.itervalues() if v[1] is not None]
AttributeError: 'dict' object has no attribute 'itervalues'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.