GithubHelp home page GithubHelp logo

trackreco / mkfit Goto Github PK

View Code? Open in Web Editor NEW
17.0 17.0 15.0 28.05 MB

Vectorized, Parallelized Tracking

Home Page: https://trackreco.github.io/

License: Apache License 2.0

C++ 85.48% C 9.02% Makefile 0.21% Perl 1.28% Python 1.02% Shell 2.89% PHP 0.09%

mkfit's People

Contributors

areinsvo avatar cerati avatar dan131riley avatar davidlange6 avatar imacneill avatar kmcdermo avatar leonardogiannini avatar makortel avatar mmasciov avatar osschar avatar pwittich avatar slava77 avatar slava77devel avatar srlantz avatar tresreid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mkfit's Issues

include -ipo in optimization options

@brnorris03 pointed out that along with -O3 it would be good to specify -ipo to enable icc to do interprocedural optimizations. This allows inlining of functions that are defined in other files. It could be pretty important for vectorization, as icc won't vectorize a loop that contains function calls unless the functions are (a) standard ones like sin or cos or (b) able to be inlined.

The equivalent option for gcc is -fwhole-program. Including this is a little awkward because Makefile.config only has one set of optimization options (OPT) that applies to both compilers. (Maybe the structure of Makefile.config should be revisited. As of now it assumes the compiler is icc if INTEL_COMPILER_LICENSE exists in the environment. Also there's a bunch of old MIC stuff in there.)

Migrate to multi-threaded ROOT6 for validation

Not so much an issue, more of a sticky for an update to the validation. At the moment, PlotValidation.cpp (the main code for reading the trees and making the physics performance plots) is all serial and takes quite a bit of time to run.

With the developments in ROOT6 for parallelism + multithreading, I want to at some point update the code to take advantage of this. The tutorials from ROOT provide some good examples: https://root.cern.ch/doc/v612/group__tutorial__multicore.html

Namely, parallelize the loop over entries when reading the tree by using a TTreeReaderValue, making a vector of TThreadedObjects, or something like this.

This is rather low priority for me. This would be a fun side project when I get time (don't know when...) to dive into this and multithread the validation code. Considering our project is all about MT, this would be nice exercise into what ROOT has under the hood for doing these sorts of tasks.

Building and fitting

I wanted to open an issue until we decide what we want to do with building and fitting. There are three distinct options:

  1. Separate building and fitting routines (current situation)
  2. Combined building and fitting routines (previous situation)
  3. Another option: keep options for both (have building and fitting separate, and also have them together) --> would require careful logic and command line options

If we have b+f together in a single task (even as an option from an option from split), we would need to abstract the copy I/O to and from Matriplex data structures to and from the normal track vectors. This is because we would have to reuse the event_of_comb_candidates in a combined b+f, and this would overwrite the track parameters between building and fitting. Since we wish to validate the track quality at each stage separately, we would have to have the copy in/out as part of the TBB task.

This is not the end of the world, as we can always enable the copy in/out a configurable that depends on if validation is being run or not. For the timing benchmarks, we can decide if it is fair or not to exclude such a section from the time.

We discussed this a bit at the last meeting, Jan 26, so I will link the relevant slides here (flip to slide 8): https://indico.cern.ch/event/690884/contributions/2836372/attachments/1589129/2516956/BkFit_260118.pdf

@osschar @slava77 @dan131riley @cerati

Removal of obsolete branches

Can you please delete branches created by you that are not relevant anymore? I guess it's best everyone deletes branches created by himself, including the merged branches.

I did cleanup of my branches just now, removed about 10 and kept 7 that I might still need ... I'll eventually move them to a clone of this repo, it makes no sense to keep them here.

Parallelizing seed preparation methods

As mentioned in PR #95 , I think we should start thinking about how we would parallelize the functions outside of the standard building routines if we want to start comparing total wall clock time for nEvents using multiple events in flight.

This mainly would be the sections dedicated to preparing the seeds for the building. I already mentioned using parallel_for's inside the clean_cms_seedtracks. We could also do the same for:

  1. clean_cms_simtracks: https://github.com/cerati/mictest/blob/full-det-tracking/Event.cc#L627
  2. create_seeds_from_simtracks: https://github.com/cerati/mictest/blob/full-det-tracking/mkFit/MkBuilder.cc#L319
  3. import_seeds: https://github.com/cerati/mictest/blob/full-det-tracking/mkFit/MkBuilder.cc#L407-L419
  4. map_seed_hits: https://github.com/cerati/mictest/blob/full-det-tracking/mkFit/MkBuilder.cc#L1004

Although to really take advantage of the parallel_for's, we would probably need to switch to tbb::concurrent_vector for the seed track containers. In fact, I have shown the potential of such containers before with find_seeds().

For 4., we also do not have to map hits for every layer... just those that correspond to the list of seed layers as given to us by the plugin. Presumably it is fairly trivial to make a function that returns (or is passed by reference) a vector of the seed layer indices (to reduce the number of mappings). This would also be nice a feature for remap_seed_hits() .

Adapt benchmark scripts to try more nVU

Following the studies from @srlantz and myself, there was a request to adapt the benchmarking scripts to try nVU sizes larger than the vector width, i.e. testing the effects of having things loaded into L1 vs L2.

Mostly a straightforward change to the scripts. Will need to check which is the optimum for each platform to set for nTH tests.

Studies are here: https://indico.cern.ch/event/783877/contributions/3261832/attachments/1825229/2986852/MatriplexSizesAndCaches.pdf

Validation fixups

Per discussions over the last few weeks, there are a few things that need to be cleaned up in the validation.

  • Create "fair" comparison between CMSSW and mkFit in simtrack validation by using the same seed collection
    • Currently, CMSSW uses full CMSSW seed collection, while mkFit uses the n^2 seed collection
    • This matters because we mark simtracks "findable" if they share 4 hits with a seed track, meaning CMSSW has a larger denom than mkFit
    • Proposed fix: use full CMSSW seed collection for marking findability
  • Per an internal thread raised by @mmasciov , need to account for pixel founds in building as valid hits versus excluding them from hit matching denom
    • The is_seed variable (a bool signifying if a hit is on a seeding layer, i.e. in iter0 if it is a pixel hit) automatically excludes these hits in hit/layer counting
    • However, both mkFit tracks and CMSSW tracks can find additional pixel hits in the building step if they are in the transition region --> unfair to exclude these hits from counting
    • Proposed fix: Simply add a check in nUniqueLayers() to allow counting of is_seed layers after Config::nlayers_per_seed (==4) has been reached
    • Sticky point: possible that CMSSW reco track on outward building picks up 1-2 pixel layers, but on backward fit, cleaned some out of the original pixel seed hits.
  • Figure out why axis titles are being dropped (part of PR #143)

Makefile.config inconsistencies between icc and gcc

When AVX_512 is defined in the make command line, the compiler options for gcc and icc become, respectively (from Makefile.config):

VEC_GCC  := -mavx512f -mavx512cd
VEC_ICC := -xHost -qopt-zmm-usage=high

However, these sets of options are not doing the same thing: -mavx512f -mavx512cd (gcc) translates to -xCOMMON-AVX512 (icc), whereas -xHost (icc) translates to -march=native (gcc). The latter means to use instructions appropriate to the processor that is running the compilation. This actually brings in quite a few more AVX-512 instructions on SKL-SP vs. KNL (i.e., beyond the subset common to both). It's also troubling that on SNB, specifying make AVX-512:=1 produces code that has no AVX-512 instructions at all.

Relevant comments already appear in PR #165, especially:

  • @dan131riley's comment that -xCOMMON-AVX512 was intentionally replaced with -xHost in PR #140 due to an apparent icc compiler bug (reference to issue #139), and
  • @kmcdermo's comment that in the benchmarking scripts we don't cross-compile anymore, we scp the source directly to each test machine and compile locally.

Whatever the history, I think we want to keep these options consistent between icc and gcc. There are a couple of ways to do this.

  1. The icc compiler bug may be fixed by now. If so, we specify -xCOMMON-AVX512 again for icc when AVX_512:=1 and include a comment about which versions of icc have trouble with it.
  2. We make a new distinction in Makefile.config, so that icc uses -xCORE-AVX512 for Skylake-SP and -xMIC-AVX512 for KNL. Similarly, gcc would use -march=skylake-avx512 and -march=knl in those cases (these options are valid in recent gcc versions). The icc options are equivalent to -xHost on each AVX-512 platform, so they should not trigger the compiler bug in icc versions that are affected by it. Accordingly, we would eliminate AVX_512 from make and split it into (say) CORE-AVX512 and MIC-AVX512.

In addition, we could consider changing the default vectorization options to be -xHost (icc) and -march=native (gcc). This may be more sensible than what we currently do, which is to specify the lowest common denominator of all the available processors.

Differences between mkFit standalone validation vs MTV

Main Issue

This has been an open issue for awhile now, and I wanted to make one place where we have some plots for reference.

The main hypothesis is that we are losing on short tracks with non-positive definite covariance matrices. Namely, the loss in efficiency from mkFit tracks compared to CMSSW tracks as seen in MTV is due to these tracks being dropped in the producer that interfaces between mkFit output and MTV input. These wacky covariances come from the backward fit within mkFit, and in particular seem to affect shorter tracks greater than longer ones.

The MTV results are in stark contrast to our standalone validation in which we see the near identical performance above pT > 0.9 GeV between mkFit and CMSSW and significantly better performance in mkFit compared to CMSSW in the barrel for tracks pT > 0 GeV.

To illustrate this hypothesis, I shamelessly am stealing some slides from Allie's talk for CMS Week that highlight the differences in definitions between mkFit standalone validation and MTV, as well as some plots of efficiency vs eta and pT. In addition, I have attached the efficiency vs number of layers from MTV from Giuseppe.

Varying Requirements with mkFit Validation

I also made some plots varying the definition of matching and good tracks. Keeping the definition of hit matching the same, the results are below:

I also changed the definition of the hit matching from 50% after the seed, to 75% including the seed (to better approximate MTV), with the results below:

I made some very quick slides demonstrating the different minimum layer requirements and hit matching schemes. As can be seen, we definitely do not do as well with lower layer requirements on tracks, as well as with the 75% matching criteria. This is most notable at low pT, but definitely affects the full pT and eta spectrum.

Main takeaways

  • mkFit struggles to find short sim tracks (i.e. tracks with ~10 layers including the seed)
  • mkFit tends to add bad hits for longer tracks, as seen in worsening efficiency with the 75% matching threshold with all reco hits

Proposed studies to determine where mkFit starts to fall off w.r.t. CMSSW

  • Dedicated study scanning the efficiency by varying the hit matching threshold with and without the seed
  • Dedicated study scanning the efficiency by varying the nLayers requirements for sim and reco tracks

Recipes

To produce the SimVal plots like those above for quick comparisons, simply run the validation script as normal (using the forConf parameter):

./val_scripts/validation-cmssw-benchmarks.sh forConf

You can drop the CMSSWVal to save on time by and then drop the irrelevant directories and plots in the web scripts by using the diffs below:

Below is a list of diffs used to make changes to minimum layer requirements and hit matching:

All of the diffs are .txt files (but really should .patch files, but GH Markdown won't let me upload that extension). To apply a patch file directly, do:

git patch <name_of_text_file>

Fix application of material effects

As discussed in PR #190, there is room for improvement when applying the material effects. On the docket for relatively straightforward-to-implement changes are the following:

  1. When in a split stereo layer, skip applying material effects in propagation to 2nd part of split layer if the 1st split layer had material effects applied.
  2. Use dot product of dR/|dZ| + track momentum to apply correct material constants when going forward or backward.

To achieve 1), we can code this into the layer plan, which will have a flag to skip material effects if on the second half of a split layer. However, if we miss a hit on the previous layer, no material effects will be applied for this layer.

This technicality was actually already mentioned in the PR: if we do not find a hit on a layer n, add then propagate to n+1, the material effects are applied for only n -> n+1, where it should be n-1 -> n+1 (i.e. integrate the full material passed). This is a non-trivial task to make such a tool but could be done. Would require a look-up of last layers, then getting the material constants for each, and then applying the constants in one fell swoop.

turbo off on phi3

The consensus between the group is to leave turbo=off on phi3 and turn it on only when making tests, as of the latest meeting. If so, I suggest we make an update to the documentation, run the benchmarking and declare a new baseline (if in fact turbo is off on phi3).

Phi outside of [-pi, pi] in newest data (with ccc, I think)

I did this quick fix ... but maybe we should squash it at the source.

matevz@phi3 ksegv> git diff
diff --git a/mkFit/HitStructures.cc b/mkFit/HitStructures.cc
index 9bcb127..98b65db 100644
--- a/mkFit/HitStructures.cc
+++ b/mkFit/HitStructures.cc
@@ -157,7 +157,7 @@ void LayerOfHits::SuckInHits(const HitVec &hitv)
       curr_phi_bin = 0;
     }

-    int phi_bin = GetPhiBin(ha[j].phi);
+    int phi_bin = std::min(std::max(GetPhiBin(ha[j].phi), 0), Config::m_nphi - 1);

     if (phi_bin > curr_phi_bin)
     {

Return to use of benchmarking scripts

Hi all,

As we have discussed in previous meetings, now that full-det-tracking has been merged, we ought to get back to regularly benchmarking the code: one for checking diffs between PRs and two to better understand our performance in general. These two can be in conflict on how we want to do to these measurements in a manageable amount of time. So, I put together a comprehensive list of the benchmarks we might want to do for the first point under "core benchmarking", while the second point can be expanded upon for conference results, wowing tracking conveners, and for torturing oneself to wait for the plethora results to come back and hopelessly try to correlate them.

Core benchmarking:

  • CMSSW building only (to start... see below): Slava's new 4.5k event TTbar + PU70 sample
  • SNB, KNC, KNL
  • vs numThreadsFinder [numThreadsEvents == 1], time only building routine, intrinsics + max VU
  • vs numThreadsEvents [vary numThreadsFinder], time overall loop time, intrinsics + max VU
  • vs numVU
  • physics validation test + nHits across platforms

--> What about ToyMC?
At the moment we do not have the toyMC steering in place, so this is a bit of moot point until then... Otherwise consider doing all the same benchmarks on a similar number of events with ~2k tracks/ev.

--> What about fitting?
Well, in the past we used this as a measure of our best possible performance, which we could still do. Although we often achieved unrealistic results by having events with unrealistic number of tracks. Once we reduced to a reasonable amount, the performance severely degraded. Of course, with multiple events in flight (or event mixing), this should somewhat recover. My personal feeling is that this

--> What about the untuned sections for multiple events in flight?
Namely, the mapping and remapping hits functions. We touched on this before, but there is no way to really drop these sections from timing measurements in this case...

--> End to end tracking?
Ideally, we would like to measure the time for reading in seeds, fitting seeds, building candidates, then backwards fitting candidates, as this would be a demonstration of "end to end" tracking.

--> What about various numa control options on KNL?
This is probably more a question for @slava77 @srlantz (and @dan131riley @osschar ). If we start to add a few, then we again have to multiply the number of tests... maybe just one setting to start? Specifically the one that returned the fastest times already from @slava77 , namely using the high-bandwidth MCDRAM.

@mpbl , I do not want to neglect GPUs here, however, I am not sure how to best test for affects to GPU performance. Do you have any thoughts?

Recovering short tracks continuing saga: adding overlapping hits + outlier rejection

As discussed extensively on the group chat, we are proposing a three step plan for our algorithm to improve efficiency of short-tracks (on top of later tunings of layer window settings, chi2, etc).

The proposal is the following:

  1. Forward propagation with one hit per layer as we are already doing with mkFit.
  2. Backward propagation to search for extra hits, but do not update when extra hits are added, as suggested originally by @cerati and @areinsvo.
  3. Perform our own final fit (forward, backward, zig-zag, etc) to perform outlier rejection + obtain final track parameters as bumped by @IHateLinus .

For starters, we will rely on CMSSW to give us the final fit with outlier rejection, so we should focus on implementing 2. As @mmasciov pointed out, this means that the mtv-like-val in standalone validation will still be sub-optimal, but hopefully improved. We can always run MTV on CMSSW side with our tracks after the final fit in CMSSW to see how we do. However, the hope is that by adding some extra hits even without outlier rejection, we raised our shared hits fraction to help recover some efficiency for short tracks.

To implement 2 properly, we will perhaps need to extend the current max length of the hit array from 32 to something greater so as not to overwrite the last hit index in the array which defeats the whole purpose of appending the list (although now that we stopped appending -1s an nauseam, probably okay...).

We would then have to adapt the current backward fit in MkBuilder to have a window search in between the propagate and update step. This would require that the search is only over hits that are on the opposing overlapping section, so the hits would need an extra bit to say which side of the overlap they are on.

For 3., we can think on the technical implementation. We could still do this in MkBuilder, or pass the completed built tracks to MkFitter (although all the issues of fitting with different nhits / cand will need to be re-addressed). Perhaps it is better to just re-use the final candidates again from the backward propagation to do the final, final fit with outlier rejection.

In any case, in order to remain vectorized, can perform the propagation + update regardless of hit goodness, but choose to store the update based on an evaluation goodness of hit on each layer in outlier rejection (and if it is a hit to be rejected, replace it in the hit list with a -4 or something and simply store the previous updated parameters).

This is tied to Issues #195, #193, #196, and #71 (although more indirectly).

Problems with parameterized magnetic field

As we have seen a few times now, the efficiency for both building and backward fit degrades when the parameterized magnetic field is turned on. It seems to be affecting the transition region the most, and not as noticeable elsewhere.

There have already been some ideas on how to check what is happening. I wanted to post this here for quick feedback. Please feel free to add insights as it comes.

  • Check the value of magnetic field from what we expect to what is actually computed for each propagation.

For reference, I am linking the delta for the magnetic field map between a static 3.8 T field and the parameterized one made from a simple macro (magfield.C): https://kmcdermo.web.cern.ch/kmcdermo/keep_old_mictrk/magfield/fieldmap/

Deadlock in jemalloc in CMSSW_10_4_0

(moving to an issue from e-mail)

Running mkFit from CMSSW_10_4_0_patch1 (single thread, single event in flight) leads to a deadlock in jemalloc in certain circumstances. A recipe to reproduce on phi3 is below

source /cvmfs/cms.cern.ch/cmsset_default.sh
source /opt/intel/bin/compilervars.sh intel64

cmsrel CMSSW_10_4_0_patch1
pushd CMSSW_10_4_0_patch1/src
cmsenv
git cms-init
popd

git clone [email protected]:cerati/mictest
pushd mictest
git remote add makortel [email protected]:makortel/mictest.git
git fetch makortel
git checkout -b cmsswTo104x makortel/cmsswTo104x
TBB_PREFIX=$(dirname $(cd $CMSSW_BASE && scram tool tag tbb INCLUDE)) make -j 12 AVX_512:=1
popd

pushd CMSSW_10_4_0_patch1/src
cat <<EOF >mkfit.xml
<tool name="mkfit" version="1.0">
  <client>
    <environment name="MKFITBASE" default="$PWD/../../mictest"/>
    <environment name="LIBDIR" default="\$MKFITBASE/lib"/>
    <environment name="INCLUDE" default="\$MKFITBASE"/>
  </client>
  <runtime name="MKFIT_BASE" value="\$MKFITBASE"/>
  <lib name="MicCore"/>
  <lib name="MkFit"/>
</tool>
EOF
scram setup mkfit.xml
cmsenv
git cms-remote add makortel
git fetch makortel
git checkout -b mkfit_1040p1 makortel/mkfit_1040p1
git cms-addpkg RecoTracker/MkFit Validation/RecoTrack
scram b -j 12

cp /data2/mkortela/step3.py .
cmsRun step3.py 

The job will hang after the following printout

08-Feb-2019 08:04:20 PST  Initiating request to open file file:/data2/mkortela/cmssw_samples/slava77/CMSSW_10_4_0_patch1-orig/11024.0_TTbar_13/AVE_70_BX01_25ns/step2_raw.root
08-Feb-2019 08:04:22 PST  Successfully opened file file:/data2/mkortela/cmssw_samples/slava77/CMSSW_10_4_0_patch1-orig/11024.0_TTbar_13/AVE_70_BX01_25ns/step2_raw.root
TrackerInfo::ExecTrackerInfoCreatorPlugin processing '/home/users/mkortela/CMSSW_10_4_0_patch1/src/../../mictest/Geoms/CMS-2017.so'
CMS-2017 -- Create_TrackerInfo finished
Begin processing the 1st record. Run 1, Event 27, LumiSection 1 on stream 0 at 08-Feb-2019 08:04:27.000 PST
%MSG-e TkDetLayers:  SeedingLayersEDProducer:initialStepSeedLayers  08-Feb-2019 08:04:44 PST Run: 1 Event: 27
 ForwardDiskSectorBuilderFromDet: Trying to build Petal Wedge from Dets at different z positions !! Delta_z = -0.950417
%MSG

and killing it gives the following stack trace

Thread 2 (Thread 0x7faf2f446700 (LWP 3347097)):
#0  0x00007faf4d47d279 in waitpid () from /lib64/libpthread.so.0
#1  0x00007faf44a5d057 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginFWCoreServicesPlugins.so
#2  0x00007faf44a5db3a in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginFWCoreServicesPlugins.so
#3  0x00007faf4da5f8ff in std::execute_native_thread_routine (__p=0x7faf45727f00) at ../../../../../libstdc++-v3/src/c++11/thread.cc:83
#4  0x00007faf4d475e25 in start_thread () from /lib64/libpthread.so.0
#5  0x00007faf4d19fbad in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7faf4b87f480 (LWP 3346971)):
#0  0x00007faf4d194f0d in poll () from /lib64/libc.so.6
#1  0x00007faf44a5d587 in full_read.constprop () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginFWCoreServicesPlugins.so
#2  0x00007faf44a5dc1c in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginFWCoreServicesPlugins.so
#3  0x00007faf44a5ec87 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007faf4d47c51b in __lll_lock_wait () from /lib64/libpthread.so.0
#6  0x00007faf4d477e1b in _L_lock_812 () from /lib64/libpthread.so.0
#7  0x00007faf4d477ce8 in pthread_mutex_lock () from /lib64/libpthread.so.0
#8  0x00007faf4ea52ed9 in malloc_mutex_lock_final (mutex=0x7faf4b606218) at include/jemalloc/internal/mutex.h:141
#9  je_malloc_mutex_lock_slow (mutex=mutex@entry=0x7faf4b606218) at src/mutex.c:84
#10 0x00007faf4ea14468 in malloc_mutex_lock (mutex=0x7faf4b606218, tsdn=0x7faf4b878878) at include/jemalloc/internal/mutex.h:205
#11 je_arena_tcache_fill_small (tsdn=tsdn@entry=0x7faf4b878878, arena=arena@entry=0x7faf4b600980, tcache=tcache@entry=0x7faf4b878a38, tbin=tbin@entry=0x7faf4b878c88, binind=binind@entry=24, prof_accumbytes=prof_accumbytes@entry=0) at src/arena.c:1261
#12 0x00007faf4ea72274 in je_tcache_alloc_small_hard (tsdn=tsdn@entry=0x7faf4b878878, arena=arena@entry=0x7faf4b600980, tcache=tcache@entry=0x7faf4b878a38, tbin=tbin@entry=0x7faf4b878c88, binind=binind@entry=24, tcache_success=tcache_success@entry=0x7ffeccebe5ef) at src/tcache.c:93
#13 0x00007faf4ea0a8a7 in tcache_alloc_small (slow_path=false, zero=false, binind=24, size=<optimized out>, tcache=0x7faf4b878a38, arena=<optimized out>, tsd=<optimized out>) at include/jemalloc/internal/tcache_inlines.h:60
#14 arena_malloc (slow_path=false, tcache=0x7faf4b878a38, zero=false, ind=24, size=<optimized out>, arena=0x0, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:94
#15 iallocztm (slow_path=false, arena=0x0, is_internal=false, tcache=0x7faf4b878a38, zero=false, ind=24, size=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:53
#16 imalloc_no_sample (ind=24, usize=2048, size=<optimized out>, tsd=0x7faf4b878878, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:1709
#17 imalloc_body (tsd=0x7faf4b878878, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:1905
#18 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2005
#19 malloc (size=size@entry=2048) at src/jemalloc.c:2038
#20 0x00007faf4ea77109 in newImpl<false> (size=2048) at src/jemalloc_cpp.cpp:78
#21 operator new (size=2048) at src/jemalloc_cpp.cpp:87
#22 0x00007faf4073c6ce in DTGeometry::add(DTSuperLayer*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libGeometryDTGeometry.so
#23 0x00007faf2af5a04b in DTGeometryBuilderFromCondDB::build(std::shared_ptr<DTGeometry> const&, RecoIdealGeometry const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libGeometryDTGeometryBuilder.so
#24 0x00007faf2af79d9d in DTGeometryESModule::setupDBGeometry(DTRecoGeometryRcd const&, std::shared_ptr<edm::ESProductHost<DTGeometry, MuonNumberingRecord, DTRecoGeometryRcd> >&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginDTGeometryESModule.so
#25 0x00007faf2af7a481 in DTGeometryESModule::produce(MuonGeometryRecord const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginDTGeometryESModule.so
#26 0x00007faf2af7d92b in edm::eventsetup::CallbackProxy<edm::eventsetup::Callback<DTGeometryESModule, std::shared_ptr<DTGeometry>, MuonGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<MuonGeometryRecord> >, MuonGeometryRecord, std::shared_ptr<DTGeometry> >::getImpl(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginDTGeometryESModule.so
#27 0x00007faf4feb4471 in edm::eventsetup::DataProxy::get(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&, bool, edm::ActivityRegistry const*) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#28 0x00007faf4fe8daf6 in edm::eventsetup::EventSetupRecordImpl::getFromProxy(edm::eventsetup::DataKey const&, edm::eventsetup::ComponentDescription const*&, bool) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#29 0x00007faf27903847 in MuonDetLayerGeometryESProducer::produce(MuonRecoGeometryRecord const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginRecoMuonDetLayersPlugins.so
#30 0x00007faf279061b7 in edm::eventsetup::CallbackProxy<edm::eventsetup::Callback<MuonDetLayerGeometryESProducer, std::unique_ptr<MuonDetLayerGeometry, std::default_delete<MuonDetLayerGeometry> >, MuonRecoGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<MuonRecoGeometryRecord> >, MuonRecoGeometryRecord, std::unique_ptr<MuonDetLayerGeometry, std::default_delete<MuonDetLayerGeometry> > >::getImpl(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginRecoMuonDetLayersPlugins.so
#31 0x00007faf4feb4471 in edm::eventsetup::DataProxy::get(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&, bool, edm::ActivityRegistry const*) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#32 0x00007faf4fe8daf6 in edm::eventsetup::EventSetupRecordImpl::getFromProxy(edm::eventsetup::DataKey const&, edm::eventsetup::ComponentDescription const*&, bool) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#33 0x00007faf2aeb7a13 in GlobalDetLayerGeometryESProducer::produce(RecoGeometryRecord const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginTrackingToolsRecoGeometryPlugins.so
#34 0x00007faf2aeb8f77 in edm::eventsetup::CallbackProxy<edm::eventsetup::Callback<GlobalDetLayerGeometryESProducer, std::unique_ptr<DetLayerGeometry, std::default_delete<DetLayerGeometry> >, RecoGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<RecoGeometryRecord> >, RecoGeometryRecord, std::unique_ptr<DetLayerGeometry, std::default_delete<DetLayerGeometry> > >::getImpl(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginTrackingToolsRecoGeometryPlugins.so
#35 0x00007faf4feb4471 in edm::eventsetup::DataProxy::get(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&, bool, edm::ActivityRegistry const*) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#36 0x00007faf4fe8daf6 in edm::eventsetup::EventSetupRecordImpl::getFromProxy(edm::eventsetup::DataKey const&, edm::eventsetup::ComponentDescription const*&, bool) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#37 0x00007faf2a71481c in (anonymous namespace)::KFTrajectoryFitterESProducer::produce(TrajectoryFitterRecord const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginTrackingToolsTrackFittersPlugins.so
#38 0x00007faf2a71421f in edm::eventsetup::CallbackProxy<edm::eventsetup::Callback<(anonymous namespace)::KFTrajectoryFitterESProducer, std::unique_ptr<TrajectoryFitter, std::default_delete<TrajectoryFitter> >, TrajectoryFitterRecord, edm::eventsetup::CallbackSimpleDecorator<TrajectoryFitterRecord> >, TrajectoryFitterRecord, std::unique_ptr<TrajectoryFitter, std::default_delete<TrajectoryFitter> > >::getImpl(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginTrackingToolsTrackFittersPlugins.so
#39 0x00007faf4feb4471 in edm::eventsetup::DataProxy::get(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&, bool, edm::ActivityRegistry const*) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#40 0x00007faf4fe8daf6 in edm::eventsetup::EventSetupRecordImpl::getFromProxy(edm::eventsetup::DataKey const&, edm::eventsetup::ComponentDescription const*&, bool) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#41 0x00007faf2a72a5fa in (anonymous namespace)::KFFittingSmootherESProducer::produce(TrajectoryFitterRecord const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginTrackingToolsTrackFittersPlugins.so
#42 0x00007faf2a72a0de in edm::eventsetup::CallbackProxy<edm::eventsetup::Callback<(anonymous namespace)::KFFittingSmootherESProducer, std::unique_ptr<TrajectoryFitter, std::default_delete<TrajectoryFitter> >, TrajectoryFitterRecord, edm::eventsetup::CallbackSimpleDecorator<TrajectoryFitterRecord> >, TrajectoryFitterRecord, std::unique_ptr<TrajectoryFitter, std::default_delete<TrajectoryFitter> > >::getImpl(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginTrackingToolsTrackFittersPlugins.so
#43 0x00007faf4feb4471 in edm::eventsetup::DataProxy::get(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&, bool, edm::ActivityRegistry const*) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#44 0x00007faf4fe8daf6 in edm::eventsetup::EventSetupRecordImpl::getFromProxy(edm::eventsetup::DataKey const&, edm::eventsetup::ComponentDescription const*&, bool) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#45 0x00007faf2a72609a in (anonymous namespace)::FlexibleKFFittingSmootherESProducer::produce(TrajectoryFitterRecord const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginTrackingToolsTrackFittersPlugins.so
#46 0x00007faf2a724fde in edm::eventsetup::CallbackProxy<edm::eventsetup::Callback<(anonymous namespace)::FlexibleKFFittingSmootherESProducer, std::unique_ptr<TrajectoryFitter, std::default_delete<TrajectoryFitter> >, TrajectoryFitterRecord, edm::eventsetup::CallbackSimpleDecorator<TrajectoryFitterRecord> >, TrajectoryFitterRecord, std::unique_ptr<TrajectoryFitter, std::default_delete<TrajectoryFitter> > >::getImpl(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginTrackingToolsTrackFittersPlugins.so
#47 0x00007faf4feb4471 in edm::eventsetup::DataProxy::get(edm::eventsetup::EventSetupRecordImpl const&, edm::eventsetup::DataKey const&, bool, edm::ActivityRegistry const*) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#48 0x00007faf4fe8daf6 in edm::eventsetup::EventSetupRecordImpl::getFromProxy(edm::eventsetup::DataKey const&, edm::eventsetup::ComponentDescription const*&, bool) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#49 0x00007faf16c4ca39 in TrackProducerBase<reco::Track>::getFromES(edm::EventSetup const&, edm::ESHandle<TrackerGeometry>&, edm::ESHandle<MagneticField>&, edm::ESHandle<TrajectoryFitter>&, edm::ESHandle<Propagator>&, edm::ESHandle<MeasurementTracker>&, edm::ESHandle<TransientTrackingRecHitBuilder>&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginRecoTrackerTrackProducerPlugins.so
#50 0x00007faf16c46d14 in TrackProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/pluginRecoTrackerTrackProducerPlugins.so
#51 0x00007faf4ffbd353 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#52 0x00007faf4feee602 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#53 0x00007faf4fe8793a in decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#54 0x00007faf4fe87afd in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#55 0x00007faf4fe891fb in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#56 0x00007faf4fe8a2a4 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#57 0x00007faf4e7c5176 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7faf4ab6b200, parent=..., child=<optimized out>) at ../../src/tbb/custom_scheduler.h:521
#58 0x00007faf4ff3c070 in edm::EventProcessor::processLumis(std::shared_ptr<void> const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#59 0x00007faf4ff451a2 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#60 0x000000000040fe8a in main::{lambda()#1}::operator()() const ()
#61 0x000000000040e162 in main ()

Current Modules:

Module: TrackProducer:initialStepTracks (crashed)

Typically deadlocks in jemalloc are caused by memory management errors elsewhere (like delete instead of delete[] of an array).

Minor issue: update file header to include presence of sim trackstates

This is a really small and minor issue that can be solved at some point in the future (namely because it means regenerating files).

In the spirit of unifying the code, I noticed that the read-in of the extra simTrackStates used in the ROOT validation for pulls do not follow the same read-in routine as other extra sections like seeds and reco tracks.

Compare
simtrackstates: https://github.com/cerati/mictest/blob/devel/Event.cc#L479-L485
vs.
seeds: https://github.com/cerati/mictest/blob/devel/Event.cc#L502-L517

I wrote a small piece of code to make this more uniform, but realized this would then change the file format (as it changes the header size):
https://github.com/kmcdermo/mictest/blob/cmd-opts-cleanup/Event.cc#L468-L485
https://github.com/kmcdermo/mictest/blob/cmd-opts-cleanup/Event.h#L82

When we see an opportunity to regenerate files, we might want to make this change.

Revamp of benchmark scripts

As discussed today, we decided to do the following from now for the default benchmarking and validation for both PRs and conferences:

  • Drop phi1 and phi2 as test platforms (still available, but disabled by default)
  • Drop SIMVAL and CMSSWVAL
  • Add SIMVAL + MTV-Like, SIMVAL + MTV-Like w/ seed matching

I would like to raise the issue of #220 again. Do we want to process more events (or just fix the number of events?). We have seen with more events processed / core, the thread utilization increases. For PRs, so we don't take forever, we can still keep 20 evs / core. For conference plots, we could increase this to 120 evs / core, and then do some of the manipulation that @makortel in his stress tests.

The benchmarks are rather fast anyways, so bumping to 120 evts would not be the end of the world. The validation takes the longest since it is not really optimized.

Strange MEIF parallelization behavior for CE

As reported by Dan on 16/02/18, the multiple events in flight parallelization benchmark for CE shows odd behavior when run Cornell's new SKX: https://indico.cern.ch/event/690887/contributions/2836403/attachments/1602175/2540493/dsr-skylake-2018-02-16.pdf (slide 6).

What happens is that for nTH = nEV, for nEV > 1, the speedup/time for that point is behaving as if it has only one event in flight, then jumps to "expected behavior" for nTH > nEV. FV does not display such behavior.

It seems we may be seeing hints of this on KNL: https://kmcdermo.web.cern.ch/kmcdermo/pr126/MultEvInFlight/KNL_CMSSW_TTbar_PU70_CE_MEIF_speedup.png

Except now, it is doing this for nTH = nEV, for nEV > 8... This should be investigated, considering this is not seen for FV. Although the loop structures are the same between between CE and FV, there could be some pernicious scheduling bug.

AVX-512 Broken?

For some reason, even with all the latest fixes protecting against NaNs, we see a serious loss of hits/track on phi2 (KNL) with AVX-512 enabled (at least with max nThreads): https://kmcdermo.web.cern.ch/kmcdermo/catching-nans-4-pt2/PlotsFromDump/CMSSW_TTbar_PU70_CE_nHits.png

This is true for BH, STD, CE. This most likely explains the enormous vectorization speedup on phi2: https://kmcdermo.web.cern.ch/kmcdermo/catching-nans-4-pt2/Benchmarks/KNL_CMSSW_TTbar_PU70_VU_speedup.png

A first test would be to make the same plot with nTh=1, to isolate multithreading from AVX-512, and perhaps making the same plot for nVU=2,4,8 nTh=1.

If I remember correctly, @slava77 reported seeing lots of new NaNs with max vectorization previously that @osschar did not see, but perhaps this was because we focused efforts on phiphi and not phi2.

The use of label()

I wanted to lay out for good the use of the label() in our code and propose what to do for future samples. I will copy this into the validation-desc.txt and push it directly to the repo.

tl;dr

I explain the full logic of the label, and its relation to the various track collections depending on the input configuration and time of execution.

I am proposing at the end based on all of this to have the external CMSSW tracks to have their label == the seedID from the seed from which they originated in order to make one-to-one comparisons. In other words, if I said extRecTracks[i].label() == seedTracks_[j].label(), then we know that this cmssw track originated from that seed.

Introduction

The label currently has multiple meanings depending on the type of track and where it is in the pipeline between seeding, building, and validation. I wanted to write this from the perspective of the validation.

To begin, allow me to map out the differences in inputs for the various validation sequences, and the associated track associator function:

  1. ToyMC Geom + sim seeds: setMCTrackIDInfoByLabel()
  2. CMSSW Geom + sim seeds (--geom CMS-2017): setMCTrackIDInfoByLabel()
  3. ToyMC Geom + found seeds (--find-seeds): setMCTrackIDInfo()
  4. CMSSW Geom + cmssw seeds (--geom CMS-2017 --cmssw-seeds): setMCTrackIDInfo()
  5. CMSSW Geom + cmssw seeds + external CMSSW tracks as reference: setCMSSWTrackIDInfo()

Important note about hits and relation to the label:

As a reminder, all hits that originate from a simulated particle will have a mcHitID_ >= 0. This is the index to the vector of simHitInfo_, where each element of the vector contains additional information about the hit. Most importantly, it stores the mcTrackID_ that the hit originated from.

As such, the following must be respected for the tracks inside simTracks_: label_ == mcTrackID_ == position inside the track vector. If the simTracks_ are moved, shuffled, sorted, deleted, etc., this means that the matching of candidate tracks via mcTrackID_'s via hits via mcHitID_ will be ruined!

Case 1. and 2.: setMCTrackIDInfoByLabel()

In both 1. and 2., the seeds are generated from the simtracks, and as such their label_ == mcTrackID_. Before the building starts, the seeds can be moved around and into different structures. Regardless, for each seed, a candidate track is created with its label_ equal to the label_ of the seed it originated from. At the end of building, the candidate tracks are dumped into their conventional candidateTracks_ collection. At this point, the label_ of the track may not be pointing to its position inside the vector, but still uniquely identifies it as to which seed it came from.

So we then create a TrackExtra for the track, storing the label_ as the seedID, and then reassign the label_ of track to be its position inside the candidate track vector. We actually do this for the seed and fit tracks also. Each track collection has an associated track extra collection, indexed the same such that candidateTracks_[i] has an associated candidateTracksExtra_[i].

The associator is run for each candidate track, using the fact that the now stored seedID_ also points to the correct mcTrackID_ this candidate was created from, counting the number of hits in the candidate track after the seed matching this id. If more than 50% are matched, the candidate track now sets its track extra mcTrackID_ == seedID_.

We then produce two maps to map the candidate tracks:

  1. simToCandidates:
  • map key = mcTrackID
  • mapped value = vector of candidate track label_'s, where the label_'s now represent the positions in the candidate vector for tracks who have the mcTrackID_ in question
  1. seedToCandidates:
  • map key = seedID_
  • mapped value = label_ of candidate track (again, the label_ now being the position inside the track vector)

These maps are then used to get the associated sim and reco information for the trees.

Case 3. and 4.: setMCTrackIDInfo()

In both 3. and 4., the seedTracks are not intrinsically related to the simTracks_ . For 3., the seeds are generated from find_seeds(), and the label_ assigned to the track is just the index at which the seed was created. For 4., the label_ is the seedID by which CMSSW assigns.

However, already it can be the case the label_ does not have to equal the position inside the track vector! So the building proceeds in the same manner as 1. and 2., where each seed first generates a single candidate track with a label_ equal to the seed label_ which happens to be its seedID_. The candidateTracks_ are dumped out in some order, where the label_ is still the seedID_.

We then generate a TrackExtra for each candidate track (and seed and fit tracks), with the seedID_ set to the label_, then reassigning the label_ to be the position inside the track vector.

The associator is run, now just counting how many hits on the candidate track are matched to a single mcTrackID. If the fraction of hits matching a single mcTrackID is greater than 75%, then the track extra mcTrackID_ is set to the matched mcTrackID.

The associate maps are then used in the same fashion as described above.

Case 5.: setCMSSWTrackIDInfo()

This is the case most in flux. The CMSSW seeds are read in, and then cleaned. If a surviving seed has a label_ > 0, the label stays the same, and becomes its seedID_. In the case of the N^2 cleaning, some seeds may remain which have a label_ == -1. Since there might be more than one and we want to uniquely identify them after building, we reassign the label_'s with an increasing negative number.

So the first seed track with label_ == -1 has label == -1, the second track with label_ == -1 then has a new *label_ == -2, third track assigned to == -3, etc.

The building proceeds, tracks are dumped out, track extra seedID_ are set to the track candidate label_, and the label_ is reassigned to the track's position inside the vector.

We also take the chance to generate a track extra for the CMSSW tracks, storing the label_ as the seedID_, and reassigning the label_ to the CMSSW track's position inside the extRecTracks_ vector. I want to make clear here though, I believe the original label_ of the CMSSW track is not actually the seedID, but something else. That, or the original label of the seedTracks is not actually its seedID either (I thought it was its mcTrackID?).

The candidate track to CMSSW associator is run, matching by chi2 and dphi. If track finds at least one CMSSW track with a match, the cmsswTrackID_ is set to the label_ of the CMSSW track. We then produce a map of the CMSSW tracks to the candidate mkFit tracks.

cmsswToCandidates:

  • map key = cmsswTrackID (which is now the position of a cmssw track in extRecTracks_)
  • mapped value = vector of candidate track label_'s, where the label_'s now represent the positions in the candidate vector for tracks who have the cmsswTrackID_ in question

Proposal

--> Because it appears that the seedIDs of the cmsswTracks != seedIDs of the candidate tracks (even amongst those who share the same seed hits!), if we want to study one-to-one with CMSSW seeds, we must make the labels agree. So I propose that the extRecTracks_ have a label == the label of the track in seedTracks_ that generated said extRecTrack.

binary file feature requests 1Q18v1

Some details were requested recently. This issue is to collect them and see if all could make it to the next update of the bin files.

  • CMSSW seed chi2
    • [Apr 16] The seed chi2 is not going to be available. It is not computed anywhere during the seed reconstruction in CMSSW.
    • the proposed solution is to refit the seeds inside mkFit and compute the chi2 here
  • post-CPE residuals for CMSSW tracks
  • beam spot parameters: needed to have a better reference point for back-propagated tracks
    • Ntuple: this is available in the ntuple since a while in bsp_ branches
    • Binary file: todo
  • (related to the above) perhaps add vertices used as inputs to the given iteration
  • normalized cluster charge for CCC implementation
    • Ntuple:
      • [May 4] code is available in slava77/cmssw:CMSSW_9_1_0_pre1/tkNtuple4micNtkLayout c93773a
      • [Summer 2018] trackingNtuples are available, e,g, /data2/slava77/samples/2017/pass-c93773a
    • Binary file:
      • files with a cut applied are available from #162 (Sep 2018)
      • Addition of the cluster charge to mkFit event content data is left for a possible future addition

Please post other missing things, I'll later update the list in this issue description.

Validation tasks for follow-up

Although we merged PR #122 , I wanted to post this as a reminder of some remaining issues with the validation.

  • Decide which PCA estimation we wish to use. This decision will also influence the issue #121. A description of possible computations for the PCA are here: https://github.com/cerati/mictest/files/1654557/pca.pdf

  • Optimize cuts for track parameter matching, used for matching CMSSW reco tracks to backward fit tracks.

    • Currently the cuts are in a 2-param chi2 (1/pt, eta) + dphi for the forward built tracks, and were optimized with pure CMSSW tracks and our forward built tracks. We know this breaks down at low pT as the dphi swimming yields NaNs.
    • Currently each cut is binned in pT only
    • May wish to add bins of pt/eta OR investigate possibility of 3-param chi2: i.e. include momentum phi in the chi2, and drop the dphi window

What to do about Track

As has been discussed for awhile (and most recently last Friday), there is some discussion on redoing the Track class, by splitting it into a base candidate class and a final candidate inherited class.

The base class would have minimal information passed around for mkFit building needs (parameters, covariance, score up to some amount of bits, last updated hit, etc), and at the very end of building, the final candidate would contain the rest (full list of hits, full chi2, other status objects, etc).

Granted, we may have a seismic shift coming soon in the form of a 5x5 block diagonal representation from our 6x6.

Recovering efficiency for short tracks

This issue is to track ideas to try to recover the efficiency for short tracks. Note that these are not monitored in the standard validation, see issue #193 to find the instructions to lower our selection.

  • A first set of changes can be to change maxHolesPerCand from 12 to 3 and chi2Cut from 30 to 15 (as already studied by Mario). The effect can be seen in the plots here to those posted by Kevin. In particular it's interesting to see the effect on the efficiency vs pt (new, old) and vs eta (new, old). As expected, the fake rate also improves dramatically see e.g.: new vs old.

  • Another idea is to modify our candidate ranking. I think what we do is not correct for short tracks. If I understand correctly we compute the number of missing hits here. But in this way we are counting all invalid hits, even those after the last valid hit. Instead we should count only the invalid hits 'inside' a track. @mmasciov, can you please take a look? You can probably just loop backwards and count invalid hits only after a first valid hit is found.

  • We should review our parameters for the building keeping the short tracks in the metric plots: @mmasciov of course and also @areinsvo for the cluster charge cut, this is something you should take a look at...

  • Placeholder for more ideas!

Revenge of the NaNs

In my investigations via printouts for fixing the BH PCA output, I noticed that there are lots of tracks with NaNs for their track parameters (trk.x(), trk.momPhi(), etc)...

I dumped the parameters in three places:

  1. After building, before bkFit over layers
  2. After bkFit over layers, before PCA
  3. After PCA

and in each case I see lots of NaNs running around for what look like legitimate tracks...

I tried this with both the 10muHS and TTbar PU70 sample. In the 10mu sample, the first NaN was in event 39, while the TTbar sample had NaNs in the very first track dumped.

If you would like to reproduce the issue yourself, I made a branch: https://github.com/kmcdermo/mictest/tree/lots_of_nans

This branch is off the HEAD of my PR 126 branch (https://github.com/kmcdermo/mictest/tree/mask_bad_sim_cms_seeds), with the only changes being the mkFit/MkFinder.cc for the cout statements, and a script, dump.sh

The printouts are as such (for the list above):

  1. Loop index, Label, x(), momPhi(), and then dump of hit lyrs + idxs
  2. Loop index, Label, x(), momPhi()
  3. Loop index, Label, x(), momPhi(), and then dump of hit lyrs + idxs

The script ./dump.sh runs out of the box to produce the log file to investigate the printouts with 1 event, nTH = 1, nVU = 1, TTbar PU70, which can all be configured, look for KM4MT in the file.

Recover efficiency for long tracks

As announced previously and discussed in the meeting today, the fixes in hit counting (see PR here ) that were merged in April changed our physics performance. We do worse for long tracks than we did before. This was shown in this presentation, for example.

Compare the results from before the changes to hit counting with the results after the changes. Both results are with our standard stand-alone validation definitions, which haven't changed.

We should make an effort to recover this efficiency.

Unifying initialization scripts

Hi all,

I wanted to open a separate issue on compilers from the one raised PR #148 , as I did not want the discussion of environment variables, compilers, c++ libraries, etc. to distract from the new physics validation (I guess a bit too late for some of that).

I propose the following:

  • We decide on which c++ standard we wish to use and edit Makefile.config
    • one that is that is driven by what CMSSW and ROOT are using
    • given the discussion from the PR, it seems like c++14 is the minimum
    • however, c++17 might be a better choice as CMSSW is moving in that direction "soonish" to quote Matti
  • install/pick up the appropriate devtoolset for each platform
  • create a initSKL-SP.sh script that users must source themselves or copy it into .bashrc
  • modify the initSNB.sh and initKNL.sh to be as unified as possible

(just saw @cerati 's comment, more or less what I am proposing)

Phi greater than |pi|?

So one curiosity that I showed today was the fact that some reco tracks end up with a phi > |pi|. This can be seen in the fake rate plot, which plots the last layer momentum phi the track ended up on:

https://kmcdermo.web.cern.ch/kmcdermo/full-det-tracking-validation/SNB_ToyMC_Barrel_FR_phi.png

This is ultimately not so surprising, since in our new polar coordinate system phi is one of the parameters of the track state, and is propagated+updated by a series of numerical computations, rather than computed from atan2(py,px).

Regardless, it is something we should track down and resolve (with the fallback option to place a bounds check after propagation and after update).

Legacy SMatrix code issues

Should have stated this earlier: all the new open issues are related to cerati/full-det-tracking.

  • Gen flat pz: I removed this option entirely from the code, as we have moved to generating flat in eta. If we want this restored, would best to implement this once hack to exclude transition region is removed, then compute the corresponding pz_max constant needed for eta_max (i.e. 2.3 in old code for eta of 1.0)

  • Event routines: Segment(), Seed(), Build(), Fit(). I removed all the extraneous validation from these routines, and now only efficiency, fake rate, and duplicate rate exist for the smatrix code (and of course matriplex). However, it might be time to retire them in some fashion, as these routines use data members (e.g. segmentMap) inside Event.. Of course we can move out and make them local to the routines that use them. --- Now PR #75 does just that

Infinite loop in CE Single Track Full Detector Building

As already mentioned in Issue #73, I encountered an infinite loop in clone engine single track events. If you are interested in recreating the problem, I have the binary file on phiphi here:
/home/kmcdermo/simtracks_fulldet_100kx1.bin

Simply compile the head of cerati/full-det-tracking, and then do:
./mkFit/mkFit --read --file-name simtracks_fulldet_100kx1.bin --build-ce --num-thr 1

It gets caught at event 23, stuck forever printing out "processing lay=-1", L1273. Turning on the Debug.h and debug=true everywhere, the attached log is here for event 23: dump.txt

I will try to narrow down where exactly it is caught inside the loop, although processing layer -1 is probably not good..

Bug in counting of invalid hits for stopping a track and cand score

As already pointed out by @cerati in PR #195 , we are unfairly penalizing short tracks by counting the negative hit indices after the last positive hit index on a candidate as missing hits for the score.

In addition, after discussing with @osschar , it turns out we are also incorrectly accounting for missed layers due to a track propagating to outside the sensitive region of the detector that actually can count towards the number of holes when considering stopping a track.

As a reminder:

  • idx >=0 : valid hit
  • idx == -1: missed finding a hit on this layer
  • idx == -2: track is stopped after accumulating too many misses
  • idx == -3: track propagated to outside of sensitive region of a layer

In principle, hit indices labeled -3 should NOT count towards the number of holes for stopping a track, only -1's. It turns out there is a bug in some cases where it actually does count towards the number of holes and then kills the track (by the simple nature of how we compute invalid hits in MkFinder).

Related lines

To demonstrate this issue, consider this dump of tracks + hit indices. This list was produced with the first ttbar PU70 event with CE, and setting the number_of_holes to 3 and hit chi2 to 15 using this patch file (via the suggestion of #195).

At track label 90, 91, you can see the track is incorrectly ended with a -2, even though it had only one -1 and two -3s. Then, looking at label 95, things are correct, where three -1's are accumulated, continues, even finds a -3, continues, until it finds another -1 which then swaps to a -2, and ends.

Needed fixes

  • Proper counter of only -1's (ignore -3's) in a track for stopping it with a -2
  • Counter of -1's (ignore -3's) WITHIN a track for the score (i.e. -1's up until the last positive index)

Backward fit and chi2

As we have discussed at length offline and in PR #186 , it was noticed that the bkfit produces some wild chi2 that have to be truncated.

This is an indication that our approach to the bkfit may not be entirely the optimal (simply starting with the last layer x 100 in uncertainty) or we ought to be considering outlier rejection in order to keep the chi2 sane.

This actually affects performance on the CMSSW MTV side, as these candidates are dropped and results in some loss in efficiency (and only gets worse with PU).

Again, as with Issue #188, it might best to wait until we know whether or not we will use the 5x5 representation and if so when it goes in, to look into this.

README.md to-do

Hi all,

Here is a running list of things we should add to the README.md (or create additional text files and link them to the README). Feel free to check the boxes as we go, making sure to point to the PR. I attached names as suggestions for these new sections.

Copied out wrong parameters for BH -- will fix

Hi all, @slava77 ,

I figured out the problem with BH for the backward fit tracks: https://indico.cern.ch/event/690887/#preview:2540491

As Slava pointed out on Friday, the peak in the dPhi implies that the PCA is not being applied, as the dPhi peak is for ~3 cm propagation (i.e. dist from bpix lay1 to the origin). After staring at the code and doing some debugging via printouts, I realized that in fact the BH is doing the backward fit + PCA calculation. However, the parameters that are being copied out are from the input of the PCA calc (i.e. the updated parameters at bpix layer1), and not the output of PCA calc.

contrast BH copy out:
https://github.com/cerati/mictest/blob/devel/mkFit/MkFinder.cc#L1047-L1048

to combinatorial copy out:
https://github.com/cerati/mictest/blob/devel/mkFit/MkFinder.cc#L1063-L1064

since PCA calc is a pure propagation, the parameters that should be copied out are iP.... I will fix this and add it to the open PR.

Peace
Kevin

migrating to gcc7

In line with the discussion of today's discussion, and other issues like #174, #171 , #170 , the proposal is to move to gcc7 on phi1, phi2, phi3 and then update the Makefile + documentation accordingly. My only concern is that the proposal included grabbing devtoolset to get the latest gcc, which we have tried before and failed utterly to get a consistent icc/gcc/c++/ROOT version (see issue #149).

The best would be to tell the initialization scripts to pick up a consistent ROOT/gcc setup. This would also tie us closer to CMSSW_10_X which sounds what was proposed today anyway to do.

Switching between samples with a flag

One possible improvement to the physics validation is a flag to switch between samples to process. The idea is to pass a single flag to the validation scripts to move between ttbar noPU, PU35, PU70, and 10mu.

Inside the validation script, the flag would set the sub directory for the samples, as well as the amount of events to process. Currently, we use 500 events for ttbar PU70, so we would need to scale accordingly for the other samples (1000 for ttbar PU35, 2000 for noPU, and 10000 events for 10mu, maybe?).

This flag would also need to handle the renaming of the output files appropriately such that the web scripts also pick up on the right labels.

Since it only really makes sense to test with ttbar PU70 for compute tests, this would not affect the benchmarking script. If we did want to test something other than PU70 in the benchmarking, we would need high stats samples of the other datasets to give enough work to the tests, specifically MEIF. We use 20 events * number of events in flight for PU70, so on KNL at full load, we are processing 2560 events. Given that the other samples have less work we would probably even need to bump up the number of events to process per event in flight.

Segfault with zero seed tracks

So in trying to produce validation plots for the TSG talk on Tuesday, I discovered that when running over the 10muon events, the code kept crashing for events that had zero seed-tracks.

Running over the full 100k events, here are the culprits:

  • 5133
  • 7584
  • 17699
  • 20841
  • 24705
  • 26026
  • 27780
  • 34007
  • 37262
  • 40594
  • 43029
  • 47012

@slava77 took a quick look at the ntuple for event 5133, and despite this event having simtracks, the sim PV was at z=27 cm. The ntuple claimed a few seeds but no reco tracks. However, our binary file read in zero seed tracks...

However, this exposes a hole in code not being able to just pass over such events, failing to move past empty loops. The events consistently crashed here: https://github.com/cerati/mictest/blob/devel/mkFit/MkBuilder.cc#L788

Printing out the layer it crashed at in this loop for 5133, it was at layer 71.

@dan131riley , when you reworked these map hits functions, did you ever see this?

Code review proposal

Hi all,

I wanted to open an issue about moving to a more administrative CMSSW approach to handling pull requests given the following (I am a bit nauseous from suggesting we need more administrative tasks at all):

A) From this morning: It is a bit terrifying that anyone connected to the repo can lose an arbitrary number of commits by a simple mirror or push or whatever really happened. I do not mean to single out Boyana, just concerned that this is a possibility at all. Thankfully, no real harm was done, but it would be nice to prevent something like this from happening again.

B) PR backlog: Some PR's end up taking a long time to merge as they are not reviewed and merged. When having to play some PR's on top of each other while waiting for others to be merged has caused some confusion.

C) Self-commits: Sometimes PR's are opened and then immediately merged by those who made them. While it is nice a log is made, we have seen sometimes these commits are reverted after a review.

D) Direct pushes to devel: aside from A), it can be hard to keep track when direct pushes are made, and can lead to confusion trying to trace down effects on timing/physics.


So, I propose the following (feel free to tell me I am crazy and should shut up, or clamor with applause):

  1. Configure cerati/devel to explicitly deny any direct push to that branch unless it is an approved PR (much like CMSSW).
  2. Configure cerati/devel to not allow self-merges of a new PR (if possible) -- in other words, require another user to hit the merge button (no collusion!).
  3. Require that the benchmarks+validation is run regardless of the changes being made. It may seem silly for one-line fixes, but at least then we have a history of validation plots should unexpected changes occur.
  4. Nominate reviewers for different aspects of the code to sign-off on changes, e.g. I can review code related to benchmarking, validation, or physics (along with Slava, Mario, or Matevz as an example). Or PR submitters can nominate reviewers. Can sort out who for what if we really want to.

And lastly, after any update to intel software, run the validation :)


It should be stated that in the effort of trying to standardize things, if we agree with the above, we should also stick to what is suggested in #150 following the discussion of #149.

Long unmatched mkFit tracks with unfindable CMSSW tracks

As discussed on Friday, I pointed that there was this strange class of mkFit tracks that even when using "pure CMSSW seeds", showed the following properties:

  • nFoundHits > 20 (including the seed)
  • fracHitsMatched < 5% (only counting hits after the seed)
  • CMSSW track is labeled as "unfindable" (which at the time meant that the CMSSW track failed either nUniqueLayers < 8 OR pT <0.5)

As a reminder, "pure CMSSW seeds" means that I am only using the CMSSW seeds that produced a CMSSW track. A plot of these weirdo tracks is here (just a copy from the slides from 25/08/17, slide 15, bottom right, ttbar+noPU CE):
ce_ttbar_nopu_badtracks

I updated the unfindability criteria based on the discussion on Friday to be:

  • CMSSW track has nUniqueLayers < 8
  • CMSSW track has its last hit position in the transition region (i.e. 0.9 < |eta| < 1.7)

I then reran the text file dump, which is attached here: cmssw2mkfitdump.txt

The selection for entering the dumper is listed at the top of the text file:

  • fracHitsMatched < 0.1
  • nFoundHits > 20
  • mcTrackID >= 0 [ensures we can dump the mcTrack info and that seed is based on a real sim track]
  • cmsswmask_build < 0 [with the selection already, this ensures the underlying CMSSW track is unfindable, i.e. cmsswTrackID == -7]

Upon inspecting the file, the first mkFit track actually finds all of the sim track hits, while the CMSSW track dies just one layer after the seed. However, the mkFit track continues plowing through, and is picking up hits which have an mcTrackID = -1 (which, if I understand how the binary file does the mcTrackID assignment for hits, means that there was no mcTrack saved for this hit and is likely a pileup track).

However, looking at the rest of the ten mkFit tracks in this dump, seven of them end up getting >= 19 hits matched to the correct sim track, while the CMSSW track dies early. Another two mkFit tracks end up tracing another single mcTrack after their seed.

So, this is good news. Now, I personally believe we should still leave these tracks out of the numerator and denominator of the fake rate when we are comparing to CMSSW tracks. When comparing straight to sim tracks, we should (and already do) add them back into the efficiency and fake rate.

Fixing up of SlurpIn

As reported in various threads over the last weeks, we have seen crashes within the simtrack validation during standard building. The crash would occur with many threads and MEIF, falling down at some "fixed" number of events.

The cause has been tracked down to an abuse of SlurpIn within the BackwardFit leading to undefined behavior. Namely, when computing offsets to read in the tracks to fit (which come from vector of vectors), there is the possibility the offsets become too large. Computing offsets from different allocations brings SlurpIn crashing down.

We have been burned by this before, so this is a call to fix this up for good. So we have a short term and long term plan.

Short term

Since we know this crash is triggered by large amounts of memory in play, we can simply avoid triggering the crash by limiting the number of events processed. Given we have a few open PRs (and more to come), the recommendation for now is to simply change the number of events to process in the validation from 500 to 100.

diff --git a/val_scripts/validation-cmssw-benchmarks.sh b/val_scripts/validation-cmssw-benchmarks.sh
index 595f388..42b2c97 100755
--- a/val_scripts/validation-cmssw-benchmarks.sh
+++ b/val_scripts/validation-cmssw-benchmarks.sh
@@ -17,7 +17,7 @@ source xeon_scripts/init-env.sh
 dir=/data2/slava77/samples/2017/pass-c93773a/initialStep
 subdir=PU70HS/10224.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017PU_GenSimFullINPUT+DigiFullPU_2017PU+RecoFullPU_2017PU+HARVESTFullPU_2017PU
 file=memoryFile.fv3.clean.writeAll.CCC1620.recT.082418-25daeda.bin
-nevents=500
+nevents=100
 
 ## Common executable setup
 maxth=64

Long Term

@osschar has volunteered to give this a try by designing some interfaces for some helper packer for the Matriplexes to avoid total rewrites of I/O tracks/hits to matriplexes and related routines.

@srlantz and @dan131riley have also agreed to think on this.

More than one hit per layer: overlaps and loopers

One issue I raised in #69 on L158 of Event.cc in that PR was about having more than one hit per layer when simulating a track when approaching an overlap region from a polygonal geometry with extended edges like CMS. Namely, one could have a sim track pass through an overlap region in the barrel (or endcap), like in CMSSW, in which the sim track will register a hit in the same layer index more than once, as seen here: doublehit.

This really is only a problem for how we save the extra hit inside the hit index array inside the sim track. This hit will automatically be added to the global ev.layerHits_, as will the simHitsInfo_ (as it is uniquely identified by the mcHitID index and ithLayerHit).

In reality, it isn't "really" a problem, since the second hit will be added like normal to the hit index array, and the respective counters get incremented. The problem is really making the hit index array large enough to account for overlaps.

Or we could consider creating an inherited Track class called SimTrack, which has a hit index array size much larger than the standard Track class, so as to not blow up the size of the reco tracks. It would be useful to signify "inner" and "outer" hits on the same layer (i.e. closer or further from the origin), through a bool inside simHitInfo or something. Or simply just replace the hit index array with a vector (as we had before).

A related problem is also simtrack seeds... if we simply just expanded the hit index array, it is quite possible that a sim track hits the same pixel layer twice and another one once, and since we use pixel triplets (at the moment), the sim track seed will just be two layers, instead of three. Algorithmically, this really isn't so much of a problem, even though we do proceed by loops over layers, as the track state will just be propagated to the first building layer. However, it might be best to "split" the seeds: i.e. have a seed for each overlap.

During reconstruction, this is also not really a problem, as all the hits explored by a track are considered, adding one hit per layer per candidate. This is also what is done in CMSSW -- when considering a propagated state near an overlap of a single layer, still only hit is added per layer. During the fitting/smoothing step, there are options to remove spurious hits and also add more hits, in which case it is possible to add a second hit from the same layer if you are at an overlap region.

And finally, for validation, this is not a problem, as again, the definition of efficiency is tied to the number of hits inside the reconstructed track that are matched to exactly one sim track.


Although we will never track loopers, it is a similar problem... how to add these hits to sim tracks in a sensical manner.


N.B. This excludes the transition, where a track could pass between a barrel layer and endcap disk very close by, which is distinct from this problem where the same layer index is used to identify two hits from the same sim track. This should be its own discussion.

Full Vector building has strange eta residual

As pointed in previous PRs, FV building has a strange shape for the eta residual between CMSSW and mkFit backward fit tracks (to the PCA):

Could be related to 1/pt, which also shows some tail compared to BH or CE:
https://kmcdermo.web.cern.ch/kmcdermo/pr126/CMSSWVAL/fit/diffs/SNB_CMSSW_TTbar_PU70_allmatch_dinvpt_fit_pt0.9_CMSSWVAL.png

Needs to be understood... perhaps the hit assignment or track labeling is messed up?

no difference in computational performance between AVX-512 and AVX2?

As @kmcdermo noted in his comment to PR #165, the benchmarks show no discernible difference in computational performance between AVX-512 and AVX2, either on phi2 (KNL) or phi3 (SKL-SP). This warrants further investigation.

One possible explanation is that performance is limited by memory access speed. We are already aware that the arithmetic intensity of most of the code is in bandwidth-limited territory. Accordingly, we may not gain much advantage when we increase the top potential flop rate through using wider vectors. (Data always traverse the levels of memory in units of cache lines, which are 512 bits in any case.) This hypothesis is somewhat borne out by @dan131riley's observation that with AVX-512, there is actually very little speedup when increasing the number of "vector units" (Matriplex width) from VU=8 to VU=16, either on SKL-SP or KNL.

Further evidence for this explanation: phi3 has Turbo Boost characteristics that don't come into play for memory-limited performance, but should affect flop-limited performance. (Turbo is disabled on phi2.) On a fully-loaded SKL-SP 6130, we might expect AVX-512 to run flops 1,5x faster than AVX2 (i.e., 2x based on vector width, but slowed down by 25% due to the lower Turbo frequency). However, these characteristics are not at all evident on phi3. It would be informative to monitor the average frequencies of the CPUs while the code is running.

A different possible explanation is that the compiler options we choose, or the intrinsics we use in each case, are not doing what we think they are doing. This seems less likely. Intel Advisor could possibly verify that the expected instruction set is actually being compiled into the code and used in each case.

Force compile DEBUG before PR

As discussed, although we do not yet have continuous integration for PRs (like cms-bot) for this project, we often break debug printouts with commits, untested given it is ifdef'ed.

@dan131riley is already working on some fixes for latest round. However, it was discussed that we should have a force compilation of debug ifdef before submitting the PR, to ensure it compiles and runs.

Perhaps this can be a separate script added to the very beginning of ./xeon_scripts/runBenchmark.sh, and if the signal fails (compilation fails, segfaults), the rest of the script exits.

FV has noticeable gaps in transition region for hit-based matching

Possible Corollary to Issue #128

FV building has noticeable dips in efficiency for the transition region when using CMSSW tracks as the reference set of tracks with hit-based matching. This is not present in simtrack hit-based matching, nor CMSSW track-parameter matching.

Compare the following, using CMSSW n^2-cleaned seeds as input, pT > 0.9:

Simtrack hit-based matching, with 10mu HS sample

snb_cmssw_10mu_eff_eta_build_pt0 9_simval

CMSSWtracks hit-based matching, with 10mu HS sample

snb_cmssw_10mu_eff_eta_build_pt0 9_cmsswval

CMSSWtracks track-parameter matching, with 10mu HS sample

snb_cmssw_10mu_eff_eta_fit_pt0 9_cmsswval

Loss of high pT tracks and high eta tracks

Still needs to be investigated: will plot nHits as a function of barrel and endcap and see what comes out. Same with last layer the track ended up on (I will submit another PR on this addition to the validation).

https://kmcdermo.web.cern.ch/kmcdermo/full-det-tracking-validation/SNB_ToyMC_Barrel_EFF_pt.png
https://kmcdermo.web.cern.ch/kmcdermo/full-det-tracking-validation/SNB_ToyMC_Barrel_FR_pt.png
https://kmcdermo.web.cern.ch/kmcdermo/full-det-tracking-validation/SNB_ToyMC_Barrel_EFF_eta.png

Remember: FR pT is "high" at pT < 1 and pT >10, because we in fact only simulate tracks with 1 < pT < 10. And since the reconstructed value can be anything, there are no real tracks at pT < 1 or pT >10 to bring this rate down.

Longer running benchmarks

As demonstrated in test throughput studies from @makortel , running with larger number of events eliminates edge effects and improves parallel throughout performance.

Quoting Matti on the chat:

so on phi3 with 32 threads or jobs, the throughput of multithreading vs. multiprocessing is

  • 20 events/thread: 75 %
  • 120 events/thread: 94 %

with 64 threads or jobs, the same fractions are

  • 20 events/thread: 67 %
  • 120 events/thread: 92 %

It seems that it may be beneficial to rewrite part of the benchmarking scripts to use more events / thread to achieve a higher parallel utilization. The question is: is this solely "forConf", to have our "best" results on display, or should we be doing this with every PR as well?

The case for every PR (although it will lengthen the time to run the benchmarking) is that compute performance gains and losses could be hiding behind this under-utilization in some systematic way. I should mention we partially account for this when running the standard benchmarks, as we drop the first event from the average build time, since we have seen it does in fact have a time per event an order of magnitude different from the average. The question, even with dropping this first event, does the average time per event improve when processing more events.

Let me know what you think (and who might want to tackle this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.