It seems that the bottleneck for prepare_snippets_h5 is the transpose operation inside the Bin1RecordingExtractor. Here's some code from bin1recordingextractor.py:
buf = kp.load_bytes(self._raw, start=i1, end=i2, p2p=self._p2p)
X = np.frombuffer(buf, dtype=np.int16).reshape((end_frame - start_frame, self._raw_num_channels))
# old method
# ret = np.zeros((M, N))
# for ii, ch_id in enumerate(channel_ids):
# ret[ii, :] = X[:, self._channel_map[str(ch_id)]]
# new (equivalent method)
X = X.T.copy() # this is the part we want to try to speed up
ret = X[[int(self._channel_map[str(ch_id)]) for ch_id in channel_ids]]
I think it should be possible to speed up the transpose operation here.
import time
import numpy as np
import matplotlib.pyplot as plt
def transpose_speed_test(M, N):
X = (np.random.normal(0, 1, (N, M)) * 20).astype(np.int16)
print('.')
timer = time.time()
X = X.T.copy()
elapsed = time.time() - timer
rate = (N/1000000)/elapsed
print(f'Elapsed for transpose: {elapsed} sec')
print(f'Rate: {rate} million frames/sec')
return rate
M = 384 # number of channels
Ns = [100, 500, 1000, 5000, 10000, 50000, 100000, 500000] # number of timepoints
rates = [
transpose_speed_test(M, N)
for N in Ns
]
plt.plot(Ns, rates, 'b.')
Am I naive in thinking that the transfer rate should be roughly constant as N increases? I suppose this depends on the size of the cache.