xdf-modules / pyxdf Goto Github PK

View Code? Open in Web Editor NEW

34.0 13.0 16.0 163 KB

Python package for working with XDF files

License: BSD 2-Clause "Simplified" License

Python 98.05% Cython 1.95%

xdf python fileformat pyxdf

pyxdf's Introduction

pyXDF

pyXDF is a Python importer for XDF files.

Sample usage

import matplotlib.pyplot as plt
import numpy as np

import pyxdf

data, header = pyxdf.load_xdf("test.xdf")

for stream in data:
    y = stream["time_series"]

    if isinstance(y, list):
        # list of strings, draw one vertical line for each marker
        for timestamp, marker in zip(stream["time_stamps"], y):
            plt.axvline(x=timestamp)
            print(f'Marker "{marker[0]}" @ {timestamp:.2f}s')
    elif isinstance(y, np.ndarray):
        # numeric data, draw as lines
        plt.plot(stream["time_stamps"], y)
    else:
        raise RuntimeError("Unknown stream format")

plt.show()

CLI examples

pyxdf has an examples module, which can be run from the command line for basic functionality.

print_metadata will enable a DEBUG logger to log read messages, then it will print basic metadata about each found stream.
- python -m pyxdf.examples.print_metadata -f=/path/to/my.xdf
playback_lsl will open an XDF file then replay its data in an infinite loop, but using current timestamps. This is useful for prototyping online processing.
- python -m pyxdf.examples.playback_lsl /path/to/my.xdf

Installation

The latest stable version can be installed with pip install pyxdf.

For the latest development version, use pip install git+https://github.com/xdf-modules/pyxdf.git.

For maintainers

A new release is automatically uploaded to PyPI. Therefore, as soon as a new release is created on GitHub (using a tag labeled e.g. v1.16.3), a PyPI package is created with the version number matching the release tag.

pyxdf's People

Contributors

Stargazers

Watchers

Forkers

agricolab cbrnr ollie-d hankso aamirmaj chkothe anonymousauthor-2021 vatthaphon eegkit max-ziliang myd7349 nvmarco musicinmybrain mcvain expensne biochron

pyxdf's Issues

We shouldn't be rounding nominal_srate in load_xdf

I've seen devices that advertise non-integer sampling rates. TDT comes to mind, though a quick scan of their online docs give numbers like "~50kHz". Also, I have a couple apps where the user can specify the sampling rate in float. There's no good reason to limit the sampling rate to integer.

Version 1.16.0 does not work with Python 2.7 anymore

$ python -V
Python 2.7.15
$ python -c 'import pyxdf'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/hoechenberger/miniconda3/envs/pyxdf/lib/python2.7/site-packages/pyxdf/__init__.py", line 11, in <module>
    from .pyxdf import load_xdf, resolve_streams, match_streaminfos
  File "/Users/hoechenberger/miniconda3/envs/pyxdf/lib/python2.7/site-packages/pyxdf/pyxdf.py", line 746
    chunk = {**chunk, **_parse_streamheader(xml)}
              ^
SyntaxError: invalid syntax

Metadata in setup.py says it should be compatible with both Py2 and Py3:
https://github.com/xdf-modules/xdf-python/blob/049aef3486040851f53ca710cee761c796b6835f/setup.py#L63-L64

Release 1.15.2

I really need #19, which has already been merged, for MNELAB. @cboulay could you please upload a new release (1.15.2) to PyPI (please modify CHANGELOG.md first by adding the release date)? If you don't have time, I can also do it, just let me know (do I have proper permissions to upload to PyPI?).

Possible bug with _clock_sync and clock reset detection

When looking through the code of _clock_sync (https://github.com/xdf-modules/pyxdf/blob/main/pyxdf/pyxdf.py#L538) which is used at import time to synchronise clocks between streams I think I found a bug.

In the synchronisation method there is a part that detects clock resets, and then processes each chunk of data separately if there is significant differences between the clock offsets.

The problem is, when the coefficients for the linear function are taken and applied to the timestamps, the code uses indexes from the range determined from the clock_values. These are not the same indexes as those that should be used to address the timestamps of the recording. This means only a tiny subset of the recording gets synchronised as the clock_values get saved only every 5 seconds or so.

The relevant code is in this for loop, specifically where stream.time_stamps is addressed with the slice:
https://github.com/xdf-modules/pyxdf/blob/main/pyxdf/pyxdf.py#L637

2-n order drift correction

See here: xdf-modules/xdf-Matlab#16

Apply dejittering / sychronization to several streams at once

Some devices (case in point: the BrainAmps) record a digital trigger alongside the EEG data and offer three options to send the markers:

as an additional channel in the EEG stream converted to whatever data type the EEG data has
as a continouous int8 / int16 / string stream with the same sampling rate as the EEG stream
as a separate, irregular int8 / int16 / string stream

The procedure is as follows (pseudocode, but probably also valid python):

data, triggers, first_timestamp = device.get_chunk()
trigger_indices = triggers.find()
data_outlet.push(data, first_timestamp)
trigger1_outlet.push([data; triggers], first_timestamp)
trigger2_outlet.push(triggers, first_timestamp)
trigger3_outlet.push(triggers[trigger_indices], first_timestamp + trigger_indices / sampling_rate)

Both for resource usage and usability, 3 is the best option but irregularly sampled stream can't be dejittered so the markers aren't as reliable as they should be.

But, as seen in the example above, the timestamps have the same jitter and time offsets so if it were possible to copy dejitter / synchronization parameters from one stream to another the third option would be feasable for offline analysis.

Rename repo to pyxdf

The package name is pyxdf, and if you google for pyxdf you won't find this repo at all (at least not on the first page). Therefore, I'd like to change the repo name to pyxdf, which would also be much more consistent.

Update PyPI information

I noticed that several fields at the PyPI website still point to the old location or don't work as expected.

Homepage should link to https://github.com/xdf-modules/xdf-Python
Author currently shows "('Christian Kothe, Tristan Stenner', 'Clemens Brunner')" as one links to Christian's email. I don't know if it is possible to specify multiple authors. I've checked some projects with multiple authors (e.g. pandas, scipy), and they omit the author field completely. We should consider doing the same, and instead of using the author field populate the maintainers list (which supports multiple persons).
The project description should mirror README.md from our repository (and BTW this should be updated because e.g. uploading to PyPI is now much easier).

XDF.jl (Julia importer)

I know that this is not the best place, but I wanted to let you know that I've written a Julia importer for XDF:

https://github.com/cbrnr/XDF.jl

If you want to give it a try, let me know how you like it and be sure to report any issues and feature requests you might have.

Opening .xdf generated by LabRecorder

Hello.
My name is Nelson.
I managed to run LabRecorder on my RASPBERRY PI 4.
However, opening the generated .xdf with mnelab is not possible.
Find attached error and file.
Thanks a lot.

foo.zip

Reorder samples in irregular rate streams - Worthwhile feature?

The latest version of the pupil labs LSL plugin forces Pupil capture service to use the LSL clock (great!). When frames are acquired from the video they get timestamped with an LSL timestamp. Sometimes frames get processed out of order and a recent sample can get pushed before an older sample. This is OK because they're timestamped with their LSL times from when they were acquired, and the stream is marked as irregular rate so there is no automatic dejittering.

Should streams with irregular rate be sorted by their timestamp on import? Is there a use case where this might be undesirable? e.g., "I want to know the order the samples were pushed; I don't care about their timestamps!" That seems pretty unlikely. I also think we don't have to worry about people using the timestamps for anything other than timestamps because they would have encountered problems from clock offset adjustment.

Are there technical reasons why we wouldn't want to support this in pyxdf? I'm guessing these streams would have to be eagerly loaded and couldn't be lazily loaded if lazy loading ever becomes a feature.

jooc: why are some values in stream meta-data lists of dicts and some just dicts?

I find this rather confusing. For example, if I want to look at channel labels for a stream I need something like:

for i, ch in enumerate(eeg_stream['info']['desc'][0]['channels'][0]['channel']):
    print(ch['label'][0])

In what scenario would there ever be more than one label, desc, or (for that matter) list of channel dictionaries in info? It makes sense that 'channels' contains a list of 'channel' dicts, but that its value itself be a list of one list of 'channel' dicts.

Whenever I work with this I find myself constantly having to remind which values are lists and which are simply values. Especially confusing to me is the fact that nominal_srate is a list (again, there will only ever be one so why?) but effective_srate is not a list.

I am sure there is a perfectly good explanation for this but I can't work out what it could be.

Writer wishlist

I've started a rough prototype for writer support. Currently, it looks like this:

with pyxdf.writer.Writer('/tmp/foo.xdf.gz') as w:
    w.add_stream(streamid=1, header='<info><streamname>XYZ</streamname>…</info>')
    w.add_stream_data(streamid=1, data=np.array([[1,2],[2,3],[3,4]], ts=np.array([5.1, np.nan, 5.3]))
    w.add_stream_offsets(streamid=1, offsets=np.array([[1.0, 5.0], [1.3, 5.3]]))

Goals:

Acceptable performance in pure Python
Optional cython-compiled parts
Full support for the XDF 1.0 spec

Non-goals:

accept the reader output directly as input
type conversion (i.e. convert data to the stream's data type; write str objects to a string stream)
write everything in C and call a compiled library

New Release

Hi guys,

is it possible to make a new release to include the feature I added a few months ago in PR #105?

It would be very much appreciated!

Clean up branches

@tstenner do you still need your branch tstenner-patch-1 in this repo? If not, please remove it. If you're still working on it, I'd prefer if you moved it to your fork.

Bump Python to >= 3.9

pyxdf seems to support Python >= 3.5, at least based on the badge we're showing in the README.md. I don't think that this is accurate, and in any case I propose to bump the minimum required Python version to 3.9. All versions <= 3.7 are EOL, and support for 3.8 will end soon (see here). The scientific packages ecosystem has converged on 3.9 as its current minimum Python version, so I think we should do the same. OK with everyone?

(This would allow us to e.g. get rid of the try/except hoops we're currently jumping through in #100.)

Make progress bar

Hello!
I need to make a progress bar to display the xdf file loading process.
I am using the pyqt5 GUI.

I have now added my function (signal) to the main loop, but I can't accurately calculate the loading percentage from 0-100%.

with open_xdf(filename) as f:
    # for each chunk
    while True:
        # noinspection PyBroadException
        try:
            # read [NumLengthBytes], [Length]
            chunklen = _read_varlen_int(f)

---> progress.emit(1) # My function!!!

        except EOFError:
            break
        except Exception:

How can I do this?

error: unpack requires a buffer of 4 bytes

I get the following error when I try to read one of my .xdf files:

~/lib/python3.7/site-packages/pyxdf/pyxdf.py in load_xdf(filename, select_streams, on_chunk, synchronize_clocks, handle_clock_resets, dejitter_timestamps, jitter_break_threshold_seconds, jitter_break_threshold_samples, clock_reset_threshold_seconds, clock_reset_threshold_stds, clock_reset_threshold_offset_seconds, clock_reset_threshold_offset_stds, winsor_threshold, verbose)
    258             log_str = ' Read tag: {} at {} bytes, length={}'.format(tag, f.tell(), chunklen)
    259             if tag in [2, 3, 4, 6]:
--> 260                 StreamId = struct.unpack('<I', f.read(4))[0]
    261                 log_str += ', StreamId={}'.format(StreamId)
    262             else:

error: unpack requires a buffer of 4 bytes

Although, I can easily read the same file using EEGLab in Matlab. Do you have any recommendations on how I can solve this?
Thanks.

Create what's new

We should document all changes in a what's new file. I suggest that we create a Markdown file in the project root, but I'm not sure if there's a best practice regarding file name or format.

Rename master -> main

Any objections? This is a really painless process, and we should do it sooner than later.

Replace Azure Pipelines with GitHub Actions

Our Azure Pipeline workflows have been dysfunctional for some time now. This means that neither tests nor publishing on PyPI works. We should replace this with suitable GitHub Actions and restore automatic publishing whenever a new release is created.

Merge streams

I am looking for a way to merge multiple streams (like EEG, ECG, events) into a single stream or a dataframe in python. Any suggestions? Thank you.

Incompatible with numpy 1.24

pyxdf does not work with numpy 1.24. It works with numpy up to 1.23.
The 1.24 version raises:

  File ~/pyvenv/mscheltienne/eeg-flow/lib/python3.10/site-packages/eeg_flow/io.py:32 in load_xdf
    streams, _ = pyxdf.load_xdf(fname)

  File ~/pyvenv/mscheltienne/eeg-flow/lib/python3.10/site-packages/pyxdf/pyxdf.py:303 in load_xdf
    temp[StreamId] = StreamData(hdr)

  File ~/pyvenv/mscheltienne/eeg-flow/lib/python3.10/site-packages/pyxdf/pyxdf.py:37 in __init__
    string=np.object,

  File ~/pyvenv/mscheltienne/eeg-flow/lib/python3.10/site-packages/numpy/__init__.py:284 in __getattr__
    raise AttributeError("module {!r} has no attribute "

AttributeError: module 'numpy' has no attribute 'object'

Add XDF support in MNELAB - please test

I'm currently working on adding XDF support to MNELAB: cbrnr/mnelab#22

If you have time, please feel free to give it a try and let me know what you think (it's in a very preliminary state right now, but it already shows a list of streams contained in the file). This also requires #19.

New release on PyPI (1.15)

@cboulay I think we should make a new release on PyPI because we fixed some important bugs. Do you have time to upload the new release? You might want to consider merging #10, #11, and #13 before that.

Nominal vs. effective sampling rate

I recently worked with an XDF file containing two streams, an EEG stream and a marker stream. I noticed that when using the nominal sampling rate (e.g. 1000 Hz), the two streams drift over time when the effective sampling rate differs (even slightly). In my data, the effective sampling rate was 1000.01218...Hz. At the end of the recording, this differences adds up to several milliseconds.

This difference can be problematic if I want to match marker stream events to events (e.g. spikes) in a channel in the EEG stream, because their difference will increase with time. Therefore, I was wondering if I should just use the effective sampling rate for the EEG stream. Is this a good idea? What is more precise, the nominal sampling rate claimed by the amp or LSL time stamps (which in most cases will use standard computer clocks)?

Import version from pyxdf.py

In setup.py, we should not define the version number again, because we already do that in pyxdf.py.

OSError: Invalid XDF file

Upon trying to load the xdf file using pyxdf, I get this error however, the file is good and works in matlab. Please advise.

Single PC clock offset vs. latency and jitter

Hi - My setup has a Cognionics headset and a heart rate monitor connected to the same PC via bluetooth, with both peripherals sending data to an LSL stream that is being recorded by the LabRecorder App on the same PC. Because this is a single PC setup, I am having trouble understanding the clock offset measurements that I'm seeing in my xdf file and how they relate to latency and jitter.

My code to analyze this xdf file is in python so I make use of your pyxdf library, but I'm not sure how to interpret the data that I am seeing.

import pyxdf

# read raw data
in_file = 'baseline.xdf'

data, header = pyxdf.load_xdf(in_file)

# disaggregate streams
for stream in data:
    [stream_name] = stream['info']['name']
    [stream_type] = stream['info']['type']
    if (stream_type == 'ECG'):
        ecg_stream = stream
    elif (stream_type == 'EEG'):
        eeg_stream = stream

When I examine ecg_stream, I see a series of timestamps and clock offsets:

>>> eeg_stream['footer']['info']
{'first_timestamp': ['1386534.7123869'],
             'last_timestamp': ['1387116.712358'],
             'sample_count': ['291008'],
             'clock_offsets': [defaultdict(list,
                          {'offset': [defaultdict(list,
                                        {'time': ['1386540.36854905'],
                                         'value': ['-2.094986848533154e-05']}),
                            defaultdict(list,
                                        {'time': ['1386545.3686529'],
                                         'value': ['-1.600000541657209e-05']}),
                            defaultdict(list,
                                        {'time': ['1386550.3687949'],
                                         'value': ['-2.410006709396839e-05']}),
 ...
                            defaultdict(list,
                                        {'time': ['1386575.37059335'],
                                         'value': ['-2.505001612007618e-05']}),
...

I can also see a time_series of what I assume is the time at which each sample is received:

>>> eeg_stream['time_series']
array([[ 7.6612516e+04, -1.0331230e+03, -1.2857312e+05, ...,
         1.1788505e+06,  9.0000000e+00,  0.0000000e+00],
       [ 7.6611164e+04, -1.0314032e+03, -1.2857277e+05, ...,
         1.1788505e+06,  1.0000000e+01,  0.0000000e+00],...

After reading the FAQs and the Time Synchronization page as well as looking through the load_xdf function.

I still have the following questions.

I have a series of timestamps in 'times_series' for each peripheral. I assume that this is the local cpu clock time at which each sample is received without any corrections applied? Or are these corrected timestamps?
For a single PC setup, what are the clock offset measurements actually measuring? They seem to be used here in _clock_sync but I'm not sure how they are being used to apply corrections (if any) to the raw time stamps.
Dejittering seems to be done here by simply assuming a constant sampling rate for each uninterrupted segment (i.e. # of samples received / duration of samples).
Is there any way to estimate uncorrected latency and jitter based on the data contained in this xdf file?
How would I go about estimating the # of samples that were dropped / lost, if any?

Thanks!

Licensing

The main module pyxdf.py currently contains the following copyright statement at the top of the file:

Copyright (c) 2015-2018, Syntrogi Inc. dba Intheon

The docstring of load_xdf contains another copyright notice:

    License:
        This file is covered by the BSD license.

        Copyright (c) 2015-2018, Syntrogi Inc. dba Intheon

        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are
        met:

            * Redistributions of source code must retain the above copyright
              notice, this list of conditions and the following disclaimer.
            * Redistributions in binary form must reproduce the above copyright
              notice, this list of conditions and the following disclaimer in
              the documentation and/or other materials provided with the
              distribution

        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
        "AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
        LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
        A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
        OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
        SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
        LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
        DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
        THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (
        INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

How do we proceed? First, we need to update the year to 2019. Second, I am starting to change more in the code so it would be nice if this was acknowledged somewhere. Third, I think we should put the current maintainers as copyright holders (but of course continue to mention Syntrogi Inc. dba Intheon). Here's a suggestion of what I'd put at the top of the file:

Authors: Syntrogi Inc. dba Intheon
         Clemens Brunner <mail address>
         Tristan Stenner <mail address>
         Chadwick Boulay <mail address>
         <others?>

License: BSD (2-clause)

Then I'd remove the copyright and license notice from the function docstring completely.

It would be even nicer if we could add the names of the original authors instead of/in addition to Syntrogi (@chkothe?).

Thoughts @cboulay @tstenner?

"Reading chunk length error" after a Python crash at the end of the recording, any solution ?

Hi guys,
I'm trying to read .xdf files with the pyxdf.load_xdf() function but I have this error for some of my files :

Error reading chunk length
Traceback (most recent call last):
File "/user/jbelo/home/anaconda3/lib/python3.7/site-packages/pyxdf/pyxdf.py", line 237, in load_xdf
chunklen = _read_varlen_int(f)
File "/user/jbelo/home/anaconda3/lib/python3.7/site-packages/pyxdf/pyxdf.py", line 487, in _read_varlen_int
raise RuntimeError("invalid variable-length integer encountered.")
RuntimeError: invalid variable-length integer encountered.
got zero-length chunk, scanning forward to next boundary chunk.

For those files, Python crash during the recording at the very end of the experiment and even if the experiment was already finished it seems that the files were corrupted by it.
I check in the files and there is no "Footer" dict.
And the second problem is when I want to transform my .xdf files in .fif ones (using MNElab), it doesn't work probably because of this error.

Do you have any suggestion to solve this problems or those files are definitely unusable ?

Thanks in advance,
Joan

unpack requires a buffer of 8 bytes

Hi! I am trying to read in an .xdf file (streamed physiological data). However, i get the following error:

  File "C:\Users\agleo\anaconda3\lib\site-packages\pyxdf\pyxdf.py", line 335, in load_xdf
    struct.unpack("<d", f.read(8))[0]

error: unpack requires a buffer of 8 bytes

This error only occurs at a certain point in reading a file. In other words, if turn the debug output on, this only happens way down the line.

Error when dejitter_timestamps=False on stream without samples

When calling load_xdf(... dejitter_timestamps=False) on a file containing a stream without any samples, an IndexError is thrown in line pyxdf.py:384

duration = stream.time_stamps[-1] - stream.time_stamps[0] IndexError: index -1 is out of bounds for axis 0 with size 0

This is because the first and last element of this empty array are accessed. For that matter, if the stream had only a single sample, duration would be 0 and the next line stream.effective_srate = len(stream.time_stamps) / duration would cause a division by 0 error.

This could be fixed by adding a condition

# Default effective sampling rate to 0 if stream does not contain any samples
if len(stream.time_stamps) <= 1:
    stream.effective_srate = 0
    break

I couldn't manage to open a new pull request, but please let me know if you need further information. I'm using pyxdf==1.16.3 but the error seems to be still in #5384490

Rename xdf-Python to xdf-python

I know this is a tiny nitpick, but could we rename this repository to xdf-python (with a small p)? It kind of hurts my eyes everytime I look at it (which is admittedly a very subjective argument), and not mixing upper and lower case in names also causes fewer problems (e.g. on non-case-sensitive file systems that are still used e.g. on macOS by default).

Critical Bug in _jitter_removal

Hello,

Please forgive me if I am posting this in an undesired format. I simply want to explain the bug, how I found it, and what the solution is.

The bug
When jitter removal is enabled, the current code creates an invalid mapping, thus distorting time. I am not sure if the effect is as dramatic as what I have experienced, but in my case it renders my markers completely useless.

Problematic code (line 563 of pyxdf.py):
indices = np.hstack((0, indices, indices, nsamples - 1))

How I Found it
I collected some pilot data and realized that my classifier was at chance. After going through all of the relevant code, I determined that my filtering and classifying were being conducted correctly. I then ran my analysis on simulated data, and the result was perfect. I then started analyzing my code for epoching the data. Luckily, I have started collecting photosensor channels so as not to rely too heavily on software markers. When I examined the markers position relative to the photosensor, I realized they were not aligned:

I then decided to look at my data in MATLAB via the load_xdf() function shipped with EEGLAB, and my data looked as expected:

After that, I decided to individually change all of the options in python and discovered that when dejitter_timestamps was set to False, my data looked as expected:

After determining that the removing of jitter was the problem, I went through each line of the relevant code in MATLAB and python and discovered that the ranges being used were inconsistent. For example, with my data, MATLAB first finds a problem in the range [2 11]. I expected to see that python would show the same problem in [1 10], but instead it finds a problem at [0 10]. I noticed that every index that python found had a lower bound by 1 index compared to MATLAB, even when taking 1-indexing into account:

This subsequently caused a discrepancy in the mapping values between python and MATLAB.

Solution
Increment the first index by 1 to match MATLAB.
indices = np.hstack((0, indices+1, indices, nsamples - 1))

Conclusion
An easy fix for a problem that had me question everything. Please let me know if you would like my data to test this solution on. Again, I'm not certain this is always a problem, but it certainly was in my case.
The setup I'm using is an actiCHamp streaming data via the RDA with 64 + 8AUX at 500 Hz. Markers are sent using pylsl. Photosensor is activated on the same frame as my stimulus display, and I have done extensive testing to validate its precision and accuracy.

Thank you,
Ollie

Re-add verbose argument to `load_xdf`

The short of it is that I updated some Intheon code to no longer use 'verbose' in the call to load_xdf. Then @chkothe reverted that changed and pinned their use of pyxdf to an older version. So obviously this is important to them though I haven't had a conversation as to why.

Looking back, it seems that I'm the one that removed it in this commit. The only thing related I see is this comment. So I guess my interpretation was "why use a 'verbose' argument when you should be setting the logger level separately?".

So if we re-add the verbose argument, does that mean we should use it to override the logger level? Or if verbose is set, do we skip the logger output and print to screen?

Requesting input for CI

We need to setup CI for this repo. It seems like there have been some changes in the CI landscape recently. Any recommendations?

For now the project is pure Python so we really only need a Linux host, but this might not always be true.

Azure Pipelines broken?

It seems like the automatic publishing step doesn't work anymore. I just created a new tag v1.16.4, but the version on PyPI is still 1.16.3. I can't even find evidence that Azure Pipelines was running the job, and I don't seem to have access to its settings either (I was expecting to see this in the "Actions" section here in the GitHub repo).

@cboulay can you check what's going on? And maybe you could also add me to Azure Pipelines so that I can fix it myself next time? That is, if Azure Pipelines run under your personal account, we should probably create a new one for this organization and repo (i.e. xdf-modules/pyxdf).

Alternatively, we could also switch to GitHub Actions, which doesn't require any external services.

xdf_load does not synchronize clocks

Hi,

Loading the xdf file I created using labRecorder did not sync my streams. Specifying synchronize_clocks=True and playing around with other parameters did not work. Any idea where the error might be? Attached an example xdf file, note the timestamp difference between streams 0 and 2.

I had to add a txt at the end for github to let me upload the file. Please rename it to test.xdf before opening.

Thanks,
Sari
test.xdf.txt

Release 0.16.0

I think it's time for a new release. We have a new feature (reading only specific streams) and important fixes. I can do the release, but let me know if there's something stopping us. @cboulay @tstenner

Using select_streams argument traverses file twice

Currently, when one uses the select_streams argument, it will first traverse the file's chunks to find/parse the headers, and sets some flags used subsequently for a check during the load_xdf main loop, which traverses and parses the file again*. It does that using an alternative code path that mirrors the chunk traversal and header parsing of the main loop, but currently skips, among others, some error checking in case of file corruptions etc.

Side note: I do like the idea of breaking load_xdf up into smaller subroutines, and I like the generator approach of _read_chunks, i.e., there may be a possibility to refactor load_xdf in this overall style, especially if the logic were to get more complex over the years. However, as it stands, the current main loop is still pretty simple and easy to follow (esp when one reads it alongside the spec), and could continue to serve as an approachable reference implementation for future language ports for a couple more years. So I'm not ready to pick a side at this point, also considering the effort that a full & clean refactor would come down to.

For now, maybe a way to reconcile the code duplication (which I hope is temporary) and double-traversal of the file could be to add a self.skip (as in, skip processing chunks of this stream) bool in the StreamData constructor, and we could move the matching logic there or into a helper method/function. This way, it would run the first time a header is encountered, and then whenever one sees a chunk of that stream, one can, roughly, do an if streams[StreamId].skip: continue near the place where it currently does that check. We could then earmark the remainder of that alternative code path (parse_chunks, _read_chunks, ...) for future consideration when or if we take on a refactor of load_xdf in this general style (maybe with a git tag).

I'd be willing to implement the suggested change (using StreamData.skip) this week if there's no objection. I think this may also get us a closer to a future simple and fast load-only-headers option for load_xdf (some time soon I'm hoping to have a separate discussion on that).

(*): The double-traversal may not sound like much, it'd be relatively more costly on a network file system.