changing time resolution problem in preprocessing (i assume)

When changing the below line to "time_resolution =4" in the config, then step two (test) returns a signal length mismatch error :

I think this is because in the preprocessing file the time resolution is hard coded?

desired time resolution of the automated annotation (in seconds)

time_resolution = 1

EDIT: I think the issue may be in this function here:
convert_state_vector_to_state_intervals(state_vector, time_resolution=1., mapping=None):

[Solved] Alaska2 branch: Mismatch lengths of signal array vs state vector with 10s epoch in spite of running the truncation routine

The problems seems to be rooted in that my EDF data have a record length covering 5 seconds, thus sometimes having an odd number of records, while alignment at with time_resolution = 10 would require an even number of records in the data file. This is appearing with the sequence: preprocess - truncate - test state annotation. Ignoring the error message from test state annotation and running training will give the same error and training stalls. However this is not consequent (odd and even referring to number of records in data file):
a) file1 even + file2 odd: OK
b) file1 odd + file2 even: Error
c) file1 even + file2 odd + file3 even: OK
d) file1 even + file2 odd + file3 odd: Error
At time resolution = 5 errors do not occur (expected).
At time resolution = 10 and only running the sequence preprocess - automated_state_annotation , errors do not occur.

Would it be possible to check for records that do not align to the set time resolution on load and truncate them in memory? I notice some code in data_io.py under _def load_edf_channels(signal_labels, edf_reader) :

total_samples = [edf_reader.samples_in_file(idx) for idx in indices]
    assert len(set(total_samples)) == 1, "All signals need to have the same length! Lengths of selected signals: {}".format(total_samples)
    total_samples, = set(total_samples)

    output_array = np.zeros((len(signal_labels), total_samples), dtype=np.int32)
    for jj, idx in enumerate(indices):
        edf_reader.read_digital_signal(idx, 0, total_samples, output_array[jj])

However this code appears to only check for different channels having the same length in the record?

The motivation for trying to make this work with time resolution = 10 is that current testing indicates, opposite of what was expected, that results have more accuracy relative to the manual scores at 10 sec than when tested with 5s or 1 sec time resolution.

I regret not truncating all files at the 10 sec border instead of at the 5 sec record when converting them to EDF. An alternative solution would be for me to write a dedicated routine to truncate only the un-aligned test samples that we are manually scoring.

While this has been tested on my Alaska2 branch, the same behavior would perhaps be expected in the main branch, or at least in the alaska branch I used as starting point.

Edit: I just noticed issue: #4 which seems to be related. I missed it when posting as it was closed.

Edit2: Paul @paulbrodersen , It is not as simple as outlined by a) and b) as even with swapping the odd and even file order in test a) I still got the error.

Problem with pomegranate library during run 02_test_state_annotation.py

Hi, I installed (Windows) the somnotate version 0.3.3, and pomegranate 0.14.8 with Python 3.12.4.
I installed somnotate in conda environment by using pip.
During the running example_pipeline/02_test_annotation.py, I encounter this error:

** File "C:\Users\Utente\somnotate\somnotate-master\example_pipeline\02_test_state_annotation.py", line 12, in
from somnotate.automated_state_annotation import StateAnnotator
File "C:\Users\Utente\somnotate\somnotate-master\example_pipeline\somnotate_automated_state_annotation.py", line 10, in
from pomegranate import (
File "C:\Users\Utente\anaconda3\envs\my_somnotate_virtual_environment_name\pkgs\pomegranate-0.14.8\pomegranate_init.py", line 11, in
from .base import *
ModuleNotFoundError: No module named 'pomegranate.base'**

Screenshot of the folder pkgs\pomegranate-0.14.8\pomegranate attached here.

Thanks

state likelihood output

Hi again, sorry for posting multiple issues! I am a bit confused with the outputs I'm getting from somnotate - i have an 'intervals.hyp' file to which i assume the score it provides is the likelihood of each state scored - however it does not say within this spreadsheet which state that was. This also does not correspond clearly with the 'automated_state_annotation.hyp' file to match up which state it is referrring to. Am I missing a step?

Another thing, I assumed this score is a probabiltiy up to 1, but many of the values I am getting are above 1 - unless it is actually a percentage up to 100 - but if so, the majority of my scores are below 10 and so i guess it hasn't worked so well.

"List index out of range" error with --only parameter if numbers are skipped in the list of files

Gaps in the sequence of --only parameters will cause a "list index out of range" error. Let test1.csv be a list with 6 valid entry rows. (In the example below I am leaving out the full paths used.)
02_test_state_annotation.py Beartest1n.csv" --only 0 1 2 3 4 5 or
02_test_state_annotation.py Beartest1n.csv" --only 0 1 2 executes without errors.
However
02_test_state_annotation.py Beartest1n.csv" --only 0 1 3 4 5 or
02_test_state_annotation.py Beartest1n.csv" --only 0 2 executes with error on the first file pointed to by the index after the gap or the one after it :

Traceback (most recent call last):
  File "C:\somnotate\example_pipeline\02_test_state_annotation.py", line 116, in <module>
    accuracy[ii] = annotator.score(signal_arrays[ii], np.abs(state_vectors[ii]))
IndexError: list index out of range

I am not sure if this is a limitation or a bug. It is not important to me as I prefer to create dedicated .csv files, so I do not intend to work on it. It looks like the index used is the index of the row in the file while it should have the one counted by the number of indexes in the only parameter, or vice versa.

File paths are os dependent

The file paths read in from the spreadsheets are kept in strings and we keep running into problems of getting backslashes/forwardslashes as we keep changing from unix to windows.

Worth updating to read them in as pathlib objects?

Recordings of different lengths

When running '02_test_state_annotation.py', it seems to have an issue when my edfs are of different lengths. I typically record for different amounts of time for different animals so truncating all to the same length is not an option. Can you please help with a solution for this? Error below:

Loading data sets.. .C:\Users\rp17047\22qScore\processed\148139_301123.npy (1/3) C:\Users\rp17047\somnotate\somnotate\_utils.py:145: UserWarning: Interval values are converted from floats to integers. warnings.warn("Interval values are converted from floats to integers.") C:\Users\rp17047\22qScore\processed\148140_301123.npy (2/3) C:\Users\rp17047\22qScore\processed\148144_281123.npy (3/3) Testing... C:\Users\rp17047\somnotate\example_pipeline\02_test_state_annotation.py:94: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. unique_states = np.unique(np.abs(state_vectors)) Traceback (most recent call last): File "C:\Users\rp17047\somnotate\example_pipeline\02_test_state_annotation.py", line 94, in <module> unique_states = np.unique(np.abs(state_vectors)) File "<__array_function__ internals>", line 180, in unique File "C:\Users\rp17047\AppData\Local\miniconda3\envs\somnotate_env\lib\site-packages\numpy\lib\arraysetops.py", line 274, in unique ret = _unique1d(ar, return_index, return_inverse, return_counts, File "C:\Users\rp17047\AppData\Local\miniconda3\envs\somnotate_env\lib\site-packages\numpy\lib\arraysetops.py", line 336, in _unique1d ar.sort() ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

TypeError: 'tuple'

When trying to run the '05_manual_refinement.py' script, the gui does not seems to work properly and almost every one of the keys i press (except 'w', 'n' ,and 'r' for state changes) results in the error below. I think this might be due to the version of matplotlib that i am using. Generally, it would be useful to get a list of the exact versions of all your dependencies if possible.

Traceback (most recent call last): File "C:\Users\rp17047\AppData\Local\miniconda3\envs\somnotate_env\lib\site-packages\matplotlib\cbook.py", line 298, in process func(*args, **kwargs) File "C:\Users\rp17047\somnotate\somnotate\_manual_state_annotation.py", line 829, in _on_motion super(TimeSeriesStateAnnotator, self)._on_motion(event) File "C:\Users\rp17047\somnotate\somnotate\_manual_state_annotation.py", line 274, in _on_motion self._handle_hold_click(event) File "C:\Users\rp17047\somnotate\somnotate\_manual_state_annotation.py", line 264, in _handle_hold_click self._update_selection(*sorted([self.button_press_start, event.xdata])) File "C:\Users\rp17047\somnotate\somnotate\_manual_state_annotation.py", line 600, in _update_selection vertices[[0, 1, -1], 0] = selection_lower_bound TypeError: 'tuple' object does not support item assignment

Different recording lengths, sample rates and vigilance states

Hello,

I have been trying to get somnotate running on my mouse EEG data but have run into several issues.

When running '02_test_state_annotation.py', it seems to have an issue when my edfs are of different lengths. I typically record for different amounts of time for different animals so truncating all to the same length is not an option. Can you please help with a solution for this? Error below:

Loading data sets.. .C:\Users\rp17047\22qScore\processed\148139_301123.npy (1/3) C:\Users\rp17047\somnotate\somnotate\_utils.py:145: UserWarning: Interval values are converted from floats to integers. warnings.warn("Interval values are converted from floats to integers.") C:\Users\rp17047\22qScore\processed\148140_301123.npy (2/3) C:\Users\rp17047\22qScore\processed\148144_281123.npy (3/3) Testing... C:\Users\rp17047\somnotate\example_pipeline\02_test_state_annotation.py:94: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. unique_states = np.unique(np.abs(state_vectors)) Traceback (most recent call last): File "C:\Users\rp17047\somnotate\example_pipeline\02_test_state_annotation.py", line 94, in <module> unique_states = np.unique(np.abs(state_vectors)) File "<__array_function__ internals>", line 180, in unique File "C:\Users\rp17047\AppData\Local\miniconda3\envs\somnotate_env\lib\site-packages\numpy\lib\arraysetops.py", line 274, in unique ret = _unique1d(ar, return_index, return_inverse, return_counts, File "C:\Users\rp17047\AppData\Local\miniconda3\envs\somnotate_env\lib\site-packages\numpy\lib\arraysetops.py", line 336, in _unique1d ar.sort() ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I have 4 vigilance states i want to score - wake, nrem, rem and artefact (any stage). I have modified these in the configuration file but i am getting the following error when running '02_test_state_annotation.py' .

Loading data sets... C:\Users\rp17047\22qScore\processed\148139_301123.npy (1/2) C:\Users\rp17047\somnotate\somnotate\_utils.py:145: UserWarning: Interval values are converted from floats to integers. warnings.warn("Interval values are converted from floats to integers.") C:\Users\rp17047\22qScore\processed\148140_301123.npy (2/2) Testing... C:\Users\rp17047\22qScore\processed\148139_301123.npy (1/2) accuracy : 91.5% Traceback (most recent call last): File "C:\Users\rp17047\somnotate\example_pipeline\02_test_state_annotation.py", line 114, in <module> confusion[ii] = get_confusion_matrix(np.abs(state_vectors[ii]), annotator.predict(signal_arrays[ii]), labels=unique_states) ValueError: could not broadcast input array from shape (3,3) into shape (4,4) (somnotate_env) C:\Users\rp17047\somnotate> (somnotate_env) C:\Users\rp17047\somnotate>python C:\Users\rp17047\somnotate\example_pipeline\02_test_state_annotation.py C:\Users\rp17047\22qScore\TrainingSetshort.csv Loading data sets... C:\Users\rp17047\22qScore\processed\148139_301123.npy (1/2) C:\Users\rp17047\somnotate\somnotate\_utils.py:145: UserWarning: Interval values are converted from floats to integers. warnings.warn("Interval values are converted from floats to integers.") C:\Users\rp17047\22qScore\processed\148140_301123.npy (2/2) Testing... C:\Users\rp17047\22qScore\processed\148139_301123.npy (1/2) accuracy : 91.5% Traceback (most recent call last): File "C:\Users\rp17047\somnotate\example_pipeline\02_test_state_annotation.py", line 114, in <module> confusion[ii] = get_confusion_matrix(np.abs(state_vectors[ii]), annotator.predict(signal_arrays[ii]), labels=unique_states) ValueError: could not broadcast input array from shape (3,3) into shape (4,4)

I have modified the configuration file as follows, i did try leaving in the unused variables, but it did not help
state_to_int = dict([
('Wake' , 1),
('Art' , -1),
# ('sleep movement' , 1),
('N2' , 2),
# ('non-REM (artefact)' , -2),
('REM' , 3),
# ('REM (artefact)' , -3),
('undefined' , 0), ])

# Construct the inverse mapping to convert back from state predictions to human readabe labels. int_to_state = {ii : state for state, ii in state_to_int.items() if state != 'sleep movement'}

# define the keymap used for the manual annotation
keymap = { 'w' : 'Wake', 'W' : 'Art', 'n' : 'N2',
# 'N' : 'non-REM (artefact)',
'r' : 'REM',
# 'R' : 'REM (artefact)',
'x' : 'undefined',
# 'X' : 'undefined (artefact)',
# 'm' : 'sleep movement',
# 'M' : 'sleep movement (artefact)', }

# define the visual display of states
state_to_color = {
'Wake' : 'crimson',
'Art' : 'coral',
# 'sleep movement' : 'violet',
'N2' : 'blue',
# 'non-REM (artefact)' : 'cornflowerblue',
'REM' : 'gold',
#'REM (artefact)' : 'yellow',
# 'sleep movement' : 'purple',
'undefined' : 'gray',
# 'undefined (artefact)': 'lightgray',
}

state_display_order = [ 'Wake', 'Art', 'N2',
# 'non-REM (artefact)',
'REM',
# 'REM (artefact)',
# 'sleep movement',
# 'sleep movement (artefact)',
'undefined',
# 'undefined (artefact)',
]

I seemed to have solved this one, but when running '01_preprocess_signals.py' it doesn't seem to work with a sample rate other than 256Hz (I'm using 250Hz) since it seems to be importing the sample rate as a float rather than int. I managed to solve this by adding the following after line 134:
datasets['sampling_frequency_in_hz']=datasets['sampling_frequency_in_hz'].astype(int)

Traceback (most recent call last): File "C:\Users\rp17047\somnotate\example_pipeline\01_preprocess_signals.py", line 167, in <module> time, frequencies, preprocessed_signal = preprocess(signal, dataset['sampling_frequency_in_hz'], File "C:\Users\rp17047\somnotate\example_pipeline\01_preprocess_signals.py", line 84, in preprocess frequencies, time, spectrogram = get_spectrogram(raw_signal, File "C:\Users\rp17047\AppData\Local\miniconda3\envs\somnotate_env\lib\site-packages\lspopt\lsp.py", line 145, in spectrogram_lspopt H, taper_weights = lspopt(n=nperseg, c_parameter=c_parameter) File "C:\Users\rp17047\AppData\Local\miniconda3\envs\somnotate_env\lib\site-packages\lspopt\lsp.py", line 56, in lspopt h = np.vstack((np.ones((n,)), 2 * t1)) File "C:\Users\rp17047\AppData\Local\miniconda3\envs\somnotate_env\lib\site-packages\numpy\core\numeric.py", line 191, in ones a = empty(shape, dtype, order) TypeError: 'float' object cannot be interpreted as an integer

[Solved] Alaska2 branch: Inconsistent time axis in plot 2-4 running 02_test_state_annotation.py with --show option and >1s time_resolution

With time_resolution = 10, the three bottom plots are compressed to 1/10 of the top plot. Apparently the time_resolution parameter is not transmitted to one or more routines, as the three bottom plots are compressed to 1/10 of the top plot. I found that time_resolution was not included as parameter to convert_state_vector_to_state_intervals() in the two calls to that procedure. Adding that in the call fixed the two bottom annotation plots, but the transformed signal plot remains compressed along the x-axis. The problem code is likely this one:

            transformed_signals = annotator.transform(signal_arrays[ii])
            plot_signals(transformed_signals, ax=axes[1])

Sample rate importing as float

Just in case anyone else comes across this issue:

When running '01_preprocess_signals.py' it doesn't seem to work with a sample rate other than 256Hz (I'm using 250Hz) since it seems to be importing the sample rate as a float rather than int. I managed to solve this by adding the following after line 134:
datasets['sampling_frequency_in_hz']=datasets['sampling_frequency_in_hz'].astype(int)

The error i was getting:
Traceback (most recent call last): File "C:\Users\rp17047\somnotate\example_pipeline\01_preprocess_signals.py", line 167, in <module> time, frequencies, preprocessed_signal = preprocess(signal, dataset['sampling_frequency_in_hz'], File "C:\Users\rp17047\somnotate\example_pipeline\01_preprocess_signals.py", line 84, in preprocess frequencies, time, spectrogram = get_spectrogram(raw_signal, File "C:\Users\rp17047\AppData\Local\miniconda3\envs\somnotate_env\lib\site-packages\lspopt\lsp.py", line 145, in spectrogram_lspopt H, taper_weights = lspopt(n=nperseg, c_parameter=c_parameter) File "C:\Users\rp17047\AppData\Local\miniconda3\envs\somnotate_env\lib\site-packages\lspopt\lsp.py", line 56, in lspopt h = np.vstack((np.ones((n,)), 2 * t1)) File "C:\Users\rp17047\AppData\Local\miniconda3\envs\somnotate_env\lib\site-packages\numpy\core\numeric.py", line 191, in ones a = empty(shape, dtype, order) TypeError: 'float' object cannot be interpreted as an integer

Read file problems

Two issues I've found so far, no idea best way to fix them at the moment but recording here.

In 00_convert_sleepsign_files.py, I've written my sleepsign file outputs slightly differently as a csv rather than tab delineated. Might be missing how to pass the *kwarg from the terminal but I had to modify the script to include deliminator="." to get it to read the files correctly.
I've written my EDF files to continue until the data runs out, rather than being strictly 24 hours. Therefore I kept on running into problems in 03_train_state_annotations.py with it complaining the dimension is >86400. I solved this by manually going into 01_preprocess_signals.py and altering it like this.

old

for signal in raw_signals.T:
            time, frequencies, preprocessed_signal = preprocess(signal, 
                                                                dataset['sampling_frequency_in_hz'],
                                                                time_resolution_in_sec   = time_resolution,
                                                                low_cut                  = 1.,
                                                                high_cut                 = 90.,
                                                                notch_low_cut            = 45.,
                                                                notch_high_cut           = 55.,
            )
            preprocessed_signals.append(preprocessed_signal)

to new

for signal in raw_signals.T:
            time, frequencies, preprocessed_signal = preprocess(signal, 
                                                                dataset['sampling_frequency_in_hz'],
                                                                time_resolution_in_sec   = time_resolution,
                                                                low_cut                  = 1.,
                                                                high_cut                 = 90.,
                                                                notch_low_cut            = 45.,
                                                                notch_high_cut           = 55.,
            )
           
            preprocessed_signal = preprocessed_signal[:, :86400]
            preprocessed_signals.append(preprocessed_signal)

Using different vigilance states

I have 4 vigilance states i want to score - wake, nrem, rem and artefact (any stage). I have modified these in the configuration file but i am getting the following error when running '02_test_state_annotation.py' .

Loading data sets... C:\Users\rp17047\22qScore\processed\148139_301123.npy (1/2) C:\Users\rp17047\somnotate\somnotate\_utils.py:145: UserWarning: Interval values are converted from floats to integers. warnings.warn("Interval values are converted from floats to integers.") C:\Users\rp17047\22qScore\processed\148140_301123.npy (2/2) Testing... C:\Users\rp17047\22qScore\processed\148139_301123.npy (1/2) accuracy : 91.5% Traceback (most recent call last): File "C:\Users\rp17047\somnotate\example_pipeline\02_test_state_annotation.py", line 114, in <module> confusion[ii] = get_confusion_matrix(np.abs(state_vectors[ii]), annotator.predict(signal_arrays[ii]), labels=unique_states) ValueError: could not broadcast input array from shape (3,3) into shape (4,4) (somnotate_env) C:\Users\rp17047\somnotate> (somnotate_env) C:\Users\rp17047\somnotate>python C:\Users\rp17047\somnotate\example_pipeline\02_test_state_annotation.py C:\Users\rp17047\22qScore\TrainingSetshort.csv Loading data sets... C:\Users\rp17047\22qScore\processed\148139_301123.npy (1/2) C:\Users\rp17047\somnotate\somnotate\_utils.py:145: UserWarning: Interval values are converted from floats to integers. warnings.warn("Interval values are converted from floats to integers.") C:\Users\rp17047\22qScore\processed\148140_301123.npy (2/2) Testing... C:\Users\rp17047\22qScore\processed\148139_301123.npy (1/2) accuracy : 91.5% Traceback (most recent call last): File "C:\Users\rp17047\somnotate\example_pipeline\02_test_state_annotation.py", line 114, in <module> confusion[ii] = get_confusion_matrix(np.abs(state_vectors[ii]), annotator.predict(signal_arrays[ii]), labels=unique_states) ValueError: could not broadcast input array from shape (3,3) into shape (4,4)

I have modified the configuration file as follows, i did try leaving in the unused variables, but it did not help

state_to_int = dict([ ('Wake' , 1), ('Art' , -1),
# ('sleep movement' , 1),
('N2' , 2),
# ('non-REM (artefact)' , -2),
('REM' , 3),
# ('REM (artefact)' , -3),
('undefined' , 0), ])

# Construct the inverse mapping to convert back from state predictions to human readabe labels. int_to_state = {ii : state for state, ii in state_to_int.items() if state != 'sleep movement'}

# define the keymap used for the manual annotation
keymap = { 'w' : 'Wake', 'W' : 'Art', 'n' : 'N2',
# 'N' : 'non-REM (artefact)',
'r' : 'REM',
# 'R' : 'REM (artefact)',
'x' : 'undefined',
# 'X' : 'undefined (artefact)',
# 'm' : 'sleep movement',
# 'M' : 'sleep movement (artefact)', }

# define the visual display of states
state_to_color = {
'Wake' : 'crimson',
'Art' : 'coral',
# 'sleep movement' : 'violet',
'N2' : 'blue',
# 'non-REM (artefact)' : 'cornflowerblue',
'REM' : 'gold',
#'REM (artefact)' : 'yellow',
# 'sleep movement' : 'purple',
'undefined' : 'gray',
# 'undefined (artefact)': 'lightgray',
}

state_display_order = [ 'Wake',
'Art',
'N2',
# 'non-REM (artefact)',
'REM',
# 'REM (artefact)',
# 'sleep movement',
# 'sleep movement (artefact)',
'undefined',
# 'undefined (artefact)',
]

paulbrodersen / somnotate Goto Github PK

somnotate's People

Contributors

Stargazers

Watchers

Forkers

somnotate's Issues

desired time resolution of the automated annotation (in seconds)

Recommend Projects

Recommend Topics

Recommend Org

Jobs