Skip to content

Conversation

@teonbrooks
Copy link
Member

Fix attempt as fixing the overflow issue in the read_raw_cnt reader. This error has manifested with numpy upgrade.

Reference issue

Fixes #13547.

What does this implement/fix?

This follows a pattern suggested in #12907 to cast the integer to int64.

@larsoner
Copy link
Member

To read your file it needs a few more fixes actually... I'll push

@larsoner
Copy link
Member

Definitely still something wrong here...

$ python -uic "import mne; raw = mne.io.read_raw_cnt('~/Desktop/945flankers_ready.cnt', data_format='int16').load_data(); raw.plot(annotation_regex='aaa')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    import mne; raw = mne.io.read_raw_cnt('~/Desktop/945flankers_ready.cnt', data_format='int16').load_data(); raw.plot(annotation_regex='aaa')
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "<decorator-gen-190>", line 12, in load_data
  File "/home/larsoner/python/mne-python/mne/io/base.py", line 589, in load_data
    self._preload_data(True)
    ~~~~~~~~~~~~~~~~~~^^^^^^
  File "/home/larsoner/python/mne-python/mne/io/base.py", line 601, in _preload_data
    self._data = self._read_segment(data_buffer=data_buffer)
                 ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<decorator-gen-189>", line 12, in _read_segment
  File "/home/larsoner/python/mne-python/mne/io/base.py", line 420, in _read_segment
    data = _allocate_data(data_buffer, data_shape, dtype)
  File "/home/larsoner/python/mne-python/mne/io/base.py", line 2577, in _allocate_data
    data = np.zeros(shape, dtype)
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 2.06 TiB for an array with shape (66, 4294966564) and data type float64

Same error if I use data_format='int32'. If I remove the .load_data and use data_format='int32' the plot at least looks okay

image

So need to figure out the n_samples issue, 4294966564 samples for 66 channels is totally unreasonable for a 150MB file...

Comment on lines -140 to +142
(n_samples,) = np.frombuffer(fid.read(4), dtype="<i4")
fid.seek(_NSAMPLES_OFFSET)
n_samples = int(np.frombuffer(fid.read(4), dtype="<u4").item())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, this was <i4 here but <u4 in cnt.py. When we use <i4 in cnt.py we get a negative n_samples (-732)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reverted it back to "<i4", it should be a signed int. I thought something was wrong when I first encountered it sign-wise.

but yeah, I'm getting the same values as you with the "<i4". I guess that's why there is the later implementation mentioned below.

@larsoner
Copy link
Member

@teonbrooks I'm done pushing/looking for now, I hope the changes I made help debugging a bit more. Something is wrong with n_samples here, it gets read as 4294966564 ...

@teonbrooks
Copy link
Member Author

thanks @larsoner!

came across this post and adding it here for reference in the future
https://paulbourke.net/dataformats/eeg/

@teonbrooks
Copy link
Member Author

@teonbrooks I'm done pushing/looking for now, I hope the changes I made help debugging a bit more. Something is wrong with n_samples here, it gets read as 4294966564 ...

according to the link above, it looks like this is not an uncommon occurrence:

Experience has shown that many (most) of the fields are not filled out correctly by the software. In particular, the best way to work out the number of samples is

it looks like n_samples should be calculated as:

nsamples = SETUP.EventTablePos - (900 + 75 * nchannels) / (2 * nchannels)

@larsoner
Copy link
Member

Great, can you add some comments / links in the code for the next time we dig into this, and try the suggested fix?

@teonbrooks
Copy link
Member Author

added a note. after trying it out and I look more closely at the code, it looks as though the n_samples logic is already there starting at https://github.com/mne-tools/mne-python/blob/main/mne/io/cnt/cnt.py#L339.

@teonbrooks
Copy link
Member Author

I actually don't know what to do about the n_samples. it looks like the code already is trying to best handle the data without knowing the data_format and with the header not having a reliable header entry for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Overflow Error with read_raw_cnt reader

2 participants