GithubHelp home page GithubHelp logo

Comments (78)

evanmiller avatar evanmiller commented on June 15, 2024

Hi,

ReadStat is designed to work on big endian architectures (see e.g. https://github.com/WizardMac/ReadStat/blob/master/src/readstat_sas.c#L266).

Do you get the same errors on a little endian machine?

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Hmm, we only have AIX, so not sure about Little endian machine. Let me try to get my hands on a little endian machine with sas to generate a sas file. Then I will go back to the original machine to try to figure out if it's an issue with ReadStat, haven, AIX, or my particular machine.

Any other thoughts with the debugging process? Thanks.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Re-tried on a tiny file and it works. Tried other files that I had and things work. It's only failing on some large files, and I believe it's been reported here, so I'm tracking that now. Sorry for jumping the gun.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

No worries... if you are able to share specific files that won't load I will look into it.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Ok, I will try to generate a file that wont load. The thread that I referenced did contain links to files that will not load. Have you tried those?

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

As I mentioned tidyverse/haven#92 (comment) the linked file worked for me, so in that case I suspect it's an out-of-date version of haven (or ReadStat) being used. I am always on the lookout for files that fail with the latest version of ReadStat though :-)

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Hmm, I generated a 700mb (10,000 columns, 10,000 rows) sas file with the following

data libfoo.bar ;
    call streaminit(123);
    array x{10000} x1-x10000 ;
    do i=1 to 10000 ;
        do j=1 to dim(x) ;
            x(j) = rand('Uniform') ;
        end ;
        output ;
    end ;
run ;

and I get the following with haven 0.2.0 + R 3.2.2 on AIX:

> library(haven)
> d1 = read_sas('bar.sas7bdat')
Error: cannot allocate vector of size 78 Kb
> d1 = read_sas('bar.sas7bdat')
Error: Failed to parse /sas/data04/vinh/bar.sas7bdat: Invalid file, or file has unsupported features.
> d1 = read_sas('bar.sas7bdat')
Error: Failed to parse /sas/data04/vinh/bar.sas7bdat: Invalid file, or file has unsupported features.

On a Windows machine running Revolution R Open 3.2.2, I get:

> library(haven)
> d1 = read_sas('bar.sas7bdat')
> dim(d1)
[1] 10000 10002

Could it be an issue with my compiled R on AIX? Probably not because I played around with the number of rows of data and was able to get a similar error on my Windows R at 30,000 rows:

Error: Failed to parse G:\path\to\bar.sas7bdat: Invalid file, or file has unsupported features.

I uploaded my test data set here. Curious if it's a haven or ReadState issue, and hopefully a fix will be forthcoming. Thanks so much!

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Please try again with the development version of haven -- 0.2.0 is a bit out of date at this point.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Darndest thing. I just tried the same file on Mac OS X with RRO 3.2.2 and haven 0.2.0 and the file loaded. Weird. I checked the md5sum on all 3 platforms (AIX, Windows, and Mac) and they are all the same: 79a7aaf549449ab3c296b05b152841c2.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Just tried the dev version of haven (0.2.0.9000) on Windows and I get the same error.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Thanks for the additional information. So to clarify, with the dev version of haven:

  • Mac OS X / Little Endian
  • Windows / Little Endian:
  • AIX / Big Endian:

Is that correct?

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

The sas file was created on AIX, so big endian for all.

Haven dev on Windows: no go.
Haven 0.2.0 on Mac: works.
Haven 0.2.0 on AIX: no go (cant get dev here because i cant get R curl package and devtools compiled yet)

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Got it, thanks.

Are you able to create a file < 2 GB that fails on Windows? I ask because the file you shared is just over 2.2 GB, which indicates it might be a 32-bit / 64-bit issue of some kind.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

I could play with it some more tomorrow. I tried 10k, 15k, 20k, 25k, and 30k rows. The last broke it. I couls try 27.5k, etc, and check file size.

From the sound of the previous thread, the file size was also 2ish gb. I did try that file on mac and it worked. I could try it on windows tomorrow too.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

From the looks of your testing, I'm going to guess it's a 32-bit / 64-bit issue. It would help to know exactly when the error is produced, either if haven provides some kind of progress bar, or a simple timing test comparing e.g. time to parse 25k successfully vs time of error when parsing 30k.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

The following applies to Windows RRO 3.2.2 (64-bit) with haven-dev.

Regarding file size:

  • Success at 26,500 rows (2145046528 bytes)
  • Failed at 26,750 rows (2165270528 bytes)

Regarding timing, when it errors out, it errors out immediately:

## no error
> system.time(d1 <- read_sas('bar.sas7bdat'))
   user  system elapsed 
  60.03    1.66   83.72

## error
> system.time(d1 <- read_sas('bar.sas7bdat'))
Error: Failed to parse G:\path\bar.sas7bdat: Invalid file, or file has unsupported features.
Timing stopped at: 0 0 0.09 

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Ok perfect, this is pretty good evidence that there's some 32-bit issue in the code (2^31 = 2147483648 bytes). It's also helpful knowing that it errors out immediately. Let me poke around the code and get back to you.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Can you tell me what the first 5 bytes are in the failing and non-failing files?

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

You mean in binary form? Or in the sas data (variable x1)? If former show me how. I can in about an hour.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

I just mean in binary form. Actually the first several bytes if you don't mind. Something like

hexdump bar.sas7bdat | head

should do the trick. If you don't have hexdump sometimes the command hd will do it.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

(By "several" I think I need the first 40 bytes)

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

hexdump head of success file:

$ hexdump bar.sas7bdat | head
0000000 0000 0000 0000 0000 0000 0000 eac2 6081
0000010 14b3 cf11 92bd 0008 c709 8c31 1f18 1110
0000020 2233 3300 0033 3102 0001 0000 0000 0000
0000030 0000 0103 1f18 1110 2233 3300 0033 3102
0000040 3301 2301 0033 001d 2000 0103 0000 0000
0000050 0000 0000 4153 2053 4946 454c 4142 2052
0000060 2020 2020 2020 2020 2020 2020 2020 2020
*
0000090 2020 2020 2020 2020 2020 2020 4144 4154
00000a0 2020 2020 0000 0000 da41 3838 920e 1e4f

hexdump head of failed:

$ hexdump bar.sas7bdat | head
0000000 0000 0000 0000 0000 0000 0000 eac2 6081
0000010 14b3 cf11 92bd 0008 c709 8c31 1f18 1110
0000020 2233 3300 0033 3102 0001 0000 0000 0000
0000030 0000 0103 1f18 1110 2233 3300 0033 3102
0000040 3301 2301 0033 001d 2000 0103 0000 0000
0000050 0000 0000 4153 2053 4946 454c 4142 2052
0000060 2020 2020 2020 2020 2020 2020 2020 2020
*
0000090 2020 2020 2020 2020 2020 2020 4144 4154
00000a0 2020 2020 0000 0000 da41 3c38 0f14 9eb1

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Thanks. It appears both of these files are saved as 64-bit, which potentially narrows down the problem a bit.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Can you test the <2GB file on AIX and let me know if that works? If it doesn't work, then we are probably looking at 2 different bugs here.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

I do believe it is 2 separate bugs. In my previous test, the 700mb file failed (see previous post). The first fail complained about allocation of a vector in R. Then the second fail yielded the same error.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Ok thanks. I've opened a separate issue for the Windows parse error:

#34

In the meantime, let me know if you manage to run the dev version of haven on AIX. I am re-opening this issue, but please close it if the dev version works.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Installed haven-dev on AIX, and I get the following on the 2.1GB file:

> d1 = read_sas('bar.sas7bdat')
Error: Failed to parse /path/to/bar.sas7bdat: Unable to open file.

I get the following on the 700mb file:

> d1 = read_sas('bar.sas7bdat')
Error: cannot allocate vector of size 78 Kb

So, AIX error is still there. Note that it didn't seem like the error was immediate like before, so I think it was able to read the file early on and then errors out later. Thanks.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Ok, thanks. The second error looks like it is from the Haven/R side. (No such error message appears in the ReadStat code base -- also 78 Kb is not a very large vector!) I would recommend opening an issue with Haven about this specific error message if you have not already.

I'll keep this issue open to address AIX > 2.1 GB failures. Might be the same bug as Windows, but I'll keep them separate for now. Does the error on the 2.1 GB file happen immediately?

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Yes, the AIX > 2.1GB file happens immediately. I'll post the AIX bug to haven.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Sounds good. Since more people will be able to test on Windows, I'll focus on that first and post updates over on issue #34.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Understood. I'm fine with Windows support. I'm already following issue #34. Thanks and let me know whenever you need testing.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

When I try reading a > 2GB file created from a little-endian machine, I get the following error:

> d1 = read_sas('bar_little_endian.sas7bdat')
Error: Failed to parse /pah/to/bar_little_endian.sas7bdat: Unable to open file.

This is the same error if the file was generated from a big-endian machine.

However, if the file is < 2GB (say, 700 mb), I get the allocate error. Thus on AIX, I think are 3 issues at play:

  • 2GB error from big-endian files: #34 (same as Windows)

  • 2GB error from little-endian files: #33 (we can use this current thread as it is specific to AIX)

  • allocate error: #113

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Got it -- I hadn't noticed the new error message ("Unable to open file"). Definitely a separate issue.

That error is only generated in one or two places so I'm hoping it will be relatively easy to narrow down.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

If I create a patch for you to try, will you be able to test it on AIX? I'd like to print out the error string from the open function, which appears to be failing.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

I have haven-dev as a directory. If you show me the commands to apply I could do so and re-install haven and send you the output.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

I have a guess. In the haven-dev directory, try changing line 8 of readstat_io.c from:

 return open(filename, O_RDONLY

to

 return open(filename, O_RDONLY | O_LARGEFILE

Recompile haven and let me know if you get the same error you get when you attempt to open the file.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Recompiled haven with your patch, and I get the following on AIX:

> library(haven)
> d1 = read_sas('bar_little_endian.sas7bdat')
Error: Failed to parse bar_little_endian.sas7bdat: Unable to read file.
> d1 = read_sas('bar_big_endian.sas7bdat')
Error: Failed to parsebar_big_endian.sas7bdat: Unable to read file.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Hmm ok. Try applying this patch:

121e50d

You can do it manually, or download this readstat_sas.c file into haven-dev:

https://raw.githubusercontent.com/WizardMac/ReadStat/121e50d11b76813783b2a6d598e6939a858dbf89/src/readstat_sas.c

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Just tried it:

> library(haven)
> d1 = read_sas('bar_little_endian.sas7bdat')
Error: Failed to parse bar_little_endian.sas7bdat: Unable to read file.
> d1 = read_sas('bar_big_endian.sas7bdat')
Error: Failed to parse bar_big_endian.sas7bdat: Unable to read file.

Same.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

This is surprising to me. Are you sure it's picking up the modified code?

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Just re-did (deleted .so and .o files in src/ and R CMD INSTALL'd). Here's a sample of the diff:

5d4
< #include <errno.h>
182a182,190
> typedef struct cat_ctx_s {
>     readstat_value_label_handler   value_label_handler;
>     int        u64;
>     void      *user_ctx;
>     int32_t    header_size;
>     int32_t    page_size;
>     int32_t    page_count;
> } cat_ctx_t;
>
747c755
<         char name[4*32+1];
---
>         char name[4*8+1];
759,767d766
<         if ((lsp[12] & 0x80)) { // has long name
<             /* Uncomment to return long name to client code instead of short name
<             retval = readstat_convert(name, sizeof(name), &lsp[116+pad], 32, ctx->converter);
<             if (retval != READSTAT_OK)

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Thanks for checking. I thought that haven used the error handler to write to the console, but perhaps I am mistaken.

I've created a special branch of the repository to try to debug this:

https://github.com/WizardMac/ReadStat/tree/debug-aix

Please replace readstat_sas.c with this one and recompile:

https://raw.githubusercontent.com/WizardMac/ReadStat/debug-aix/src/readstat_sas.c

That should give a message "Opening [filename]" and then log the error if the open fails.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024
> library(haven)
> d1 = read_sas('bar_little_endian.sas7bdat')
Opening file: bar_little_endian.sas7bdat
Error: Failed to parse /bar_little_endian.sas7bdat: Unable to read file.
> d1 = read_sas('bar_big_endian.sas7bdat')
Opening file: bar_big_endian.sas7bdat
Error: Failed to parse bar_big_endian.sas7bdat: Unable to read file.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Oooooh I see. The error is different now, which I didn't notice before. Previously it was "Unable to open file" and now it is "Unable to read file". Progress!

I've updated that file with this commit:

dbe68c3

Please redownload the readstat_sas.c file and recompile to test.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

I've pushed another update to try to address the problem. You'll need these three files:

readstat_io.c
readstat_io.h
readstat_sas.c

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Getting this error during compilation:

readstat_io.c:7:27: error: 'O_RDONLY' undeclared (first use in this function)
     return open(filename, O_RDONLY
                           ^
readstat_io.c:7:27: note: each undeclared identifier is reported only once for each function it appears in
readstat_io.c:11:15: error: 'O_LARGEFILE' undeclared (first use in this function)
             | O_LARGEFILE
               ^
/sas/outmva/opt/lib/R/etc/Makeconf:134: recipe for target 'readstat_io.o' failed
make: *** [readstat_io.o] Error 1
ERROR: compilation failed for package 'haven'

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Is it perhaps a copy/paste error? Is this line at the top of the readstat_io.c file?

#include <fcntl.h>

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Yep, sorry.

> library(haven)
> d1 = read_sas('bar_little_endian.sas7bdat')

 *** caught segfault ***
address 4, cause 'invalid permissions'

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

Same with big endian.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Does the segfault happen immediately?

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Within 2 seconds...so not immediately.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Updated, try this: readstat_sas.c

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Same error.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

See if this fixes the segfault: readstat_sas.c

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Same error.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Another attempt: readstat_sas.c

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Same error.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

To make sure I didn't break anything: Does the segfault occur only with >2.1 GB files?

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

One more try: readstat_sas.c

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Getting this now:

> library(haven)
> d1 = read_sas('bar_little_endian.sas7bdat')
ReadStat: Error reading file (22): Invalid argument

Error: Failed to parse bar_little_endian.sas7bdat: Unable to read file.
## same for big endian

> d1 = read_sas('foo.sas7bdat')
> d1
  a b c d
1 1 2 3 x
2 1 2 3 y
3 4 5 6 z

Doesn't appear to break 'small' sas data sets.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

All right, this looks like progress. Please update:

readstat.h
readstat_error.c
readstat_sas.c

I don't expect it to work, but it will tell me whether the problem is with reading bytes from the file or with seeking to a location within the file. I should be able to narrow it down from there.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024
> library(haven)
> d1 = read_sas('bar_little_endian.sas7bdat')
ReadStat: Unable to seek within file (retval = 13): Invalid argument (errno = 22)

Error: Failed to parse bar_little_endian.sas7bdat: Unable to seek within file.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Thanks. Try this: readstat_sas.c

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

segfaulting now.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Can you give me the full output?

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024
> d1 = read_sas('bar_little_endian.sas7bdat')

 *** caught segfault ***
address 4, cause 'invalid permissions'

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 2

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

This update will print out more debugging information: readstat_sas.c

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Indeed:

> library(haven)
> d1 = read_sas('bar_little_endian.sas7bdat')
Seeking to end of file
Seeking to beginning of file
Page count: 27516  Page size: 80896
Seeking to page 0
Parsing page 0 (Pass 1)
Seeking to page 1
Parsing page 1 (Pass 1)
Seeking to page 2
Parsing page 2 (Pass 1)
Seeking to page 3
Parsing page 3 (Pass 1)
Seeking to page 4
Parsing page 4 (Pass 1)
Seeking to page 5
Parsing page 5 (Pass 1)
Seeking to page 6
Parsing page 6 (Pass 1)
Seeking to page 7
Parsing page 7 (Pass 1)
Seeking to page 8
Parsing page 8 (Pass 1)
Seeking to page 9
Parsing page 9 (Pass 1)
Seeking to page 10
Parsing page 10 (Pass 1)
Seeking to page 11
Parsing page 11 (Pass 1)
Seeking to page 12
Parsing page 12 (Pass 1)
Seeking to page 13
Parsing page 13 (Pass 1)
Seeking to page 14
Parsing page 14 (Pass 1)
Seeking to page 15
Parsing page 15 (Pass 1)
Seeking to page 16
Seeking to page 27515
Seeking to starting position
Reading page 0
Parsing page 0 (Pass 2)
Reading page 1
Parsing page 1 (Pass 2)
Reading page 2
Parsing page 2 (Pass 2)
Reading page 3
Parsing page 3 (Pass 2)
Reading page 4
Parsing page 4 (Pass 2)
Reading page 5
Parsing page 5 (Pass 2)
Reading page 6
Parsing page 6 (Pass 2)
Reading page 7
Parsing page 7 (Pass 2)
Reading page 8
Parsing page 8 (Pass 2)
Reading page 9
Parsing page 9 (Pass 2)
Reading page 10
Parsing page 10 (Pass 2)
Reading page 11
Parsing page 11 (Pass 2)
Reading page 12
Parsing page 12 (Pass 2)
Reading page 13
Parsing page 13 (Pass 2)
Reading page 14
Parsing page 14 (Pass 2)
Reading page 15
Parsing page 15 (Pass 2)
Reading page 16
Parsing page 16 (Pass 2)

 *** caught segfault ***
address 4, cause 'invalid permissions'

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

This should have more debug messages: readstat_sas.c

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Wow, output was long and was only able to catch this

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Thanks :-).

This will produce more helpful messages: readstat_sas.c

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024
d1 = read_sas('bar_little_endian.sas7bdat')
Seeking to end of file
Seeking to beginning of file
Page count: 27516  Page size: 80896
Seeking to page 0
Parsing page 0 (Pass 1)
Seeking to page 1
Parsing page 1 (Pass 1)
Seeking to page 2
Parsing page 2 (Pass 1)
Seeking to page 3
Parsing page 3 (Pass 1)
Seeking to page 4
Parsing page 4 (Pass 1)
Seeking to page 5
Parsing page 5 (Pass 1)
Seeking to page 6
Parsing page 6 (Pass 1)
Seeking to page 7
Parsing page 7 (Pass 1)
Seeking to page 8
Parsing page 8 (Pass 1)
Seeking to page 9
Parsing page 9 (Pass 1)
Seeking to page 10
Parsing page 10 (Pass 1)
Seeking to page 11
Parsing page 11 (Pass 1)
Seeking to page 12
Parsing page 12 (Pass 1)
Seeking to page 13
Parsing page 13 (Pass 1)
Seeking to page 14
Parsing page 14 (Pass 1)
Seeking to page 15
Parsing page 15 (Pass 1)
Seeking to page 16
Seeking to page 27515
Seeking to starting position
Reading page 0
Parsing page 0 (Pass 2)
Reading page 1
Parsing page 1 (Pass 2)
Reading page 2
Parsing page 2 (Pass 2)
Reading page 3
Parsing page 3 (Pass 2)
Reading page 4
Parsing page 4 (Pass 2)
Reading page 5
Parsing page 5 (Pass 2)
Reading page 6
Parsing page 6 (Pass 2)
Reading page 7
Parsing page 7 (Pass 2)
Reading page 8
Parsing page 8 (Pass 2)
Reading page 9
Parsing page 9 (Pass 2)
Reading page 10
Parsing page 10 (Pass 2)
Reading page 11
Parsing page 11 (Pass 2)
Reading page 12
Parsing page 12 (Pass 2)
Reading page 13
Parsing page 13 (Pass 2)
Reading page 14
Parsing page 14 (Pass 2)
Reading page 15
Parsing page 15 (Pass 2)
Reading page 16
Parsing page 16 (Pass 2)
Submitting columns...

 *** caught segfault ***
address 4, cause 'invalid permissions'

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

We're almost there. Try this: readstat_sas.c

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Here you go.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Would you mind running the command a couple more times to see if it crashes in the same place? I.e. "Calling variable handler on variable 1105..."

It appears that the segfault is occurring inside haven's variable handler. If the segfault is predictable then I will suggest filing an issue with haven; if it's not predictable then there could be some kind of memory corruption on the ReadStat side of things.

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

Tried it 4x and they all error out at 1105. If you think it's an issue on the haven side, could you file it? I wouldn't know how to describe the error. Thanks!

On to the Windows side next?

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Can you try with a few different files and see what happens?

from readstat.

vinhdizzo avatar vinhdizzo commented on June 15, 2024

When I tried a big endian file, it errors out after

Initializing variable...
Calling variable handler on variable 1136...

Repeat yielded the same thing.

A similar 2.3gb file generated

Initializing variable...
Calling variable handler on variable 1013...

 *** caught segfault ***
address 4, cause 'invalid permissions'

A 9.1gb file (actual data set and not all generated as uniform random numbers) yielded:

> library(haven)
> d1 = read_sas('combined_all.sas7bdat')
Seeking to end of file
Seeking to beginning of file
Page count: 148085  Page size: 65536
Seeking to starting position
Reading page 0
Parsing page 0 (Pass 2)
Reading page 1
Parsing page 1 (Pass 2)
Calling info handler...
Finished calling info handler.
Initializing variable...
Calling variable handler on variable 0...
Finished calling variable handler.
Initializing variable...
Calling variable handler on variable 1...
Finished calling variable handler.
Initializing variable...
Calling variable handler on variable 2...
Finished calling variable handler.
Initializing variable...
Calling variable handler on variable 3...
Finished calling variable handler.
Initializing variable...
Calling variable handler on variable 4...
Finished calling variable handler.
Initializing variable...
Calling variable handler on variable 5...
Finished calling variable handler.
Initializing variable...
Calling variable handler on variable 6...
Finished calling variable handler.
Initializing variable...
Calling variable handler on variable 7...
Finished calling variable handler.
Initializing variable...
Calling variable handler on variable 8...
Finished calling variable handler.
Initializing variable...
Calling variable handler on variable 9...
Finished calling variable handler.
Initializing variable...
Calling variable handler on variable 10...
Error: cannot allocate vector of size 16.7 Mb

The 700mb file yielded

Finished calling variable handler.
Initializing variable...
Calling variable handler on variable 3041...
Error: cannot allocate vector of size 78 Kb

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Ok great. It looks like this issue is fundamentally the same as tidyverse/haven#113. I will add my comments over there and close this issue.

Thanks!

from readstat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.