GithubHelp home page GithubHelp logo

Comments (22)

xiaodaigh avatar xiaodaigh commented on June 15, 2024 1

Compression support is definitely needed! I have to stick to the parso library in Java until this is supported.

from readstat.

xiaodaigh avatar xiaodaigh commented on June 15, 2024 1

anyone has example files,

That is usually the problem!

I try to keep track of it here

https://github.com/xiaodaigh/sas7bdat-resources

from readstat.

sclewis23 avatar sclewis23 commented on June 15, 2024 1

If anyone has example files, please try them with this new code branch:

https://github.com/WizardMac/ReadStat/tree/sas-binary-compression

I'll create a sample tomorrow.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024 1

@ofajardo Thanks for testing – I will wait a few days for results from other files and then merge if everything looks okay.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024 1

@sclewis23 Is this the correct data?

"IDnumber","week1","week16","AverageLoss"
"2477",195.000000,163.000000,32.000000
"2431",220.000000,198.000000,22.000000
"2456",173.000000,155.000000,18.000000
"2412",135.000000,116.000000,19.000000

from readstat.

ajdamico avatar ajdamico commented on June 15, 2024

hi, not sure if this is the right place for a minimal reproducible example since tidyverse/haven#31 was closed? the latest version of haven still fails on compressed data.. thanks

# devtools::install_github('biostatmatt/sas7bdat.parso')
library(haven)
library(sas7bdat.parso)


tf1 <- tempfile()

download.file( "http://www.census.gov/housing/extract_files/data%20extracts/cpsasec14/hhld.sas7bdat" , tf1 , mode = 'wb' )

# fails
x1 <- read_sas( tf1 )

# works
y1 <- read.sas7bdat.parso( tf1 )

from readstat.

hadley avatar hadley commented on June 15, 2024

@evanmiller is this on the schedule?

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

No schedule for this -- note that many compression issues were misdiagnosed as binary compression rather than bugs in the character decompressor. Seems like 90%+ compressed files in the wild are character compressed.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Note that the example file provided by @ajdamico was fixed in 69d5751 and 8c0463a.

from readstat.

reikoch avatar reikoch commented on June 15, 2024

If a truly binary compressed SAS dataset is needed, you may use [https://github.com/reikoch/testfiles/blob/master/binary.sas7bdat]. haven 1.0 fails with "ReadStat: Error parsing page 0, bytes 8192-16383".

from readstat.

sclewis23 avatar sclewis23 commented on June 15, 2024

Is this encoding error related to compression ?
Unsupported character set code: 204.
tidyverse/haven#482

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

@sclewis23 No - the error is related to the file's character encoding.

If you know which encoding was used to create the file, I can try to add support.

from readstat.

sclewis23 avatar sclewis23 commented on June 15, 2024

@evanmiller - the encoding is set to "any"
outencoding=any

from readstat.

sclewis23 avatar sclewis23 commented on June 15, 2024

@evanmiller
here is a sample SAS program, creates one with UTF-8 and "any" :

#SAS CODE:
# Data Weight2; 
# input IDnumber $ week1 week16; 
# AverageLoss=week1-week16; 
# datalines; 
# 2477 195 163
# 2431 220 198
# 2456 173 155
# 2412 135 116
# ;
# libname outlib '~' outencoding='UTF-8';
# data outlib.Weight_utf8;
# Set Weight2;
# Run;
# libname out_any '~' outencoding='any';
# data out_any.Weight2;
# Set Weight2;
# Run;

from readstat.

sclewis23 avatar sclewis23 commented on June 15, 2024

@evanmiller
I can send example files if that helps?

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

If anyone has example files, please try them with this new code branch:

https://github.com/WizardMac/ReadStat/tree/sas-binary-compression

from readstat.

ofajardo avatar ofajardo commented on June 15, 2024

tested OK in pyreadstat with the attached sample file I generated in SAS like this:

data SAMPLES.sample_bincompressed(compress=binary);
set SAMPLES.sample;
run;

The file is stored permanently in the pyreadstat repo in the test_data/basic/sample_bincompressed.sas7bdat, for now in the sasbin_dev branch.

sample_bincompressed.sas7bdat.zip

from readstat.

sclewis23 avatar sclewis23 commented on June 15, 2024

Here is a very small sample file with binary compression(4 rows).

weigth2.zip

from readstat.

sclewis23 avatar sclewis23 commented on June 15, 2024

Looks good:
image

from readstat.

reikoch avatar reikoch commented on June 15, 2024

Looks good on my two testfiles dates_binary.sas7bdat and dates_longname_binary.sas7bdat in https://github.com/reikoch/testfiles - congratulations!

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

@reikoch Thanks for letting me know. The reports are all positive so I'll get this merged into dev later today.

from readstat.

evanmiller avatar evanmiller commented on June 15, 2024

Merged into master and included in 1.1.4 - closing

from readstat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.