Comments (22)
Compression support is definitely needed! I have to stick to the parso library in Java until this is supported.
from readstat.
anyone has example files,
That is usually the problem!
I try to keep track of it here
https://github.com/xiaodaigh/sas7bdat-resources
from readstat.
If anyone has example files, please try them with this new code branch:
https://github.com/WizardMac/ReadStat/tree/sas-binary-compression
I'll create a sample tomorrow.
from readstat.
@ofajardo Thanks for testing – I will wait a few days for results from other files and then merge if everything looks okay.
from readstat.
@sclewis23 Is this the correct data?
"IDnumber","week1","week16","AverageLoss"
"2477",195.000000,163.000000,32.000000
"2431",220.000000,198.000000,22.000000
"2456",173.000000,155.000000,18.000000
"2412",135.000000,116.000000,19.000000
from readstat.
hi, not sure if this is the right place for a minimal reproducible example since tidyverse/haven#31 was closed? the latest version of haven
still fails on compressed data.. thanks
# devtools::install_github('biostatmatt/sas7bdat.parso')
library(haven)
library(sas7bdat.parso)
tf1 <- tempfile()
download.file( "http://www.census.gov/housing/extract_files/data%20extracts/cpsasec14/hhld.sas7bdat" , tf1 , mode = 'wb' )
# fails
x1 <- read_sas( tf1 )
# works
y1 <- read.sas7bdat.parso( tf1 )
from readstat.
@evanmiller is this on the schedule?
from readstat.
No schedule for this -- note that many compression issues were misdiagnosed as binary compression rather than bugs in the character decompressor. Seems like 90%+ compressed files in the wild are character compressed.
from readstat.
Note that the example file provided by @ajdamico was fixed in 69d5751 and 8c0463a.
from readstat.
If a truly binary compressed SAS dataset is needed, you may use [https://github.com/reikoch/testfiles/blob/master/binary.sas7bdat]. haven 1.0 fails with "ReadStat: Error parsing page 0, bytes 8192-16383".
from readstat.
Is this encoding error related to compression ?
Unsupported character set code: 204.
tidyverse/haven#482
from readstat.
@sclewis23 No - the error is related to the file's character encoding.
If you know which encoding was used to create the file, I can try to add support.
from readstat.
@evanmiller - the encoding is set to "any"
outencoding=any
from readstat.
@evanmiller
here is a sample SAS program, creates one with UTF-8 and "any" :
#SAS CODE:
# Data Weight2;
# input IDnumber $ week1 week16;
# AverageLoss=week1-week16;
# datalines;
# 2477 195 163
# 2431 220 198
# 2456 173 155
# 2412 135 116
# ;
# libname outlib '~' outencoding='UTF-8';
# data outlib.Weight_utf8;
# Set Weight2;
# Run;
# libname out_any '~' outencoding='any';
# data out_any.Weight2;
# Set Weight2;
# Run;
from readstat.
@evanmiller
I can send example files if that helps?
from readstat.
If anyone has example files, please try them with this new code branch:
https://github.com/WizardMac/ReadStat/tree/sas-binary-compression
from readstat.
tested OK in pyreadstat with the attached sample file I generated in SAS like this:
data SAMPLES.sample_bincompressed(compress=binary);
set SAMPLES.sample;
run;
The file is stored permanently in the pyreadstat repo in the test_data/basic/sample_bincompressed.sas7bdat, for now in the sasbin_dev branch.
sample_bincompressed.sas7bdat.zip
from readstat.
Here is a very small sample file with binary compression(4 rows).
from readstat.
from readstat.
Looks good on my two testfiles dates_binary.sas7bdat and dates_longname_binary.sas7bdat in https://github.com/reikoch/testfiles - congratulations!
from readstat.
@reikoch Thanks for letting me know. The reports are all positive so I'll get this merged into dev
later today.
from readstat.
Merged into master and included in 1.1.4 - closing
from readstat.
Related Issues (20)
- spss invalid file when reading char value labels HOT 1
- cannot read correctly variable name
- Issues writing Stata StrL variables HOT 4
- ENH: Add buffer based IO support
- Use-after-free Error , [gcc12 couldnt build] HOT 1
- Improve SAS7BDAT reader performance HOT 1
- Troubleshooting of reading sas7bdat format HOT 2
- Non-deterministic result of readstat_get_file_label in a DTA file HOT 1
- Different results of readstat_get_modified_time on Windows and Mac HOT 1
- readstat exporting value labels to sas7bcat from a Stata dta.
- Example for SAV metadeta changing
- Numeric variables files generated from CSV input always have decimals HOT 1
- Should the write functions use int64_t instead of long for row_count. HOT 1
- Number of rows in sas7bdat file nearly tripled
- Skip deleted observations in SAS7BDAT files HOT 10
- Security: heap-buffer-overflow in readstat_convert
- Unable to parse sas7bdat when data set page size >= 16MB HOT 2
- `Error: Failed to parse [...].sav: Invalid file, or file has unsupported features` when using haven package to read .sav file HOT 3
- Problem in export file (in python libary) HOT 1
- `sprintf()` -> `snprintf()` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from readstat.