Comments (21)
If you can point me to the sas7bcat file I'll take a look.
from haven.
Unfortunately all of the sas files I work with are HIPPA protected data located on our servers, so I'm not able to do that. In order to have data transfer over to sas with the value labels, do I need to reference the catalog file, or am I doing something wrong since when I only reference the sas7bdat file nothing includes the labels?
I apologize for not having a reproducible example; I can close the issue since I am unable to provide one if that would be appropriate (it's why I closed it originally).
from haven.
Thanks for the update. @hadley might be able to shed light on the issue of importing into a data frame that already has value labels applied.
from haven.
Yes, you need the catalog file if you want labels. Could you try creating an example with dummy data?
from haven.
I'm having the same problem. When I reference the catalog file using read_sas, I get:
"Error: Failed to parse K:...\formats.sas7bcat: Invalid file, or file has unsupported features."
My data contains personal health information and I don't have SAS to create dummy data.
I can send the catalog file if that would be any benefit.
from haven.
@cullenjd Yes, a sample catalog file that demonstrates the problem would help.
from haven.
Here's a link to the sas7bcat file:
https://www.dropbox.com/s/o6xwioy72xs4g3i/formats.sas7bcat?dl=0
from haven.
Thanks @cullenjd. I believe I've tracked down the issue, but without an accompanying data set I cannot fully test. I've committed code to ReadStat here:
You can test this code with the instructions here:
https://github.com/hadley/haven#updating-readstat
Let me know if correctly reads and applies your value labels. @hadley please re-open this issue until @cullenjd confirms it is resolved.
For posterity: It looks like catalog files can contain both an 8-character short name and a 32-character long name for each variable format (e.g. CLAVICLE and CLAVICLE_FORMAT). The presence of the long name appears to be determined by a bit-field. Without more sample data files I'm not sure whether to return the short name or the long name to the client code, i.e. whether the data file refers to formats using their short name or long name.
from haven.
I can't get the code to update ReadStat to work. I think it might be something to do with the path. I get the following error:
file.copy(src, "src/", overwrite = TRUE)
Error in file.copy(src, "src/", overwrite = TRUE) :
more 'from' files than 'to' files
from haven.
Should be fixed by PR #114.
from haven.
I've tried updating ReadStat using the code given on Hadley's Haven page but still get the same error. Is there a way to explicitly specify the 'to' directory? I'm not sure which src directory is referenced with the "src/" parameter.
from haven.
Hi @cullenjd -- for now, just try this:
devtools::install_github("evanmiller/haven")
That will install my fork of haven, which has the latest ReadStat code.
from haven.
If that doesn't fix it, I am working on another bugfix here: WizardMac/ReadStat#38
from haven.
Hi @evanmiller - your fork installed fine and I was able to pass the name of the catalog without error, however, no value labels from the catalog file were applied. For example;
table(mydata$gender)
returned 0 and 1 not Male and Female as specified in the catalog file.
from haven.
Hi @cullenjd,
Please try the latest hadley/haven master, which includes all of the latest fixes. If you still experience the same problem, please file a new issue against haven.
from haven.
@cullenjd I suspect the issue has to do with short vs long names of formats mentioned earlier (e.g. CLAVICLE
vs CLAVICLE_FORMAT
). Right now I am returning the short format to haven but I'm not sure if that's correct. Let's continue the discussion over here: #121
from haven.
I'm still having this issue with
http://www.cdc.gov/brfss/annual_data/2014/files/formats14.sas7bcat
I can read in the data from the respective .sas7bdat
Error: Failed to parse C:\Users\WisemBH\AppData\Local\Temp\Rtmpi2Z4F3\file14f8582e3cc6: Invalid file, or file has unsupported features.
from haven.
Hi @BenWiseman, are you using the development version of haven?
devtools::install_github("hadley/haven")
from haven.
@evanmiller
I use the devtools to install but still encounter the same problem
Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding, cols_only = cols_only) : Failed to parse /home/s/Documents/health/2005.sas7bdat: File has an unsupported character set.
from haven.
@2YC Can you share the sas7bdat file that you're working with?
from haven.
Hi @evanmiller, I'm still encountering the same problem.
It's coming from a publicly available dataset from the PISA study.
Data can be downloaded from here: http://vs-web-fs-1.oecd.org/pisa/PUF_SAS_COMBINED_CMB_STU_QQQ.zip
The download can take some time (~ 5-10 mins) as the dataset is about ~ 2GB
library(haven)
pisa_2015 <- read_sas("cy6_ms_cmb_stu_qqq.sas7bdat", "CY6_MS_CMB_STU_QQQ.sas7bdat.format.sas")
Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding, cols_only = cols_only) :
Failed to parse /dir//CY6_MS_CMB_STU_QQQ.sas7bdat.format.sas: Invalid file, or file has unsupported features.
I also tried changing the extension from sas7bdat.format.sas
to sas7bcat
and the result is the same. Do note that: pisa_2015 <- read_sas("cy6_ms_cmb_stu_qqq.sas7bdat")
reads the data correctly.
Any ideas?
from haven.
Related Issues (20)
- include metadata from SAS files in read_sas HOT 1
- Feature request: support for writing "notes" to .dta files
- Release haven 2.5.2 HOT 16
- Some value labels not being loaded from catalog file HOT 4
- String variables imported from .sav files lose their attributes with bind_rows HOT 1
- Length, Type and other properties are missing when using haven::read_sas HOT 2
- Documentation of retained metadata for `read_sas`, `read_xpt`, `write_xpt`
- read_sav fails on zipped files when using col_select
- dta output as version 15 is not readable by Stata HOT 1
- Feature request: Generate dataframe of variable names and labels HOT 3
- Write_xpt issue HOT 1
- Cannot open a SAS7BDAT generated/edited from JMP
- Use strings for versions
- Release haven 2.5.3
- bug while reading sas7bdat file HOT 4
- Vulnerability in SAS7BCAT reader HOT 2
- Progress Bar for XPT HOT 1
- Returning full variable labels HOT 1
- New feature proposal: Reads creation/modified time of SAS7BDAT HOT 3
- Unable to allocate memory when opening a dta file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from haven.