Comments (8)
Missing strings are the zero-length string but the treatment of missing numeric values is currently inconsistent. The DTA parser returns NULL but the others return a NaN. The SAS parser returns the data representation in the file (a NaN, but it might be a tagged NaN?) while the others return a system NaN.
from readstat.
What about integers? Also seems a bit dangerous to encode a missing strings as zero-length
from readstat.
Only the DTA parser returns integers which is why it uses the NULL convention rather than NaNs.
I think using NULL in all cases might make sense. READSTAT_TYPE_MISSING loses information about the underlying type, which puts more of a burden on the client to keep track of the column types.
from readstat.
Using NULL seems reasonable to me. Another option would be to provide readstat_value_is_missing()
or similar so the representation could be changed in the future.
from readstat.
In the meantime we could do both. A new readstat_value_t
type could be a possibly NULL pointer, which we could change to a struct later.
from readstat.
Ok, the callbacks now all receive NULL for missing numeric values:
String are trickier since I believe none of these file formats distinguish between zero-length strings and missing strings. (RData might be an exception.)
from readstat.
Oh interesting. In that case, leaving as empty strings seems reasonable to me.
from readstat.
Closing
from readstat.
Related Issues (20)
- spss invalid file when reading char value labels HOT 1
- cannot read correctly variable name
- Issues writing Stata StrL variables HOT 4
- ENH: Add buffer based IO support
- Use-after-free Error , [gcc12 couldnt build] HOT 1
- Improve SAS7BDAT reader performance HOT 1
- Troubleshooting of reading sas7bdat format HOT 2
- Non-deterministic result of readstat_get_file_label in a DTA file HOT 1
- Different results of readstat_get_modified_time on Windows and Mac HOT 1
- readstat exporting value labels to sas7bcat from a Stata dta.
- Example for SAV metadeta changing
- Numeric variables files generated from CSV input always have decimals HOT 1
- Should the write functions use int64_t instead of long for row_count. HOT 1
- Number of rows in sas7bdat file nearly tripled
- Skip deleted observations in SAS7BDAT files HOT 10
- Security: heap-buffer-overflow in readstat_convert
- Unable to parse sas7bdat when data set page size >= 16MB HOT 2
- `Error: Failed to parse [...].sav: Invalid file, or file has unsupported features` when using haven package to read .sav file HOT 3
- Problem in export file (in python libary) HOT 1
- `sprintf()` -> `snprintf()` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from readstat.