GithubHelp home page GithubHelp logo

extract2ddi's People

Contributors

alkondrashov avatar bbarker avatar csimmer avatar larsvilhuber avatar spuddybike avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

extract2ddi's Issues

RecommendedDataType should be Numeric not type

Variable Representation: RecommendedDataType should be Numeric for Code Representation, not "type"
ddi:VariableRepresentation
<r:CodeRepresentation>
<r:RecommendedDataType>type</r:RecommendedDataType>
<r:CodeListReference>
<r:URN typeOfIdentifier="Canonical">urn:ddi:uk.closer:72c7a745-5bad-42e9-b95d-ddfde33587e2:1</r:URN>
<r:TypeOfObject>CodeList</r:TypeOfObject>
</r:CodeListReference>
</r:CodeRepresentation>
</ddi:VariableRepresentation>

DataItem attribute values missing from Stata output

For DataItem,. attribute values are not correctly populated

            <ddi1:DataItem>
                <r:VariableReference>
                    <r:URN typeOfIdentifier="Canonical">urn:ddi:uk.closer:e9b94a1b-797c-4f89-bf1c-09bf25492359:1</r:URN>
                    <r:TypeOfObject>Variable</r:TypeOfObject>
                </r:VariableReference>
                <r:ProprietaryInfo>
                    <r:ProprietaryProperty>
                        <r:AttributeKey>Width</r:AttributeKey>
                        <r:AttributeValue>???</r:AttributeValue>
                    </r:ProprietaryProperty>
                    <r:ProprietaryProperty>
                        <r:AttributeKey>Decimals</r:AttributeKey>
                        <r:AttributeValue>???</r:AttributeValue>
                    </r:ProprietaryProperty>
                    <r:ProprietaryProperty>
                        <r:AttributeKey>WriteFormatType</r:AttributeKey>
                        <r:AttributeValue>???</r:AttributeValue>
                    </r:ProprietaryProperty>
                </r:ProprietaryInfo>
            </ddi1:DataItem>

Update documentation

Update documentation to show all options

  • Codebook output options
  • DDI-L 3.2 output options
  • DDI-L 3.3 output options

Errors in summary stats in format 3.3Fragment

CodeValue should have valid and invalid counts, currently showing 0

        <TotalResponses>4</TotalResponses>
        <SummaryStatistic>
            <TypeOfSummaryStatistic>ValidCases</TypeOfSummaryStatistic>
            <Statistic>0</Statistic>
        </SummaryStatistic>
        <SummaryStatistic>
            <TypeOfSummaryStatistic>InvalidCases</TypeOfSummaryStatistic>
            <Statistic>0</Statistic>
        </SummaryStatistic>
    </VariableStatistics>

Add optional citation to 3.2 Format

Add configuration option in config file to add blank citation
After DDIInstance / URN
After ResourcePackage / URN
After PhysicalInstance / URN
<r:Citation>
<r:Title><r:String xml:lang="en-GB"></r:String></r:Title>
<r:AlternateTitle><r:String xml:lang="en-GB"></r:String></r:AlternateTitle>
</r:Citation>

Add software tagging for SPSS and Stata 3.3

  <GrossFileStructure isUniversallyUnique="true">
    <r:URN>urn:ddi:uk.genscotland:e3fc49ff-d893-434e-a4c6-24bc6b7c3934:1</r:URN>
    <r:Agency>uk.genscotland</r:Agency>
    <r:ID>e3fc49ff-d893-434e-a4c6-24bc6b7c3934</r:ID>
    <r:Version>1</r:Version>
    <CaseQuantity>5</CaseQuantity>
  </GrossFileStructure>

Add CreationSoftware element as below

Duplicate VariableStatistics IDs in 3.3. output

For all variables, the corresponding VariableStatistics element has the same ID as the variable. In the example attached, variable 5a99caa4-6044-41d1-b7cc-cc94f1ae0e9c has a corresponding VariableStatistics element with the same ID. This element then has a VariableReference with the same correct
lfsp_jm15_eul_11.sav.xml.txt
ID

Duplicate variable CodeListReference IDs in 3.3. output

For all variables, the VariableRepresentation -> CodeRepresentation -> CodeListReference has the same ID as the variable and so does not exist in the output file. The corresponding CodeList IDs are unique but never referenced. In the example output, if you take ACTHR (ID 5a99caa4-6044-41d1-b7cc-cc94f1ae0e9c) and look at the reference to its CodeList it has the same ID as the variable (not the ID of its corresponding CodeList, presumably 25dac1d6-6cf8-4602-b2a8-9f169ffed68f)
lfsp_jm15_eul_11.sav.xml.txt

Stata fails with java.lang.OutOfMemoryError: Java heap space

its-meta:extract jon$ java -jar Extract2DDI.jar -f test-file-data-types.dta --format 3.2 --config format32-stata
2022-11-15 08:42:48,454 [main] INFO edu.cornell.ncrn.ced2ar.stata.StataReaderFactory - Stata Data file test-file-data-types.dta is not a Format 115.
2022-11-15 08:42:48,455 [main] INFO edu.cornell.ncrn.ced2ar.stata.StataReaderFactory - Stata Data file test-file-data-types.dta is not a Format 114.
2022-11-15 08:42:48,455 [main] INFO edu.cornell.ncrn.ced2ar.stata.StataReaderFactory - Stata Data file test-file-data-types.dta is not a Format 113.
2022-11-15 08:42:48,457 [main] INFO edu.cornell.ncrn.ced2ar.stata.StataReaderFactory - Stata Data file test-file-data-types.dta is not a Format 117
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at edu.cornell.ncrn.ced2ar.stata.impl.DtaReader.readValueLabels(DtaReader.java:419)
at edu.cornell.ncrn.ced2ar.stata.impl.Dta118Reader.readVariables(Dta118Reader.java:222)
at edu.cornell.ncrn.ced2ar.stata.impl.Dta117Reader.(Dta117Reader.java:65)
at edu.cornell.ncrn.ced2ar.stata.impl.Dta118Reader.(Dta118Reader.java:49)
at edu.cornell.ncrn.ced2ar.stata.StataReaderFactory.getStataReader(StataReaderFactory.java:42)
at edu.cornell.ncrn.ced2ar.ddigen.csv.StataCsvGenerator.generateVariablesCsv(StataCsvGenerator.java:40)
at edu.cornell.ncrn.ced2ar.ddigen.DdiLifecycleGenerator.generateVariablesCsv(DdiLifecycleGenerator.java:55)
at edu.cornell.ncrn.ced2ar.ddigen.GenerateDDI32.generateDDI(GenerateDDI32.java:34)
at edu.cornell.ncrn.ced2ar.ddigen.Main.main(Main.java:132)

Debug output should write out column

If the column is not processable, the debug should identify the column which is the problem
e.g.
2022-11-15 08:42:48,454 [main] INFO edu.cornell.ncrn.ced2ar.stata.StataReaderFactory - Stata Data file test-file-data-types.dta is not a Format 115.
2022-11-15 08:42:48,455 [main] INFO edu.cornell.ncrn.ced2ar.stata.StataReaderFactory - Stata Data file test-file-data-types.dta is not a Format 114.
2022-11-15 08:42:48,455 [main] INFO edu.cornell.ncrn.ced2ar.stata.StataReaderFactory - Stata Data file test-file-data-types.dta is not a Format 113.
2022-11-15 08:42:48,457 [main] INFO edu.cornell.ncrn.ced2ar.stata.StataReaderFactory - Stata Data file test-file-data-types.dta is not a Format 117

Mandatory items

For 3.2. and 3.3,

  • f - filename
  • agency
  • ddilang
  • format (3.2.or 3.3
    For 2.5
  • filename
  • format (2.5)

NumberFormatException error

Using format33 and format 32 config file

2022-10-10 14:04:29,000 [main] ERROR edu.cornell.ncrn.ced2ar.ddigen.csv.SpssCsvGenerator - An error occured in reading observation 1. Skipping this observation java.lang.NumberFormatException: empty String
2022-10-10 14:04:29,003 [main] ERROR edu.cornell.ncrn.ced2ar.ddigen.csv.SpssCsvGenerator - An error occured in reading observation 2. Skipping this observation java.lang.NumberFormatException: empty String
2022-10-10 14:04:29,005 [main] ERROR edu.cornell.ncrn.ced2ar.ddigen.csv.SpssCsvGenerator - An error occured in reading observation 3. Skipping this observation java.lang.NumberFormatException: empty String
2022-10-10 14:04:29,007 [main] ERROR edu.cornell.ncrn.ced2ar.ddigen.csv.SpssCsvGenerator - An error occured in reading observation 5. Skipping this observation java.lang.NumberFormatException: empty String

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.