GithubHelp home page GithubHelp logo

python_scripts's Introduction

October 2019

Week 1 (Oct 1 - Oct 4)

  1. File manipulations
    1. Change all .txt files from 2 to 3/4 columns
      1. modify_columns.py
        • If the column number if 2 then this script will fix it and over write the file
        • Output a notes.txt to show what was done to each file
        • Neet to modify the hard coded directory path before running
    2. Remove all carriage returns and correct ascii characters from .xml files
      1. clean_text.py
        • Clean the text field in .xml
      2. clean_xml_and_newline.py
        • Remove carriage returns
        • Expand the escaped characters, for example: £ becomes £
  2. Scrape Archivist
    1. Download all the .xml from Instrument
      1. scrape_archivist_selenium.py
        • Use selenium to download all .xml files
        • Need to modify the user log in and password

Week 2 (Oct 7 - Oct 11)

  1. Scrape Archivist continuous
    1. Download all the .txt from Datasets
  2. Setup Heroku
    1. Record how to insert table from file on wiki
  3. Fix problems around ncds_81_i.xml:
    1. could not download: fixed in scrape_archivist_selenium.py
    2. file contains &# instead of &#: fixed in clean_text.py
    3. Note: need to run clean_text.py first then clean_xml_and_newline.py

November 2019

  1. Process NCDS_2004_tables_version5.xlsx
    1. pre_process_db_input.py
      • Output csv files
  2. Built database using above csv files, see Populate database wiki
    1. db_temp.sql
    2. db_insert.sql
      • From temporary tables, insert to database tables
    3. db_delete.sql
      • Delete a study

Dec 2019 - Apr 2020

  1. The Longitudinal Study of Young People in England (LSYPE), also known as "Next Steps"
    1. Wave 8
    2. Wave 7
    3. Wave 5
    4. Wave 4
    5. Wave 3
    6. Wave 2
    7. Wave 1

May 2020

  1. UCL Centre for Longitudinal Studies COVID-19 Online Survey Questionnaire

    1. Wave 1
  2. Export and clean xml files

    1. archivist_click_export_button.py
    2. export_clean_xml.py
  3. Understanding Society Coronavirus Study

    1. April 2020 questionnaire
  4. Code lists could be used for different questions, fix LSYPE studys

    1. LSYPE_clean_codes.py
      • same script used for all studys, need to modify input dir

python_scripts's People

Contributors

jli755 avatar spuddybike avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.