GithubHelp home page GithubHelp logo

wiki's Introduction

wiki's People

Contributors

dfsnow avatar wrridgeway avatar jeancochrane avatar njardine avatar dsharm avatar ccao-jardine avatar damonamajor avatar wagnerlmichael avatar

Stargazers

 avatar  avatar

Forkers

bradley39e

wiki's Issues

Document how to add features to the residential model

Adding features to the residential/condo model is now a somewhat lengthy process, with steps across many repositories and domains. We should document this process on the wiki to make it easier to remember for future features. Write a how-to article that outlines each step of the process, including any details and caveats.

Unclear how to recreate sales ratio sample data from open data

Cross posted.

I am trying to create a sample of sales from the Cook County Assessors' Open Data portal for sales ratio studies. In the SOP on sales ratio studies, you have:

  • Properties with known characteristic changes. Properties known to have undergone physical and/or legal characteristic changes between the time of sale and assessment are excluded.
  • Special properties. Some residential properties classified as 'Single-Family' are valued by the 'Special Properties' division of the Valuations Department. These are excluded from the sales ratio study.

It is unclear to me how to identify these properties from the sales data, or what fields in another data set I can join in to ID these sales.

Add overall architecture diagram for the Data Dept.

Create an architecture diagram that shows the general structure of the department's data architecture. It should give newcomers an idea of how data flows for the processes the Data Department is responsible for.

Consolidate data inventories and catalogues into single workbook

See old GitLab issue. This issue needs to be updated to reflect current data cataloguing plans. (Summer 2023)

We should consolidate all of our disparate data catalogues, inventories, and trackers into a single Excel sheet. I've created a template of what should be included:

new_data_catalog.xlsx

And I'm working to consolidate the following sheets:

warehouse_athena_map.xlsx
data_catalog_wiki.xlsx
data_catalog_warehouse.xlsx
update_inventory.xlsx

The final workbook should:

  • Live at Data/Data_Dept_Catalog.xlsx in this repo
  • Be linked to from the Home and _sidebar wiki pages + from a readme note in the data architecture repo
  • Be tracked using Git LFS
  • Orange columns in the worksheet should be updated programmatically via daily API calls to AWS. Can use GitLab's CI + boto3 to accomplish this
  • Be machine-readable in the long format, no merged cells!

Document organizational processes and timelines

Create a new wiki section "Processes" to document how internal CCAO processes work:

  • Start with an investigation of existing process documents
    • Look in the shared drives for existing documentation
    • Ask Tia and Mirella if any such documents already exist
    • Check the intranet and CCAO handbook

Things to Document

  • What the general timeline is between departments for finalizing assessments (Data --> Valuations --> iasWorld --> Legacy?)
  • Timeline between offices (CCAO --> BoR --> Clerk --> Treasurer) + what data they are passing
  • Internal data flows after the Data Department hands off data
  • GIS processes for updating parcel files and other boundaries (timeline?)

Create a directory of file storage locations used by the Data Dept.

The Data Department's data is now spread over multiple locations/servers. We need to create a short directory that shows what is stored where. Include (at least) the following locations:

  • Sharepoint
  • Shared drive ("ocommon")
  • Data's S3 buckets
  • Open Data Portal
  • iasWorld
  • Teams

Update Mission Vision and Values

The Mission, Vision, and Values section of the handbook hasn't been touched in awhile. We may want to revisit this section to make sure it aligns with where the Department is headed.

Some specific edits that should be made:

  • Trim down the number of values. It's hard to embody values when you can't even remember them all. We should pare back to the ones that really matter and collapse similar ones. Something like the social rules of the Recurse Center might be more useful.

Create model selection SOP

We should write a Standard Operating Procedure (SOP) codifying how and why we select final valuation model runs. This is mostly about formalizing and documenting the best practices and making sure that we're implementing them internally.

  • Collect examples of similar SOPs from other departments/companies
  • Collect best practices re: model selection, see Tidymodels docs, Max Kuhn, other predictive modeling resources
  • Seek feedback from Valuations on any proposed changes
  • Publish the SOP to the wiki, with link from README

Create list of automated processes

Create a table of what processes are running, where, and on what schedule. Some processes include:

  • Sqoop
  • Glue jobs
    • Ratio stats
  • Open data pull
  • Appeal worksheets

Add how-to article for setting up git

  • How to setup SSH keys and link with GitHub
  • Setting up a global email address, both locally and on the server
  • Linking that email address to GitHub
  • Cloning a repo
  • Optional, GPG setup

Add brief DVC documentation

DVC can be a little confusing when starting out. While DVC documentation is robust, it's very obtuse for those just dipping their toes in. It would be nice for us to have a small guide to help folks understand the basics.

Create a list of Data Department-specific accounts

We should create a list of any Data Department-specific accounts, including their login, who maintains credentials, who primarily uses them, and the account purpose. So far, I can think of two accounts:

  • PyPI
  • Cook County Data Portal
  • Reetro
  • draw.io

This excludes personal accounts tied to an organization i.e. GitHub.

Document `noctua` caching behavior and 403 errors

As documented in DyfanJones/noctua#96, the noctua R package tries to delete results from the results S3 bucket after retrieving them. However, our read-only AWS accounts aren't permissioned to delete things from S3, resulting in a 403 error after every query.

This behavior can be disabled by enabling noctua caching: noctua_options(cache_size = 10)

We should document this flag in the How-To/Connect-to-AWS-Resources.md doc.

Create coding practices SOP

Now that the Data Department is growing, we should create a short document outlining coding best practices in the office. This should include things like:

  • Styling and tinting practices for different languages
  • Pre-commit standards and practices
  • Code review / PR practices
  • Standards for setting user permissions

Steps:

  • Move coding standards from handbook into separate SOP
  • Update the onboarding issue template in ccao-data/people to include link to this standard

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.