GithubHelp home page GithubHelp logo

serif_takehome's Introduction

Serif Health Takehome - Matt Lewis

How to run the script

Given that this is supposed to be a ~90 mins exercise, I tried to keep the script relatively simple. All imports are from the python standard library.

With the assumption that you have python installed on a unix/mac machine:

# install pyenv if you don't have it 
brew update
brew install pyenv

# install python 3.8 
pyenv install 3.8

# create an env and activate 
pyenv virtualenv 3.8 matt_takehome
pyenv activate matt_takehome

# run the script 
python takehome_script.py
# results written to locations.txt

Results

  • 56 distinct URLs
  • Runtime ~0:13:35

Discussion:

  • This script took around 90 mins to write, though there are clearly gains that could be made through refactoring.
  • Then general premise of this script was to iterate through the gzip file without fully loading it, parsing it bit by bit to extract relevant URLs through some basic text matching, and writing distinct url values to a text file.
  • The first task for this script was to find a way to stream through the gzipped file. A little searching showed me that we can do this pretty easily with gzip, part of the standard python library. I had to futz around somewhat to get this to work as needed.
  • If this was a task that was going to be repeated many times on the same/similar gzip file, there are some interesting things that could be done with some libraries that allow for random access of gzip files in Python, like indexed_gzip
  • Anthem has an interactive MRF lookup system. This lookup can be used to gather additional information - but it requires you to input the EIN or name of an employer who offers an Anthem health plan: Anthem EIN lookup. How might you find a business likely to be in the Anthem NY PPO? How can you use this tool to confirm if your answer is complete?

  • Seems like you could use this tool to spot verify that the data we've parsed out of the file does indeed correspond to Anthem health plans through matching the EIN given in the gzip file.

serif_takehome's People

Contributors

mattlewissf avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.