GithubHelp home page GithubHelp logo

ficpa_article's Introduction

Programming for Efficiency

This repository includes example Python code from an article written for FICPA called Programming for Efficiency.

The examples were written in Python 3.6, and require the following libraries to be installed:

  • requests
  • beautiful soup
  • openpyxl
  • pandas
  • pdfplumber

Example 1: Using Excel to prep a PBC TB for import

There are several Python libraries designed to work with Excel data, including openpyxl and pandas . While both are very powerful and useful, openpyxl is easier to perform simple Excel tasks such as reading in, editing, and saving back to Excel.

This example shows the use of openpyxl to read in the PBC trial balance, clean it up to be import-ready in a new tab, and save as a new file.

Take an example of a trial balance formatted like this:

pbc tb

After running example_1_tb.py, the output file includes a new tab with this data:

import tb

Example 2: Scraping the web

This simple code pulls down the authors and excerpt of their testimonial from the first three testimonials on FICPA's testimonials page.

This uses the requests and beautifulsoup Python libraries, which are two very powerful libraries for interacting with websites.

import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.ficpa.org/Content/Members/Member-Testimonials.aspx')

soup = BeautifulSoup(r.text, 'lxml')

testimonials = soup.find_all('div', class_='testimonial-wrapper')

for testimonial in testimonials[:3]:
    author = testimonial.find(class_='testimonial-author').get_text()
    excerpt = testimonial.get_text().lstrip()
    print('Author: {}'.format(author))
    print('Exerpt: {}'.format(excerpt[:60]))
    print('-------------------------------------------------------------------')

An example of resulting output is:
Author: John Smith, CPA — Smith & Smith, LLC
Excerpt: Joining the FICPA and having the chance to participate in th
-------------------------------------------------------------------
Author: Jamie J. Johnson — J. J. Johnson & Associates, PA, CPA
Excerpt: I will always feel honored to be able to contribute – and be
-------------------------------------------------------------------
Author: Bobby L. O’Charley — Longfellow Consulting Group
Excerpt: I recently attended the 2014 University of South Florida Acc
-------------------------------------------------------------------

Example 3: Extracting tables from PDFs

This is one of my new favorite tools. pdfplumber can extract text, and even identify tables, from PDF files. This example uses the PDF file from https://www.opm.gov/policy-data-oversight/data-analysis-documentation/federal-employment-reports/reports-publications/salary-information-for-the-executive-branch.pdf

Let's say you wanted to extract the data from this table on pg 2:

pdf table

Using the Python code in example 3, the output looks like this: pdf table output

ficpa_article's People

Contributors

danshorstein avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.