GithubHelp home page GithubHelp logo

isabelsw / cfgdataanalysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 84 KB

This was an assignment for the CFG course "Introduction to Python & Apps". It entailed importing and working with data from a pre-existing Excel file and then exporting all the results to a new Excel workbook. See README for details.

License: MIT License

Python 100.00%
excel matplotlib openpyxl pandas pycharm-ide python3 seaport

cfgdataanalysis's Introduction


 

Table of Contents
  1. About The Project
  2. Built With
  3. Features
  4. Setup
  5. Roadmap
  6. Contributing
  7. Licence
  8. Contact
  9. Acknowledgements
  10. Copyright Notices

 

About The Project

This is a solution to the assignment "Spreadsheet Analysis" as outlined in the course "Introduction to Python & Apps" by Code First Girls Ltd. It entails basic analysis of fictional sales data using both Excel and Python. See Features for an overview of the methods and libraries/modules used.

 

(back to top)

Built With

Python PyCharm pandas Matplotlib Seaborn Microsoft Excel 365
Python pycharm Pandas mpl Seaborn

 

  • Use different libraries/modules for simple analysis of data from an Excel workbook:
    • pandas
    • openpyxl
    • seaborn
    • matplotlib
  • Find the total sum of values and the minimum and maximum values
  • Use pct_change for working out periodic changes as a percentage
  • Create a simple lineplot for showing periodic changes as a percentage
  • Import an Excel file and then write results to a new Excel file
  • Assign data to different spreadsheets upon writing to the new Excel file

 

(back to top)

Setup

Prerequisites

  • Install Python and/or Install PyCharm by Jet Brains (you can opt for the free and open-source community edition). Make sure you are downloading a Python 3 version.

  • Get a subscription for Microsoft 365 and download the latest verstion of MS Excel.

  • See Pycharm Guide for creating a new Py project.

  • Download the file sales.csv via the repository's main page.

  • PIP package

  1. See if PIP is already installed on your system. Write in either Windows OS shell or macOS terminal:

    Install

    pip help
    pip3 help
    python3 -m pip help
  2. If it is not installed, write in shell or terminal:

    python3 get-pip.py

N.B. Make sure you have added pip.exe to the Environment Variables (Path).

  • pandas library; openpyxl module; seaborn library; matplotlib

    There are 2 ways, EITHER:

    • Write in PyCharm IDE install [library name]. E.g.,

      install pandas
      
    • OR Write in shell or terminal pip install [library name]. For macOS terminal, if that instruction does not work, you can write pip3 install [library name]. E.g.,

      pip install pandas

 

(back to top)

Roadmap

If you want to have a go yourself, I have provided the guide below. You will create 3 separate Py files that reflect each stage of the process.

  • N.B. From this point on, Python will be abbreviated to Py.

PY FILE 1

  • Create a new .py file in your project folder in PyCharm (if you have not already done so). Name the file.
  • Import openpyxl and math.
    • N.B. It is good practice to import your libraries/modules at the top; however, I have imported pandas as pd later in the code so that it is shown within its immediate context.
  • Import sales.csv file (sales.csv); otherwise, create your own spreadsheet with similar quantitative data.

 

PyCharm_table_sales_csv

 

  • Create a function named read_data using def to retrieve the sales and months from the spreadsheet.
  • Use the def method to create a function that makes a Py dictionary from the data for monthly sales (from sales.csv).
  • Collect all the sales from each month into a single list using the list() function.
  • Get the sales total (across all months) using the sum() function.
  • Find the largest and smallest values in the sales data using the min() and max() functions.
  • Find the months with the lowest/highest number of sales by using the dictionary get() method and writing the keys as max_value and min_value.
  • Find the average value of sales using the lens() function (returns the number of values for sales). Then continue the formula by writing total / number_of_sales (total divided by the number of values returned by the lens function).
  • Using math.ceil() method (from the math module), round the number of average sales that was calculated in the previous step. You do not have to do this, but it ensures better readability of the table that we will be creating later.
  • Import pandas as pd to read sales.csv and create a dataframe. Use pct_change() method to retrieve sales data and calculate the monthly change in sales from one month to the next as a percentage. Multiply pct_change by 100 (* 100) to complete the formula. NaN will appear in the table for Jan if you do not use the fillna() function; this is because there is no value for the previous month (in the sales data) with which to calculate the change in sales for Jan. Fill none with the value 0 to make it clear that there is no monthly-change data for this month.
  • Create a pandas dataframe for the output. Then use ExcelWriter and df.to_excel to write the results (df) to a new workbook and spreadsheet. Write the parameter index=False to remove the surplus column.
  • This step is optional: you can load the existing workbook and change its formatting so that the size of the cells fit the values better. I have used from openpyxl import load_workbook to load the workbook and from openpyxl.utils import get_column_letter for this task.
  • Save and close the workbook.
  • Print the results with the string format() method so that they appear in the console (you will need these values for writing to the new Excel sheet; see below). /n in the print function adds a line break. Then write run() to complete def run().

 

PyCharm_Console_Output_basicdataanalysis

 

PY FILE 2

  • Import pandas as pd.
  • Read the Excel file (created via the previous steps) into a pandas dataframe object with the pd.read_excel function.
  • With your results from PY FILE 1, add the new data as a dictionary of lists and create a dataframe for it with the pd.DataFrame function.
  • Write the dataframe to a new Excel spreadsheet using pd.Excelwriter. Name your sheet in the parameters and write index=false to remove the generated extra column.

PY FILE 3

  • Import: seaborn as sns; pandas as pd; matplotlib as plt.
  • Specify columns from which to retrieve the data. Read this data into a pandas dataframe via the pd.read_excel function.
  • Print this data.

 

PyCharm_Console_Output_graph

 

Figure_1_cfgspreadsheetanalysis

 

  • Create a simple lineplot with the sns.lineplot() function.
  • Show this lineplot by writing plt.show(). You can then save this lineplot to a local folder.

See the open issues for a full list of proposed features and known issues.

 

(back to top)

Contributing

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

 

(back to top)

Licence

Distributed under the MIT Licence. See "LICENSE.txt" for more information.

 

(back to top)

Contact

My profile - isabelsw

Project link - https://github.com/isabelsw/cfgdataanalysis

 

(back to top)

Acknowledgements

 

(back to top)

Copyright Notices

  • PyCharm logo: Copyright © 2000-2024 JetBrains s.r.o. JetBrains and the JetBrains logo are registered trademarks of JetBrains s.r.o.
  • Python logo: Copyright © 2001-2024 Python Software Foundation.
  • pandas logo: Copyright © 2008 AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team.
  • Seaport logo: Copyright © Matthias Bussonnier and Seaport.
  • matplotlib logo: Copyright © 2002–2012 John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team; 2012–2024 The Matplotlib development team.

 

(back to top)

cfgdataanalysis's People

Contributors

isabelsw avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.