In this module, you will:
1. Perform a guided exploratory data analysis project.
2. Conduct a unique data analysis exploration.
The objective is to craft a compelling narrative ("tell a story") using data, showcasing not only your analytical skills but also your professional and engaging communication abilities.
This module requires the skills learned in previous chapters. The first is a guided exploratory data project that focuses on diamonds.csv and is based on in Exercise 9.16 beginning on page 352 of the text. The second is a project of your choice, related to your domain.
Decide what you would like your second project to focus on / showcase. Review the requirements for the project and make sure your topic lends itself to successfully completing all requirements.
- GitHub
- Create a new repository named datafun-06-projects on GitHub.
- Initialize it with the default README.md.
- Local Machine
- In VS Code, clone your new repo into your Documents folder.
- Repository Essentials
- Add a .gitignore file from a previous Python project.
- Add a requirements.txt file to hold external dependencies for Jupyter notebooks and others as you need them.
- Update README.md
- Modify the README.md to include your name, the link to your repo, and the focus of this project repository.
- Include instructions with the exact commands to:
- Create and activate your virtual environment.
- Install all required external dependencies.
- Execute your Python files.
- Create your local virtual environment (hint: use venv to create a .venv folder)
- Activate your local virtual environment (hint: call a command in the .venv subfolder)
- Install any external dependencies you need (hint: use requirements.txt and all the files needed for Jupyter notebooks, pandas, etc.)
- Push to GitHub
- Add and commit all your changes with a commit message "Initialized repo"
- Push your changes to GitHub
- Take a screenshot of your GitHub project repository after you've pushed these changes to GitHub.
- Display the screenshot as evidence of task completion.
This first project is a guided exploration.
- Follow the instructions for Exercise 9.16 (starting pg. 350).
- Complete the exercise in a Jupyter notebook.
- Include the title of the notebook, your name, and date at the top.
- Include the following Markdown Section Headings in your notebook.
- Section 1-Load: Get the file, store it in your repo, and load it into a DataFrame.
- Section 2-View: Display the first 7 rows and the last 7 rows.
- Section 3-Describe: Use the DataFrame describe() function to calculate basic descriptive statistics for all numeric columns.
- Section 4-Series: Use the Series method describe() to calculate the descriptive stats for all category/text columns.
- Section 5-Unique: Use the Series method unique() to get unique category values.
- Section 6-Histograms: Use the DataFrame's hist() function to create a histogram for each numerical column.
- Execute the completed notebook
- Add, commit, and push your changes to GitHub. You can use incremental commits as you work - provide useful commit messages.
- At the end, use a commit message like "Task 2 complete".
- Verify your GitHub notebook appears complete and well-presented.
- Capture a screenshot of your completed notebook as viewed in GitHub at the conclusion of this task.
- Display the screenshot as evidence of task completion.
Use everything you've learned to conduct a unique data exploration project using some information related to your domain.
Create a new notebook that uses a dataset of your choice.
The notebook name should make it clear this is your unique project.
Use this project to feature all of the key skills learned. See the list above.
Include challenging Python programming aspects - find a reason to use filter(), map(), and list comprehensions.
Have fun and make it unique.
Your second project must show the following Python skills and Markdown sections:
Section 1-Load - Read from a data file into a pandas DataFrame. Section 2-View - Display the first 5 rows and the last 5 rows. Section 3-Describe: Use the DataFrame describe() function to calculate basic descriptive statistics for all numeric columns. Section 4-Series: Use the Series method describe() to calculate the descriptive stats for all category/text columns. Section 5-Unique: Use the Series method unique() to get unique category values. Section 6-Histograms: Use the DataFrame's hist() function to create a histogram for each numerical column. Section 7-List: Get some of your information into a list. Process each item in the list (use for or comprehensions as you like). Section 8-Filter: Use filter() to show only part of the information. Section 9-Map: Use map() to transform some of the data. Include a title section with your name - this is your branding - make it professional and attractive. Use Markdown section headings to professionally present your work. Tell a story with data - lead us through your project and summarize your interesting results.
Execute the completed notebook Add, commit, and push your changes to GitHub. You can use incremental commits as you work - provide useful commit messages. At the end, use a commit message like "Task 3 complete". Verify your GitHub notebook appears complete and well-presented.
Capture a screenshot of your completed notebook as viewed in GitHub at the conclusion of this task. Display the screenshot as evidence of task completion.
As part of your custom project, use a library or module we did not explore in class. Consider imageio, nltk, texatistic, textblob, wordcloud, or others. Look for something new that might interest you. Learn it on your own and apply it to your domain/project. Clearly label your Section Bonus using Markdown. In the bonus section, explain what you chose, why, how it went, and your results. Do you recommend it? This is a chance to show advanced, creative skills - make it valuable for others by describing it well.
Execute the completed notebook Add, commit, and push your changes to GitHub. You can use incremental commits as you work - provide useful commit messages. At the end, use a commit message like "Bonus complete". Verify your GitHub notebook appears complete and well-presented.
Capture a screenshot of your completed notebook as viewed in GitHub at the conclusion of this task. Display the screenshot as evidence of task completion.