GithubHelp home page GithubHelp logo

w205-assignment-1's Introduction

template-activity-01

Assignment 01: Set up and prerequisites

  1. Git
  • Install git. https://git-scm.com/downloads

  • You may see references to the stand alone app for git on your desktop. That's not what we're using for this course.

  • Watch the videos in this series that you need to watch (seriously, even if you've been working with git for a while, it's sometimes handy to revisit, e.g., the difference between git and Github). They are on youtube. If you don't have a subscription, it will pop up with short ads. Sorry, but these are really decent videos. There's about 30 min total.

https://www.youtube.com/playlist?list=PL5-da3qGB5IBLMp7LtN8Nc3Efd4hJq0kD

  • Follow the instructions to do what the videos walk you through.
  1. Data Engineering Jobs
  • Google "data engineering jobs"
  • Read ads (between 5&10)
  • What are companies looking for in skills, experience, competencies?
    • Answer:
      • I tried to look at data engineering jobs from companies that I know operate at significant scale. The idea being that if you can work for them, you can work for anyone.
      • I chose roles at:
      • Observations:
        • There's an adage in writing and filmmaking which essentially says that you should "show and not tell". The idea being that you should demonstrate what you can do, not TELL people what you can do. That's reflected in these job descriptions. The number of experience requirements far outweigh the skills requirements. So, for example, with the Facebook role, it doesn't have a skill requirement for Python or Java. It asks for 4+ years of experience using Python or Java. This pattern is common across all the roles.
        • All ask for a bachelors or masters in CS or a related field
        • All the roles mention communication skills, albeit with different levels of specificity
        • Hadoop is mentioned in the reqs from: Amazon and Google
        • Java is mentioned by: Airbnb, Google and Facebook
        • Python is mentioned by: Salesforce, Google and Facebook
        • Every company mentions experience with big data explicitly
  1. Submit a PR for this assignment.

Follow the instructions below for GitHub procedures in this class.

  1. You should know a few things about Markdown, the markup language that determines how things look when you view them on the Github web interface. That is what we see when we review your work, so you should always check to see how your README.me file looks before you submit. You might check out this cheat sheet for some pointers.

Markdown is designed to look pretty much in plain text the way that you might guess it would look when made into pretty HTML.

Here are some basics.

Use #, ##, ###, and so on to indicate headers. The header above is ###.

Emphasis, aka italics, with *asterisks* or _underscores_.

Strong emphasis, aka bold, with **asterisks** or __underscores__.

Combined emphasis with **asterisks and _underscores_**.

Strikethrough uses two tildes. ~~Scratch this.~~

[This is a link](https://www.google.com)

Look like this:

Emphasis, aka italics, with asterisks or underscores.

Strong emphasis, aka bold, with asterisks or underscores.

Combined emphasis with asterisks and underscores.

Strikethrough uses two tildes. Scratch this.

This is a link

Formatting Code

Since much of what we'll be doing is showing code and output, it's important to know how to display that such that it is readable.

Inline `code` has `back-ticks around` it.

Inline code has back-ticks around it.

Blocks of code can be indicated by indenting with 4 spaces or with three back-ticks (```</code).

```sql
SELECT this, that, the_other
FROM my_table
```
SELECT this, that, the_other
FROM my_table;
```
col1               col2               col3
fun                dog                cat
mouse              rat                banana
```
col1               col2               col3
fun                dog                cat
mouse              rat                banana

without the backticks, that sql would look like:

SELECT this, that, the_other FROM my_table;

and that pretty table would look like this (please don't do this!!):

col1 col2 col3 fun dog cat mouse rat banana


GitHub Procedures

In your Python class you used GitHub, with a single repo for all assignments, where you committed without doing a pull request. In this class, we will try to mimic the real world more closely, so our procedures will be enhanced.

Each project, including this one, will have it's own repo.

Important: In w205, please never merge your assignment branch to the master branch.

Using the git command line: clone down the repo, leave the master branch untouched, create an assignment branch, and move to that branch:

  • Open a linux command line to your virtual machine and be sure you are logged in as jupyter.
  • Create a ~/w205 directory if it does not already exist mkdir ~/w205
  • Change directory into the ~/w205 directory cd ~/w205
  • Clone down your repo git clone <https url for your repo>
  • Change directory into the repo cd <repo name>
  • Create an assignment branch git branch assignment
  • Checkout the assignment branch git checkout assignment

The previous steps only need to be done once. Once you your clone is on the assignment branch it will remain on that branch unless you checkout another branch.

The project workflow follows this pattern, which may be repeated as many times as needed. In fact it's best to do this frequently as it saves your work into GitHub in case your virtual machine becomes corrupt:

  • Make changes to existing files as needed.
  • Add new files as needed
  • Stage modified files git add <filename>
  • Commit staged files git commit -m "<meaningful comment about your changes>"
  • Push the commit on your assignment branch from your clone to GitHub git push origin assignment

Once you are done, go to the GitHub web interface and create a pull request comparing the assignment branch to the master branch. Add your instructor, and only your instructor, as the reviewer. The date and time stamp of the pull request is considered the submission time for late penalties.

If you decide to make more changes after you have created a pull request, you can simply close the pull request (without merge!), make more changes, stage, commit, push, and create a final pull request when you are done. Note that the last data and time stamp of the last pull request will be considered the submission time for late penalties.

Make sure you receive the emails related to your repository! Your project feedback will be given as comment on the pull request. When you receive the feedback, you can address problems or simply comment that you have read the feedback. AFTER receiving and answering the feedback, merge you PR to master. Your project only counts as complete once this is done.


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.