GithubHelp home page GithubHelp logo

yalies / api Goto Github PK

View Code? Open in Web Editor NEW
12.0 4.0 23.0 14.38 MB

👥 The best directory of Yale personnel, with a clean API to match. Used by 70% of undergrads!

Home Page: https://yalies.io

HTML 13.39% Python 69.65% Shell 0.45% CSS 3.69% JavaScript 12.50% Mako 0.23% Procfile 0.02% Dockerfile 0.08%
yale api scraping

api's Introduction

A website and API for getting information on students at Yale College.

Screenshot

Initial Setup

To develop changes to the application, you'll need to run it locally for testing.

This guide assumes, as prerequisites, that you have A MacOS or Linux-based OS (if you use Windows, you can still follow along, but some commands may be different).

Install Homebrew

  • If you're on a Mac, first install Homebrew.
  • If you're on Linux, follow along with your distro's package manager (pacman, apt-get, yum, etc).

Install Python

This project requires Python version 3.10.7. You may already have Python on your machine—it comes preinstalled with macOS—but it may be the wrong version.

To check your python version, run:

python3 --version

If it says anything other than 3.10.7, keep following along.

We are going to use pyenv to install multiple versions of Python on our machine, and pyenv-virtualenv to manage dependencies in our package. Run:

brew install pyenv
brew install pyenv-virtualenv

Now, we must add pyenv to our PATH. PATH is a special bash variable that tells the shell what executables we can run.

If you're using zsh (the default for macOS nowadays), open ~/.zprofile. in your favorite text editor If you're still using bash, edit ~/.bash_profile. Add the following lines at the bottom:

export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init --path)"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"

Note: for fish shell, add this to ~/.config/fish/config.fish instead. If you don't know what fish is, ignore this.

set PATH $PATH "$HOME/.pyenv/bin"
eval "$(pyenv init --path)"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"

Finally, we have to tell Python to use the correct version. To tell pyenv to switch to the right version, run

pyenv install 3.10.7
pyenv global 3.10.7

Install PostgreSQL

Run:

brew install postgresql

Clone the repository

If you need a refresher on Git, Eric made a Git tutorial video. The video describes the workflow the development team will be using.

Clone the repo using

git clone https://github.com/Yalies/api Yalies
cd Yalies

Note: If you are not a member of the Y/CS Yalies Dev Team but want to contribute, make a fork of the repo and set upstream.

Create your venv

Normally, when you install Python packages with pip3, the packages are installed in your user directory. But, what if two project use different versions of the same package? To avoid conflicts, instead of installing packages globally, we'll install them in a virtual environment localized to our project.

Create a virtual environment (venv) inside your project directory with:

python3 -m venv .venv

Optional: Install the VSCode extension to work with venv.

Now, activate the venv:

source .venv/bin/activate

Or, for fish:

source .venv/bin/activate.fish

Or, for powershell:

Set-ExecutionPolicy Unrestricted -Scope Process
.venv\Scripts\activate     

Now that you've activated your venv, the commands python3 and pip3 are replaced with a special pointer to your project directory.

Install dependencies

pip3 install -r requirements.txt
pip3 install -r requirements-test.txt

Run migrations

Finally, run the database migrations to get the local SQLite database configured:

python3 -m flask db upgrade

Running

To locally launch the application:

FLASK_APP=app.py FLASK_ENV=development flask run

Or, for powershell:

$env:FLASK_APP="app.py"
$env:FLASK_ENV="development"
flask run    

The app will subsequently be available at localhost:5000.

When running locally, the app will use a non-hosted SQLite database, meaning that all database contents will be stored in app.db. If you wish to run SQL queries on this database, simply install sqlite (best obtained through Homebrew or other package manager), and run:

sqlite3 app.db

Scraper

Our scraper crawls Yale's websites in order to obtain the data we provide. See documentation here.

Submitting changes

Switch to a new branch to hold your changes:

git checkout -b changes_description

The name of the branch should be short and refer to what you’re planning to change.

Next, make your code changes! Be sure to test them and make sure the app runs as you expect. Next, commit your code:

# To tell git to track all the files you changed
git add -A
# To label this set of changes:
git commit -m "Describe your changes here"
# Make sure to title your commit in the imperative tense, for example "Add new features” instead of "Added…", "Adding…", etc.

Next, upload your code to the repo. This won't affect master, only your feature branch.

git push -u origin your_branch_name

NOTE: If you make another change on this branch, you can just do git push (without additional flags) and it will automatically push to the last remote/branch you specified.

Before you make a pull request, please make sure you have all the latest changes from master. Resolve all conflicts accordingly.

git merge master

Next, create a pull request (a request to merge your changes into the main repository) by going to the repository page on GitHub and clicking the green "Compare & Pull Request” button that appears.

Title the pull request with a description of all included changes.

In the description, write "Fixes #X” or "Resolve #X”, with X being an issue number, for each issue you're fixing in this PR. This will save time by telling GitHub to automatically close those issues once your changes are merged.

On the right side under Reviewers, select one other team member, your team lead (Eric Yoon), and Erik Boesen. You'll need all three people to approve your changes before it can be merged.

Congratulations! Your changes will be up for review. After they are merged, you'll need to check out back to master.

git checkout master

Repeat until all features are implemented and all bugs fixed! 🙂

License

Licensed under the MIT license.

Author

Built by Erik Boesen. Maintained by the Yale Computer Society.

api's People

Contributors

bearsyankees avatar bgdncz avatar davidtjeong avatar ericyoondotcom avatar erikboesen avatar evgerritz avatar goldinguy avatar helenhall avatar jeffreyjgong avatar neilshah12 avatar redorhcs avatar rencewang avatar salmogy22 avatar transdoan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

api's Issues

Add admin and banned columns to users table

Currently, when checking that a user is permitted to do certain privileged operations (i.e. running the scraper), we just check if the user's CAS NetID is equal to my NetID (ekb33). We should add a boolean admin column to the users table that would allow users to be set as administrators, and then check if the current user is an admin when attempting to perform privileged operations, rather than checking against my hardcoded NetID. If you really want to be fancy, you could try to figure out how to add a decorator for this (like @admin_required, comparably to how flask-cas and flask-login implement @login_required).

For banned, it would be good to be able to ban individual users who we don't want using the site. Just in case.

If no people were found on face book page and we abort scraping, delete the saved page file

This failure is caused when authentication has failed. Currently, if we change the passed token so that it's valid, and then immediately rerun the scraper, it'll use the existing page.html file from when the request failed, and the problem won't be fixed until we restart the heroku dyno (which resets the ephemeral filesystem).

Add code to the failing case to delete the page.html file.

Scrape more faculty information from department websites

Lots of academic departments at Yale have People pages that list (in apparently a somewhat consistent format) all the people (grad students, faculty, staff, etc.) in the department.

These websites have lots of extra information, such as:

  • Suffixes (M.D., Ph.D, etc.)
  • Links to personal or lab webpages
  • Full professorship titles (for example "Sterling Prof of Sociology, Director, Urban Ethnography Project; Prof African American Studies")
  • Pictures

Examples:
https://ling.yale.edu/people
https://cpsc.yale.edu/people
https://afamstudies.yale.edu/people
https://math.yale.edu/people
https://mcdb.yale.edu/people
https://medicine.yale.edu/anesthesiology/people/

Many more... full list here: https://www.yale.edu/academics/departments-programs

ElasticSearch can return different results each search, causing different ordering for each page of results

Pretty much the title. If you do a broad search that returns many results ("Hopper", for instance), you may notice that some people are duplicated across pages, or possibly omitted. This only is an issue for very large searches (which few people do, apparently preferring to use filters), but it's a very obvious problem once you notice it. One solution to this might be to use ES's scrolling tracking features.

Re-show residence filters once room numbers return to Face Book

The Face Book has removed all room numbers this semester. This may be because of us. It also may be because of the irregularity of COVID. For this reason, I hid the filters that use room numbers (building code, entryway, floor, etc.) in app/templates/index.html. If/when room numbers are put back, we should show these filters again.

Use more secure filenames for S3 images

Right now, someone could theoretically iterate through every number from 1-100,000 and get all the user images. Rather than using Yale's naming scheme for the files, we should generate a securely random name for the image file based on some properties of the user that aren't likely to change. For example, we can append UPI, image ID, netid, etc. together and then hash that somehow and name the file thus.

Refactor scraper into multiple files

It's huge, and once we implement #44, it's only gonna get huger. We should split different components into multiple files, some ideas for divisions:

  • face_book
  • directory
  • department_websites
  • util (for example clean_* functions)

Write more complete API documentation

I think it would be really cool to have a Swagger docs system like this, where you can test the API in-browser and see what the responses are like. At minimum we should document the filters endpoint and add a list of fields Person has.

Come up with a new way to tell if people are on leave

Currently we just check if the graduation year of each student has increased since the saved copy we have from last year. Once this semester ends, we'll no longer have a reliable way to tell whether people are still on leave or if they only took one semester off. We'll either need to find another way to get leave data, or change the labeling to signify that this student HAS taken a leave but may not necessarily still be on it.

Persist query in URL parameters

One kind of nice (although very bugged) thing that the Yale Face Book does is that when you run a search, the search information is stored in the URL. That way, if you want to send someone the results of your search, you can just copy and paste the URL, which would be something like:

yalies.io/?query=Some+name&filters=...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.