GithubHelp home page GithubHelp logo

anastasia / perma-capture Goto Github PK

View Code? Open in Web Editor NEW

This project forked from harvard-lil/perma-capture

0.0 1.0 0.0 11.74 MB

License: GNU Affero General Public License v3.0

Dockerfile 0.23% Python 73.14% CSS 4.16% HTML 21.01% Shell 0.05% JavaScript 1.42%

perma-capture's Introduction

Perma-Capture

CircleCI codecov

Development

Spin up some containers

Start up the Docker containers in the background:

$ docker-compose up -d

The first time this runs it will build the Docker images, which may take several minutes. (After the first time, it should only take 1-3 seconds.)

Then log into the main Docker container:

$ docker-compose exec web bash

(Commands from here on out that start with # are being run in Docker.)

Run Django

You should now have a working installation!

Migrate the database:

# ./manage.py migrate

Spin up the development server:

# fab run

Create a test admin user (follow the prompts, then log in using those credentials):

# ./manage.py createsuperuser

Optional: wire up a capture service

To capture websites and serve archives, this application must be configured to communicate with a running capture service. You can run one locally at the same time as this application using Minikube (Docker driver).

Follow the directions for the KubeCaptures "Sample Development Flow." The capture service will be exposed to your localhost on a particular port by minikube service --url browserkube: make note of it. The minio storage service should also be available to your localhost on port 9000.

In this application's settings.py, add the following stanzas:

# Tell the capture service to direct its webhook callbacks to this Django app,
# using Docker's special hostname for internal routing.
CALLBACK_PREFIX = "http://host.docker.internal:8000"

# Tell this application where the capture service's WACZ files are hosted:
# if accessed by a user, via curl or their browser, minio will be at localhost;
# if accessed by the Django application, inside its container, minio will be at
# Docker's special hostname for internal routing.
OVERRIDE_ACCESS_URL_NETLOC = {'internal': 'host.docker.internal:9000', 'external': 'localhost:9000'}

Then, add the following, swapping in the port your capture service is exposed on ("49445" in this example):

BACKEND_API = "http://host.docker.internal:49445"

Remember, every time you start the capture service, it will be exposed on a different port, so you will have to update this setting regularly. The Django application will restart automatically when you make changes to settings.py.

If everything is working, you should now be able to (while logged in as your test superuser):

  • request multiple captures of http://example.com
  • watch the capture jobs complete in your kubernetes dashboard
  • watch the capture service POST to our webhook callback; see the newly created 'Archive' objects at http://localhost:8000/admin/main/archive/; observe the hash of each file is different
  • watch the Django application poll continuously and eventually report the capture jobs are complete
  • download the WACZ files using the UI button or the API
  • view playbacks of the WACZ files using the UI button

Stop

When you are finished, spin down Docker containers by running:

$ docker-compose down

Your database will persist and will load automatically the next time you run docker-compose up -d.

Or, you can clean up everything Docker-related, so you can start fresh, as with a new installation:

$ bash docker/clean.sh

Testing

Test Commands

  1. pytest runs python tests
  2. flake8 runs python lints

Coverage

Coverage will be generated automatically for all manually-run tests.

Migrations

We use standard Django migrations

Contributions

Contributions to this project should be made in individual forks and then merged by pull request. Here's an outline:

  1. Fork and clone the project.
  2. Make a branch for your feature: git branch feature-1
  3. Commit your changes with git add and git commit. (git diff --staged is handy here!)
  4. Push your branch to your fork: git push origin feature-1
  5. Submit a pull request to the upstream develop through GitHub.

Design Notes

JSON Keys

The Kubecaptures capture service returns JSON data with keys formatted in camel case. Since this application is primarily a wrapper for the capture service, we too return camel case keys, even though the Python convention is to use snake case.

That requires a bit of acrobatics.

We treat the conversion like that between strings and bytes: we work exclusively with snake case keys within the application and convert to and from camel case at the application boundaries, only when communicating with the capture service API or when serializing our own API responses.

Utilities should, for the most part, keep developers from having to think about this much: Django REST Framework will handle the conversion during serialization/deserialization of standard Django requests/responses; our utility function will handle the conversion when we interact with the capture API inside a view function. But, developers should be aware this is occurring and may occasionally become relevant (for instance, during signature validation).

Please keep this in mind when updating docs: JSON should be in camel case, even though the corresponding Python dictionary of data will be in snake case.

License

This codebase is Copyright 2020 The President and Fellows of Harvard College and is licensed under the open-source AGPLv3 for public use and modification. See LICENSE for details.

perma-capture's People

Contributors

bensteinberg avatar ikreymer avatar rebeccacremona avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.