GithubHelp home page GithubHelp logo

Comments (7)

april avatar april commented on August 23, 2024 1

I think I can make this work! This sort of thing is on my list of goals to accomplish over the next quarter or two. :)

from http-observatory.

jarondl avatar jarondl commented on August 23, 2024

+1 from me.

BTW, my current solution (once #156 and related issues are resolved) is something along these lines:

>>> from httpobs.scanner import analyzer, retriever
Retrieving the Chromium HSTS preload list
Successfully downloaded and parsed the Chromium HSTS preload list
WARNING: Disconnected from PostgreSQL
WARNING: Unable to connect to PostgreSQL.
>>> resp = retriever.retrieve_all('mozilla.org')
>>> grade = 100 + sum(test(resp)['score_modifier'] for test in analyzer.tests)
>>> print(grade)
75

To clarify - this does not currently work because of the database calls in #156

from http-observatory.

april avatar april commented on August 23, 2024

Let me plug away at this for the next day or two! I'm thinking an environmental variable of HTTPOBS_LOCAL_MODE. If it's set, it does everything:

a) without celery/redis
b) without the database

How does that sound?

from http-observatory.

aneeshusa avatar aneeshusa commented on August 23, 2024

An environment variable sounds workable on our end, but I'd propose another solution. My ideal use case would be being able to import some functions from httpobs and call them directly, instead of requiring running a separate process/service with a special environment. I haven't looked at any of the implementation yet, but this would involve separating the functions that do processing (analyzing, testing, etc.) from those doing database/redis/celery I/O, so we can just call the functions we need and pass around Python data structures, skipping serialization/deserialization.

An example: The retriever.retrieve_all function already returns a dict which we can pass directly to the various testing functions. The scan task wraps that to do DB I/O: picks up an item off the celery queue and inserts results into the database. To get around the select_site_headers issue (#156), have retrieve_all take the headers as another argument, and make scan responsible for fetching that data from the DB and passing it to retrieve_all.

This avoid needing to have yet-another-config option contributing to an exponential explosion of the testing space, reducing your maintainability load since everything uses the same code path.

from http-observatory.

april avatar april commented on August 23, 2024

Sounds good to me. That should be workable. :)

from http-observatory.

april avatar april commented on August 23, 2024

Okay! Could y'all check out this branch:

https://github.com/marumari/http-observatory/tree/local-mode

And test things out for me? It talks about it in the readme, but it's basically:

>>> from httpobs.scanner.local import scan
>>> scan('foo.com')

This local scan also lets you do a bunch of other things, since it's not a public API. You can specify cookies and headers to send, choose non-default ports for both http and https, and scan an arbitrary path that isn't /. Please let me know how it goes!

Thanks!

(cc: @floatingatoll)

from http-observatory.

floatingatoll avatar floatingatoll commented on August 23, 2024

@aneeshusa Local scan is working out well for several people now. If you're willing to install postgresql client libraries for now, you can simply use the local scanner pip3 instructions in README.md and use the .scan() function documented therein to scan your sites (with optional paths and custom ports). You won't need any running servers, just that one client library, and I'm poking at #185 to try and make it possible to install local scanner requirements without the database client library someday. Please do report back when you get a chance if this works out for you, or if there's any improvements you need!

from http-observatory.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.