GithubHelp home page GithubHelp logo

city-bureau / city-scrapers Goto Github PK

View Code? Open in Web Editor NEW
329.0 38.0 310.0 11.21 MB

Scrape, standardize and share public meetings from local government websites

Home Page: https://cityscrapers.org

License: MIT License

Python 41.11% Shell 0.03% HTML 58.86%
web-scraping python open-data scrapy city-scrapers

city-scrapers's People

Contributors

ab1470 avatar bergren2 avatar bonfirefan avatar brettvanderwerff avatar cherdeman avatar ckwms63 avatar csethna avatar dcldmartin avatar diaholliday avatar eads avatar easherma avatar haileyhoyat avatar hancush avatar jim avatar joshuarrrr avatar kevivjaknap avatar maxine avatar mkrump avatar mwgalloway avatar myersjustinc avatar novellac avatar o-stovicek avatar palewire avatar pjsier avatar rhetr avatar simmonsritchie avatar the-furies avatar thenoelman avatar vincecima avatar wildisle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

city-scrapers's Issues

clean up installation docs

since I just copied over from the readme, there's stuff that is now in the contributing a spider guide that doesn't need to be there

Context for meetings

Hi all,

We've built out a few sites that seem somewhat similar to the aims of this project

We have found that these have not been widely used.

We suspect that the main barrier to participation is not knowing when and where meetings occur, but an understanding of

  • what takes place in the meeting
  • why a person might go to the meeting
  • how to engage in the meeting (as either as speaker or just an observer)
  • does going to this meeting feel safe

A comprehensive source for the time and place of public meetings can be useful, but I'm wondering if you all are thinking of addressing any of the other barriers?

Schedules in PDF format

Based on a discussion, we need to document the following process for developers in the documentation:

  • If a data source is difficult to scrape (data is in a PDF, image, etc.), notate that on the sources spreadsheet and don't attempt to write a scraper for it.

Tasks to close this issue:

  • Add documentation on how to handle difficult to scrape sources.

Original issue content

Not sure if we want to get into trying to parse PDFs, but the Chicago Housing Authority's meetings are posted as an image and PDF.

Is it worth trying to automate the import of this information? Or is is better to just have a list of sources that a human needs to enter into the system once per year?

what to do about files genspider tests downloads

@r-wei

The genspider tests leave new files in the tests directory, somewhat out of necessity. Not a big deal but worth noting.

we could:

  • ignore completely
  • remove when finished
  • do some mock stuff so they never get written

How to add public meetings for other municipalities?

Hey @diaholliday, @eads said you would be the one to ask about this. I'd like to contribute events to your aggregator from the Rockford area. Being a municipal area with one of the larger populations in Illinois, I'd like to get some action going around documenting public meetings here, as nothing like this currently exists. Ideally, I will be able to add agencies/orgs from not just Rockford proper but including the surrounding municipalities within Winnebago county.

What is the appropriate way to go about doing this? I don't want to step on toes and it looks as if your current spreadsheet has things almost exclusively limited to Chicago and Cook County. Should I add a new tab for Rockford Area, or add Rockford and related items to their own bordered section on the appropriate tabs that already exist?

I'm looking forward to contributing!

Event format validator

I think it would be nice to have an event validator as a part of the pipeline in order to ensure that any data that gets downstream has all of the required fields in the correct formats.

failing gen_spider test

(documenters-aggregator) ➜  documenters-aggregator git:(master) ✗ pytest
======================================================================== test session starts =========================================================================
platform darwin -- Python 3.6.2, pytest-3.1.3, py-1.4.34, pluggy-0.4.0
rootdir: /Users/DE-Admin/Code/documenters-aggregator, inifile:
collected 310 items

tests/test_cchhs.py ..........................................................................................................................................................................................................................................................
tests/test_idph.py .......................................................
tests/test_tasks.py ....F

============================================================================== FAILURES ==============================================================================
_______________________________________________________________________ test_gen_html_content ________________________________________________________________________

    def test_gen_html_content():
        tasks._gen_html(SPIDER_NAME, SPIDER_START_URLS)
        test_file_content = read_test_file_content('files/testspider_articles.html.example')
        rendered_content = read_test_file_content('files/testspider_articles.html')
>       assert test_file_content == rendered_content
E       assert '<!doctype ht...ody>\n</html>' == '<!doctype htm...ody>\n</html>'
E         Skipping 3940 identical leading characters in diff, use -v to show
E         Skipping 210551 identical trailing characters in diff, use -v to show
E         - ed/common-5087336d1f748f6e2186-min.js"]; })(SQUARESPACE_ROLLUPS, 'squarespace-common');</script>
E         ?           ^ -----  ^^ ^^^  ^ ^
E         + ed/common-d8a982e40d16144e2580-min.js"]; })(SQUARESPACE_ROLLUPS, 'squarespace-common');</script>
E         ?           ^^^^^^^^   ^^ ^  ^ ^
E         - <script crossorigin="anonymous" src="//static.squarespace.com/universal/scripts-compressed/common-5087336d1f748f6e2186-min.js"></script><script>(function(rollups, name) { if (!ro...
E
E         ...Full output truncated (9 lines hidden), use '-vv' to show

tests/test_tasks.py:40: AssertionError
================================================================ 1 failed, 309 passed in 2.85 seconds ================================================================

invoke fails on windows

invoke runtests
You indicated pty=True, but your platform doesn't support the 'pty' module!

Unfortunately pseudo terminal handling is not available for Windows, and the pty=true flag prevents you from invoking runtests. Most people are probably not doing the frustrating thing of trying to run things in Windows like I am, but just case, I know one solution mentioned was to put in an os check.

what should the date format be?

The OpenCivicData spec is unclear on this point. To wit, "Starting date / time of the event. This should be fully timezone qualified."

What does that even mean?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.