GithubHelp home page GithubHelp logo

michalczaplinski / pitchfork Goto Github PK

View Code? Open in Web Editor NEW
72.0 7.0 18.0 36 KB

:notes: Unofficial python API for pitchfork.com reviews.

License: MIT License

Python 100.00%
pitchfork python scraper requests beautifulsoup

pitchfork's Introduction

pitchfork

An unofficial python API for pitchfork.com reviews.

Build Status PyPI - Python Version

Installation

You can get it from python package index:

pip install pitchfork

You can also clone the repository, but note that pitchfork depends on beautifulsoup4 for HTML parsing so you first have to install beautifulsoup4 yourself:

git clone https://github.com/michalczaplinski/pitchfork.git
cd pitchfork
pip install -r requirements.txt

Usage

>> import pitchfork

>> p = pitchfork.search('kanye west', 'my beautiful') # the title is autocompleted
>> p.album() # the full album title
u'My Beautiful Dark Twisted Fantasy'

>> p.label()
u'Def Jam / Roc-A-Fella'

>> p.editorial()[:100] # get the first 100 characters of the review.
u"Kanye West's 35-minute super-video,\xa0Runaway, peaks with a parade. Fireworks flash while red hoods ma"

# the link to the album cover image
>> p.cover()
'http://cdn4.pitchfork.com/albums/15935/homepage_large.831179e9.jpg'

>> p.score()
10.0
# pretty overrated IMHO!

Tests

You can run the basic tests located in the tests directory with:

$ cd path/to/tests
$ python -m unittest discover

License

MIT

Contributions

If you want to add some new feature, suggest improvement or whatnot you're welcome to message me or send a pull request!

pitchfork's People

Contributors

aristeia avatar diazcastaneda avatar fortes avatar johnwmillr avatar michalczaplinski avatar omardelarosa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pitchfork's Issues

Switch to using a Session object for HTTP requests

Copied from issue on tejassharma96's repository.


I propose switching to a Session object from the requests library for the package's HTTP requests. We're currently opening a new connection to the same host for each request. Making this switch would allow some parameters to persist between a users's requests, which may be friendlier to the Pitchfork server.

I'd be able to handle the switch over to the requests library, but I'll try to think some more first on whether it really makes sense to make the change.

.year() function does not work as intended

Caught in unit testing:

https://travis-ci.org/bruno207/pitchfork/jobs/187940542#L252

Sample run:

> python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pitchfork
>>> p = pitchfork.search('mogwai', 'come on')
>>> p.year()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bruno207/.local/lib/python3.5/site-packages/pitchfork/pitchfork.py", line 94, in year
    year = self.soup.find(class_='year').contents[1].get_text()
  File "/home/bruno207/.local/lib/python3.5/site-packages/bs4/element.py", line 730, in __getattr__
    self.__class__.__name__, attr))
AttributeError: 'NavigableString' object has no attribute 'get_text'
>>> 

Pitchfork response format has changed, breaks a number of features

The response format from Pitchfork has changed and breaks a number of this package's features. I've tested the code on a Mac using the Anaconda distribution of python 2.7.13, and python 3.5.2 from python.org.

Most of these issues can be fixed by changing the JSON keys the search() function expects and the HTML classes used when parsing with BeautifulSoup.

I'm working on a fix for the bugs that I've found and will submit a pull request soon.

John

Bugs

Within the search function

URL to the review
  File "pitchfork/pitchfork.py", line 203, in search
    url = review_dict['site_url']
KeyError: 'site_url'
Key for artist name
  File "pitchfork/pitchfork.py", line 204, in search
    matched_artist = review_dict['content'].strip().split('\n\n\n')[0]
KeyError: 'content'

Within the Review class

URL to the album cover
  File "pitchfork/pitchfork.py", line 70, in cover
    image_link = artwork.img['src'].strip()
AttributeError: 'NoneType' object has no attribute 'img'
Label
  File "pitchfork/pitchfork.py", line 85, in label
    label = self.soup.find(class_='label-list').get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
Year
  File "pitchfork/pitchfork.py", line 95, in year
    year = self.soup.find(class_='year').contents[1].get_text()
  File "/usr/local/lib/python3.5/site-packages/bs4/element.py", line 737, in __getattr__
    self.__class__.__name__, attr))
AttributeError: 'Comment' object has no attribute 'get_text'

Search no longer working

Pitchfork Site Redesign has made search() invalid

Searching produces a HTTP Error 404: Not Found on line 197 in pitchfork.py:

response = urlopen(request)

From what I have noticed, the search url has changed and no longer includes 'ac/' in the static url you append queries to in 'http://pitchfork.com/search/ac/?query='

Search no longer works after site redesign

Looks like Pitchfork's redesign breaks how search was working here (via reading a JSON object that no longer exists). Poking around at the source, there's no straightforward replacement.

https://pitchfork.com/api/v2/search/_ac/?query= endpoint seems promising, but surprisingly fails for all sorts of values such as mogwai atomic even though that review exists and is returned when just searching for mogwai

Specify python 3 more clearly

I've been playing around with your project today and I ran into a few issues running your pip package inside python. I was running it in python 2 as that is where I assume most packages are running and didn't notice it was a 3 project before seeing the badge in the README. After installing with pip3 and python3 things went smoothly.

I've written a pull request with some slight modifications to make this more clear in the README.

Also as a side note I noticed your .travis.yml had an unknown build status. I went ahead and modified it to use pip3 and python3 instead for builds, which got CI to run on my repo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.