csu / pyquora Goto Github PK

View Code? Open in Web Editor NEW

131.0 131.0 70.0 255 KB

A Python module for fetching and parsing data from Quora.

Home Page: http://christopher.su/pyquora/

License: Other

Python 100.00%

parsed-data python python-library quora statistics

pyquora's People

Contributors

Stargazers

Watchers

pyquora's Issues

Write test cases to ensure legacy support

Doesn't need to actually check the functionality of the methods/API (because they would just be aliases to methods that are being tested elsewhere in the test suite), just need to check that the old API/methods exist and can be called with the correct parameters.

get_user_activity does not scrap data anymore because of recent UI changes

Because of Quora's recent UI change Quora.get_user_activity does not scrap data correctly.

A direct consequence on quora-api can be observed by making a GET request on:
http://quora-api.herokuapp.com/users//activity/answers
where an empty array is returned.

https://github.com/csu/pyquora/edit/master/quora/pyquora.py#L45

Write test suite

Most static methods in Quora and User really should be class methods

We should fix this, but maintain the legacy API so things that currently use pyquora don't need to be rewritten. We can throw away the legacy support at a certain milestone, like v2.0 or something.

get_random_answers breaks

>>> from quora import Quora
>>> Quora.get_random_answers(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "quora/quora.py", line 139, in get_random_answers
    answer = Quora.get_one_answer(question)
  File "quora/quora.py", line 50, in get_one_answer
    return Quora.scrape_one_answer(soup)
  File "quora/quora.py", line 54, in scrape_one_answer
    answer = soup.find('div', id = re.compile('_answer_content$')).find('div', id = re.compile('_container'))
AttributeError: 'NoneType' object has no attribute 'find'
>>>

Add tests for answer statistics

Update PyPi Readme

No longer the same as the markdown/GitHub readme.

Take a different approach to testing

If I'm not wrong, the tests are checking if data has been scraped. It does not check if it has been done correctly.

How about adding selected HTML pages into the test folder rather than loading the page from Quora every time the test is run? This way, we can check if there is a difference between what was expected and what was received.

Separate user statistics and activity code into separate files

Not absolutely necessary, but could improve readability.

Add question description to get_question_stats

Write basic tests for user statistics

Rewrite tests to use soups from local test HTML files

Add tests for legacy API

So csu/quora-backup#6 doesn't happen again.

Use Python properties to have User Activity as an attribute to User

e.g. the end API usage should be like

user = Quora.User('Christopher-J-Su')
activity = user.activity
print activity.activity_type

I.e. we shouldn't have to call a method to get activity, rather, it should be an attribute of the User class like the other statistics (followers, following, edits, etc.).

Add tests for get_latest_answers

Fix answer activity

Answers aren't being parsed from the feed properly.

Test code:

from quora import Quora, Activity

quora = Quora()
activity = quora.get_activity('Christopher-J-Su')
print activity.answers

Results:

(env)csu:pyquora (master)$ python debug.py
[]

Also, from quora-api:

{
  "items": []
}

Open organization to give contributors push access

@rohithpr and @aaronwinter have contributed enough and are familiar enough with the codebase to directly push to pyquora and quora-api, as well as review and accept pull requests. An org should be created to grant them push access to the repositories.

Add question statistics

Fetch the number of views, edits, followers, etc. for a question, but not the content (for now, just to be safe 😄).

Add question stats, answer stats, and get_latest_answers to readme

get_latest_answers returns some empty dicts

This happens when the answer's author has a number at the end of their username.
Ex: Foo-Bar-23 but we make a function call as: get_one_answer(question, 'Foo-Bar')

One way to overcome this would be to check for invalid dicts and keep making function calls as:
get_one_answer(question, 'Foo-Bar-1'), get_one_answer(question, 'Foo-Bar-2') and so on till a valid dict is received but it is highly inefficient.

So we need to find another way to get these answers.

Add topic follows to user activity

Write class/serializer for user statistics

Quora is blocking scrapers

As I've stated here, quora is blocking some (all?) scripts.

from bs4 import BeautifulSoup
import requests

url = 'http://www.quora.com/search?q=flowers'
soup = BeautifulSoup(requests.get(url).text)
print soup

<html>
  <head>
    <title>503 Service Unavailable</title>
  </head>
  <body>
    <h1>503 Service Unavailable</h1>
      The server is currently unavailable. Please try again at a later time.<br/><br/>
      Our automated scripts have detected a possible scraper. If you feel we have made an error, please email [email protected]. Sorry for the inconvenience. Thanks.


  </body>
</html>

get_user_stats raises an IndexError

stats = quora.get_user_stats('Christopher-J-Su')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "../quora/quora.py", line 143, in get_user_stats
    return User.get_user_stats(u)
  File "../quora/user.py", line 156, in get_user_stats
    user_dict = {'answers'   : data_stats[1],
IndexError: list index out of range

Add travis.yml for Travis CI

Come up with a better way to test Activity

Right now, my example tests (wrote just to get CI working) just check to see if any of the activity attributes (answers, questions, etc.) return an empty list. This isn't always necessarily correct.

For example, if someone hasn't posted a review in a long time, their activity.review_requests will be empty, even if pyquora is working properly.

how to get all the answers of a question

using Quora.get_one_answer('6hARL') can only get one answer for a question.
I mean how to get all the answers of a question?
thanks

nosetest imports `quora` from whatever is installed in the virtualenv and not the current working directory

I ran into a little something when I was writing tests and made changes to quora.py.
Those changes aren't useful as nosetest is importing quora from the venv.
Is this the expected behaviour or am I doing something wrong?

Question details doesn't work

Try Is-there-a-proof-of-the-Four-Color-Theorem-that-does-not-involve-substantial-computation.

GET: http://quora-api.herokuapp.com/questions/Is-there-a-proof-of-the-Four-Color-Theorem-that-does-not-involve-substantial-computation

Output:

{
  "answer_count": 4, 
  "answer_wiki": "<div class=\"hidden\" id=\"answer_wiki\"><div id=\"ld_ebgwib_28688\"><div id=\"__w2_sHb6iqm_wiki\"></div></div></div>", 
  "question_details": null, 
  "question_text": "Is there a proof of the Four Color Theorem that does not involve substantial computation?", 
  "topics": [
    "Science, Engineering, and Technology", 
    "Science", 
    "Formal Sciences", 
    "Mathematics"
  ], 
  "want_answers": 1
}

question_details is null, but the question has details on Quora.

get_random_answers isn't particularly useful

What possible use cases are there for get_random_answers? I don't think it's necessary/useful. Plus, we're importing string and random just for it.

Write class/serializer for question statistics

Fix the "name" field so that it gives the user's full name instead of their username

Set up Read the Docs

How to get latest or popular question?

I don't see any endpoint. is there any method to do that?
thanks

Create a "User" class to serialize user statistics

Correct USAGE instruction in README.md

Currently the USAGE instruction in README.md is like this:



    from quora import Quora, Activity

    quora = new Quora()

    # get user activity
    activity = get_activity('Christopher-J-Su')

But it should be like this:



from quora import Quora, Activity

quora = Quora()

# get user activity
activity = quora.get_activity('Christopher-J-Su')

Unhandled exception in `get_question_stats()`

When the get_question_stats() method is called with an invalid question, an unhandled exception occurs. Here's a dump of an error:

question = Quora.get_question_stats('Medicine-and-Healthcare')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/quora/quora.py", line 115, in get_question_stats
return Quora.scrape_question_stats(soup)
File "/usr/local/lib/python2.7/dist-packages/quora/quora.py", line 125, in scrape_question_stats
answer_count = soup.find('div', attrs={'class' : 'answer_count'}).next.split()[0]
AttributeError: 'NoneType' object has no attribute 'next'

Add tests for question statistics

Fix get_one_answer

It's still not working for me. It's also not working at http://quora-api.herokuapp.com/answers/How-can-I-join-Open-Source-Rails-projects/Tobias-Sandelius.

>>> from quora import Quora
>>> Quora.get_one_answer('How-can-I-join-Open-Source-Rails-projects', 'Tobias-Sandelius')
{}

Move "Usage"

Currently it is in readme.md. Wouldn't it be better to move it from there to another folder with code examples?

Add question text/title to question statistics

Make Python 3 compatible

Aiming for Python 2.6+ and Python 3.3+ compatibility.

try_cast_int ignores the 'k' in cases where there are over a thousand upvotes/want answers

print Quora.get_question_stats('What-are-the-best-Cyanide-Happiness-comics')
{'want_answers': 2, 'question_text': u'What are the best Cyanide & Happiness comics?', 'topics': [u'Communication', u'Writing', u'Books', u'Publishing', u'Comics (narrative art form)'], 'question_details': None, 'answer_count': 474, 'answer_wiki': None}

want_answers should've been 2k! 😆

Fix activity so that it recognizes "want answers" instead of "followed question"

In light of new Quora UI changes, we need to fix how we detect question follows in user activity.

Rewrite test suite to use new API

User information hidden when not logged in

Looks like Quora is masking usernames when you view an answer without logging in. "Quora User" is shown in place of the user's actual name. This is also affecting answers fetched by requests.

I haven't checked to see the extent to which this is applied.

PS: It's not an issue with the user being banned or anything, it shows the name properly after logging in.

csu / pyquora Goto Github PK

pyquora's People

Contributors

Stargazers

Watchers

Forkers

pyquora's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs