jaryaman / propnews Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 223 KB

Making the news proportionate to global priorities

License: GNU Affero General Public License v3.0

Python 100.00%

propnews's People

Watchers

propnews's Issues

Make log more robust

Convert error_log.txt into an SQL table
Make the log if it doesn't exist. Otherwise, append to the log.
Add the log to .gitignore so that when you pull the repo you don't have to discard the old log

Generalise to different news vendors

We're currently hardcoded for BBC news. We should generalise so that the bot can take a user-defined list of news sites.

Investigate the optimal call frequency of NewsAPI

We are limited by page_limit_per_request=10 and results_per_page=100. Is it possible that, as we increase the list of news sources we search, we hit the limit that NewsAPI can return in a single call, and we therefore need to query NewsAPI more frequently?

get_bbc_content doesn't get full content

Seems to me that get_full_content.get_bbc_content() doesn't return the full content of a corresponding BBC article. The text isn't precisely the same. Is this a concern?

Numbers as a necessary condition for non-zero score

score_articles.score_article should search the article content for (large?) numbers, and words like: double, triple, halve, third, percent, and require that the article pass this test in order to receive a non-zero score.

Duplicate tweets

Twitter doesn't allow you to tweet the same thing twice. Currently, if main.py attempts to tweet the same tweet twice, we write to error_log.txt, but really we ought to keep sampling articles until something new comes up.

Doc2Vec NLP

Using Doc2Vec as a substitute for search strings in assigning scores to articles

Make clean NewsAPI Key

The security of the API key we are currently using is compromised. We should make another.

get_articles.get_url_content always returns ''

get_articles.get_url_content() never appears to be able to get URL contents. I performed the following experiment. In get_articles.py, I added the lines

content = get_url_content(url_path, url)
                if len(content) > 0:
                    pass

and set a breakpoint at pass. The code never entered the line 'pass' after searching > 100 articles. This doesn't seem right.

SQL Primary keys

Currently using the URL as the primary key in news.db, but integer primary keys are more efficient. Consider hashing the URL?

Replace get_articles with get_new_articles

Integrate get_new_articles into main, where we use some time window e.g. 24 hours to draw from the database to tweet

Set debug mode as a flag

Whilst debugging the code, we don't necessarily want to call the API heavily. There currently exists an argument dbg_mode for tweeting.tweet_news. This should be set as a command line flag for running main.py

$ ipython
$ run main.py -d

jaryaman / propnews Goto Github PK

propnews's People

Watchers

propnews's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs