gonzaloulla / unlp-dbd-newsler Goto Github PK
View Code? Open in Web Editor NEWNewsler - News crawler from Websites and Twitter - DB Design - MS in Software Engineering - UNLP
License: MIT License
Newsler - News crawler from Websites and Twitter - DB Design - MS in Software Engineering - UNLP
License: MIT License
Is your feature request related to a problem? Please describe.
I have no idea why to visualize our data.
Describe the solution you'd like
I would like to have a Kibana Dashboard stored as a JSON file in our GitHub repo.
Describe alternatives you've considered
None yet
Is your feature request related to a problem? Please describe.
Starting the entire stack at once might make your computer/vm crash.
Describe the solution you'd like
Since docker-compose depends:
attribute does not determine startup order of services, a wait-for-it.sh
script should be used.
Describe the solution
3 tables in README.md
The first one containing Newsler component and data attribute prefixes (news_
fors news-crawler and tweet_
for twitter-crawler
News Crawler: a list of attributes and description of data provided by news-crawler
Twitter Crawler: a list of attributes and description of data provided by twitter-crawler
Describe alternatives you've considered
A Google doc file, an Excel spreadsheet, a Json file
Is your feature request related to a problem? Please describe.
YES!!
After a long time running, when my laptop becomes idle, it shuts down all network data transmissions and, thus, twitter-crawler fails.
Whenever one of the crawler fail, the python process is stopped and never restarted.
Describe the solution you'd like
Have supervisord to keep a process up and running all the time
Value of the US for each role
As a Developer, I want to document Newsler arch & features so I can share it with stakeholders.
As a Student, I want to approve DB Design.
As a Professor, I want to have proper evidence of the work that's been done.
As an User, I don't know all Newsler prerequisites, features and configurations (e.g., Kibana index patterns) to use it
Describe the solution you'd like
A clear and concise User Manual (check alternatives below) detailing the following items:
MVP (acceptance criteria)
Alternatives
.rst
files (maybe it's way too much)Is your feature request related to a problem? Please describe.
DB Design
requirement, we should encapsulate and decouple from the internal ELK technical details.Describe the solution you'd like
Not sure yet, some decoupled framework to abstract data models and decouple CRUD operations to any persistence layer.
Describe alternatives you've considered
Use Newsler "AS IS" right now, using different Logstash output plugins
A common framework/library (set of Python modules) shared by both news-crawler
and twitter-crawler
components to abstract data models and decouple CRUD operations
Option 1
plus a Kafka input/output mechanism (maybe a couple of new components and dependencies?)
Additional context
I need help, a lot.
Is your feature request related to a problem? Please describe.
How can I parameterize an existing Kibana dashboard to filter/dive into specific results?
Describe the solution you'd like
For example, as an User I would like define some parameters in order to:
tweet_id_str
)Describe the solution you'd like
Jupyter notebooks assessing Sentiment Analysis performance.
Solution
Epic
Check epic #22
Inputs que te pueden servir:
Describe the solution you'd like
Describe the solution you'd like
Current filebeat+logstash+elasticsearch+kibana infrastructure to collect logs and metrics from each service. New logstash pipelines can be used to achieve this.
Describe the bug
If one tweet is a retweet from another, tweet_likes is always zero. That is, the amount of likes is not saved
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Twitter: https://i.imgur.com/elsozs9.png
Kibana: For tweet_id = 1251658883650203648, tweet_likes must be 140
Current behavior
Twitter: https://i.imgur.com/elsozs9.png
Kibana: https://i.imgur.com/UMFQjOs.png
Describe the bug
Some of the links generated by the CNN and FoxNews crawler are broken
To Reproduce
Steps to reproduce the behavior:
Run the CNN or FoxNews crawler and check the links.
Some links are not under the /world URL but for instance europe or us, by default the /world is added to the URL
Expected behavior
Links are not broken and the URL is honored
Current behavior
/world is added to all the links
Inputs:
Describe the solution you'd like
Describe the bug
Out of 1082 tweets, 443 of them have nil polarity
To Reproduce
Expected behavior
Tweets don't have to have a nil value in their variable tweet_sentiment_polarity
Current behavior
https://i.imgur.com/zXzohR8.png
https://i.imgur.com/89iH9Hw.png
Is your feature request related to a problem? Please describe.
No
Describe the solution you'd like
Only one way to run web spiders: right now this'd be websites.py
, which applies the Template Method pattern
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.