docnow / docnow Goto Github PK
View Code? Open in Web Editor NEWA Twitter data collection and appraisal application.
License: MIT License
A Twitter data collection and appraisal application.
License: MIT License
Now that we have a homepage that lists trending hashtags we can start to implement a Search view that it can link to. The idea is that a logged in user can click on a hashtag on the homepage and go to a page that summarizes the recent history of that hashtag. You can see a sketch of this page in this design:
It would be nice to have a fancier "loading" animated gif for the cards as they are populated. Perhaps something related to our logo, or something simple that's better than the word "loading"?
It could be useful to display verified users in explore mode, or perhaps after the collection has started. I think highlighting the verified users could be useful because they are individuals who have gone the extra step to authenticate their account, and could be more interested in their content having a home in an archive.
Thoughts?
A ranked list of URLs should appear on the Explore page.
At the moment jslint is running as part of webpack configuration which only processes the src/client code. It would be nice if jslint ran on /src/server code as well. Perhaps the best place to do this is in the package.json
when running npm start
. It would be nice if the server failed to start if the linter failed. Also, it could be useful if npm test
exercised jslint on src.
(node:36) DeprecationWarning: loaderUtils.parseQuery() received a non-string value which can be problematic, see webpack/loader-utils#56
webpack_1 | parseQuery() will be replaced with getOptions() in the next major version of loader-utils.
too much padding on right in NewLocation
It could be useful to see the number of times a user tweets per hour or day by taking their total tweets and dividing by the time that has passed since the account was created. It is a useful metric for determining when an account is a bot, because tweeting more than 30 times an hour on average is a pretty good indicator that there is automation involved, which sometimes means it's a bot or a cyborg.
The profile page includes rudimentary support for adding location ids. It would be useful to be able to remove locations as well.
Twitter allows us to link our app to a Terms of Service. I think it's important for us to clearly state what the terms of service are for any app we put in front of the public.
Both the the privacy policy #47 and the terms of service will be useful if we want to use the Instagram API.
ElasticSearch v6 is the latest version of ElasticSearch. They have broke compatibility with v5 by not allowing more than one type of document in a single index. server/db will need to be significantly refactored to accommodate this.
The cookie that encrypts the user id for the logged in user is hard coded at the moment and needs to be generated on first start up, or something else. Otherwise anyone can spoof DocNow cookies.
Currently you need to know the WOEID for a place to enter it which isn't workable long term:
There is a db method for loading all the places into redis which currently runs every time the server is started. So it should be possible to create an API endpoint that returns all the possible ones. Perhaps the component could autocomplete them?
Maybe the existing places endpoint could take a query parameter ?all=true to return all places instead of only the ones the user has selected to follow?
It should be possible to represent the hashtag counts on the explore page as a simple bar chart, preferably using d3.
The sample search should default to 1000 instead of 100 tweets. There also should be a button to request more tweets be retrieved. Since it will take about 10 seconds to retrieve the tweets some of the top users and top hashtags cards will likely shift and update.
Promises are easier to read than callbacks.
When starting up the trends process will emit errors until it is able to get Twitter credentials of the admin user. These are non-fatal but are disconcerting if you are watching the docker logs, and don't otherwise know that it's ok.
The trends process should just quietly keep trying until it gets the keys it needs.
The DocNow application needs to regularly fetch trends for up to three locations per user. The user's application keys are used to fetch the trends using the trends/place API call. Each API keypair can request trends up to 75 times per 15 minutes (1 request every 12 seconds).
If we assume that we only need a granularity of 1 minute per trending location we can go through each location, see if it has been updated in the last minute, if it hasn't get/set the trends and move on to the next location.
I might be wrong but I think that this can be handled in the docnow app itself (using setTimeout) rather than with a separate process that will need to be managed.
Here's a sketch in pseudo-code of what it could possibly look like:
granularity = 1 minute
while True:
waitTime = 0
timeStart = now
foreach location in getLocations():
if timeStart - location.lastUpdate < granularity:
if timeStart - location.lastUpdate > waitTime:
waitTime = timeStart - location.lastUpdate
continue
keypair = getKeys(location)
trends = getTrends(location, keypair)
saveTrends(trends)
sleep(waitTime)
The intro paragraph style is not working as it is coded in profile.js as a div. correct syntax can be founds in trends.js but there are if/then statements there. did not want to attempt myself and break :/
FontAwesome is currently being loaded from the Web and really should be part of the application bundle.
Now that ElasticSearch is being used for storing tweets and Twitter users I think it makes sense to move our User, Trends, Places and Settings models from Redis to ElasticSearch. JSON stores more naturally in ElasticSearch without having to manually maintain id indexes. Redis will still be useful for caching and queuing work, so I think it makes sense to not remove Redis support completely.
The design page for the homepage has an Add Location input when the user is logged in. Will it be possible to reuse the component from the Settings page on the homepage?
If it's easy to do as part of this issue it would be good to have the default locations displaying when a user is not logged in.
It would be useful to know what the options are for managing a Docker swarm from Node. I'm wondering if a data collections processes (twarc?) could be managed inside a Docker container.
We're looking at creating a set of design components similar to Material Design so that we can drop snippets of HTML/CSS into our various apps, but mostly focused on the DocNow app. Since some might get used in other projects we should put these in a new repository, docnow-design?
To get started let's see if we can create some components required on the settings page.
The twitter.getTrendsAtPlace() function occasionally throws this error, I think it's happening when the quota for requesting trends has been exceeded. The function should check for this error and back off.
Unhandled rejection TypeError: Cannot read property 'locations' of undefined
at /Users/ed/Projects/docnow/src/server/twitter.js:50:19
at runCallback (timers.js:781:20)
at tryOnImmediate (timers.js:743:5)
at processImmediate [as _immediateCallback] (timers.js:714:5)
From previous event:
at /Users/ed/Projects/docnow/src/server/twitter.js:48:12
at Promise (<anonymous>)
at Twitter.getTrendsAtPlace (/Users/ed/Projects/docnow/src/server/twitter.js:45:12)
at Array.map (<anonymous>)
at /Users/ed/Projects/docnow/src/server/db.js:170:43
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
Unhandled rejection TypeError: Cannot read property 'locations' of undefined
at /Users/ed/Projects/docnow/src/server/twitter.js:50:19
at runCallback (timers.js:781:20)
at tryOnImmediate (timers.js:743:5)
at processImmediate [as _immediateCallback] (timers.js:714:5)
From previous event:
at /Users/ed/Projects/docnow/src/server/twitter.js:48:12
at Promise (<anonymous>)
at Twitter.getTrendsAtPlace (/Users/ed/Projects/docnow/src/server/twitter.js:45:12)
at Array.map (<anonymous>)
at /Users/ed/Projects/docnow/src/server/db.js:170:43
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
Unhandled rejection TypeError: Cannot read property 'locations' of undefined
at /Users/ed/Projects/docnow/src/server/twitter.js:50:19
at runCallback (timers.js:781:20)
at tryOnImmediate (timers.js:743:5)
at processImmediate [as _immediateCallback] (timers.js:714:5)
From previous event:
at /Users/ed/Projects/docnow/src/server/twitter.js:48:12
at Promise (<anonymous>)
at Twitter.getTrendsAtPlace (/Users/ed/Projects/docnow/src/server/twitter.js:45:12)
at Array.map (<anonymous>)
at /Users/ed/Projects/docnow/src/server/db.js:170:43
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
When one launches the app from AWS using our ansible repo you are met with this error when trying to set up the app
Error: Desktop applications only support the oauth_callback value 'oob'
at Strategy.parseErrorResponse (/code/node_modules/passport-twitter/lib/strategy.js:206:12)
at Strategy.OAuthStrategy._createOAuthError (/code/node_modules/passport-oauth1/lib/strategy.js:393:16)
at /code/node_modules/passport-oauth1/lib/strategy.js:244:41
at /code/node_modules/oauth/lib/oauth.js:543:17
at passBackControl (/code/node_modules/oauth/lib/oauth.js:397:13)
at IncomingMessage.<anonymous> (/code/node_modules/oauth/lib/oauth.js:409:9)
at emitNone (events.js:110:20)
at IncomingMessage.emit (events.js:207:7)
at endReadableNT (_stream_readable.js:1045:12)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
The steps to reproduce are
Clone https://github.com/DocNow/docnow-ansible
And run the playbook as described then enter the Twitter API key/secret.
That path in the error above isn't any "obvious" I can find.
After introducing https:// for the demo.docnow.io application authentication via twitter breaks with an internal server error.
search results page should show number of favorites/replies on RTs
Search should be renamed to Explore. This should happen at least in the UI, and perhaps (for sanity) in the client/server side code.
We need to bring the app's profile page more in line with the original design. The idea is to add a profile component to the docnow-design repo, and then we'll move it over to docnow/docnow.
We should be running demo.docnow.io under SSL/https.
It would be useful if the server pushed updates to the client when things change, rather than having the client poll for changes. I think this will be more important as we start building the search dashboard.
I ran across a pattern for using websockets with redux that looked quite simple & elegant. I'm going to try to get it working on the trends page.
It would be useful to have Travis run the tests. The Twitter keys used during testing will need to be encrypted.
docker-compose up
throws a lot of errors, and ends up not serving. It appears that the db server failed to start.
docker-compose 1.8.1 on Ubuntu 16.04
...
webpack_1 | npm info lifecycle [email protected]~postclean: [email protected]
webpack_1 | npm info ok
webpack_1 | Project is running at http://0.0.0.0:3000/
webpack_1 | webpack output is served from http://0.0.0.0:3000/assets/bundles/
webpack_1 | (node:38) DeprecationWarning: loaderUtils.parseQuery() received a non-string value which can be problematic, see https://github.com/webpack/loader-utils/issues/56
webpack_1 | parseQuery() will be replaced with getOptions() in the next major version of loader-utils.
db_1 | creating configuration files ... ok
django_1 | Traceback (most recent call last):
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
django_1 | self.connect()
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 171, in connect
django_1 | self.connection = self.get_new_connection(conn_params)
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection
django_1 | connection = Database.connect(**conn_params)
django_1 | File "/usr/local/lib/python3.6/site-packages/psycopg2/__init__.py", line 130, in connect
django_1 | conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django_1 | psycopg2.OperationalError: could not connect to server: Connection refused
django_1 | Is the server running on host "db" (172.18.0.5) and accepting
django_1 | TCP/IP connections on port 5432?
django_1 |
django_1 |
django_1 | The above exception was the direct cause of the following exception:
django_1 |
django_1 | Traceback (most recent call last):
django_1 | File "/usr/local/bin/django-admin", line 11, in <module>
django_1 | sys.exit(execute_from_command_line())
django_1 | File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
django_1 | utility.execute()
django_1 | File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 359, in execute
django_1 | self.fetch_command(subcommand).run_from_argv(self.argv)
django_1 | File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 294, in run_from_argv
django_1 | self.execute(*args, **cmd_options)
django_1 | File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 345, in execute
django_1 | output = self.handle(*args, **options)
django_1 | File "/usr/local/lib/python3.6/site-packages/django/core/management/commands/migrate.py", line 83, in handle
django_1 | executor = MigrationExecutor(connection, self.migration_progress_callback)
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/migrations/executor.py", line 20, in __init__
django_1 | self.loader = MigrationLoader(self.connection)
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/migrations/loader.py", line 52, in __init__
django_1 | self.build_graph()
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/migrations/loader.py", line 203, in build_graph
django_1 | self.applied_migrations = recorder.applied_migrations()
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/migrations/recorder.py", line 65, in applied_migrations
django_1 | self.ensure_schema()
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/migrations/recorder.py", line 52, in ensure_schema
django_1 | if self.Migration._meta.db_table in self.connection.introspection.table_names(self.connection.cursor()):
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 231, in cursor
django_1 | cursor = self.make_debug_cursor(self._cursor())
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 204, in _cursor
django_1 | self.ensure_connection()
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
django_1 | self.connect()
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/utils.py", line 94, in __exit__
django_1 | six.reraise(dj_exc_type, dj_exc_value, traceback)
django_1 | File "/usr/local/lib/python3.6/site-packages/django/utils/six.py", line 685, in reraise
django_1 | raise value.with_traceback(tb)
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
django_1 | self.connect()
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 171, in connect
django_1 | self.connection = self.get_new_connection(conn_params)
django_1 | File "/usr/local/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection
django_1 | connection = Database.connect(**conn_params)
django_1 | File "/usr/local/lib/python3.6/site-packages/psycopg2/__init__.py", line 130, in connect
django_1 | conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django_1 | django.db.utils.OperationalError: could not connect to server: Connection refused
django_1 | Is the server running on host "db" (172.18.0.5) and accepting
django_1 | TCP/IP connections on port 5432?
django_1 |
db_1 | running bootstrap script ... ok
db_1 | performing post-bootstrap initialization ... ok
trends_1 | ERROR:root:could not connect to server: Connection refused
trends_1 | Is the server running on host "db" (172.18.0.5) and accepting
trends_1 | TCP/IP connections on port 5432?
trends_1 |
db_1 | syncing data to disk ... ok
db_1 |
db_1 | Success. You can now start the database server using:
db_1 |
db_1 | pg_ctl -D /var/lib/postgresql/data -l logfile start
db_1 |
db_1 |
db_1 | WARNING: enabling "trust" authentication for local connections
db_1 | You can change this by editing pg_hba.conf or using the option -A, or
db_1 | --auth-local and --auth-host, the next time you run initdb.
db_1 | ****************************************************
db_1 | WARNING: No password has been set for the database.
db_1 | This will allow anyone with access to the
db_1 | Postgres port to access your database. In
db_1 | Docker's default configuration, this is
db_1 | effectively any other container on the same
db_1 | system.
db_1 |
db_1 | Use "-e POSTGRES_PASSWORD=password" to set
db_1 | it in "docker run".
db_1 | ****************************************************
db_1 | waiting for server to start....LOG: database system was shut down at 2017-03-24 18:16:36 UTC
db_1 | LOG: MultiXact member wraparound protections are now enabled
db_1 | LOG: database system is ready to accept connections
db_1 | LOG: autovacuum launcher started
db_1 | done
db_1 | server started
db_1 | ALTER ROLE
db_1 |
db_1 |
db_1 | /docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
db_1 |
db_1 | LOG: received fast shutdown request
db_1 | LOG: aborting any active transactions
db_1 | waiting for server to shut down...LOG: autovacuum launcher shutting down
db_1 | .LOG: shutting down
db_1 | LOG: database system is shut down
db_1 | done
db_1 | server stopped
db_1 |
db_1 | PostgreSQL init process complete; ready for start up.
db_1 |
db_1 | LOG: database system was shut down at 2017-03-24 18:16:43 UTC
db_1 | LOG: MultiXact member wraparound protections are now enabled
db_1 | LOG: database system is ready to accept connections
db_1 | LOG: autovacuum launcher started
db_1 | ERROR: relation "docnow_user" does not exist at character 302
db_1 | STATEMENT: SELECT "docnow_user"."id", "docnow_user"."password", "docnow_user"."last_login", "docnow_user"."is_superuser", "docnow_user"."username", "docnow_user"."first_name", "docnow_user"."last_name", "docnow_user"."email", "docnow_user"."is_staff", "docnow_user"."is_active", "docnow_user"."date_joined" FROM "docnow_user" WHERE "docnow_user"."is_superuser" = true
trends_1 | ERROR:root:relation "docnow_user" does not exist
trends_1 | LINE 1: ...er"."is_active", "docnow_user"."date_joined" FROM "docnow_us...
trends_1 | ^
trends_1 |
(the "relation docnow_user does not exist" error repeats many many times)
I think it would be useful to add a logger to the server side code, so that we can keep track of asynchronous things like fetching trends without console.log and the resulting warnings from eslint.
It would also be nice to see the requests to the server. I've seen a bunch of projects using morgan but I have no idea what it does really.
Twitter allows us to link our app to a privacy policy. I think it's important for us to have one before inviting people to use our app.
I think a good starting place would be modeling a policy based on what the Lentil folks did for getting access to the Instagram API.
The current settings should be adapted to look and function more like @alexandradm's mockups.
I think that the user management piece can wait till a later iteration but for now it should allow you to do a sequence of things:
3-5 are the most important things to do for now. And they can only be done in sequence: you can't set locations until the user has logged in and you have their API access keys, and they can't login until the application keys have been set. So it's actually kind of complicated :-)
When logging in for the first time you get redirected to Twitter and back OK. But the header component does not show you as being logged in. Reloading has no effect. Clicking login again works. I think there must be a problem with the way the initial login works when a new user has arrived at the application.
The console of the server displays this, which hints at the problem:
Error: Can't set headers after they are sent.
at validateHeader (_http_outgoing.js:504:11)
at ServerResponse.setHeader (_http_outgoing.js:511:3)
at ServerResponse.header (/Users/ed/Projects/docnow/node_modules/express/lib/response.js:730:10)
at ServerResponse.location (/Users/ed/Projects/docnow/node_modules/express/lib/response.js:847:15)
at ServerResponse.redirect (/Users/ed/Projects/docnow/node_modules/express/lib/response.js:885:18)
at /Users/ed/Projects/docnow/src/server/auth.js:64:9
at Layer.handle [as handle_request] (/Users/ed/Projects/docnow/node_modules/express/lib/router/layer.js:95:5)
at next (/Users/ed/Projects/docnow/node_modules/express/lib/router/route.js:137:13)
at complete (/Users/ed/Projects/docnow/node_modules/passport/lib/middleware/authenticate.js:250:13)
at /Users/ed/Projects/docnow/node_modules/passport/lib/middleware/authenticate.js:257:15
at pass (/Users/ed/Projects/docnow/node_modules/passport/lib/authenticator.js:421:14)
at Authenticator.transformAuthInfo (/Users/ed/Projects/docnow/node_modules/passport/lib/authenticator.js:443:5)
at /Users/ed/Projects/docnow/node_modules/passport/lib/middleware/authenticate.js:254:22
at /Users/ed/Projects/docnow/node_modules/passport/lib/http/request.js:60:7
at pass (/Users/ed/Projects/docnow/node_modules/passport/lib/authenticator.js:267:43)
at serialized (/Users/ed/Projects/docnow/node_modules/passport/lib/authenticator.js:276:7)
There is a branch for adding an ansible deployment for diffengine. I think we settled on using a separate repository diffengine_ansible instead. Let's create the new repository diffengine_ansible and extract the code there.
It would be useful if tweets with videos allowed playback when they are embedded using Twitter's embed JavaScript library. This would allow users to view the content in the DocNow app without having to leave and go somewhere else.
The media card on the search sample should be split into video and photos and should show some things!
django_1 | [10/Apr/2017 16:20:01] "GET /favicon.ico HTTP/1.1" 404 2690
django_1 | [10/Apr/2017 16:20:06] "GET /api/v1/trends HTTP/1.1" 200 3185
django_1 | [10/Apr/2017 16:20:11] "GET /api/v1/trends HTTP/1.1" 200 3185
django_1 | [10/Apr/2017 16:20:16] "GET /api/v1/trends HTTP/1.1" 200 3185
django_1 | [10/Apr/2017 16:20:21] "GET /api/v1/trends HTTP/1.1" 200 3185
django_1 | [10/Apr/2017 16:20:26] "GET /api/v1/trends HTTP/1.1" 200 3185
django_1 | [10/Apr/2017 16:20:31] "GET /api/v1/trends HTTP/1.1" 200 3185
trends_1 | ERROR:root:list index out of range
happens repeatedly, cycling through build
Now that there is demo.docnow.io It would be useful to have an ansible setup that will deploy the docnow app on AWS. At the moment this will include:
If it makes sense I think it could be useful to try to install nginx and docnow/docnow as Docker containers using AWS Elastic Container Service. But I'm definitely open to other ways of doing it.
One thing to bear in mind is that we'll probably be adding AWS ElasticSearch to the mix sometime soon, but not yet.
One thing I noticed is that for the docnow app to work behind nginx as a proxy you need to setup the proxy like this:
location / {
proxy_pass http://127.0.0.1:3000;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host:$server_port;
}
Otherwise the Twitter authentication redirects to localhost instead of the hostname of the proxy server.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.