GithubHelp home page GithubHelp logo

catalystcode / project-fortis-pipeline Goto Github PK

View Code? Open in Web Editor NEW
13.0 19.0 9.0 32.11 MB

Project Fortis is a data ingestion, analysis and visualization pipeline.

License: Apache License 2.0

Shell 96.71% Python 3.29%
fortis acs-containers spark-streaming cassandra kubernetes-setup azure deis

project-fortis-pipeline's Introduction

This repository is outdated and was migrated to project-fortis.




Travis CI status

deploy fortis pipeline

Deploy your own Fortis pipeline to an azure subscription through a single click.

Deploy to Azure fortisadminoverview

Pipeline Architecture

A fully containerized realtime spark pipeline powered off Kubernetes. fortis_overview

Background

Project Fortis is a data ingestion, analysis and visualization pipeline. The Fortis pipeline collects social media conversations and postings from the public web and darknet data sources.

Related Repositories

Documentation

Demo Videos

image

Deployment Prerequisites

  • First and foremost, you'll need an existing azure subscription. You can create one for free here.
  • Generate a Public / Private ssh key pair following these instructions. The contents from the generated MyKey.pub file will be used for the SSh Public Key field.
  • You'll need an existing azure service principal. You can follow these instructions if you need to generate a new service principal. Your service principles Application ID will be used for the Service Principal App ID field, and the Authentication Key will be used for the Service Principal App Key.

Fortis Monitored Data Sources

  • Public Web - Bing
  • Reddit
  • Twitter
  • Facebook
  • Instagram
  • Radio Broadcasts
  • ACLED

Site Types

  • The site type selection drives which default topics, public sites and facebook pages are auto-generated for your site as part of the deployment process.
  • Available site types
    • Humanitarian
    • Climate Change
    • Health

Post Deployment Instructions

  • Grab a large cup of coffee as the deployment can take north of an hour to complete.

  • Once the deployment has finished, click on the Manage your resources (highlighted below). Screenshot of ARM template after successful deployment with highlight of management link to access the newly created resource group

  • Select the Tags tab in the Azure Portal (highlighted below), point your browser to the site at the FORTIS_ADMIN_INTERFACE_URL (also highlighted below). Screenshot of Azure portal with highlight of the Fortis admin site URL accessed via Azure Portal tags

  • In the Fortis admin portal, you can now finalize the setup of your Fortis deployment; once you've completed all the admin configuration, your deployment is ready to be used. Screenshot showing the Fortis admin interface

project-fortis-pipeline's People

Contributors

c-w avatar erikschlegel avatar jcjimenez avatar kevinhartman avatar nathanielrose avatar smarker avatar xtophs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

project-fortis-pipeline's Issues

No PowerBI export

Customers are asking for export to PowerBI to build custom dashboards or to combine fortis data with other data sources.

Can't reset map to configured geofence

After zooming into the map, the only way to restore to the initial view is to refresh the page. Refreshing the page loses all other filter settings as well.

Would be great to have a way to reset the map to the configured area.

Admin site does not display configured event sources

Fortis V1 displayed lists of configured facebook and twitter accounts in great detail.

V2 only displays "Twitter" and "Facebook". No details about which accounts are configured on the admin page.

Makes it very hard to verify and troubleshoot for users that don't understand how to query the configuration from the database.

Improve post-deployment site configuration experience

First run experience must be performable by a novice user without knowledge of low level packages like Cassandra.

Some ideas:

  • Wizard driven config for sources
  • Step-by-stp wizard for all required config steps (search terms, geo fence, sources)
  • import cloned config and customize from existing config #178
  • Search based definition of geo fence, i.e. type location name and system will look up geo fence CatalystCode/project-fortis@cfe7afb
  • shared config library, i.e. new sites present a library of configs made available by other users #178
  • curated configs made available from Microsoft or crowd sourcing #178

Deployment Tool does not display site URL or admin URL

Per docs at https://github.com/CatalystCode/project-fortis-pipeline#post-deployment-instructions user is expected to go into the Azure portal, navigate to non-intuitive fields resourcegroup->deployments->[deploymentname] and look up a field called FORTISSITEADMINURL.

  1. in most of my attempts, this field is empty
  2. step to retrieve the URL is hard to perform even for an IT admin familiar with Azure. Would be great to display that URL on the page that launched the deployment once the deployment finished successfully.

Site doesn't come up in IE

Site sits there with loading message. No errors in F12

If IE is not supported, then we should display an "unsupported browser" page.

image

No progress indicator or error messages when page starts

Page sits at "loading [sitename] watcher..." providing no indication that things are happening. It's confusing over slower connections because the user doesn't know that the site is making progress loading.
The page also does not display an error when something goes wrong and the page does not load, leaving the user wondering if loading every completes.

Setup only allows a single geofence to filter events

Customers may want finer grained configuration options, beyond a single rectangle to define the desired origin of events. Combined with lack of cloning a site, configuring a site for non-contiguous geographies is too hard.

Missing Admin Audit capabilities

There does not seem to be a capability to see audit logs of modifications.

Would be great for debugging and possible legal reasons if user account and originating IP address for config changes were recorded and audit logs were available.

Can't reset UI after changing timeline filter

Repro steps:

navigate to site.
change timeline filter to narrow down time filter

Clicking on Reset does not restore timeline to setting selected by the dropdown in the upper right corner.

Suggestion:

  • Remove Reset Button,
  • Rename to reflect what it does
  • Move interval dropdown and timeline widget closer together since the former controls scope of the latter.

Credentials displayed in clear text

Credentials to Microsoft and 3rd party Services (facebook, twitter) are displayed in full in clear text on the admin site

To protect credential leaks, applications obfuscate tokens and credentials in the UI, some times displaying partial credentials, some don't display them at all.
(examples: GitHub, Azure Portal, Amazon, Google Cloud)

Timeline widget uses weird start dates

Timeline chart used Sat - Sat range for "Last Week" with seemingly arbitrary start and end times.
The first day of week should come from the users first day of week setting.
The weird start time may be due to timezone conversions.

Term Filters: language inconsistent

  • open site
  • change language to en
  • type gunfire in filter list and wait for UI to change

Top 5 terms still has some, but not all Spanish items:

image

VM SKU filter in ARM template is not consistent with nodeSelector in helm chart

ARM template allows all SKUs with local SSD, i.e. the full range of DS, L, GS, etc. VM sizes. CassaHelm chart only deploys Cassandra on a single VM SKU (L4s), see https://github.com/erikschlegel/charts/blob/spark-localssd/incubator/cassandra/templates/statefulset.yaml#L30 and https://github.com/erikschlegel/charts/blob/spark-localssd/incubator/cassandra/values.yaml#L10.

In clusters with VM size any other than L4S, the Azure deployment will finish successfully, i.e. the user thinks everthing is up and running, but fortis is not deployed because kubernetes is waiting with the Cassandra install until L4s nodes join the cluster.

Possible remedies:

  • remove the nodeSelector for Cassandra from the helm chart
  • restrict ARM template to L4S deployment only

Also, consider a Custom Script Extension as part of the template that will only finish successfully when fortis is up and running to avoid customer confusion when deployment reports success, but the deployment is up and running.

Item Pop up does not display images

Often, posts and articles include a picture that's very important to understand the item.

The image is not displayed. Neither is the link to the image clickable. A user has to copy the link, open another browser window and then paste the link to fully understand what an item was about.

Local Environment Health-Check Scripts & ReadMe

Create a health-check script for Windows & Linux that verifies all correct Fortis dependencies are properly installed.

Script should include the following criteria

  • 1. java & javac health check for v1.8 or greater and OpenJDK version.
  • 2. mvn check for v3.0 or greater
  • 3. node & npm check for v5.0 and v6.0 respectively.
  • 4. scala check for v2.0 or greater.
  • 5. sbt check for v 0.13 or greater.
  • 6. cassandra check for v3.0 or greater
  • 7. spark path check and verify v2.7 or greater.
  • 8. kubectl check .
  • 9. helm validation.
  • 10. kube config validate it exists and non-empty.
  • 11. Redirect additional issues to Fortis Docs for guidance

Filters input fails intermittently

  • Pick a different filter in the filters input (rafaga, captura)
  • either hit enter or tab out

Sometimes it works and sometimes it doesn't

No easy way to clone a site.

There does not seem to be an easy way to retrieve config settings for a deployed fortis site.

Would be greatly helpful to retrieve site settings and deploy another site with the same settings as a starting point for a new site.

Example: UNDP may want to deploy sites with the same event stream configurations for different geographies. It appears that today every site needs to be set up from scratch or it requires data export / import from the underlying database, which an casual user can probably not perform.

Timeline pop up missing values

Popup only displays one value, even though there are 10 lines in the graph.

Might be something specific to the chart widget, but it's confusing to the user.

image

Pop-Ups are not searchable

  • Expand larger news item from the item feed (pick an RSS-based item).
  • Press CTRL-F to search on the page

Not being able to search make it very hard to find keywords in a longer text, especially since keywords are not highlighted in the item text.

Time not formatted according to local settings

  • Hover over timeline
  • Wait for pop-up

Time displayed is formatted in ISO format. End users are not very familiar with ISO formatted times. Would be better to read time display format from local settings to also account for customary display in all regions around the world.

image

Facebook links don't show facebook content

Clicking on the link in the item feed or on the Read Original link for facebook posts comes out at a facebook page that the site is not available.

Maybe get the content from the API instead of redirecting?

image

No translate button on the Event Details pop-up

The Translate button on the feed widget is great.

However, to understand the text of a larger event text, like RSS feed item, users expand the feed item. Unfortunately the pop up with the full item text does not allow for translating the text.

Suggestion: Color code Tags on item details

Similar experience looking at pages in Google or Bing cache where tags found in the text are highlighted in different colors.
You could color each tag (and even each location) with a different color and then highlight matched accordingly in the text

image

No support for sharing dashboard views.

Dashboards do not allow easy sharing of current filter settings with a simple Share button on the UI or via copy and paste the URL with all the filter settings.

Users are used to being able to share a particular view from apps like Google Maps, Amazon, GitHub, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.