GithubHelp home page GithubHelp logo

gchq / gaffer-docker Goto Github PK

View Code? Open in Web Editor NEW
30.0 19.0 36.0 2.87 MB

Gaffer Docker images and associated Helm charts for deploying on Kubernetes

Home Page: https://gchq.github.io/gaffer-docker

License: Apache License 2.0

Shell 26.85% Dockerfile 11.71% Java 2.52% Mustache 3.76% JavaScript 19.23% Pug 22.18% Jupyter Notebook 8.41% Python 4.64% Smarty 0.70%
gaffer hdfs accumulo docker helm

gaffer-docker's People

Contributors

ahmetkaftan avatar b956022 avatar c95560 avatar cn337131 avatar ctas582 avatar cybermaggedon avatar d47853 avatar gchqdeveloper314 avatar github-actions[bot] avatar l46978 avatar lb324567 avatar m29827 avatar m316257 avatar p013570 avatar p29876 avatar p3430233 avatar r32575 avatar rocky341 avatar t616178 avatar t92549 avatar tb06904 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gaffer-docker's Issues

Add autoscaling to Helm charts

Gaffer should scale up when hdfs starts filling up and down when it becomes under-utilised to allow. A nice to have feature would be to be able to do this on a schedule as well - so Gaffer can in effect shut down at night to reduce resources and costs when deployed on a Kubernetes cluster.

Rename Gaffer Wildfly to Gaffer REST

Rename Gaffer Wildfly to Gaffer REST so that in the future, if we need to, we can switch out the web container.

All references to "gaffer-wildfly" should be updated to "gaffer-rest"

Add copyright headers to everything

Add Crown Copyright to everything and add a script so that we can update it every year. There's examples of this in other Gaffer projects

Allow users to add their own Operations to their Gaffer deployment

Users often want to add their own operations (or even existing operations that haven't been enabled by default). To do this, the operations would have to be added to the classpath of the REST service (if the operations are non-standard). They would also need to be configured into the Gaffer instance by using an operation declarations file.

Improve version update script to exclude dependency versions

The version update script (./cd/updateAppVersion.sh) performs a find and replace on the Helm charts and app_version file. This is fine while we're on 0.x versions and the dependencies are all on versions greater than 0.x, however when the project matures and a version 1 is released there could be issues with conflicting versions (such as an app version getting updated accidentally). Improve the script's find and replace function so that only the versions that we want to update in the script are updated.

One idea is to use a comment on the lines that we want to use the script for:

...
version: 0.5.0 # managed version
dependencies:
- name: zookeeper
  version: 2.1.1

Then only update the lines in Charts which have the # managed version comment

Provision Gaffer instance with non-root users

The Accumulo instance deployed by the Gaffer Helm Chart currently only provisions with a root user. Update the Gaffer Helm Chart to allow additional Gaffer/Accumulo users to be created after the Accumulo instance has been initialised.

Allow users to add their own custom functions to deployment

Graph owners may want to add their own custom functions to their Gaffer instance. These should be added to the Classpath on the web app and also to the iterators which accumulo has access to (in the case of functions/predicates/binary operators).

Road Traffic Example

Develop additional containers, docker-compose and Helm chart that deploy the example Road Traffic Gaffer graph

Gaffer-as-a-Service on AWS

Develop a CloudFormation template that deploys the Gaffer Helm Chart onto an EKS cluster.

Wrap this in another CloudFormation template which spins up an EKS cluster and registers the above template with Service Catalog.

Users would then be able to spin up new Gaffer graphs by 'ordering' one from the Service Catalog and wouldn't have to worry about spinning up an EKS cluster and configuring it.

Update Gaffer version to 1.12.0

Update Gaffer version to 1.12.0. We're currently on 1.11.0 so a search and replace should do the trick. At the same time, update the app version to 0.6.0 by running:./cd/updateAppVersion.sh 0.6.0

Schema Migration

Add a post-upgrade Helm Chart hook to upgrade schemas and iterators

Add a system diagram

Add a system diagram to show how all the containers interact on a Kubernetes cluster.

Refactor Docker images

Split Docker images up into the following constituent parts:

  • HDFS
  • Accumulo
  • Gaffer Rest service

By making the Docker images more modular, we make it easier to deploy to a production environment.

Add a guide for loading in data

A common need will be for users to load in their own data. Add a guide detailing how to do this including:

  • Loading via REST
  • Loading via HDFS
  • Loading via accumulo import

The guide should cover how to do this on Kubernetes, as well as locally (using docker compose)

Decouple root user password from trace user

The Accumulo container and Gaffer Helm Chart currently assume that the root user is also being used for the trace user, and sets the root user password to be the same as the trace user's password. This might not be the case.

Allow the root user password to be specified by:

  • the trace.token.property.password property in accumulo-site.xml
  • another property in accumulo-site.xml
  • environment variable

Add some documentation around the various passwords and user setup

When users get hold of the Gaffer Helm chart, the first thing they do is try and deploy it. It breaks. Why? Because they haven't setup the users and passwords for Accumulo. This can be quite confusing and it would be handy to have just a line or two saying what they need to do to fix it.

Reduce HDFS container size

The HDFS container image is 1.02GB (uncompressed).

Looking at the layers it appears that the Hadoop files are being committed into the image twice. Once as the distribution tarball and then again once they have been extracted. Although we delete the tarball in the RUN instruction that extracts the files, the tarball has already been committed to the image by the previous COPY instruction. Try moving the extraction into the builder stage and COPY the extracted directory instead.

Remove workarounds for GAFFER-2262

In Gaffer v1.12.0, Gaffer released a bugfix which allowed schema directories to be recursive. There are a number of workarounds for this in the Kubernetes helm charts. These workarounds can be removed now as Gaffer version is now set to 1.12.0.

Add Docker version to README.md

This project should work for most versions of Docker on most operating systems. However it is possible that certain versions are incompatible.

We should add the Docker versions that we as developers use or know to work to a file. This could be done directly in the top level README or in a separate file and linked from the README.

Support Ingresses using a path prefix

The Gaffer Helm Chart deploys, and exposes, a number of web UIs:

  • HDFS Namenode
  • Accumulo Monitor
  • Gaffer REST API
  • Gaffer UI

Unfortunately, none of these work if they are exposed by an Ingress configured to route to them using a path prefix. Work out what is required to make this possible.

The HDFS Namenode UI will work if Hadoop is built using this patch: https://github.com/gchq/gaffer-docker/blob/e5597da9e7d7c4300b4e1e6fe1207a2c0daff1a0/docker/hdfs/files/patches/3.2.1/enable-urls-with-path-prefix.patch

Schema Validation

Add a hook to the Gaffer Helm Chart that validates the schema before deployment. Could also validate that it is possible to migrate to new schemas when there is an upgrade.

Validation shouldn't just check that the schema can be deserialised, but it should check that any classes that the schema refers to are available on the class path.

REST API can't run custom Operations and Functions

Sometimes Graph owners have custom operations or operations from different libraries that they want to run through their REST API. In order to add these operations / functions, the jar files will need to be added to the classpath of the REST service, as well as the Accumulo lib/ext directory.

Operations have a further complication in that they need to be added to the REST api using an operation declarations file. See the bottom of this page for an example

AddElementsFromHDFS does not work using Hadoop 3.2.1

Using the latest version of hadoop causes MethodNotFound errors when using AddElementsFromHDFS. This is because Gaffer is still using and older version of Hadoop (2.6.5).

To address this, revert the hadoop version to one that's compatible with Gaffer. Add a test to make sure AddElementsFromHDFS works.

Docker run commands

Please can I ask that you update the Gaffer document to provide basic instructions for how a new user can launch gaffer in a docker environment.

When I run Gaffer using:

docker run -d -it -p 8080:8080 --name gaffer gchq/gaffer

The container auto stops. I assume there are some dependent packages or containers I need to run and link in first. However, I do not know what these are.

Add code for releasing to Dockerhub

Add code into cd/deploy.sh for releasing to Dockerhub / updating version control from Travis CI when code is merged into master

When code is merged into master, Travis CI should:

  • Tag the release in Git
  • Push images to dockerhub
  • Update the version on develop

Gaffer Docker upload failing

The Gaffer Docker chart upload is not working. This is due to the URL being incorrect. Specifically it doesn't use a release id like it should (referenced here). You can see all the release assets which are created by making a request to the Github API. The deploy script needs to get the id generated when a release is created and use that id when creating assets.

Add a guide for deploying on AWS

There exists a guide for deploying Gaffer onto a local Kind deployment but it would be really useful to have one for deploying onto EKS.

Clarify where scripts should be run from

Some scripts in the docs won't work if you run them from the top level directory. Add a line at the top of these docs informing the user of where the scripts should be run from.

Integrate gaffer-docker into Gaffer release process

When a Gaffer release is published, the gaffer-docker image should also be updated to match the new Gaffer version. The updated image should also be pushed up to Docker Hub and tagged with the 'latest' tag.

Add a guide for deploying with your own schema

A pretty common use case will be to stand up a Gaffer graph with a custom schema. However there is no guides anywhere for how to do that.

Add a guide for deploying a custom schema locally (using docker-compose) and on Kubernetes.

Fix Bad Request when uploading Github Release

As part of the release, we automatically update Github with the issues that has been resolved as part of the release. However this is currently not working. It looks as though the \n characters were replaced with real newline characters which looks to be the cause.

Incorporate HDFS patch into docker image

At present, the Continuous integration would fail if dfs.namenode.datanode.registration.ip-hostname-check is set to true in hdfs-site.xml.

This is a workaround for a bug in HDFS around DNS described in this issue

To fix it, ideally HDFS would adopt this patch. However the patch is over a year old and therefore unlikely to be merged.

Apply this patch to our own version of the hdfs image or find another way to ensure the namenode does not reject the datanodes.

Identify suitable default memory settings

Memory limits need to be configured for:

  • containers in Accumulo Pod specifications
  • the JVM used for each Accumulo component
  • Accumulo's Native Map feature

Identify suitable minimum configuration and update all config files as appropriate.

Rather than requiring users to manually set all memory settings, can we automatically calculate sensible values e.g. Tablet Servers configured to have 2GB - Kubernetes Pod specs set to request and limit memory usage to 2GB, JVM configured to 1GB, native maps configured to 1GB.

Add documentation to the Docker images

Add documentation to each of the README.md files in the Docker images directories explaining what each one does and how to use it.

Try to use a consistent layout for each of them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.