gchq / gaffer-docker Goto Github PK
View Code? Open in Web Editor NEWGaffer Docker images and associated Helm charts for deploying on Kubernetes
Home Page: https://gchq.github.io/gaffer-docker
License: Apache License 2.0
Gaffer Docker images and associated Helm charts for deploying on Kubernetes
Home Page: https://gchq.github.io/gaffer-docker
License: Apache License 2.0
Gaffer should scale up when hdfs starts filling up and down when it becomes under-utilised to allow. A nice to have feature would be to be able to do this on a schedule as well - so Gaffer can in effect shut down at night to reduce resources and costs when deployed on a Kubernetes cluster.
Rename Gaffer Wildfly to Gaffer REST so that in the future, if we need to, we can switch out the web container.
All references to "gaffer-wildfly" should be updated to "gaffer-rest"
Add Crown Copyright to everything and add a script so that we can update it every year. There's examples of this in other Gaffer projects
Users often want to add their own operations (or even existing operations that haven't been enabled by default). To do this, the operations would have to be added to the classpath of the REST service (if the operations are non-standard). They would also need to be configured into the Gaffer instance by using an operation declarations file.
The version update script (./cd/updateAppVersion.sh) performs a find and replace on the Helm charts and app_version file. This is fine while we're on 0.x versions and the dependencies are all on versions greater than 0.x, however when the project matures and a version 1 is released there could be issues with conflicting versions (such as an app version getting updated accidentally). Improve the script's find and replace function so that only the versions that we want to update in the script are updated.
One idea is to use a comment on the lines that we want to use the script for:
...
version: 0.5.0 # managed version
dependencies:
- name: zookeeper
version: 2.1.1
Then only update the lines in Charts which have the # managed version
comment
The Accumulo instance deployed by the Gaffer Helm Chart currently only provisions with a root user. Update the Gaffer Helm Chart to allow additional Gaffer/Accumulo users to be created after the Accumulo instance has been initialised.
Update github URL's for the Kind nginx-ingress controller installation.
Fix AWS ECR login command
Graph owners may want to add their own custom functions to their Gaffer instance. These should be added to the Classpath on the web app and also to the iterators which accumulo has access to (in the case of functions/predicates/binary operators).
Develop additional containers, docker-compose and Helm chart that deploy the example Road Traffic Gaffer graph
When updating Github as part of a release, we make use of the Github API. To authenticate we use a Github Token. However the way in which we use the token is deprecated. See here for more detail: https://developer.github.com/changes/2020-02-10-deprecating-auth-through-query-param/
Develop a CloudFormation template that deploys the Gaffer Helm Chart onto an EKS cluster.
Wrap this in another CloudFormation template which spins up an EKS cluster and registers the above template with Service Catalog.
Users would then be able to spin up new Gaffer graphs by 'ordering' one from the Service Catalog and wouldn't have to worry about spinning up an EKS cluster and configuring it.
Update Gaffer version to 1.12.0. We're currently on 1.11.0 so a search and replace should do the trick. At the same time, update the app version to 0.6.0 by running:./cd/updateAppVersion.sh 0.6.0
Add a post-upgrade Helm Chart hook to upgrade schemas and iterators
Add a system diagram to show how all the containers interact on a Kubernetes cluster.
Split Docker images up into the following constituent parts:
By making the Docker images more modular, we make it easier to deploy to a production environment.
Noticed some of the docker-compose yaml files are building the same image multiple times for different services (same args, same context). This likely increases build times.
A common need will be for users to load in their own data. Add a guide detailing how to do this including:
The guide should cover how to do this on Kubernetes, as well as locally (using docker compose)
The Accumulo container and Gaffer Helm Chart currently assume that the root user is also being used for the trace user, and sets the root user password to be the same as the trace user's password. This might not be the case.
Allow the root user password to be specified by:
When users get hold of the Gaffer Helm chart, the first thing they do is try and deploy it. It breaks. Why? Because they haven't setup the users and passwords for Accumulo. This can be quite confusing and it would be handy to have just a line or two saying what they need to do to fix it.
Pull in all relevant docs from this project into the Gaffer Doc project.
The HDFS container image is 1.02GB (uncompressed).
Looking at the layers it appears that the Hadoop files are being committed into the image twice. Once as the distribution tarball and then again once they have been extracted. Although we delete the tarball in the RUN instruction that extracts the files, the tarball has already been committed to the image by the previous COPY instruction. Try moving the extraction into the builder
stage and COPY the extracted directory instead.
In Gaffer v1.12.0, Gaffer released a bugfix which allowed schema directories to be recursive. There are a number of workarounds for this in the Kubernetes helm charts. These workarounds can be removed now as Gaffer version is now set to 1.12.0.
This project should work for most versions of Docker on most operating systems. However it is possible that certain versions are incompatible.
We should add the Docker versions that we as developers use or know to work to a file. This could be done directly in the top level README or in a separate file and linked from the README.
The Gaffer Helm Chart deploys, and exposes, a number of web UIs:
Unfortunately, none of these work if they are exposed by an Ingress configured to route to them using a path prefix. Work out what is required to make this possible.
The HDFS Namenode UI will work if Hadoop is built using this patch: https://github.com/gchq/gaffer-docker/blob/e5597da9e7d7c4300b4e1e6fe1207a2c0daff1a0/docker/hdfs/files/patches/3.2.1/enable-urls-with-path-prefix.patch
Add a hook to the Gaffer Helm Chart that validates the schema before deployment. Could also validate that it is possible to migrate to new schemas when there is an upgrade.
Validation shouldn't just check that the schema can be deserialised, but it should check that any classes that the schema refers to are available on the class path.
Sometimes Graph owners have custom operations or operations from different libraries that they want to run through their REST API. In order to add these operations / functions, the jar files will need to be added to the classpath of the REST service, as well as the Accumulo lib/ext directory.
Operations have a further complication in that they need to be added to the REST api using an operation declarations file. See the bottom of this page for an example
Using the latest version of hadoop causes MethodNotFound errors when using AddElementsFromHDFS. This is because Gaffer is still using and older version of Hadoop (2.6.5).
To address this, revert the hadoop version to one that's compatible with Gaffer. Add a test to make sure AddElementsFromHDFS works.
Please can I ask that you update the Gaffer document to provide basic instructions for how a new user can launch gaffer in a docker environment.
When I run Gaffer using:
docker run -d -it -p 8080:8080 --name gaffer gchq/gaffer
The container auto stops. I assume there are some dependent packages or containers I need to run and link in first. However, I do not know what these are.
Add code into cd/deploy.sh for releasing to Dockerhub / updating version control from Travis CI when code is merged into master
When code is merged into master, Travis CI should:
The Gaffer Docker chart upload is not working. This is due to the URL being incorrect. Specifically it doesn't use a release id like it should (referenced here). You can see all the release assets which are created by making a request to the Github API. The deploy script needs to get the id generated when a release is created and use that id when creating assets.
Add Helm charts for Gaffer and HDFS so that users can deploy Gaffer on Kubernetes with ease.
There exists a guide for deploying Gaffer onto a local Kind deployment but it would be really useful to have one for deploying onto EKS.
Some scripts in the docs won't work if you run them from the top level directory. Add a line at the top of these docs informing the user of where the scripts should be run from.
As a part of the expected Java upgrade in Gaffer, the JDK references will need updating to use JDK 11 based images.
Related: gchq/Gaffer#2300
When a Gaffer release is published, the gaffer-docker image should also be updated to match the new Gaffer version. The updated image should also be pushed up to Docker Hub and tagged with the 'latest' tag.
A pretty common use case will be to stand up a Gaffer graph with a custom schema. However there is no guides anywhere for how to do that.
Add a guide for deploying a custom schema locally (using docker-compose) and on Kubernetes.
Create a ways of working / contributing.md file which will lay out issue / pull request templates as well as best practices to follow when developing this project.
As part of the release, we automatically update Github with the issues that has been resolved as part of the release. However this is currently not working. It looks as though the \n characters were replaced with real newline characters which looks to be the cause.
At present, the Continuous integration would fail if dfs.namenode.datanode.registration.ip-hostname-check is set to true in hdfs-site.xml.
This is a workaround for a bug in HDFS around DNS described in this issue
To fix it, ideally HDFS would adopt this patch. However the patch is over a year old and therefore unlikely to be merged.
Apply this patch to our own version of the hdfs image or find another way to ensure the namenode does not reject the datanodes.
The chart creation fails as the script uses different version for the chart. This is due to develop having it's version updated before the chart is generated.
Memory limits need to be configured for:
Identify suitable minimum configuration and update all config files as appropriate.
Rather than requiring users to manually set all memory settings, can we automatically calculate sensible values e.g. Tablet Servers configured to have 2GB - Kubernetes Pod specs set to request and limit memory usage to 2GB, JVM configured to 1GB, native maps configured to 1GB.
Add documentation to each of the README.md files in the Docker images directories explaining what each one does and how to use it.
Try to use a consistent layout for each of them.
Create a separate Helm Chart for deploying Accumulo so that the Gaffer Helm Chart can be used to deploy multiple graphs on the same Accumulo instance
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.