drogue-iot / drogue-cloud Goto Github PK

View Code? Open in Web Editor NEW

113.0 8.0 30.0 6.56 MB

Cloud Native IoT

Home Page: https://drogue.io

License: Apache License 2.0

Dockerfile 0.38% Rust 94.86% Makefile 1.45% Shell 3.06% HTML 0.10% SCSS 0.02% JavaScript 0.10% PLpgSQL 0.04%

iot serverless kubernetes knative cloud-events protocol-normalization

drogue-cloud's People

Contributors

Stargazers

Watchers

drogue-cloud's Issues

Inconsistent use of dirname - hack | scripts

The release archive promote the use of ./scripts/drogue.sh in the generated assets ( install-*.zip ) while the source code contains ./hack/drogue.sh
This may lead to inconsistency in documentation.

"make test" is broken

Originally "make test" used docker/podman to run the build and the tests.

However, the tests now start containers as well. So that means that we start containers inside of containers. Unfortunately that broke "make test". "make container-test" still works, but requires the user to have all kind of development tools installed, which can be tricky on windows and mac os.

Allow adding scopes to API tokens

API tokens currently allow full access to a project. Allow users to limits this to specific resources and operations. Not too fine grained, but having some basics like read, write, admin.

Add AWS IoT compatible endpoint for MQTT

MQTT stream doesn't stop if the Kafka topic is not available

1- start to consume events with the MQTT endpoint :
2- delete the app

=> The MQTT session should be ended I suppose.
The mqtt-integration is stuck in a while loop :

[2021-08-11T10:55:38Z INFO  rdkafka::client] librdkafka: PARTCNT [thrd:main]: Topic events-example-app partition count changed from 3 to 0
[2021-08-11T10:55:38Z ERROR rdkafka::client] librdkafka: Global error: UnknownPartition (Local: Unknown partition): events-example-app [0]: desired partition is no longer available (Local: Unknown partition)
[2021-08-11T10:55:38Z ERROR rdkafka::client] librdkafka: Global error: UnknownPartition (Local: Unknown partition): events-example-app [1]: desired partition is no longer available (Local: Unknown partition)
[2021-08-11T10:55:38Z ERROR rdkafka::client] librdkafka: Global error: UnknownPartition (Local: Unknown partition): events-example-app [2]: desired partition is no longer available (Local: Unknown partition)

I have the same issue with the websocket service

Provide dedicated "health" endpoints

Currently all services/endpoints expose their "health" information on the main API endpoint. That should change in a way that each endpoint/service has a dedicated "health" endpoint, which is only exposed internally.

In the past we had issues with Knative deployments having only one port to check. However, this should change as we will be using normal deployments for most services/endpoints soon.

The "authentication service" already is deployed that way, so that might be a good first candidate.

A goal of this task should be, that at least the configuration for this endpoint is consistent. The current implementation of the endpoints should be kept the same. Using some alternative way, for existing implementation for providing Kubernetes readiness/liveness information in Rust can be done in a separate issue.

Install DCO app for enforcing sign-offs

The DCO app can be installed by the maintainers to enforce sign-offs, with minimal effort.

improve visibility of drg login in Console

one helpful thing might be for the landing page of http://sandbox.drogue.cloud to mention drg login http://api.sandbox.drogue.cloud/

Instead of navigating to getting started > register devices

Allow using QoS 1 with the MQTT Integration

Add Fiware compatible endpoints

Evaluate the different options Fiware offers: https://www.fiware.org/
Come up with a strategy
Implement at least one compatible endpoint
Add this to our deployment
Add system tests and documentation
Test with an existing Fiware example/application

Provide additional deployment model for a public cloud provider

We want to have at least one additional deployment for a public cloud provider.

The goal is not to use OpenShift and abstract away all the differences. The goal also is not host a bunch of additional deployments for all the different APIs and requirements.

But to see what changes are required for a specific Kubernetes variant like GKE, Azure, AWS, DigitalOcean, … and learn some things that might help generalize the deployment.

Find a replacement for webpack

Currently we are using webpack and wasm-pack to compile/package the frontend.

However, it looks like webpack 5 has issues with WASM and webpack 4 is no longer maintained, accumulating NPM security advisories.

Also, the stability of the toolchain leaves room for improvement.

Webpack is used for the proper console-frontend project, but also for the SwaggerUI embedded in that. So a replacement would need to think about the implications of that too. Splitting these up in two different "projects" would work too of course.

One potential replacement could be "trunk", which seems to become more popular in the Rust world: https://github.com/thedodd/trunk

Refactor the Rust client for the management API to re-use the token provider trait

Currently we have an OpenIdTokenProvider and four different REST clients written in Rust.

Two of them use the token provider trait. One needs to pass in the original request token, and the third is the command line client.

We need to refactor this so that:

We possibly use the same implementation (or trait)
We have the ability to re-use a token from the original request
It works with the backend and the command line application

Maybe we should consider extracting this component into a dedicated repository and make it a dependency the drogue cloud backend and the command line application.

Provide an aggregated API

Currently we have a bunch of services, with a bunch of endpoints.

However, as we aligned the different APIs now, we could/should offer a single API endpoint.

Add application metrics using prometheus

Provide a basic metrics setup using prometheus:

Add basic infrastructure to services/endpoints
Add Prometheus + Grafana to deployment (maybe through a separate helm chart drogue-cloud-metrics
Add some reasonable first metrics

Implement TTN controller/operator

Implement an example controller/operator using the device registry events. Syncing devices from our internal registry to TTN, using the v3 API.

This should:

Sync the app
Sync devices of the app
Set up at least one endpoint (e.g. HTTP)
Work with upstream and downstream events

API keys cannot be used for consuming events

Trying to use an API key to authenticate a client connecting to the MQTT integration, I get:

CONNECT failed as CONNACK contained an Error Code: BAD_USER_NAME_OR_PASSWORD.

The "delete" button in apps list does nothing

When clicking delete nothing happens.

Make a deployment configurable

It'd be nice to have switches that would install separate components like knative infra, drogue infra, additional services, etc.

Missing installer for generic kubernetes

The installer zip for kubernetes cluster type is not included in releases.

Add a `list` method to retrieve a list of devices for an app.

Proposal : /api/v1/<appId>/devices/

Add Eclipse Hono compatible HTTP endpoint

Add Azure IoT compatible endpoints

Evaluate the different options and implement at least one Azure IoT compatible endpoint

Configure keycloak to provide "audience" in token

Currently Keycloak is not set up to provide the required aud (audience) information in the token. The console backend will reject incoming requests.

To my understanding this could be implemented by adding the following to the realm config:

    clientScopes:
      - name: good-service
        attributes:
          "include.in.token.scope": "true"
          "display.on.consent.screen": "true"
        protocolMappers:
          - name: app-audience
            protocol: openid-connect
            protocolMapper: protocolMapper
            consentRequired: false
            config:
              "included.client.audience": "drogue"
              "id.token.claim": "false"
              "access.token.claim": "true"

However, that isn't supported by the current version of the keycloak operator. It is only on "master" at the moment.

Maybe we can add this to the client also: https://stackoverflow.com/a/61059910

More information:

Document list of necessary packages

There are seveal packages that are needed to build the modules, like cmake, cyrus-sasl-devel and others.
They are included in the github actions image so it all works but if one want to run the build locally it's a bit cumbersome to go through each package as the build fails.
Having a list of package in the readme would be nice :)

Define mapping of IoT related information to Cloud Events.

Currently we have a little bit of a mess going on with IoT related information (device id, model id, …), mapping them to the cloud events attributes.

We need to define how to best use the existing attributes, and fix any spec violating constructs that we currently might have.

Support multi-tenant deployment

Currently everything is tied to drogue-cloud namespace

command-endpoint image use dejan's repo

kubectl describe ksvc command-endpoint

Name:         command-endpoint
Namespace:    drogue-iot
Labels:       app.kubernetes.io/part-of=endpoints
              image-source=build
Annotations:  serving.knative.dev/creator: kubernetes-admin
              serving.knative.dev/lastModifier: system:serviceaccount:knative-eventing:eventing-webhook
API Version:  serving.knative.dev/v1
Kind:         Service
Metadata:
  Creation Timestamp:  2021-01-13T17:05:42Z
  Generation:          2
  Managed Fields:
    API Version:  serving.knative.dev/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
        f:labels:
          .:
          f:app.kubernetes.io/part-of:
          f:image-source:
      f:spec:
        .:
        f:template:
          .:
          f:metadata:
            .:
            f:labels:
              .:
              f:bindings.knative.dev/include:
              f:image-source:
          f:spec:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2021-01-13T17:05:42Z
    API Version:  serving.knative.dev/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:template:
          f:spec:
            f:containers:
    Manager:      webhook
    Operation:    Update
    Time:         2021-01-13T17:06:05Z
    API Version:  serving.knative.dev/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:address:
          .:
          f:url:
        f:conditions:
        f:latestCreatedRevisionName:
        f:latestReadyRevisionName:
        f:observedGeneration:
        f:traffic:
        f:url:
    Manager:         controller
    Operation:       Update
    Time:            2021-01-13T17:06:15Z
  Resource Version:  7959
  Self Link:         /apis/serving.knative.dev/v1/namespaces/drogue-iot/services/command-endpoint
  UID:               541a28b8-9a16-4aad-8b84-70e48c9a9890
Spec:
  Template:
    Metadata:
      Creation Timestamp:  <nil>
      Labels:
        bindings.knative.dev/include:  true
        Image - Source:                build
    Spec:
      Container Concurrency:  0
      Containers:
        Env:
          Name:   RUST_LOG
          Value:  info
          Name:   K_SINK
          Value:  http://iot-commands-kn-channel.drogue-iot.svc.cluster.local
          Name:   K_CE_OVERRIDES
        Image:    quay.io/dejanb/command-endpoint:latest
        Name:     user-container
        Readiness Probe:
          Success Threshold:  1
          Tcp Socket:
            Port:  0
        Resources:
      Enable Service Links:  false
      Timeout Seconds:       300
  Traffic:
    Latest Revision:  true
    Percent:          100
Status:
  Address:
    URL:  http://command-endpoint.drogue-iot.svc.cluster.local
  Conditions:
    Last Transition Time:        2021-01-13T17:06:13Z
    Status:                      True
    Type:                        ConfigurationsReady
    Last Transition Time:        2021-01-13T17:06:15Z
    Status:                      True
    Type:                        Ready
    Last Transition Time:        2021-01-13T17:06:15Z
    Status:                      True
    Type:                        RoutesReady
  Latest Created Revision Name:  command-endpoint-00002
  Latest Ready Revision Name:    command-endpoint-00002
  Observed Generation:           2
  Traffic:
    Latest Revision:  true
    Percent:          100
    Revision Name:    command-endpoint-00002
  URL:                http://command-endpoint.drogue-iot.172.18.0.2.nip.io
Events:
  Type    Reason   Age   From                Message
  ----    ------   ----  ----                -------
  Normal  Created  10m   service-controller  Created Configuration "command-endpoint"
  Normal  Created  10m   service-controller  Created Route "command-endpoint"

About dialog picture

Currently the about dialog of the web console doesn't have a picture. I would be nice to have one.

What we need is a picture which fits into the dialog shown below, and looks good in B/W (the part on the right):

AuthenticationError 403 while using x509 certificates.

I am using the following script
script.sh

which does the following

create a trust anchor certificate
add it to the application object using drg.
create a device certificate and sign it with the app's private key.

After running the script, the trust anchor is successfully added to the Application object, this can be verified as the app object contains the following.

  "status": {
    "trustAnchors": {
      "anchors": [
        {
          "valid": {
            "certificate": "...",
            "notAfter": "2022-06-23T12:31:15Z",
            "notBefore": "2021-06-23T12:31:15Z",
            "subject": "O=Drogue IoT, OU=Cloud, CN=app12"
          }
        }
      ]
    }
  }

Then I use the device certificate to authenticate using the following command.
http --cert test-certs/device-certs.pem --cert-key test-certs/app-private.key POST https://http.sandbox.drogue.cloud/v1/foo

but it returns 403.

While monitoring the server logs while running this request with @jbtrystram.
We found this

[2021-06-23T12:31:31Z DEBUG drogue_cloud_http_endpoint] Accepting client certificates: "[organizationName = \"Drogue IoT\", organizationalUnitName = \"Cloud\", commonName = \"d7\"]"
[2021-06-23T12:31:31Z DEBUG drogue_cloud_http_endpoint] Accepting client certificates: "[organizationName = \"Drogue IoT\", organizationalUnitName = \"Cloud\", commonName = \"d7\"]"
[2021-06-23T12:31:31Z DEBUG drogue_cloud_http_endpoint::x509] Try extracting client cert
[2021-06-23T12:31:32Z DEBUG actix_web::extract] Error for Option<T> extractor: UnknownError
[2021-06-23T12:31:32Z DEBUG drogue_cloud_http_endpoint::telemetry] Publish to 'foo'
[2021-06-23T12:31:32Z DEBUG actix_web::middleware::logger] Error in response: HttpEndpointError(AuthenticationError)
[2021-06-23T12:31:32Z INFO  actix_web::middleware::logger] 10.130.2.1:53128 "POST /v1/foo HTTP/1.1" 403 76 "-" "HTTPie/0.9.8" 0.000071
[2021-06-23T12:31:33Z DEBUG drogue_cloud_http_endpoint::x509] Try extracting client cert
[2021-06-23T12:31:33Z DEBUG actix_web::extract] Error for Option<T> extractor: UnknownError
[2021-06-23T12:31:33Z DEBUG drogue_cloud_http_endpoint::telemetry] Publish to 'status'
[2021-06-23T12:31:33Z DEBUG drogue_client::openid::provider] Token still valid
[2021-06-23T12:31:33Z DEBUG hyper::client::pool] reuse idle connection for ("http", authentication-service) 
[2021-06-23T12:31:33Z DEBUG hyper::proto::h1::io] flushed 1629 bytes 
[2021-06-23T12:31:33Z DEBUG hyper::proto::h1::io] parsed 3 headers 
[2021-06-23T12:31:33Z DEBUG hyper::proto::h1::conn] incoming body is content-length (464 bytes) 
[2021-06-23T12:31:33Z DEBUG hyper::proto::h1::conn] incoming body completed 
[2021-06-23T12:31:33Z DEBUG hyper::client::pool] pooling idle connection for ("http", authentication-service) 
[2021-06-23T12:31:33Z DEBUG reqwest::async_impl::client] response '200 OK' for http://authentication-service/api/v1/auth

Add Eclipse Hono compatible MQTT endpoint

Create a WebSocket endpoint for consuming events

Similar to the MQTT Integration endpoint, we should have a web service version of this. This could also replace the current SSE based "Spy" endpoint.

Add tracing using Jaeger/OpenTracing

Add tracing capabilities, allowing to use Jaeger/OpenTracing with our services/endpoints.

Two HTTP command subscribers on the same device cause a panic

I just had two HTTP commands with a command delay, which resulted into a panic due to an .unwrap():

[2021-03-10T15:44:37Z DEBUG drogue_cloud_endpoint_common::commands] Device Id { app_id: "app_id", device_id: "device_id" } subscribed to receive commands
[2021-03-10T15:44:37Z DEBUG drogue_cloud_endpoint_common::commands] Device Id { app_id: "app_id", device_id: "device_id" } unsubscribed from receiving commands
thread 'actix-rt|system:0|arbiter:0' panicked at 'called `Option::unwrap()` on a `None` value', http-endpoint/src/command.rs:[2021-03-10T15:44:37Z DEBUG drogue_cloud_endpoint_common::commands] Device Id { app_id: "app_id", device_id: "device_id" } unsubscribed from receiving commands
27:39
thread 'actix-rt|system:0|arbiter:2' panicked at 'called `Option::unwrap()` on a `None` value', http-endpoint/src/command.rs:27:39

Simplify running Drogue Cloud locally

Although we know Kubernetes and the ecosystem well, users new to Drogue IoT may not be familiar with these technologies, and the bar for getting drogue cloud working just for evaluation is high.

The sandbox aleviates this a bit, but in the end it is just a sandbox.

If we could provide a smaller, more self-contained version of the drogue cloud that could run locally, ideally in a single binary packaged the same way as the drg tool, that lower the bar significantly.

Some goals for such a tool would be:

Single entry point for all endpoints and management APIs. Use multiple ports to run different services.
Ability to use simpler variants of Keycloak, Kafka and PostgreSQL to make it even easier
(Optional) Allow selecting some components to use an external instance of Keycloak, Kafka or PostgreSQL to be able to run "mini production" environment locally.

Example for what this could look like:

Spinning up a local server with default auth, kafka and postgresql alternatives:

$ drg server run
Starting services...done!

Console: https://localhost:8080

Running a local server but using third party service for dependencies: Probably need more options for credentials and such:

$ drg server run --database-url myhost:5354 --kafka-bootstrap myhost:12345 --oauth-server https://cloud.google.com/...
...

Whether or not this would be baked into drg is not that important, but I think if it were, it would be extremely simple, and then you can easily switch to a 'scalable Drogue IoT' instance using that very same tool.

Refresh access token in frontend

Split up Kafka topic

We still have a single Kafka topic, but wanted to have different topics for a bit.

There are three modes we could have here:

Single topic for all – what he have now
Topic per application – all data of an application flows into the same topic
Topic per application per channel – an application could have "sub"-topics, defined by the "channel" information. That would allow "telemetry", "event", or possible others.

Parsing error on device names with space on Sandbox

While using the sandbox, when we try to view the details of the device, it works well, but it doesn't open when the device name has a space in it.
Here is a video to explain the situation.

droguessu.mp4

Refactor the Management API

The current structure of the management API is a bit "grown". We should re-structure the current API:

Focus on the device management API
Keep the functionality, but change the representation (endpoints, paths, data formats, …)
Create an OpenAPI 3 spec

Console backend receives empty client secret on first start

When deploying the stack, the console-backend has an empty client secret at first:

CLIENT_ID=drogue
CLIENT_SECRET=

While the secret has the proper content:

data:
  CLIENT_ID: ZHJvZ3Vl
  CLIENT_SECRET: ZTc0ODM2Y2QtMGY3NS00ZDZkLWIzNDEtNDIyNDE5YjZjMTk3

I assume the keycloak operator updates the secret later on, so we must check this somehow.

[WS] API keys don't work

I wasn't able to get API keys to work. I am not sure if the Either<Bearer,Basic> idea works as intended.

I added a few lines of debug output in the start_ function, but never saw that triggered.

Enable TLS 1.3

HTTP endpoint
MQTT endpoint

Proposal on making install script more appealing

Just a thought I had regarding our discussion on scripts vs helm vs operator. This felt too small to be an RFC, so I just thought I'd start a discussion here. The idea is just to modify/rename drogue.sh, status.sh or wrapping them in a drgadm script.

The drgadm script could just be drogue.sh renamed, wrap it, or combine it with helm or operator under the hood. I find it appealing that you don't really need to touch Kubernetes with such an interface. It would be similar to the kubeadm tool that exists for installing and managing Kubernetes clusters, in the same way drg is similar to kubectl.

drgadm install -c kubernetes # Does whatever drogue.sh does
drgadm status # Invokes status.sh

Maybe the first step could be just to refactor the drogue.sh and status.sh into functions that can be sourced and invoked from this tool?

Where this ties into helm I'm not sure, but if it uses helm, we introduce a dependency other than kubectl. Maybe helm would just be separate to this, I'm not sure.

Knative services fail to start up

Some services fail to start:

NAME                        URL                                                                LATESTCREATED                     LATESTREADY           READY   REASON
device-management-service   http://device-management-service.drogue-iot.10.103.42.167.nip.io   device-management-service-00001                         False   RevisionMissing
http-endpoint               http://http-endpoint.drogue-iot.10.103.42.167.nip.io               http-endpoint-00002               http-endpoint-00002   True    
influxdb-pusher             http://influxdb-pusher.drogue-iot.10.103.42.167.nip.io             influxdb-pusher-00001                                   False   RevisionMissing

It is possible to nudge them:

➜  install-minikube-0.2.0-rc3 kn -n drogue-iot service update device-management-service -e N=1
Updating Service 'device-management-service' in namespace 'drogue-iot':

  0.039s unsuccessfully observed a new generation
  0.089s Configuration "device-management-service" does not have any ready Revision.
  0.134s Configuration "device-management-service" is waiting for a Revision to become ready.
  3.416s ...
  3.460s Ingress has not yet been reconciled.
  3.511s Waiting for load balancer to be ready
  3.695s Ready to serve.

Service 'device-management-service' updated to latest revision 'device-management-service-bswsg-2' is available at URL:
http://device-management-service.drogue-iot.10.103.42.167.nip.io

However, this should not be necessary: tracking knative/serving#10344

We also should wait for jonhoo/fantoccini#134 to be resolved.

drogue-iot / drogue-cloud Goto Github PK

drogue-cloud's People

Contributors

Stargazers

Watchers

Forkers

drogue-cloud's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs