oats-center / isoblue Goto Github PK
View Code? Open in Web Editor NEWISOBlue Hardware, Avena software, and Deployment Files
License: MIT License
ISOBlue Hardware, Avena software, and Deployment Files
License: MIT License
What containers to build and based on what?
NB: No need to deal with GH action rules
services/*
services/*
NB: This requires only one GH workflow file (and only one line needs to be added to enable a new service). DockerHub "does the right thing" when you push the exactly same container to it multiple times.
services/<service-name/*
services/<service-name/*
NB: This is more "correct" but would need a workflow for each service (I think) and those work flows would largely be exact duplicates of each other. However, this enables services that need a non-standard workflow. We could factor a lot of our custom logic into a bash file that all workflows call.
isoblue
) right away.master
, v*
branches, and/or v*
tags) to DockerHub `isoblue right away.isobluedev
right away for easier PR / branch testingisoblue
right awayisoblue
Automated testing of things like CA auth and wireguard will require further work, but to get started we need a way to automatically deploy a new "virtual" ISOBlue to run on something like GitHub Actions or Travis CI.
In the wireguard portion of Ansible we pause and request that the user copy the wireguard public key into their bounce server. This is great for the inital deploy but does not need to be done every time Ansible runs (ie for updating the device) and can cause unneeded delays if the user switches to another task waiting for Ansible to finish.
One solution would be some sort of ping to the bounce server. If it is reachable then the server clearly has our public key. However if the bounce server is unreachable this does not nessicarly mean the config is the issue.
A contributing.md
file serves to define procedures for PRs, releases, etc and is important to make sure we keep things consistent as well as helping newcomers contribute to the project
Previous discussion: #45 (comment)
Currently our docker-compose.yml
files are chained together with a docker-compose -f ... -f ... -f ....
command to bring up a series of containers for manual testing. We would like each docker-compose.yml
file to be self-contained and (eventually) run automated tests when brought up (Maybe a separate docker-compose-testing.yml
?)
For service xxx
the docker-compose.yml
should have the following properties:
isoblue-avena/services/xxx
, docker-compose {build up down etc}
should work@abalmos This is your vision. Any other comments?
After installing a new kernel, there should be a /var/run/reboot-required
file. For some reason that does not exist and the reboot does not happen. For the meantime role avena
always reboots.
This is not well thought out, however, could we somehow make the core Avena install read-only? Only docker-containers can be downloaded and started. We ought to be able to run SSH in a container and give that container enough access to the system, that you can admin the entire thing from that container.
The idea here is the base Avena doesn't change very often and deals with the minimal set of hardware issues it needs to. The rest is easily upgradeable from remote via docker deploys.
This SSH container would need to deal with CA auth
Hard-coding host IP in kafka-gps-log.py is not a good solution as docker's localhost
IP can change.
The optimal choice is to use docker.host.internal
to have docker resolve to localhost
. However, as mentioned in here, Linux won't have this function until docker version 20.04.
Maybe current workarounds is to use network_mode: host
in docker-compose.yml
and hard-code localhost
in kafka-gps-log.py
.
Might need to make an index on (sent, timestamp) or something. Or maybe use sent
as the space index in TSDB.
From meeting with @aultac on 8/7/2020
There should be some sort of flag or mechanism to ask the ISOBlue to stay online even if no CAN is detected. This would be very helpful to keep demos and deploys from being interrupted.
There are several ways we could do this, however containerizing can-watchdog
may add some complications. Would using DBUS be feasible?
This likely should be done in conjunction with #51
Not sure why?
Can we use a pip module? Or, maybe pip
's return code communicates if something was changed or not?
Current docker containers start up with default settings which assign each topic with unknown number of partitions. This will cause consumers to consume from multiple partitions and produce out of order messages if not done right.
Links for instructions under Apalis device and concepts are dead.
In the bitbake days of ISOBlue we had a systemctl
C script that controlled the LED lights on the board allowing us to indicate internet connectivity and general device health among other things. This was helpful for troubleshooting in the field as the operator telling us the LED status was an easy and informative first step in debugging
We have discussed containerizing things (like networkmanager) that would have to break some of the containerization principles, and if we choose to go down that path, this might be a good first one to get our feet wet. On the old kernel the LED status was controlled by writing a value to a file
Current engine-rpm-pub.py
sends seemingly high data rate engine rpm data that overloads thingsboard. It is unclear whether the engine rpm messages from the bus are too much or the something is up with Kafka.
Need to investigate.
This one is very important and is very non-Ansible best practices.
Issues:
Not idempotent: Always re-generates the certificate.
- Should we only do this when near expiration?
- Should Ansible be generating these at all? Maybe something like vault
(can it do it ??) should generate / store them and Ansible just grabs the stored one and ensures that it is on the node. Seems like a business issue to regenerate/extend the node's trust.
Major hack:
- We use with_items
with one item to enable a double template expansion ... this is needed because we fetch the host key to from the remote directly (keys are generated on the node and private keys never leave for security reasons)
Current containers spit out everything that they can onto container logs.
It will be necessary and more storage-friendly if containers (kafka-canX-log
, kafka-gps-log
, engine-rpm-log
) has log levels so that we can view the container output better.
Use a cluster VM as a GH builder?
It is likely possible to move gpsd into a container.
This should be done at the same time as chrony container.
apt install gpsd
or something like: https://hub.docker.com/r/skyhuborg/gpsdpyudev
in the same way we use the system dbus?Doing something like:
$ sudo ./installers/toradex-apalis/make-install-disk.sh /dev/sda
fails because it tries to copy files at the end but the path is wrong. Must run the script locally.
output:
All paritions will be deleted. Do you wish to continue? [y/N]y
umount: /dev/sda: not mounted.
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
Checking that no-one is using this disk right now ... OK
Disk /dev/sda: 14.6 GiB, 15664676864 bytes, 30595072 sectors
Disk model: Cruzer Dial
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x522718ac
Old situation:
>>> Created a new DOS disklabel with disk identifier 0x8e9f2343.
/dev/sda1: Created a new partition 1 of type 'Linux' and of size 14.6 GiB.
Partition #1 contains a vfat signature.
/dev/sda2: Done.
New situation:
Disklabel type: dos
Disk identifier: 0x8e9f2343
Device Boot Start End Sectors Size Id Type
/dev/sda1 2048 30595071 30593024 14.6G 83 Linux
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
mkfs.fat 4.1 (2017-01-24)
--2020-04-08 14:04:32-- https://cdimage.debian.org/debian-cd/current/armhf/iso-cd/debian-10.3.0-armhf-netinst.iso
Resolving cdimage.debian.org (cdimage.debian.org)... 2001:6b0:19::165, 2001:6b0:19::173, 194.71.11.173, ...
Connecting to cdimage.debian.org (cdimage.debian.org)|2001:6b0:19::165|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://caesar.ftp.acc.umu.se/debian-cd/current/armhf/iso-cd/debian-10.3.0-armhf-netinst.iso [following]
--2020-04-08 14:04:33-- https://caesar.ftp.acc.umu.se/debian-cd/current/armhf/iso-cd/debian-10.3.0-armhf-netinst.iso
Resolving caesar.ftp.acc.umu.se (caesar.ftp.acc.umu.se)... 2001:6b0:19::142, 194.71.11.142
Connecting to caesar.ftp.acc.umu.se (caesar.ftp.acc.umu.se)|2001:6b0:19::142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 488943616 (466M) [application/x-iso9660-image]
Saving to: ‘/tmp/tmp.5p4DCoDpNh/debian-10.3.0-armhf-netinst.iso’
debian-10.3.0-armhf-netinst.iso 100%[============================================================================>] 466.29M 9.08MB/s in 53s
2020-04-08 14:05:27 (8.75 MB/s) - ‘/tmp/tmp.5p4DCoDpNh/debian-10.3.0-armhf-netinst.iso’ saved [488943616/488943616]
--2020-04-08 14:05:27-- http://http.us.debian.org/debian/dists/buster/main/installer-armhf/current/images/hd-media/hd-media.tar.gz
Resolving http.us.debian.org (http.us.debian.org)... 2600:3404:200:237::2, 2600:3402:200:227::2, 2620:0:861:1:208:80:154:15, ...
Connecting to http.us.debian.org (http.us.debian.org)|2600:3404:200:237::2|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24306376 (23M) [application/x-gzip]
Saving to: ‘STDOUT’
- 100%[============================================================================>] 23.18M 10.2MB/s in 2.3s
2020-04-08 14:05:30 (10.2 MB/s) - written to stdout [24306376/24306376]
Making Avena install u-boot script
mkimage: Can't stat install-avena: No such file or directory
Copying Debian preseed file
cp: cannot stat 'preseed.cfg': No such file or directory
Copying device tree update script
cp: cannot stat 'update-device-tree': No such file or directory
Unmounting installer
An app that polls the RSSI power data from ModemManager.
Right now 2.7 is the min because of
include_tasks:
file:
apply:
can we do it another way?
Nodejs has a very nice ENV based logger called debug. This is especially helpful in our containers for writing logs to disk instead of having to chase them in stdout/err. We should find and use a similar library for our python based containers
Currently we have no way to manage and deploy docker containers to our Avena devices in the field. In the few instances where I have needed to test containers in the field, I have manually git clone
the repo and used docker-compose
to start it manually. This is not elegant or sustainable.
Possible solutions:
docker-compose
wrapper. Seems more aimed at a docker replacement than a container management tool@abalmos Are there any other approaches we discussed that I am missing?
Restore the Prometheus/Grafana dashboard functionality. It was lost during move to oats2 and new docker-compose strategy.
No comment needed
With all docker containers running, it seems like 4 cores on apalis-imx6 are nearly maxed out.
It is probable that Kafka caused this and we need to find a way to start Kafka and zookeeper containers with addition or removal of arguments like this.
Postgres/timescaledb currently uses "password" as the password for the database. As the ISOBlue lies behind a VPN, no one outside of the VPN should be able to access it regardless of the passsword. However, this should still be investigated, especially if we want to expose it on the wireguard interface in the future
Original comment by @abalmos in #24 (comment)_
Network traffic FROM isoblue is needed to create the NAT tunnel, but by default there is none. How can we get the tunnel from the get go?
TASK [wireguard : gather the wireguard public key] **********************************************
Tuesday 15 September 2020 13:27:33 -0400 (0:00:00.582) 0:02:45.574 *****
ok: [avena-apalis-3]
ok: [avena-apalis-6]
TASK [wireguard : pause] **********************************************
Tuesday 15 September 2020 13:27:34 -0400 (0:00:00.391) 0:02:45.966 *****
[wireguard : pause]
Please add:
[Peer]
# Name = avena-apalis-3
PublicKey = XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
AllowedIPs = XXX.XXX.XXX
to your bounce server wiregaurd configuration and then run
$ sudo su
# wg addconf <wg-interface> <(wg-quick strip <wg-interface>)
before hitting <enter> to continue.
:
ok: [avena-apalis-3]
TASK [wireguard : ensure wireguard auto restarts on failure] **********************************************
Tuesday 15 September 2020 13:30:31 -0400 (0:02:56.825) 0:05:42.791 *****
ok: [avena-apalis-3]
ok: [avena-apalis-6]
We should be seeing two prompts with wireguard public keys in this case
It is likely possible to move chrony into a container.
This should be done at the same time as gpsd container.
apt install chrony
or something like: https://hub.docker.com/r/geoffh1977/chronysystemd-timesyncd
is disabled (chrony apt package does that right now)avena-wireguard
(at least though an option) should route all traffic through the wg interface.
This might require host networking mode (but I think that is still better than a local install)
apt install wireguard
or something like: https://hub.docker.com/r/cmulk/wireguard-dockerIt always causes a "change" due to the way it is implemented.
Maybe backports has a newer openssh with these old moduli's already removed?
Task "ensure wireguard key exists" from role "avena-wireguard" failed with the following error:
TASK [avena-wireguard : ensure wireguard key exists] ******************************************************************************************************************** fatal: [avena-apalis-dev04]: FAILED! => {"changed": false, "msg": "Unable to change directory before execution: [Errno 2] No such file or directory: '/etc/wireguard'"}
Wireguard was not installed and deployment was halted.
Postgres/timescaledb is becoming our primary storage layer for data logs. We should consider some schemes for getting this data back to a remote postgres for permanent storage.
This is an interesting project that we may be able to use and/or learn from: https://bucardo.org/Bucardo/
We could have a stand alone opensshd container. By mounting root, host networking mode, etc. that would gives you close to complete control.
You can mount the docker socket, so you could control the host docker daemon from within the container. Docker-in-docker (dind) is a somewhat understood practice.
Then we could have a CA version of that container for CA based fleet's like ours.
We should make issues to track migrating as many of the "core" service into containers as possible. Each issue should indicate one thing to migrate and a short plan of attack.
We need a license for this repo. Not something urgent though.
We are currently only logging time/lat/lng data from the gps module. According to man 8 gpsd
the following additional data is also available:
Likely all these should be logged, even if they will not be uploaded to OADA or wherever. However, I noticed that with the gps2tsdb container, some of them are null. Unsure if our GPS module does not support them or it is an issue with the python library, but we should discuss how to record values that are not available.
As the ISOBlue devices in the field tend to be offline as much as they are online, updating the ISOBlues using the typical 'push' Ansible mentality involves waiting for each device to come online and pushing the update before it goes back offline which is quite inconvenient and time consuming. At some point we should equip the ISOBlues with an Ansible Pull style updating system where each ISOBlue periodically applies the most recent stable version of the code automatically
Wireguard sets the wg interface with an MTU based off of the default route's MTU when first started. However, when the default route interface changes the MTU is not updated. This can causes connectivity issues.
Should we enabled tcp_mtu_probing? This seems to have its own issue...
/etc/networks/if-up.d script to re-compute and set MTU?
Punting on the issue now ... we will only have cell based ones at first .
Step 3 of the Apalis installer states:
Make installer disk:
$ ./installers/toradex-apalis/make-install-disk.sh DEVICE_FILE
Running this from the the root directory of the project as the installer guide states yields the following errors:
╭─aaron@TRANSLTR ~/code/OATS/isoblue-avena ‹field-fixes*›
╰─$ sudo ./installers/toradex-apalis/make-install-disk.sh /dev/sdb
...
Making Avena install u-boot script
mkimage: Can't stat install-avena: No such file or directory
Copying Debian preseed file
cp: cannot stat 'preseed.cfg': No such file or directory
Copying device tree update script
cp: cannot stat 'update-device-tree': No such file or directory
Unmounting installer
However after cd
ing into the toradex-apalis
folder and running the script again, it proceeds without errors
╭─aaron@TRANSLTR ~/code/OATS/isoblue-avena/installers/toradex-apalis ‹field-fixes*›
╰─$ sudo ./make-install-disk.sh /dev/sdb
...
Making Avena install u-boot script
Image Name: Install Avena on Apalis
Created: Tue Jun 23 17:26:02 2020
Image Type: PowerPC Linux Script (uncompressed)
Data Size: 519 Bytes = 0.51 KiB = 0.00 MiB
Load Address: 00000000
Entry Point: 00000000
Contents:
Image 0: 511 Bytes = 0.50 KiB = 0.00 MiB
Copying Debian preseed file
Copying device tree update script
Unmounting installer
Three things should be changed
toradex-apalis
(if next two fixes cannot be done immediately)... or at least some of it.
Talk to both modem manager and network manager via dbus to configure, start, and stop the various networking interfaces.
Right now if you run ./make-install-disk.sh /dev/diskname
, the script will output:
All paritions will be deleted. Do you wish to continue? [y/N]y
umount: /dev/sdb: not mounted.
umount: /dev/sdb1: not mounted.
sfdisk: cannot open /dev/sdb: Permission denied
sfdisk: cannot open /dev/sdb: Permission denied
mkfs.fat 4.1 (2017-01-24)
mkfs.vfat: unable to open /dev/sdb1: Permission denied
mount: only root can do that
--2020-04-04 17:39:55-- https://cdimage.debian.org/debian-cd/current/armhf/iso-cd/debian-10.3.0-armhf-netinst.iso
Resolving cdimage.debian.org (cdimage.debian.org)... 194.71.11.173, 194.71.11.165, 2001:6b0:19::165, ...
Connecting to cdimage.debian.org (cdimage.debian.org)|194.71.11.173|:443... connected.
...
The script ignores the permission denied
message and keeps doing the rest. The script should prompt user to run script with sudo
and exit.
When a gps is plugged in after isoblue starts gpsd
is not started.
Current docker-compose.yml
and .env
is very platform-specific. We should come up with a way to improve this.
The current pyGObject
based DBUS lib used with gps2tsdb
requires 100s of MB of unrelated dependencies. If we continue to use Python for most of our containers and DBUS as our main communication bus, this will become unsustainable soon
Options:
python3-dbus
and only download those. Will require significant effort to find thesesystemd sleep
callA canbus to postgres container.
A few ideas:
Option 3 has the advantage of letting us try out nano message where we thought DBUS might work well for us.
Option 2 has the advantage of using existing socketcan code
Option 1 has the advantage of less moving parts.
Current thinking: Try out option 3 as a learning experience around nanomsg. Most of the core code is needed for all options, so we can always "downgrade" if we experience issues.
Current thinking 2: We can keep running our candump
loggers a for a while too ... as a backup until we trust our new setup.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.