Image
20230609T214426Z
Describe the bug
- https://docs.tritondatacenter.com/private-cloud/install/linux-compute-node-installation
Triton documentation references a URL with an unsupported naming convention.
This causes the resultant .tgz
file to also be named incorrectly which sdcadm
then chokes on during sdcadm platform install ./platform-latest.tgz
step.
- Triton documentation:
-o
argument is empty causing command failure
curl -o https://us-east.manta.joyent.com/Joyent_Dev/public/TritonDCLinux/joyent-debian_live-latest.usb.gz
Additionally, the triton documentation instructs you to download the .usb.gz image rather
than the platform image. Which sdcadm
does not like at all.
- https://github.com/TritonDataCenter/linux-live/blob/master/docs/6-quick-start.md
github quickstart documentation references a corrupted 133 byte sized image
with a build timestamp of 20210731T223008Z..
- Numerous failures setting up a linux compute node during the setup phase in triton.
The first real issue that I ran into is probably a combination of bugs in
sdcadm platform install
and also the aforementioned documentation.
Using the existing instructions, both instances reference curl commands,
with URL's that do not produce the file naming convention desired by:
sdcadm platform install
Triton Documented curl command:
# this command is just outright bad. (Even the file name to download is
incorrect) Corrected on the following line:
# curl -o https://us-east.manta.joyent.com/Joyent_Dev/public/TritonDCLinux/joyent-debian_live-latest.usb.gz
curl -o joyent-debian_live-latest.tgz https://us-east.manta.joyent.com/Joyent_Dev/public/TritonDCLinux/joyent-debian_live-latest.usb.gz
sdcadm platform install ./joyent-debian_live-latest.usb.gz
# Failure
Using platform file joyent-linux_live-latest.tgz
Installing Platform Image onto USB key
Invalid platform version format: PLATFORM-LATEST
Please ensure this is a valid SmartOS platform image.
install-platform.sh script failed for platform joyent-debian_live-latest.tgz.
Please, check /tmp/install_platform.log for additional information.
Error: Platform setup failed
In order not to have to re-download image, joyent-debian_live-latest.tgz has been left behind.
After correcting above problem, rerun `sdcadm platform install joyent-debian_live--latest.tgz`.
sdcadm platform install: error: Platform setup failed
Github Quickstart curl command:
curl -OC - https://us-central.manta.mnx.io/Joyent_Dev/public/TritonDCLinux/20210731T223008Z/platform-20210731T223008Z.tgz
# This produces a 133 byte corrupted tar archive file that does not work.
Steps I took to correct:
To obtain and assign a linuxcn image I traversed the manta directory and found
the latest platform image timestamp from mtime
which in this case is:
20230609T214426Z
. sdcadm
complained initially about the name not being
correct for triton and while traversing the manta directories I noticed that
the "links" referring to latest in the manta hierarchy all have mtime
stamps
of 20230404, which is 2 months older than the actual latest published build.
I can only assume the majority of these issues are related to ci/cd process or
some build automation regression test failure or lack of community
contribution to the documentation.
I got the Manta objects in Joyent_Dev/public/TritonDCLinux
with the following
[root@headnode (dfw002) /var/tmp/linuxcn]# IFS=""; while read -r a;do echo $a | json -Ha name type mtime;done<<<$(curl https://us-central.manta.mnx.io/Joyent_Dev/public/TritonDCLinux)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1544 0 1544 0 0 2382 0 --:--:-- --:--:-- --:--:-- 2848
20210806T203445Z directory 2022-07-31T18:11:07.740Z
20210809T231231Z directory 2022-07-31T18:11:07.750Z
20210814T215810Z directory 2022-07-31T18:11:07.522Z
20210818T230702Z directory 2022-07-31T18:11:07.740Z
20230227T175258Z directory 2023-04-04T19:34:46.867Z
20230510T190821Z directory 2023-05-16T21:34:36.591Z
20230609T214426Z directory 2023-06-15T19:54:59.511Z
joyent-debian_live-latest.iso object 2023-04-04T20:30:22.703Z
joyent-debian_live-latest.usb.gz object 2023-04-04T20:19:25.470Z
latest object 2023-04-04T20:04:09.539Z
platform-latest.tgz object 2023-04-04T20:13:20.360Z
I manually updated the curl command referencing the latest build timestamp and
the correct output filename and was then able to install the image:
# Documentation updates
mkdir /var/tmp/linuxcn
cd /var/tmp/linuxcn
curl -o platform-20230609T214426Z.tgz https://https://us-central.manta.mnx.io/Joyent_Dev/public/TritonDCLinux/20230609T214426Z/platform-20230609T214426Z.tgz
sdcadm platform install ./platform-20230609T214426Z.tgz
# Success!
The remainder of the github documentation (Which is the documentation I chose
to follow) was correct.
Additional pi installation Notes:
- Either the
latest
manta links should be updated properly redirecting file
output to a supported naming convention including build timestamp (id:
platform-TIMESTAMP.tgz), the URL in the docs should reference the actual
timestamp in the URL, or sdcadm
should have the capability to pull the build
timestamp out of the tarball somewhere within the archive so as not to choke
on a platform image install that simply isn't named properly on the local
filesystem.
The remainder of this report focuses on the compute node installation
- The Installer fails to properly setup the compute node out of the box.
joysetup.sh
fails to setup /var
zfs dataset.
To resolve, I manually intervened and performed the following steps to recover:
Note: I made an assumption that /var
should be of mountpoint type: legacy
if this is incorrect I will update this report.
SVCS="var-lib-lxd-devlxd.mount triton-lxd lxcfs"
for i in ${SVCS};do systemctl stop $i;done
mv /var /var.orig
mkdir /var
zfs set mountpoint=legacy zones/var
mount -t zfs zones/var /var
rsync -arv --progress --partial /var.orig/ /var
for i in ${SVCS}; do systemctl start $i;done
rm -rf /var.orig
Upon running setup again...joysetup.sh
completed successfully.
- After failure 1
agentsetup.sh
failed, complaining now about a lack of /opt/triton/config /opt/triton/config/triton-setup-state.json
.
I resolved the agentsetup.sh
issues manually as well with the following:
# verified that /opt was an existing, mounted zfs dataset with "zfs mount -l | grep opt"
mkdir -p /opt/triton/{bin,config}
mkdir -p /opt/smartdc/config
touch /opt/triton/config/triton-setup-state.json
export ASSETS_URL=<IP From the headnode assets zone (vmadm get $(vmadm lookup alias=~assets) | json nics.0.ip)>
/var/tmp/agentsetup.sh
After all of the above I ran setup a final time and the linuxcn rebooted and came up as expected.