GithubHelp home page GithubHelp logo

tritondatacenter / linux-live Goto Github PK

View Code? Open in Web Editor NEW
9.0 13.0 7.0 254 KB

Linux compute node platform image tools. This is the Linux counterpart to smartos-live.

License: Mozilla Public License 2.0

Shell 92.02% JavaScript 0.68% Makefile 1.91% Rust 2.71% C 2.68%

linux-live's Introduction

linux-live

This repo is used to build the Triton Datacenter Linux platform image. This is not a standalone product and is intended to be used with Triton Datacenter.

THIS IS A TECHNOLOGY PREVIEW. Not everythign works as expected yet. But you (yes, you!) can help shape the direction of the product by getting involved.

Detailed documentation can be found in the docs directory. Some of this is likely to become offical documentation and some of it is capturing current areas of exploration.

Bugs

If you find bugs with the platform image, please report them in this repo. If you find bugs in the Triton API stack, please file them in their respective repo (you can find a list in the triton repository).

You can also find bugs filed in our internal Jira here:

Getting Help

You can reach out to us in the following ways:

linux-live's People

Contributors

aqw avatar bahamat avatar kusor avatar twhiteman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linux-live's Issues

need serial console support

For iso/usb boot, grub should present the menu on the graphical console, ttya, and ttyb with the ability to use any of them for OS console access.

Setup Tries to put setup_complete in /var/svc

[2021-10-04T19:13:33.005Z] INFO: ur/2727 on ac-1f-6b-a4-df-ee: Publishing execute-reply to request 854ab79c-5e02-ef05-f430-c9aa5956d41c:
[2021-10-04T19:13:33.005Z] INFO: ur/2727 on ac-1f-6b-a4-df-ee:
{
exit_status: 1,
stdout: '',
stderr: "touch: cannot touch '/var/svc/setup_complete': No such file or directory\n"
}

reboot in container does not work

When root in a container issues reboot or machinectl reboot is issued in the host, the container dies and does not restart. While running journalctl -ef in the host, and executing machinectl reboot $uuid, I see:

Feb 20 13:43:58 debian-live-20200211T200343Z systemd-nspawn[43129]: Sending SIGTERM to remaining processes...
Feb 20 13:43:58 debian-live-20200211T200343Z systemd-nspawn[43129]: Sending SIGKILL to remaining processes...
Feb 20 13:43:58 debian-live-20200211T200343Z systemd-nspawn[43129]: Rebooting.
Feb 20 13:43:58 debian-live-20200211T200343Z systemd[1]: Stopping Container 371f18c0-9f73-6e86-94f6-c1cf71188d23...
Feb 20 13:43:58 debian-live-20200211T200343Z systemd-machined[36777]: Failed to drop reference to machine scope, ignoring: Unit has not been referenced yet.
Feb 20 13:43:58 debian-live-20200211T200343Z systemd[1]: [email protected]: Succeeded.
Feb 20 13:43:58 debian-live-20200211T200343Z systemd[1]: Stopped Container 371f18c0-9f73-6e86-94f6-c1cf71188d23.
Feb 20 13:43:58 debian-live-20200211T200343Z systemd-machined[36777]: Machine 371f18c0-9f73-6e86-94f6-c1cf71188d23 terminated.

The Failed to drop reference to machine scope, ignoring: Unit has not been referenced yet is mentioned in a commit that appears in the v244-rc1 and v245-rc1 releases. Debian 10 is running v241.

Workarounds like this should appear as drop-in files under /usr/lib/systemd/system/.

There are two methods of overriding vendor settings in unit files: copying the unit file from /usr/lib/systemd/system to /etc/systemd/system and modifying the chosen settings. Alternatively, one can create a directory named unit.d/ within /etc/systemd/system and place a drop-in file name.conf there that only changes the specific settings one is interested in. Note that multiple such drop-in files are read if present, processed in lexicographic order of their filename.

Drop-in files work under /usr/lib/systemd as well, allowing software installations to override the system default while still giving the admin free reign in /etc/systemd.

The fix that is developed here should be documented as a best practice for dealing with services that are broken in some versions of some distributions.

vmadm list type should be "LXD"?

vmadm on linux shows TYPE as "-" for lxd.

root@ac-1f-6b-a4-df-ee:~# vmadm list
UUID                                  TYPE  RAM      STATE             ALIAS
d13e5d97-856a-40cf-a7bb-31d1d41e4c4d  -     4096     running           lxd-test-3

triton images show the images as type "lxd" but vmadm shows them as "-".

46a52c59  ubuntu_bionic_amd64_cloud  20211004_07_42  S      linux    lxd           2021-10-04
f45889b7  ubuntu_focal_amd64_cloud   20211004_07_42  S      linux    lxd           2021-10-04
86e0c1a5  ubuntu_xenial_amd64_cloud  20211004_07_42  S      linux    lxd           2021-10-04

vmadm on smartos shows TYPE as OS,LX,BHYV.

7dcd7a5c-9e8b-68b9-a5e9-a40a780a16a9  LX    16384    running           gitlab
43c3ec27-7737-634b-867f-a85e9a17b758  BHYV  32768    running           prod-dv-konvoy-worker-9

zfs build needs to be separated

I think that for developer productivity, the existing debian-live build is on the right path. However, we need to have a way to create a distributable PI and have a way to allow external users to build the zfs binaries easily.

The general direction I expect this to take is described at

https://github.com/joyent/linux-live/blob/929f86535e75e2deff852eb1f464e252b613300d/docs/2-platform-image.md#image-distribution

Before blindly following the direction described in that link, compare it to what is found at a similar location in the linuxcn and/or master branches.

ssh host keys not generated on boot

During boot, the ssh host keys should be generated so that sshd is useful.

Workaround:

# dpkg-reconfigure openssh-server

You will probably also need to do something like the following to allow root login via ssh.

sed -i -e 's/^#PermitRootLogin .*/PermitRootLogin yes/' /etc/ssh/sshd_config
service ssh restart
passwd root

joysetup.sh fails on lxcfs permission errors

diff --git a/scripts/joysetup.sh b/scripts/joysetup.sh
index fda7ffbd..663affd0 100755
--- a/scripts/joysetup.sh
+++ b/scripts/joysetup.sh
@@ -700,7 +700,7 @@ setup_datasets()
         # shellcheck disable=2064
         trap "cp /tmp/joysetup.$$ /var/log/joysetup.log" EXIT
 
-        if ( ! find . -print | TMPDIR=/tmp cpio -pdm "/${VARDS}" ); then
+        if ( ! find . -not -path '*/lib/lxcfs*' -print | TMPDIR=/tmp cpio -pdm "/${VARDS}" ); then
             fatal "failed to initialize the var directory"
         fi

grub menu does not appear on serial console

I have been unable to convince the grub menu to appear anywhere other than the graphical console. This is not great for remote operations.

I've tried various incantations similar to the following with no progress:

diff --git a/tools/debian-live b/tools/debian-live
index a634a17..32565af 100755
--- a/tools/debian-live
+++ b/tools/debian-live
@@ -356,11 +356,15 @@ function prepare_archive_bits {
        cat <<EOF >$scratch/grub.cfg
 search --set=root --file /JOYENT_DEBIAN_LIVE

-insmod all_video
+#insmod all_video

 set default="0"
 set timeout=30

+serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1
+terminal_input serial
+terminal_output serial
+
 menuentry "Joyent Debian Live $release without DHCP ttyS0" {
     linux /vmlinuz boot=live console=ttyS0
     initrd /initrd

In the terminal_* lines, I've tried console serial and --append serial. Once grub starts the kernel, the kernel's messages are seen on the serial port.

need persistent hostid

Whether the zpool can be imported without -f is determined by whether the hostid matches the hostid of the system that previously imported this. Linux normally persists the hostid in /etc/hostid, which is not so useful with live media. If /etc/hostid does not exist, it is set based upon the IP address. In a DHCP world, that's not so great.

Better would be to use the system's UUID with something like the following. This calculates a CRC32 of the system's UUID. It is left as an exercise to the reader to actually set the hostid to this value. This approach requires the dmidecode package is added, which will likely be needed by sysinfo and friends anyway.

#! /usr/bin/python3

import zlib
from subprocess import check_output

uuid = check_output(['/usr/sbin/dmidecode', '-s', 'system-uuid']).strip(b'\n')
print("%08x" % zlib.crc32(uuid))

need network generator

Triton's booter will provide networking.json, indicating how networking should be configured. A systemd generator is needed to create the appropriate .link and .network files under /var/systemd.

In the absence of networking.conf, it should generate a generic configuration that enables dhcp on all interfaces.

ipv6 does not work

As of debian-live-20191218T161612Z, the system seems to prefer IPv6 over IPv4. For whatever reason, my router is giving out IPv6 address but IPv6 connections never succeed from the debian image. I've not noticed this on other SmartOS, MacOS, or Linux instances running on the same network.

Workaround 1:

root@debian-live-20191218T161612Z:~# sysctl -w net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.all.disable_ipv6 = 1
root@debian-live-20191218T161612Z:~# sysctl -w net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6 = 1
root@debian-live-20191218T161612Z:~#

Workaround 2:

https://askubuntu.com/questions/32298/prefer-a-ipv4-dns-lookups-before-aaaaipv6-lookups#answer-38468

# For sites which prefer IPv4 connections change the last line to
#precedence ::ffff:0:0/96 100

need ur-agent

In order to bootstrap the CN, ur-agent needs to be in the image.

linuxcn setup failures in sdcadm, joysetup.sh, and agentsetup.sh

Image
20230609T214426Z

Describe the bug

  • https://docs.tritondatacenter.com/private-cloud/install/linux-compute-node-installation
    Triton documentation references a URL with an unsupported naming convention.
    This causes the resultant .tgz file to also be named incorrectly which sdcadm
    then chokes on during sdcadm platform install ./platform-latest.tgz
    step.
  • Triton documentation: -o argument is empty causing command failure
    curl -o https://us-east.manta.joyent.com/Joyent_Dev/public/TritonDCLinux/joyent-debian_live-latest.usb.gz
    Additionally, the triton documentation instructs you to download the .usb.gz image rather
    than the platform image. Which sdcadm does not like at all.
  • https://github.com/TritonDataCenter/linux-live/blob/master/docs/6-quick-start.md
    github quickstart documentation references a corrupted 133 byte sized image
    with a build timestamp of 20210731T223008Z..
  • Numerous failures setting up a linux compute node during the setup phase in triton.

The first real issue that I ran into is probably a combination of bugs in
sdcadm platform install and also the aforementioned documentation.
Using the existing instructions, both instances reference curl commands,
with URL's that do not produce the file naming convention desired by:
sdcadm platform install

Triton Documented curl command:

# this command is just outright bad. (Even the file name to download is 
 incorrect) Corrected on the following line:
# curl -o https://us-east.manta.joyent.com/Joyent_Dev/public/TritonDCLinux/joyent-debian_live-latest.usb.gz
curl -o joyent-debian_live-latest.tgz https://us-east.manta.joyent.com/Joyent_Dev/public/TritonDCLinux/joyent-debian_live-latest.usb.gz
sdcadm platform install ./joyent-debian_live-latest.usb.gz

# Failure
Using platform file joyent-linux_live-latest.tgz
Installing Platform Image onto USB key
Invalid platform version format: PLATFORM-LATEST
Please ensure this is a valid SmartOS platform image.
install-platform.sh script failed for platform joyent-debian_live-latest.tgz.
Please, check /tmp/install_platform.log for additional information.
Error: Platform setup failed
In order not to have to re-download image, joyent-debian_live-latest.tgz has been left behind.
After correcting above problem, rerun `sdcadm platform install joyent-debian_live--latest.tgz`.
sdcadm platform install: error: Platform setup failed

Github Quickstart curl command:

curl -OC - https://us-central.manta.mnx.io/Joyent_Dev/public/TritonDCLinux/20210731T223008Z/platform-20210731T223008Z.tgz

# This produces a 133 byte corrupted tar archive file that does not work. 

Steps I took to correct:
To obtain and assign a linuxcn image I traversed the manta directory and found
the latest platform image timestamp from mtime which in this case is:
20230609T214426Z. sdcadm complained initially about the name not being
correct for triton and while traversing the manta directories I noticed that
the "links" referring to latest in the manta hierarchy all have mtime stamps
of 20230404, which is 2 months older than the actual latest published build.
I can only assume the majority of these issues are related to ci/cd process or
some build automation regression test failure or lack of community
contribution to the documentation.

I got the Manta objects in Joyent_Dev/public/TritonDCLinux with the following

[root@headnode (dfw002) /var/tmp/linuxcn]# IFS=""; while read -r a;do echo $a | json -Ha name type mtime;done<<<$(curl https://us-central.manta.mnx.io/Joyent_Dev/public/TritonDCLinux)

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1544    0  1544    0     0   2382      0 --:--:-- --:--:-- --:--:--  2848

20210806T203445Z directory 2022-07-31T18:11:07.740Z
20210809T231231Z directory 2022-07-31T18:11:07.750Z
20210814T215810Z directory 2022-07-31T18:11:07.522Z
20210818T230702Z directory 2022-07-31T18:11:07.740Z
20230227T175258Z directory 2023-04-04T19:34:46.867Z
20230510T190821Z directory 2023-05-16T21:34:36.591Z
20230609T214426Z directory 2023-06-15T19:54:59.511Z
joyent-debian_live-latest.iso object 2023-04-04T20:30:22.703Z
joyent-debian_live-latest.usb.gz object 2023-04-04T20:19:25.470Z
latest object 2023-04-04T20:04:09.539Z
platform-latest.tgz object 2023-04-04T20:13:20.360Z

I manually updated the curl command referencing the latest build timestamp and
the correct output filename and was then able to install the image:

# Documentation updates
mkdir /var/tmp/linuxcn
cd /var/tmp/linuxcn
curl -o platform-20230609T214426Z.tgz https://https://us-central.manta.mnx.io/Joyent_Dev/public/TritonDCLinux/20230609T214426Z/platform-20230609T214426Z.tgz
sdcadm platform install ./platform-20230609T214426Z.tgz
# Success!

The remainder of the github documentation (Which is the documentation I chose
to follow) was correct.

Additional pi installation Notes:

  1. Either the latest manta links should be updated properly redirecting file
    output to a supported naming convention including build timestamp (id:
    platform-TIMESTAMP.tgz), the URL in the docs should reference the actual
    timestamp in the URL, or sdcadm should have the capability to pull the build
    timestamp out of the tarball somewhere within the archive so as not to choke
    on a platform image install that simply isn't named properly on the local
    filesystem.

The remainder of this report focuses on the compute node installation

  1. The Installer fails to properly setup the compute node out of the box.
    joysetup.sh fails to setup /var zfs dataset.

To resolve, I manually intervened and performed the following steps to recover:

Note: I made an assumption that /var should be of mountpoint type: legacy if this is incorrect I will update this report.

SVCS="var-lib-lxd-devlxd.mount triton-lxd lxcfs"
for i in ${SVCS};do systemctl stop $i;done
mv /var /var.orig
mkdir /var
zfs set mountpoint=legacy zones/var
mount -t zfs zones/var /var
rsync -arv --progress --partial /var.orig/ /var
for i in ${SVCS}; do systemctl start $i;done
rm -rf /var.orig

Upon running setup again...joysetup.sh completed successfully.

  1. After failure 1 agentsetup.sh failed, complaining now about a lack of /opt/triton/config /opt/triton/config/triton-setup-state.json.

I resolved the agentsetup.sh issues manually as well with the following:

# verified that /opt was an existing, mounted zfs dataset with "zfs mount -l | grep opt"
mkdir -p /opt/triton/{bin,config}
mkdir -p /opt/smartdc/config
touch /opt/triton/config/triton-setup-state.json
export ASSETS_URL=<IP From the headnode assets zone (vmadm get $(vmadm lookup alias=~assets) | json nics.0.ip)> 
/var/tmp/agentsetup.sh

After all of the above I ran setup a final time and the linuxcn rebooted and came up as expected.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.