coreos / mantle Goto Github PK

Mantle: Gluing Container Linux together

License: Apache License 2.0

Makefile 0.01% Go 99.82% Shell 0.17%

mantle's Introduction

🚨 Mantle for Fedora CoreOS and RHEL CoreOS has been merged into coreos-assembler. :rotating_light:
This cl branch is for CoreOS Container Linux.

Mantle: Gluing Container Linux together

This repository is a collection of utilities for developing Container Linux. Most of the tools are for uploading, running, and interacting with Container Linux instances running locally or in a cloud.

Overview

Mantle is composed of many utilities:

cork for handling the Container Linux SDK
gangue for downloading from Google Storage
kola for launching instances and running tests
kolet an agent for kola that runs on instances
ore for interfacing with cloud providers
plume for releasing Container Linux

All of the utilities support the help command to get a full listing of their subcommands and options.

Tools

cork

Cork is a tool that helps working with Container Linux images and the SDK.

cork create

Download and unpack the Container Linux SDK.

cork create

cork enter

Enter the SDK chroot, and optionally run a command. The command and its arguments can be given after --.

cork enter -- repo sync

cork download-image

Download a Container Linux image into $PWD/.cache/images.

cork download-image --platform=qemu

Building Container Linux with cork

See Modifying Container Linux for an example of using cork to build a Container Linux image.

gangue

Gangue is a tool for downloading and verifying files from Google Storage with authenticated requests. It is primarily used by the SDK.

gangue get

Get a file from Google Storage and verify it using GPG.

kola

Kola is a framework for testing software integration in Container Linux instances across multiple platforms. It is primarily designed to operate within the Container Linux SDK for testing software that has landed in the OS image. Ideally, all software needed for a test should be included by building it into the image from the SDK.

Kola supports running tests on multiple platforms, currently QEMU, GCE, AWS, VMware VSphere, Packet, and OpenStack. In the future systemd-nspawn and other platforms may be added. Local platforms do not rely on access to the Internet as a design principle of kola, minimizing external dependencies. Any network services required get built directly into kola itself. Machines on cloud platforms do not have direct access to the kola so tests may depend on Internet services such as discovery.etcd.io or quay.io instead.

Kola outputs assorted logs and test data to _kola_temp for later inspection.

Kola is still under heavy development and it is expected that its interface will continue to change.

By default, kola uses the qemu platform with the most recently built image (assuming it is run from within the SDK).

kola run

The run command invokes the main kola test harness. It runs any tests whose registered names matches a glob pattern.

kola run <glob pattern>

--blacklist-test can be used if one or more tests in the pattern should be skipped. This switch may be provided once:

kola --blacklist-test linux.nfs.v3 run

multiple times:

kola --blacklist-test linux.nfs.v3 --blacklist-test linux.nfs.v4 run

and can also be used with glob patterns:

kola --blacklist-test linux.nfs* --blacklist-test crio.* run

kola list

The list command lists all of the available tests.

kola spawn

The spawn command launches Container Linux instances.

kola mkimage

The mkimage command creates a copy of the input image with its primary console set to the serial port (/dev/ttyS0). This causes more output to be logged on the console, which is also logged in _kola_temp. This can only be used with QEMU images and must be used with the coreos_*_image.bin image, not the coreos_*_qemu_image.img.

kola bootchart

The bootchart command launches an instance then generates an svg of the boot process using systemd-analyze.

kola updatepayload

The updatepayload command launches a Container Linux instance then updates it by sending an update to its update_engine. The update is the coreos_*_update.gz in the latest build directory.

kola subtest parallelization

Subtests can be parallelized by adding c.H.Parallel() at the top of the inline function given to c.Run. It is not recommended to utilize the FailFast flag in tests that utilize this functionality as it can have unintended results.

kola test namespacing

The top-level namespace of tests should fit into one of the following categories:

Groups of tests targeting specific packages/binaries may use that namespace (ex: docker.*)
Tests that target multiple supported distributions may use the coreos namespace.
Tests that target singular distributions may use the distribution's namespace.

kola test registration

Registering kola tests currently requires that the tests are registered under the kola package and that the test function itself lives within the mantle codebase.

Groups of similar tests are registered in an init() function inside the kola package. Register(*Test) is called per test. A kola Test struct requires a unique name, and a single function that is the entry point into the test. Additionally, userdata (such as a Container Linux Config) can be supplied. See the Test struct in kola/register/register.go for a complete list of options.

kola test writing

A kola test is a go function that is passed a platform.TestCluster to run code against. Its signature is func(platform.TestCluster) and must be registered and built into the kola binary.

A TestCluster implements the platform.Cluster interface and will give you access to a running cluster of Container Linux machines. A test writer can interact with these machines through this interface.

To see test examples look under kola/tests in the mantle codebase.

For a quickstart see kola/README.md.

kola native code

For some tests, the Cluster interface is limited and it is desirable to run native go code directly on one of the Container Linux machines. This is currently possible by using the NativeFuncs field of a kola Test struct. This like a limited RPC interface.

NativeFuncs is used similar to the Run field of a registered kola test. It registers and names functions in nearby packages. These functions, unlike the Run entry point, must be manually invoked inside a kola test using a TestCluster's RunNative method. The function itself is then run natively on the specified running Container Linux instances.

For more examples, look at the coretest suite of tests under kola. These tests were ported into kola and make heavy use of the native code interface.

Manhole

The platform.Manhole() function creates an interactive SSH session which can be used to inspect a machine during a test.

kolet

kolet is run on kola instances to run native functions in tests. Generally kolet is not invoked manually.

ore

Ore provides a low-level interface for each cloud provider. It has commands related to launching instances on a variety of platforms (gcloud, aws, azure, esx, and packet) within the latest SDK image. Ore mimics the underlying api for each cloud provider closely, so the interface for each cloud provider is different. See each providers help command for the available actions.

Note, when uploading to some cloud providers (e.g. gce) the image may need to be packaged with a different --format (e.g. --format=gce) when running image_to_vm.sh

plume

Plume is the Container Linux release utility. Releases are done in two stages, each with their own command: pre-release and release. Both of these commands are idempotent.

plume pre-release

The pre-release command does as much of the release process as possible without making anything public. This includes uploading images to cloud providers (except those like gce which don't allow us to upload images without making them public).

plume release

Publish a new Container Linux release. This makes the images uploaded by pre-release public and uploads images that pre-release could not. It copies the release artifacts to public storage buckets and updates the directory index.

plume index

Generate and upload index.html objects to turn a Google Cloud Storage bucket into a publicly browsable file tree. Useful if you want something like Apache's directory index for your software download repository. Plume release handles this as well, so it does not need to be run as part of the release process.

Platform Credentials

Each platform reads the credentials it uses from different files. The aws, azure, do, esx and packet platforms support selecting from multiple configured credentials, call "profiles". The examples below are for the "default" profile, but other profiles can be specified in the credentials files and selected via the --<platform-name>-profile flag:

kola spawn -p aws --aws-profile other_profile

aws

aws reads the ~/.aws/credentials file used by Amazon's aws command-line tool. It can be created using the aws command:

$ aws configure

To configure a different profile, use the --profile flag

$ aws configure --profile other_profile

The ~/.aws/credentials file can also be populated manually:

[default]
aws_access_key_id = ACCESS_KEY_ID_HERE
aws_secret_access_key = SECRET_ACCESS_KEY_HERE

To install the aws command in the SDK, run:

sudo emerge --ask awscli

azure

azure uses ~/.azure/azureProfile.json. This can be created using the az command:

$ az login`

It also requires that the environment variable AZURE_AUTH_LOCATION points to a JSON file (this can also be set via the --azure-auth parameter). The JSON file will require a service provider active directory account to be created.

Service provider accounts can be created via the az command (the output will contain an appId field which is used as the clientId variable in the AZURE_AUTH_LOCATION JSON):

az ad sp create-for-rbac

The client secret can be created inside of the Azure portal when looking at the service provider account under the Azure Active Directory service on the App registrations tab.

You can find your subscriptionId & tenantId in the ~/.azure/azureProfile.json via:

cat ~/.azure/azureProfile.json | jq '{subscriptionId: .subscriptions[].id, tenantId: .subscriptions[].tenantId}'

The JSON file exported to the variable AZURE_AUTH_LOCATION should be generated by hand and have the following contents:

{
  "clientId": "<service provider id>", 
  "clientSecret": "<service provider secret>", 
  "subscriptionId": "<subscription id>", 
  "tenantId": "<tenant id>", 
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com", 
  "resourceManagerEndpointUrl": "https://management.azure.com/", 
  "activeDirectoryGraphResourceId": "https://graph.windows.net/", 
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/", 
  "galleryEndpointUrl": "https://gallery.azure.com/", 
  "managementEndpointUrl": "https://management.core.windows.net/"
}

do

do uses ~/.config/digitalocean.json. This can be configured manually:

{
    "default": {
        "token": "token goes here"
    }
}

esx

esx uses ~/.config/esx.json. This can be configured manually:

{
    "default": {
        "server": "server.address.goes.here",
        "user": "user.goes.here",
        "password": "password.goes.here"
    }
}

gce

gce uses the ~/.boto file. When the gce platform is first used, it will print a link that can be used to log into your account with gce and get a verification code you can paste in. This will populate the .boto file.

See Google Cloud Platform's Documentation for more information about the .boto file.

openstack

openstack uses ~/.config/openstack.json. This can be configured manually:

{
    "default": {
        "auth_url": "auth url here",
        "tenant_id": "tenant id here",
        "tenant_name": "tenant name here",
        "username": "username here",
        "password": "password here",
        "user_domain": "domain id here",
        "floating_ip_pool": "floating ip pool here",
        "region_name": "region here"
    }
}

user_domain is required on some newer versions of OpenStack using Keystone V3 but is optional on older versions. floating_ip_pool and region_name can be optionally specified here to be used as a default if not specified on the command line.

packet

packet uses ~/.config/packet.json. This can be configured manually:

{
	"default": {
		"api_key": "your api key here",
		"project": "project id here"
	}
}

qemu

qemu is run locally and needs no credentials, but does need to be run as root.

qemu-unpriv

qemu-unpriv is run locally and needs no credentials. It has a restricted set of functionality compared to the qemu platform, such as:

Single node only, no machine to machine networking
DHCP provides no data (forces several tests to be disabled)
No Local cluster

mantle's People

Contributors

Stargazers

Watchers

mantle's Issues

kola: etcdctl

Need to test etcdctl since we removed its use as part of the basic etcd discovery tests. This depends on etcdctl returning to using non-zero exit codes to facilitate scripting with it.

plume: publish AMIs

Replace the functionality in the prod-publish.sh/publish_ami.sh scripts.

rfc: test naming scheme

our test names are super disorganized, and have no consistency. having a proper naming scheme that is globbable by category would be nice.

something like:

base/adduser
fleet/submitunit
etcd/discovery
etcd/atomicswap
docker/push
docker/pull
systemd/journald/remote
systemd/nspawn
net/nfs/v3
net/nfs/v4
ext/deis
ext/kubernetes

then i can tell kola run "docker/*", for example.

consider moving to libretto

might simplify our code quite a bit will gaining new platform support.

kola: systemd.journal.remote is broken with systemd v229

the remote journal stuff now generates a journal file with no port, e.g. /var/log/journal/remote/remote-10.0.0.2.journal instead of /var/log/journal/remote/remote-10.0.0.2:19531.journal. the test needs to be fixed to handle this for v229.

mantle: clean-up manual instance spawning

ore and kola now both share the ability to manually spawn a VM. plume shares code with ore and the copied code is out of sync. The kola spawn command probably belongs in ore and we should try and factor common code out among all three binaries and clean-up the user-interfaces. See: #160

kola: vmware support

we should investigate using kola to run tests on vmare.

https://github.com/vmware/govmomi looks like it might be useful.

Limit Kola tests to applicable architectures

The kola test docker.oldclient will only work for amd64 hosts. As we develop more support for arm devices, we'll need some mechanism to tag certain tests as arm or x86 only.

kola: cannot reboot machines

if we reboot a machine during a test, it breaks our ssh client. this limits our ability to test CoreOS beyond the first boot.

Split --gce-project

The flag --gce-project is used as both the gce-image-project and the gce-project. So if you want to use an image from a different project then the project on which machine's are spawned it can't be done. Gcloud differentiates these projects and we should too.

kola: check coreos semver

kola tests should be able to specify which CoreOS versions they can execute on, and if the remote machine does not have the appropriate version, the tests should be skipped.

i'm not sure what this looks like yet, since the actual test functions nor the platform receive information (*kola.Test) about the currently running test.

plume: build aws images

Plume should build AWS images free of java and python dependencies. This will unblock doing automated tests of AWS images.

update Google Cloud API client import paths and more

The Google Cloud API client libraries for Go are making some breaking changes:

The import paths are changing from google.golang.org/cloud/... to
cloud.google.com/go/.... For example, if your code imports the BigQuery client
it currently reads
import "google.golang.org/cloud/bigquery"
It should be changed to
import "cloud.google.com/go/bigquery"
Client options are also moving, from google.golang.org/cloud to
google.golang.org/api/option. Two have also been renamed:
- WithBaseGRPC is now WithGRPCConn
- WithBaseHTTP is now WithHTTPClient
The cloud.WithContext and cloud.NewContext methods are gone, as are the
deprecated pubsub and container functions that required them. Use the Client
methods of these packages instead.

You should make these changes before September 12, 2016, when the packages at
google.golang.org/cloud will go away.

coreos.filesystem.writabledirs is flaky

find: `/etc/gshadow.lock': No such file or directory
find: `/etc/shadow.lock': No such file or directory
find: `/etc/passwd.lock': No such file or directory
find: `/etc/group.lock': No such file or directory
...
2016-08-23T03:27:09Z kola: --- FAIL: coreos.filesystem.writabledirs on gce (34.599s)
2016-08-23T03:27:09Z kola:         Failed to run find: output [], status: Process exited with: 1. Reason was:  ()

it looks like there's a race of find reading the direntries of /etc. possibly racing against the gce agent adding users.

what's a sane way to fix this race? stop google-accounts-manager.service?

kola: run external Docker unit/functional tests

Docker has its own set of unit/functional tests. Lets investigate making these run on kola without having to import and manually update this test code. Ideally, this test runs the set of Docker tests that most closely match the version of Docker in the CoreOS image being run.

kola: test basic docker networking

Make sure outbound traffic from containers works properly.

kola: selinux tests

right now there's some outstanding quirks with selinux on coreos. kola should have tests that set up selinux in enforcing mode and do some basic sanity checks.

cork: download-image: rename qemu->raw, support downloading qemu qcow images

current when one specifies qemu as a platform to cork download-image, it downloads coreos_production_image.bin.bz2 which is a raw disk file. instead, qemu should download coreos_production_qemu_image.img.bz2 which is a qemu qcow image, and raw should download coreos_production_image.bin.bz2.

kola/jenkins: authenticate with GS to allow fetching from private buckets

The imageroot parameter in kola and kola-gce jobs should work with private buckets like builds.release.core-os.net so they can be used to test releases.

qemu.image and other flags broken

since c9c2611 flags defined in https://github.com/coreos/mantle/blob/master/kola/flags.go no longer work because flag.Parse is no longer called.

bump k8s test to v1.3.5_coreos.1

kola: emit test results in a format jenkins can parse

using JUnit with https://github.com/jstemmer/go-junit-report seems like a good choice, since jenkins supports JUnit.

QEMU cluster does not stop ntp.Server

network/ntp.Server is brought up during qemu tests, but is never stopped.

/cc @marineam

kola: test that no systemd units fail

we should be testing that no systemd units have failed. this would hopefully catch e.g. coreos/bugs#917 or coreos/bugs#914 or coreos/bugs#447.

kola: manual test jenkins job against AWS

To have parity with the release tests that we are running today we need to run against a set of hosts on AWS. Steps:

Setup up a "manual AWS jenkins job" that takes an existing AMI as a parameter, write docs on how to do this
Setup a job that can look at AMI ids for a release on the release mirror and run the AWS tests
Setup a job that can look at AMI ids for a release on a private release URL and run the AWS tests

Update coreos.com/releases

This unnamed tool will grab the latest release info and publish it to coreos.com/releases.

kola: comprehensive flannel tests

we need to have tests for flannel. currently, we cannot test flannel on our qemu target because it has no internet connectivity and thus cannot pull the docker image.

however, we can write tests for gce and aws, and test flannel's gce and aws-vpc backends.

kola: test automatically against GCE

@pbx0 can you edit this and add in your tasks.

Publish Azure

This is going to require either a client on the Windows machine or reverse engineering their utilities.

kola: etcd1 tests fail

etcd1 tests fail because etcdctl now tries to reach a uri which 404s with etcd1. we either need to rewrite the tests to use e.g. curl for etcd1, or nuke the tests.

kola: test docker at scale

We need to create some tests focused on running large amounts of Docker containers in order to stress it. This is a good starting point: coreos/bugs#481 (comment).

kola: run coretest equivilant

kola: write cluster tests for etcd3 in rkt container

kola: separate testing of kola from testing of OS using kola

This is a summary of @marineam's suggestion on having stable kola releases:

Since its undesirable to have our release builds break because kola happened to break, we want the release tests to use a known-good version of kola. Testing kola itself will continue to happen by trigging test runs from a PR in mantle and always use the latest commit to master.

Bumping the ebuild in the SDK can be the definitive process for cutting a release of kola. Doing this means you've tested that the latest kola commit works fine against the current SDK builds. This also supports developers working in the SDK so they can locally run and test their latest image builds using a stable kola commit.

To automatically propagate ebuild bumps (new kola releases) to be immediately used in our release tests, we will have to upload the latest kola builds alongside the latest OS image builds. The release tests can then just use that version of kola rather then compiling from master.

kola: test kubernetes using CoreOS docs

Currently, a kubernetes multi-node smoke test exists that uses a fairly direct translation of some upstream community docs. This test should have its cloud-configs replaced to use our own docs which include TLS and using the built-in CoreOS kubelet service file. This is important to test our built in kubelet binary (soon to be an aci).

kola: figure out why ssh fails

almost every test run in qemu we see:

2015-12-09T22:00:26Z kola: Cluster failed starting machines: ssh unreachable: dial tcp 10.0.0.2:22: getsockopt: connection refused

causing tests to fail.

GitHub bug wrangler

We need a new tool to help manage GitHub bugs, possible features include:

View/sort bugs across repos
View related bugs across repos (bug report and PR are often in different places)
Update bugs as a fix rolls through the release process so people can track when it hits alpha, beta, stable.
Help establish a pattern we can use to organize and prioritize bugs.

In short, we are terrible at tracking bugs right now. We need to fix that.

kola: test ignition

kola currently tests coreos-cloudinit pretty thoroughly, although indirectly.

kola should also test ignition, with a good base set of ignition configurations that meet common use cases.

kola: list tests by available platforms

UI bug essentially. If you do kola list and see all the tests and then try to run just that test on a platform for which it is not available, 0 tests will be listed. This is confusing, so just make it clear in kola list which platforms the tests are available on.

Update CoreOS docs

Similar to updating the releases page, this will update the docs in the rolling fashion discussed in person.

kola: use text/template for userdata templates in test specifications

kola: test etcd dns-discovery

The etcd team has found regressions related to dns-discovery before it makes sense to add to kola. https://github.com/coreos/etcd/blob/master/Documentation/clustering.md#dns-discovery

kola/jenkins: trigger builds on private bucket

Trigger gce and qemu kola builds on new releases appearing in private bucket (builds.release.core-os.net). Depends on completing #135

platform: packet.net support

https://github.com/packethost/packngo

lack of internet access prevents testing flannel

flannel is pulled from quay.io, and since kola does not bridge to the wan under qemu, it cannot be tested.

would it be possible to connect the bridge kola creates to a nic connected to the wan via a flag?

plume pre-release: copy instead of re-upload

plume pre-release would be quicker to complete the azure stuff if it simply copied between azure storage containers rather than uploading twice.

serializing cluster state

i'd like to be able to serialize cluster state and reconstitute it later. this means saving the platform, and the instance IDs into a json file.

however, it's currently not possible to save the SSH private key, because the ssh agent used in the platform code doesn't expose private keys.

a simple fix is to expose the private key in network.SSHAgent.

for this to work, these things need to be serializable and able to be created from serialized form -

platform.Cluster
platform.Machine
network.SSHAgent

coreos / mantle Goto Github PK

mantle's Introduction

Mantle: Gluing Container Linux together

Overview

Tools

cork

cork create

cork enter

cork download-image

Building Container Linux with cork

gangue

gangue get

kola

kola run

kola list

kola spawn

kola mkimage

kola bootchart

kola updatepayload

kola subtest parallelization

kola test namespacing

kola test registration

kola test writing

kola native code

Manhole

kolet

ore

plume

plume pre-release

plume release

plume index

Platform Credentials

aws

azure

do

esx

gce

openstack

packet

qemu

qemu-unpriv

mantle's People

Contributors

Stargazers

Watchers

Forkers

mantle's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs