GithubHelp home page GithubHelp logo

Comments (6)

bgandon avatar bgandon commented on August 15, 2024 1

As a first step for fix, I’ve contributed the cloudfoundry/config-server#17 PR.
This change will then have to be pulled by Bosh CLI.

CredHub server and CLI don’t seem to be affected.

With Bosh Director though, I’m concerned by NATS client-certs. Indeed, it looks like these certs are generated with the 3-letters USA country code, and trusted with that exact country code.
See https://github.com/cloudfoundry/bosh/blob/main/jobs/nats/templates/nats.cfg.erb#L10-L32.

Fixing this might require doing it in two distinct phases, with proper transition for trusting both types of generated certificates, with either US or USA country code. Don’t hesitate to bring more context here @rkoster or @jpalermo for a better understanding on the NATS client certs generation and trust mechanisms.

from bosh.

bgandon avatar bgandon commented on August 15, 2024 1

Following up on this, if we re-generate a NATS CA with subject fixed (using proper X.520 country code, based on 2-letters country codes from ISO 3166), the problem with the new NATS CA re-generated by CredHub is the same as when the NATS CA expires and needs to be rotated.

For the time being, it’s still unclear to me how the NATS CA can safely be rotated, so I need more input to understand what is scenario is already supported and how.

Indeed, I imagine the Director has to “see” that the NATS CA has changed, and re-generate all NATS client certificates for Agents, then push these through the live “update VM settings” mechanism.… that unfortunately relies on NATS. This would require the NATS server to temporarily trust both the old and the new NATS CA (because Agent are still using old client certs to connect to NATS and receive updated VM settings), and I’m not sure if there is another safe way, so I need to do more investigations on that.

Then, there is also the question of eliminating the countryName (C) “RelativeDistinguishedName” (or “RDN”, as the RFC says) from the certificates subject. So far, I see no reason why the countryName RDN would be mandatory there (and the RFC 2253 pointed out this afternoon doesn’t state anything like that as being mandatory, nor the RFC 2459 that was mentioned above, and might also be more relevant on this) correct me if I’m wrong. I need to do some test on some Bosh and switch the Director to generating NATS certs without the countryName RDN and see what happens.

Finally, @rkoster has mentioned today in the CFF Foundational Infrastructure meeting the TLS Authentication chapter of the NATS documentation, which links to the RFC 2253. It’s interesting to see in section 2.2 and later in the examples that a built-in feature is to be able to “multi valuate” RDN, separating them with a + sign.

This means that we could possibly trust certificates with Subject: C=US+C=USA, O=Cloud Foundry, CN=bla-bla-bla, or try something in that direction.

from bosh.

bgandon avatar bgandon commented on August 15, 2024 1

On the subject of multi-valued RDNs like C=US+C=USA, well it doesn’t mean “US or USA” but instead means “US and USA”. So, that would not help us.

More interesting, the NATS client certs don’t need a countryName (C) RDN (formal verification below). But the challenge is to synchronize the Subject of the NATS certificates (generated by the Director) with the usernames put in the NATS config (by the separate bosh-nats-sync).

If we suddenly change the NATS client certs subject in the Director’s code, then we need:

  1. To trigger a recreation of all the client certificates. There is no such trigger in the code and implementing it would need to remove it at some point. And as discussed last week with @jpalermo, “when?” is a very good question! 😆
  2. To deliver those recreated certs with synchronized NATS config for proper temporary permissions on both certificates, or switch permissions to the new certificate only after it has properly been delivered to the Agent. (The Agent may probably exit after NATS connexion failure (with retries) and sv will restart the Agent until it successfully connects.) In order to synchronise the Director with the external NATS config generator, then the NATS usernames can no more be a shared convention between these separate components, and shall be transmitted by the Director to the NATS config generator. This may involve storing the NATS username in the database, which involves tough refactoring with the only small benefit of aligning the countryName (C) RDN with RFC.

Instead, the NATS client certs could be re-generated as part of a normal NATS CA rotation process, as documented in “Rotating NATS Certificate Authorities”.

My advice would be that the generate_nats_client_certificate(common_name) method (in NatsClientCertGenerator class) is modified in order to grab the countryName (C) RDN out of the NATS CA, and use it in the generated NATS client certificates.

Then the agent_user(agent_id, cn) method (in the NATSSync::NatsAuthConfig class) would need to know the countryName (C) RDN from the CA subject in order to put the correct user names in the generate NATS config. Such transmission of information is already done for the Director and Health Monitor NATS usernames. Adding a third one for NATS CA countryName is affordable.

This way, the NATS client certs re-generation (with the correct countryName (C) RDN) would be synchronized with the NATS CA change, and no new trigger in the Director would be necessary. Operators would do the operation at their own pace, following the usual NATS CA rotation process when the time has come for them to do so.

Code ref.: short-lived NATS credentials: dec31de

NATS auth doesn’t need the countryCode RDN incerts subject

On a running Director VM, patch both “director” and “nats-sync” Gems.

# cd /var/vcap/packages/director
# patch -p1 # paste the 1st patch below then type Control-D (possibly two times)
--- director/gem_home/ruby/3.2.0/gems/bosh-director-0.0.0/lib/bosh/director/nats_client_cert_generator.rb	2023-11-07 11:53:28.089549055 +0000
+++ director/gem_home/ruby/3.2.0/gems/bosh-director-0.0.0/lib/bosh/director/nats_client_cert_generator.new.rb	2023-11-07 11:52:24.473567658 +0000
@@ -35,7 +35,7 @@
 
       cert.serial = SecureRandom.hex(16).to_i(16)
 
-      cert.subject = OpenSSL::X509::Name.parse "/C=USA/O=Cloud Foundry/CN=#{common_name}"
+      cert.subject = OpenSSL::X509::Name.parse "/O=Cloud Foundry/CN=#{common_name}"
       cert.issuer = @root_ca.subject # root CA is the issuer
       cert.public_key = key.public_key
       cert.not_before = Time.now
^D
# cd /var/vcap/packages/nats
# patch -p1 # paste the 2nd patch below then type Control-D
--- nats/gem_home/ruby/3.2.0/gems/bosh-nats-sync-0.0.0/lib/nats_sync/nats_auth_config.rb	2023-10-28 23:13:14.000000000 +0000
+++ nats/gem_home/ruby/3.2.0/gems/bosh-nats-sync-0.0.0/lib/nats_sync/nats_auth_config.new.rb	2023-11-07 12:09:33.345254409 +0000
@@ -30,7 +30,7 @@
 
     def agent_user(agent_id, cn)
       {
-        'user' => "C=USA, O=Cloud Foundry, CN=#{cn}.agent.bosh-internal",
+        'user' => "O=Cloud Foundry, CN=#{cn}.agent.bosh-internal",
         'permissions' => {
           'publish' => [
             "hm.agent.heartbeat.#{agent_id}",
^D

Before running monit restart bosh_nats_sync, one can inspect some NATS client certificate

$ bosh ssh scratchpad/0 -d scratchpad
$ sudo apt update -qq && sudo apt install -y -qq jq
$ sudo jq -r .env.bosh.mbus.cert.certificate /var/vcap/bosh/settings.json | openssl x509 -noout -subject
subject=C = USA, O = Cloud Foundry, CN = 09b9b58e-96ad-4f9f-b0dd-4e078b23ea9e.bootstrap.agent.bosh-internal
$ exit

On the director restart director and bosh_nats_sync monit processes:

# monit restart director ; monit restart bosh_nats_sync ; watch -n1 monit summary # wait until both have restarted

The regenerated NATS config in /var/vcap/data/nats/auth.json makes all agents suddenly be irresponsive.

Try re-creating a VM.

$ bosh recreate --fix scratchpad/0 -d scratchpad --non-interactive
...
$ bosh ssh scratchpad/0 -d scratchpad
$ sudo apt update -qq && sudo apt install -y -qq jq
$ sudo jq -r .env.bosh.mbus.cert.certificate /var/vcap/bosh/settings.json | openssl x509 -noout -subject
subject=O = Cloud Foundry, CN = 67aead65-2985-43cb-ab63-191df827c373.bootstrap.agent.bosh-internal
$ exit

It works without the countryCode (C) RDN.

Director can get back to its former state applying the reversed patches with patch -p1 -R.

from bosh.

rkoster avatar rkoster commented on August 15, 2024

@gberche-orange great job on the analysis! @bgandon has volunteered during the Foundational Infrastructure working group meeting to make PRs to change all occurrences of USA to US.

from bosh.

gberche-orange avatar gberche-orange commented on August 15, 2024

great thanks @rkoster for the update and reminder about the working group meeting. We'll try to join next time we'll submit issues/prs.

Thinking it through, my initial proposal to introduce an opt-in property is unnecessary. The USA -> US country code change should not have negative side effects for bosh deployments that currently support zero-downtime certificate rotation using certs generated by bosh interpolate or bosh config servers: the update procedure already support having distinct certificates (N and N+1) with distinct Subject (with USA or US country code).

Thanks @bgandon for your proposal to submit PRs to change country code from USA to US in bosh, much appreciated !

from bosh.

gberche-orange avatar gberche-orange commented on August 15, 2024

Thanks a lot @bgandon, @rkoster and @jpalermo for your work on this issue, and sorry we were enable to participate in related discussions in the infrastructure working group meeting.

Within orange, your fix will enable the bosh directors created using bosh create-env to have valid x509 certs with 2 digits country codes.

From @jpalermo analysis cloudfoundry/config-server#17 (review)

I believe we decided there was no risk to these changes.

This code is pulled in by the bosh-cli and it will change how certificate variables are generated when doing a bosh create-env, but that will at most impact the CA/server cert generated for NATS, not any of the client certs, and the clients don't use the country for any sort of server validation.

I understand that this change will not imply unresponsive agents to these upgraded directors.

At orange, the bosh directors not deployed using bosh create-env (i.e. "nested bosh directors"), already use certificates patched to not include the country code in their subject. This enables operators to choose on emergency to renew the certificates using openssl cli (without rotating the private key and just extending the expiration date) without going through the full procedure documented at https://bosh.io/docs/nats-ca-rotation/ which requires two redeployments of each bosh deployment. This method, while being less secure than changing the private key, enables to us to avoid hitting the expired cert condition, especially on directors with a large number of deployments and vms, where the recovery through the deployment recreation has too heavy operational impact.

from bosh.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.