Post

Production Secrets Management with HashiCorp Vault and GitHub Actions

Production Secrets Management with HashiCorp Vault and GitHub Actions

Managing secrets in CI/CD pipelines is one of those problems that starts simple and quietly grows into a liability. GitHub’s native encrypted secrets work fine early on, but once you have a dozen repos each with their own set of credentials you start to feel the friction. Rotating a secret means touching every repo individually, there is no audit trail, and a compromised runner has access to everything it was ever given.

The main reason I looked into this now is managing client IDs and secrets for Authentik OAuth providers across multiple applications. Each app has its own client ID and secret, and keeping all of that in sync across GitHub Secrets was becoming a mess.

This post walks through how I replaced GitHub Secrets with HashiCorp Vault across my homelab CI/CD pipelines, using JWT/OIDC authentication so runners never hold a static token, and Terraform to manage the whole thing as code.


The Goal

  • Vault running in production on Ubuntu 24.04 with a valid TLS certificate
  • GitHub Actions runners authenticate to Vault via JWT/OIDC with no static tokens
  • Secrets scoped per repo so each repo can only read its own paths
  • Vault policies and roles managed as code in a dedicated GitHub repo
  • Adding a new repo to the system is just dropping a single file and opening a PR

Part 1 - Standing Up Vault

TLS First

Vault won’t run without TLS in production and it shouldn’t. Because the server isn’t publicly reachable on port 80, the HTTP-01 Let’s Encrypt challenge isn’t an option. Instead I used the DNS-01 challenge, which only requires adding a TXT record at your DNS provider and works regardless of whether anything is listening on the server.

1
2
3
4
5
6
7
8
sudo apt install -y certbot

sudo certbot certonly \
  --manual \
  --preferred-challenges dns \
  --agree-tos \
  --email you@yourdomain.com \
  -d vault.yourdomain.com

Certbot pauses and gives you a TXT record to add under _acme-challenge.vault.yourdomain.com. Add it at your DNS provider, wait for it to propagate, verify with dig before pressing Enter, and the certificate lands at /etc/letsencrypt/live/vault.yourdomain.com/.

1
2
# Verify propagation before continuing
watch -n 5 "dig TXT _acme-challenge.vault.yourdomain.com +short"

Certs are valid for 90 days. Set up a cron job to warn you before they expire and a deploy hook to reload Vault automatically on renewal.

Installing Vault

1
2
3
4
5
6
7
8
curl -fsSL https://apt.releases.hashicorp.com/gpg \
  | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg

echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] \
  https://apt.releases.hashicorp.com $(lsb_release -cs) main" \
  | sudo tee /etc/apt/sources.list.d/hashicorp.list

sudo apt update && sudo apt install -y vault

Configuration

Vault uses the integrated Raft storage backend so there is no external database needed. Place the following in /etc/vault.d/vault.hcl:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ui            = true
disable_mlock = false

storage "raft" {
  path    = "/opt/vault/data"
  node_id = "vault-node-1"
}

listener "tcp" {
  address         = "0.0.0.0:8200"
  tls_cert_file   = "/opt/vault/tls/vault.crt"
  tls_key_file    = "/opt/vault/tls/vault.key"
  tls_min_version = "tls13"
}

api_addr     = "https://vault.yourdomain.com:8200"
cluster_addr = "https://vault.yourdomain.com:8201"

Copy the certs, lock down permissions and start the service:

1
2
3
4
5
6
sudo mkdir -p /opt/vault/data /opt/vault/tls
sudo cp /etc/letsencrypt/live/vault.yourdomain.com/fullchain.pem /opt/vault/tls/vault.crt
sudo cp /etc/letsencrypt/live/vault.yourdomain.com/privkey.pem   /opt/vault/tls/vault.key
sudo chown -R vault:vault /opt/vault
sudo chmod 600 /opt/vault/tls/vault.key
sudo systemctl enable --now vault

Initialise and Unseal

1
2
export VAULT_ADDR="https://vault.yourdomain.com:8200"
vault operator init

This produces 5 unseal keys and a root token. Save them somewhere safe and offline as you cannot recover them. Unseal with any 3 of the 5 keys:

1
2
3
vault operator unseal  # key 1
vault operator unseal  # key 2
vault operator unseal  # key 3

Part 2 - JWT/OIDC Authentication for GitHub Actions

The key design decision here is using GitHub’s built-in OIDC token issuer rather than a static Vault token. Every job requests a short-lived signed JWT from GitHub, presents it to Vault, and receives a Vault token valid for 5 minutes. When the job ends the token is already expired and a compromised runner gets nothing reusable.

This requires no secrets to be stored in GitHub at all. The runner just needs network access to Vault.

Enable the JWT Auth Backend

1
2
3
4
5
vault auth enable jwt

vault write auth/jwt/config \
  oidc_discovery_url="https://token.actions.githubusercontent.com" \
  bound_issuer="https://token.actions.githubusercontent.com"

Enable KV v2

1
vault secrets enable -path=secret kv-v2

Store Your Secrets

1
2
3
vault kv put secret/myapp/config \
  api_key="your-api-key" \
  db_password="your-db-password"

Create a Policy

1
2
3
4
5
vault policy write myapp - <<'POLICY'
path "secret/data/myapp/*" {
  capabilities = ["read"]
}
POLICY

Create a JWT Role

The role binds a GitHub repository to a policy. The runner’s JWT is validated against the repository claim so only jobs from yourorg/yourrepo can assume this role.

One thing to watch out for here is that passing bound_claims as a JSON string on the command line gets mangled by the shell. Pass it as a file instead:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat > /tmp/role.json <<'ROLE'
{
  "role_type": "jwt",
  "bound_audiences": ["https://github.com/yourorg"],
  "user_claim": "sub",
  "bound_claims_type": "glob",
  "bound_claims": { "repository": "yourorg/yourrepo" },
  "policies": ["myapp"],
  "ttl": "5m"
}
ROLE

vault write auth/jwt/role/myapp @/tmp/role.json
rm /tmp/role.json

The GitHub Actions Workflow

The workflow needs id-token: write permission to request a JWT from GitHub. The hashicorp/vault-action step handles the authentication and injects secrets as environment variables for all subsequent steps.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
jobs:
  deploy:
    runs-on: self-hosted
    permissions:
      id-token: write
      contents: read

    steps:
      - name: Import secrets from Vault
        uses: hashicorp/vault-action@v3
        with:
          url: https://vault.yourdomain.com:8200
          method: jwt
          role: myapp
          secrets: |
            secret/data/myapp/config api_key     | API_KEY ;
            secret/data/myapp/config db_password | DB_PASSWORD

      - name: Deploy
        run: ./deploy.sh

Part 3 - Managing Vault as Code

Once you have more than a couple of repos using Vault, managing roles and policies by hand does not scale. I created a dedicated vault-config repo that uses Terraform to manage everything, with one file per repo.

The Repo Structure

1
2
3
4
5
6
7
8
terraform/
├── main.tf         # discovers repos/*.hcl automatically
├── provider.tf
├── backend.tf
└── repos/
    ├── myapp.hcl
    ├── another-repo.hcl
    └── vault-config.hcl.current  # bootstrap role, manually managed

One File Per Repo

Each .hcl file in repos/ contains only the Vault path rules for that repo. The filename is the repo name, that is it.

terraform/repos/myapp.hcl:

1
2
3
path "secret/data/myapp/*" {
  capabilities = ["read"]
}

Terraform Discovers Them Automatically

main.tf uses fileset to find every .hcl file and creates a matching Vault policy and JWT role from it. No other files need to change when you add a new repo.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
locals {
  repo_files = fileset("${path.module}/repos", "*.hcl")

  repos = {
    for f in local.repo_files :
    trimsuffix(f, ".hcl") => file("${path.module}/repos/${f}")
  }
}

resource "vault_policy" "repos" {
  for_each = local.repos
  name     = each.key
  policy   = each.value
}

resource "vault_jwt_auth_backend_role" "repos" {
  for_each          = local.repos
  backend           = "jwt"
  role_name         = each.key
  role_type         = "jwt"
  user_claim        = "sub"
  bound_audiences   = ["https://github.com/yourorg"]
  bound_claims_type = "glob"
  bound_claims      = { repository = "yourorg/${each.key}" }
  token_policies    = [vault_policy.repos[each.key].name]
  token_ttl         = 300
}

Adding a new repo to Vault is now:

1
2
3
4
5
6
7
8
9
cat > terraform/repos/new-repo.hcl <<'HCL'
path "secret/data/new-repo/*" {
  capabilities = ["read"]
}
HCL

git add terraform/repos/new-repo.hcl
git commit -m "feat: add vault access for new-repo"
# open a PR, review the plan, merge

What Not to Manage in Terraform

The JWT auth backend, KV mount, and the vault-config role itself are intentionally left out of Terraform. If a bad apply destroyed the JWT backend the workflow couldn’t authenticate to Vault to fix it, a deadlock with no way out. Everything else, policies and per-repo roles, is safe for Terraform to own.


The Result

Every repo that needs secrets gets a single .hcl file defining exactly what it can read. The CI/CD workflow authenticates with a 5 minute token that is already expired by the time anyone could misuse it. Rotating a secret means updating one value in Vault with no GitHub repo settings to touch. And the audit log shows exactly which job read which secret and when.

It is a bit of upfront infrastructure but once it is running the operational overhead is close to zero.

This post is licensed under CC BY 4.0 by the author.