Post

Authentik as Code with Terraform, Vault, and GitHub Actions

Managing every Authentik group, user, and OAuth2 application as code with Terraform, secrets in Vault, and a GitHub Actions pipeline that plans on each PR and applies on merge to main.

Authentik as Code with Terraform, Vault, and GitHub Actions

Authentik ended up being the single most important service in my lab. It is the front door to a dozen self-hosted apps, from dashboards to a wiki to file storage, all of them handing off login to one identity provider. That is great until you realise the entire thing is configured by clicking around a web UI. Who is in which group? Which redirect URIs does an app actually allow? When did that provider change? The honest answer was usually “I think I set that up months ago,” which is exactly the answer you do not want for the service that guards everything else.

So I did the same thing I did with DNS: I treated it as infrastructure. Groups, users, OAuth2 providers, applications, and the policy bindings that tie them together all live in Terraform now. Every change is a pull request with a plan attached, nothing gets applied until it merges, and a job runs every morning to tell me if anyone touched the live config behind Terraform’s back.


The Goal

  • Define groups, users, and every OAuth2 application as code, in one repo.
  • Keep secrets (the API token, per-app client IDs and secrets) out of the repo entirely.
  • Plan on every PR so I can see exactly what will change before it happens.
  • Apply only on merge to main, gated behind a protected environment.
  • Get told automatically when the live config drifts from the repo.

Part 1 - The Provider and State

The Authentik Terraform provider talks to the same API the web UI uses. The provider config is two lines plus a token that never appears in the repo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
terraform {
  required_version = ">= 1.13.4"
  required_providers {
    authentik = {
      source  = "goauthentik/authentik"
      version = "~> 2026.2.0"
    }
  }
}

provider "authentik" {
  url   = "https://auth.example.com"
  token = var.authentik_api_token
}

State lives in S3 with native locking turned on. The lockfile is the part that matters: it means two runs can never apply over each other and corrupt state, which is a real risk once a daily drift job and a merge can both fire on the same day.

1
2
3
4
5
6
7
8
9
terraform {
  backend "s3" {
    bucket       = "example-tfstate"
    key          = "authentik/terraform.tfstate"
    region       = "ap-southeast-2"
    encrypt      = true
    use_lockfile = true # native S3 state locking, no DynamoDB table needed
  }
}

Part 2 - Config You Edit, Secrets You Do Not

The split I care about most is between configuration and secrets. Configuration is the stuff I edit every week and is perfectly safe to commit: which groups exist, who is in them, and what each app looks like. That all lives in environment/production.tfvars.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
groups = {
  administrators = { is_superuser = true }
  grafana        = { is_superuser = false }
  nextcloud      = { is_superuser = false }
  wiki           = { is_superuser = false }
}

users = {
  asmith = {
    name   = "alex smith"
    email  = "alex@example.com"
    groups = ["administrators", "grafana", "nextcloud", "wiki"]
  }

  jlee = {
    name   = "jordan lee"
    email  = "jordan@example.com"
    groups = ["grafana", "wiki"]
  }
}

apps = {
  grafana = {
    redirect_uris = ["https://grafana.example.com/login/generic_oauth"]
    groups        = ["grafana"]
    icon_url      = "https://cdn.jsdelivr.net/gh/selfhst/icons@main/png/grafana.png"
  }

  nextcloud = {
    redirect_uris = ["https://cloud.example.com/apps/oidc_login/oidc"]
    groups        = ["nextcloud"]
    icon_url      = "https://cdn.jsdelivr.net/gh/selfhst/icons@main/png/nextcloud.png"
  }
}

Secrets never go near the repo. The Authentik API token and every app’s client_id and client_secret come from Vault at run time, injected as TF_VAR_* environment variables. The variable definitions mark them sensitive so they stay out of plan output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
variable "authentik_api_token" {
  description = "Authentik API token"
  type        = string
  sensitive   = true
}

variable "app_secrets" {
  description = "Client IDs and secrets for OAuth2 apps"
  type = map(object({
    client_id     = string
    client_secret = string
  }))
  sensitive = true
}

Part 3 - Turning Maps Into Resources

The whole config is data-driven. Rather than writing a block per app, every map in production.tfvars gets expanded with for_each, so adding an app is a data edit, not a code edit.

Groups and users come first, with users referencing groups by key so membership stays readable:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
resource "authentik_group" "groups" {
  for_each     = var.groups
  name         = each.key
  is_superuser = each.value.is_superuser
}

resource "authentik_user" "users" {
  for_each = var.users
  username = each.key
  name     = each.value.name
  email    = each.value.email
  groups = [
    for g in each.value.groups :
    authentik_group.groups[g].id
  ]
}

Each app becomes an OAuth2 provider plus an application. This is also where the configuration and the secret get stitched back together: the public bits come from var.apps, the client_id and client_secret from var.app_secrets. A precondition guards the join so a missing Vault entry fails loudly with a message that tells me exactly what to fix, instead of a confusing null error deep in the plan:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
resource "authentik_provider_oauth2" "apps" {
  for_each = var.apps

  name               = each.key
  client_id          = var.app_secrets[each.key].client_id
  client_secret      = var.app_secrets[each.key].client_secret
  authorization_flow = data.authentik_flow.default_provider_authorization.id
  invalidation_flow  = data.authentik_flow.default_invalidation_flow.id
  signing_key        = data.authentik_certificate_key_pair.default.id

  allowed_redirect_uris = [
    for uri in each.value.redirect_uris : {
      matching_mode = "strict"
      url           = uri
    }
  ]

  lifecycle {
    precondition {
      condition     = contains(keys(var.app_secrets), each.key)
      error_message = "No client_id/client_secret found in Vault (app_secrets) for app '${each.key}'."
    }
  }
}

The last piece is access control. I do not bind users to apps directly; I bind groups. A small flatten turns the groups list on each app into one binding per app-and-group pair, so “who can log into what” is driven entirely by group membership:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
locals {
  app_group_pairs = flatten([
    for app, cfg in var.apps : [
      for g in cfg.groups : { app = app, group = g }
    ]
  ])
}

resource "authentik_policy_binding" "app_bindings" {
  for_each = {
    for pair in local.app_group_pairs :
    "${pair.app}-${pair.group}" => pair
  }

  target = authentik_application.apps[each.value.app].uuid
  group  = authentik_group.groups[each.value.group].id
  order  = 0
}

One app needed more than the standard scopes. My wiki wants a username claim that the default mappings do not emit, so that gets added as a custom property mapping and switched on per app with an optional extra_scopes = true flag in the tfvars. Everything else just gets openid, email, and profile.


Part 4 - The Pipeline

CI runs on a self-hosted runner and authenticates the same way for every job, so all the setup lives in one composite action, tf-prepare. It pulls secrets from Vault over JWT, assumes an AWS role for the state backend via OIDC, installs Terraform, and runs init:

1
2
3
4
5
6
7
8
9
- name: 🔐 Import secrets from Vault
  uses: hashicorp/vault-action@v4
  with:
    url: https://vault.example.com:8200
    method: jwt
    role: authentik
    secrets: |
      secret/data/authentik/config      authentik_api_token | TF_VAR_authentik_api_token ;
      secret/data/authentik/app_secrets value               | TF_VAR_app_secrets

From there the main workflow has three jobs, each scoped to when it should run:

  • validate runs on every push and PR: terraform fmt -check, terraform validate, and tflint. Nothing else runs if this fails.
  • plan runs on PRs only. It writes the plan and posts it as a comment on the PR, updating the same comment in place on each push so the thread does not fill up with stale plans.
  • apply runs only on a push to main, and only after passing through the production GitHub environment, which is where the approval gate lives.
1
2
3
4
5
6
7
8
9
10
11
12
apply:
  needs: validate
  if: github.event_name == 'push' && github.ref == 'refs/heads/main'
  runs-on: self-hosted
  environment: production
  steps:
    - uses: actions/checkout@v7
    - uses: ./.github/actions/tf-prepare
    - run: terraform plan -input=false -var-file=environment/production.tfvars -out=authentik.tfplan
      working-directory: terraform
    - run: terraform apply -input=false -auto-approve authentik.tfplan
      working-directory: terraform

Every Terraform job, including the drift one below, shares a single terraform-authentik concurrency group with cancel-in-progress: false. That serializes everything: a drift check can never run while an apply is holding the state lock, and vice versa.


Part 5 - Catching Drift

The weak point of any “config as code” setup is the web UI still being right there. Someone (often me, in a hurry) edits a redirect URI by hand, and now the repo is lying about reality. A scheduled job catches that. Every morning it runs a plan with -detailed-exitcode, which returns 2 when there is a diff:

1
2
3
4
5
6
7
8
9
- name: 🔍 Detect drift
  id: drift
  run: |
    set +e
    terraform plan -input=false -detailed-exitcode \
      -var-file=environment/production.tfvars -out=authentik.tfplan
    echo "exitcode=$?" >> "$GITHUB_OUTPUT"
    exit 0
  working-directory: terraform

If it finds drift, it opens (or updates) a GitHub issue labelled authentik-drift with the plan attached, and fails the run so it shows up red. Once the live config matches the repo again, the next clean run comments on the issue and closes it. The result is a service that tells me when it has been changed out from under its own source of truth, which for an identity provider is exactly the kind of thing I want to hear about.


The Day-to-Day

In practice the workflow is dull, which is the point:

  • Add an app? Add an entry under apps in production.tfvars, drop its client_id and client_secret into Vault, open a PR. The plan shows up as a comment. Merge it and the app exists.
  • Onboard a person? Add them to users with the right group list. The policy bindings already exist, so group membership is the only lever.
  • Someone clicked something in the UI? I get a drift issue in the morning, and I either fix the repo to match or merge to re-assert the desired state.

The thing I actually gained is not speed, it is auditability. Every change to my identity provider is now a commit with a name, a date, and a diff, and the one service that guards everything else is no longer configured by memory.

This post is licensed under CC BY 4.0 by the author.