Authentik as Code with Terraform, Vault, and GitHub Actions
Managing every Authentik group, user, and OAuth2 application as code with Terraform, secrets in Vault, and a GitHub Actions pipeline that plans on each PR and applies on merge to main.
Authentik ended up being the single most important service in my lab. It is the front door to a dozen self-hosted apps, from dashboards to a wiki to file storage, all of them handing off login to one identity provider. That is great until you realise the entire thing is configured by clicking around a web UI. Who is in which group? Which redirect URIs does an app actually allow? When did that provider change? The honest answer was usually “I think I set that up months ago,” which is exactly the answer you do not want for the service that guards everything else.
So I did the same thing I did with DNS: I treated it as infrastructure. Groups, users, OAuth2 providers, applications, and the policy bindings that tie them together all live in Terraform now. Every change is a pull request with a plan attached, nothing gets applied until it merges, and a job runs every morning to tell me if anyone touched the live config behind Terraform’s back.
The Goal
- Define groups, users, and every OAuth2 application as code, in one repo.
- Keep secrets (the API token, per-app client IDs and secrets) out of the repo entirely.
- Plan on every PR so I can see exactly what will change before it happens.
- Apply only on merge to
main, gated behind a protected environment. - Get told automatically when the live config drifts from the repo.
Part 1 - The Provider and State
The Authentik Terraform provider talks to the same API the web UI uses. The provider config is two lines plus a token that never appears in the repo:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
terraform {
required_version = ">= 1.13.4"
required_providers {
authentik = {
source = "goauthentik/authentik"
version = "~> 2026.2.0"
}
}
}
provider "authentik" {
url = "https://auth.example.com"
token = var.authentik_api_token
}
State lives in S3 with native locking turned on. The lockfile is the part that matters: it means two runs can never apply over each other and corrupt state, which is a real risk once a daily drift job and a merge can both fire on the same day.
1
2
3
4
5
6
7
8
9
terraform {
backend "s3" {
bucket = "example-tfstate"
key = "authentik/terraform.tfstate"
region = "ap-southeast-2"
encrypt = true
use_lockfile = true # native S3 state locking, no DynamoDB table needed
}
}
Part 2 - Config You Edit, Secrets You Do Not
The split I care about most is between configuration and secrets. Configuration is the stuff I edit every week and is perfectly safe to commit: which groups exist, who is in them, and what each app looks like. That all lives in environment/production.tfvars.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
groups = {
administrators = { is_superuser = true }
grafana = { is_superuser = false }
nextcloud = { is_superuser = false }
wiki = { is_superuser = false }
}
users = {
asmith = {
name = "alex smith"
email = "alex@example.com"
groups = ["administrators", "grafana", "nextcloud", "wiki"]
}
jlee = {
name = "jordan lee"
email = "jordan@example.com"
groups = ["grafana", "wiki"]
}
}
apps = {
grafana = {
redirect_uris = ["https://grafana.example.com/login/generic_oauth"]
groups = ["grafana"]
icon_url = "https://cdn.jsdelivr.net/gh/selfhst/icons@main/png/grafana.png"
}
nextcloud = {
redirect_uris = ["https://cloud.example.com/apps/oidc_login/oidc"]
groups = ["nextcloud"]
icon_url = "https://cdn.jsdelivr.net/gh/selfhst/icons@main/png/nextcloud.png"
}
}
Secrets never go near the repo. The Authentik API token and every app’s client_id and client_secret come from Vault at run time, injected as TF_VAR_* environment variables. The variable definitions mark them sensitive so they stay out of plan output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
variable "authentik_api_token" {
description = "Authentik API token"
type = string
sensitive = true
}
variable "app_secrets" {
description = "Client IDs and secrets for OAuth2 apps"
type = map(object({
client_id = string
client_secret = string
}))
sensitive = true
}
Part 3 - Turning Maps Into Resources
The whole config is data-driven. Rather than writing a block per app, every map in production.tfvars gets expanded with for_each, so adding an app is a data edit, not a code edit.
Groups and users come first, with users referencing groups by key so membership stays readable:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
resource "authentik_group" "groups" {
for_each = var.groups
name = each.key
is_superuser = each.value.is_superuser
}
resource "authentik_user" "users" {
for_each = var.users
username = each.key
name = each.value.name
email = each.value.email
groups = [
for g in each.value.groups :
authentik_group.groups[g].id
]
}
Each app becomes an OAuth2 provider plus an application. This is also where the configuration and the secret get stitched back together: the public bits come from var.apps, the client_id and client_secret from var.app_secrets. A precondition guards the join so a missing Vault entry fails loudly with a message that tells me exactly what to fix, instead of a confusing null error deep in the plan:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
resource "authentik_provider_oauth2" "apps" {
for_each = var.apps
name = each.key
client_id = var.app_secrets[each.key].client_id
client_secret = var.app_secrets[each.key].client_secret
authorization_flow = data.authentik_flow.default_provider_authorization.id
invalidation_flow = data.authentik_flow.default_invalidation_flow.id
signing_key = data.authentik_certificate_key_pair.default.id
allowed_redirect_uris = [
for uri in each.value.redirect_uris : {
matching_mode = "strict"
url = uri
}
]
lifecycle {
precondition {
condition = contains(keys(var.app_secrets), each.key)
error_message = "No client_id/client_secret found in Vault (app_secrets) for app '${each.key}'."
}
}
}
The last piece is access control. I do not bind users to apps directly; I bind groups. A small flatten turns the groups list on each app into one binding per app-and-group pair, so “who can log into what” is driven entirely by group membership:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
locals {
app_group_pairs = flatten([
for app, cfg in var.apps : [
for g in cfg.groups : { app = app, group = g }
]
])
}
resource "authentik_policy_binding" "app_bindings" {
for_each = {
for pair in local.app_group_pairs :
"${pair.app}-${pair.group}" => pair
}
target = authentik_application.apps[each.value.app].uuid
group = authentik_group.groups[each.value.group].id
order = 0
}
One app needed more than the standard scopes. My wiki wants a username claim that the default mappings do not emit, so that gets added as a custom property mapping and switched on per app with an optional extra_scopes = true flag in the tfvars. Everything else just gets openid, email, and profile.
Part 4 - The Pipeline
CI runs on a self-hosted runner and authenticates the same way for every job, so all the setup lives in one composite action, tf-prepare. It pulls secrets from Vault over JWT, assumes an AWS role for the state backend via OIDC, installs Terraform, and runs init:
1
2
3
4
5
6
7
8
9
- name: 🔐 Import secrets from Vault
uses: hashicorp/vault-action@v4
with:
url: https://vault.example.com:8200
method: jwt
role: authentik
secrets: |
secret/data/authentik/config authentik_api_token | TF_VAR_authentik_api_token ;
secret/data/authentik/app_secrets value | TF_VAR_app_secrets
From there the main workflow has three jobs, each scoped to when it should run:
- validate runs on every push and PR:
terraform fmt -check,terraform validate, andtflint. Nothing else runs if this fails. - plan runs on PRs only. It writes the plan and posts it as a comment on the PR, updating the same comment in place on each push so the thread does not fill up with stale plans.
- apply runs only on a push to
main, and only after passing through theproductionGitHub environment, which is where the approval gate lives.
1
2
3
4
5
6
7
8
9
10
11
12
apply:
needs: validate
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: self-hosted
environment: production
steps:
- uses: actions/checkout@v7
- uses: ./.github/actions/tf-prepare
- run: terraform plan -input=false -var-file=environment/production.tfvars -out=authentik.tfplan
working-directory: terraform
- run: terraform apply -input=false -auto-approve authentik.tfplan
working-directory: terraform
Every Terraform job, including the drift one below, shares a single terraform-authentik concurrency group with cancel-in-progress: false. That serializes everything: a drift check can never run while an apply is holding the state lock, and vice versa.
Part 5 - Catching Drift
The weak point of any “config as code” setup is the web UI still being right there. Someone (often me, in a hurry) edits a redirect URI by hand, and now the repo is lying about reality. A scheduled job catches that. Every morning it runs a plan with -detailed-exitcode, which returns 2 when there is a diff:
1
2
3
4
5
6
7
8
9
- name: 🔍 Detect drift
id: drift
run: |
set +e
terraform plan -input=false -detailed-exitcode \
-var-file=environment/production.tfvars -out=authentik.tfplan
echo "exitcode=$?" >> "$GITHUB_OUTPUT"
exit 0
working-directory: terraform
If it finds drift, it opens (or updates) a GitHub issue labelled authentik-drift with the plan attached, and fails the run so it shows up red. Once the live config matches the repo again, the next clean run comments on the issue and closes it. The result is a service that tells me when it has been changed out from under its own source of truth, which for an identity provider is exactly the kind of thing I want to hear about.
The Day-to-Day
In practice the workflow is dull, which is the point:
- Add an app? Add an entry under
appsinproduction.tfvars, drop itsclient_idandclient_secretinto Vault, open a PR. The plan shows up as a comment. Merge it and the app exists. - Onboard a person? Add them to
userswith the right group list. The policy bindings already exist, so group membership is the only lever. - Someone clicked something in the UI? I get a drift issue in the morning, and I either fix the repo to match or merge to re-assert the desired state.
The thing I actually gained is not speed, it is auditability. Every change to my identity provider is now a commit with a name, a date, and a diff, and the one service that guards everything else is no longer configured by memory.