Skip to content

Commit 0b4180d

Browse files
committed
k8s: Terraform deployment for Azure clusters
This provides a Terraform configuration for deploying our Kubernetes clusters to Azure. We deploy an identical cluster to each of a list of regions, with one small node for admin purposes due to a requirement to not use spot instances for the main node group for the and two autoscaling groups one with small 8 core nodes for most jobs and one with bigger nodes for the more resource intensive ones. This is different to our current scheme where each cluster has a single node group and we direct jobs in Jenkins. With this scheme we allow the Kubernetes scheduler to place jobs, or we can still direct them to specific node sizes using nodeSelector in the jobs and the labels that are assigned to the nodegroups. This is a more Kubernetes way of doing things and decouples further from Jenkins. Signed-off-by: Mark Brown <[email protected]>
1 parent f043f71 commit 0b4180d

File tree

5 files changed

+173
-0
lines changed

5 files changed

+173
-0
lines changed

k8s/azure/README

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
This needs to be run on a machine with Terraform install and the
2+
Azure CLI. Personal login for the Azure CLI can be done with:
3+
4+
az login
5+
6+
The actual account used is a service principal account though:
7+
8+
az ad sp create-for-rbac -n kernelci-k8s
9+
10+
which outputs an appId and password, this should be distributed via the
11+
credential store and set in terraform variables, see variables.tf.
12+
13+
When the clusters are created a logged in user can set up the client
14+
credentials like this:
15+
16+
for c in $(az aks list --query '[].name' -o tsv) ; do
17+
az aks get-credentials --resource-group kernelci-workers --name ${c}
18+
done
19+
20+
(TBD: also put this in outputs.tf, need to figure out syntax for arrays)

k8s/azure/aks-cluster.tf

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Store the terraform state in cloud storage rather than just the
2+
# current directory, terraform supports Azure blob storage directly.
3+
# This means configuration doesn't need to be on a single machine
4+
# somewhere.
5+
#
6+
# See https://www.terraform.io/language/settings/backends/azurerm
7+
#
8+
terraform {
9+
backend "azurerm" {
10+
resource_group_name = "kernelci-tf-storage"
11+
storage_account_name = "kernelci-tf"
12+
container_name = "tfstate"
13+
key = "workers.terraform.tfstate"
14+
}
15+
}
16+
17+
provider "azurerm" {
18+
features {}
19+
}
20+
21+
# We assign all clusters to the same resource group, this is purely for
22+
# accounting purposes so it doesn't matter where the resource group is
23+
resource "azurerm_resource_group" "workers" {
24+
name = "kernelci-workers"
25+
location = "East US"
26+
27+
tags = {
28+
environment = "kernelci-workers"
29+
}
30+
}
31+
32+
locals {
33+
zones = toset([
34+
"francecentral",
35+
"uksouth",
36+
"eastus2",
37+
])
38+
}
39+
40+
resource "azurerm_kubernetes_cluster" "workers" {
41+
for_each = local.zones
42+
43+
name = "${each.key}-workers-aks"
44+
location = each.key
45+
resource_group_name = azurerm_resource_group.workers.name
46+
dns_prefix = "${each.key}-workers-k8s"
47+
48+
# Automatically roll out upgrades from AKS
49+
automatic_channel_upgrade = "stable"
50+
51+
# Single always present node as AKS requires a default node pool -
52+
# Terraform and/or AKS don't let us tag this as a spot instance and
53+
# ideally we can scale the builders down to 0 so this is a small
54+
# instance not tagged for work.
55+
default_node_pool {
56+
name = "default"
57+
node_count = 1
58+
vm_size = "Standard_DS2_v2"
59+
os_disk_size_gb = 30
60+
61+
node_labels = {
62+
"kernelci/management" = "management"
63+
}
64+
}
65+
66+
service_principal {
67+
client_id = var.appId
68+
client_secret = var.password
69+
}
70+
71+
role_based_access_control {
72+
enabled = true
73+
}
74+
75+
tags = {
76+
environment = "kernelci"
77+
}
78+
}
79+
80+
# Smaller nodes for most jobs
81+
resource "azurerm_kubernetes_cluster_node_pool" "small_workers" {
82+
for_each = azurerm_kubernetes_cluster.workers
83+
84+
name = "smallworkers"
85+
kubernetes_cluster_id = each.value.id
86+
87+
# 3rd gen Xeon 8 cores, 32G RAM - general purpose
88+
vm_size = "Standard_D8s_v5"
89+
90+
# Currently things struggle with scale to 0 so require a node
91+
enable_auto_scaling = true
92+
min_count = 1
93+
node_count = 1
94+
max_count = 10
95+
96+
priority = "Spot"
97+
# We could set this lower to control costs, -1 means up to on demand
98+
# price
99+
spot_max_price = -1
100+
101+
node_labels = {
102+
"kernelci/worker" = "worker"
103+
"kernelci/worker-size" = "small"
104+
}
105+
}
106+
107+
# Big nodes for more intensive jobs (and large numbers of small jobs)
108+
resource "azurerm_kubernetes_cluster_node_pool" "big_workers" {
109+
for_each = azurerm_kubernetes_cluster.workers
110+
111+
name = "bigworkers"
112+
kubernetes_cluster_id = each.value.id
113+
114+
# 3rd gen Xeon, 32 core, 64G RAM - compute optimised
115+
vm_size = "Standard_F32s_v2"
116+
117+
# Currently things struggle with scale to 0 so require a node
118+
enable_auto_scaling = true
119+
min_count = 1
120+
node_count = 1
121+
max_count = 10
122+
123+
priority = "Spot"
124+
# We could set this lower to control costs, -1 means up to on demand
125+
# price
126+
spot_max_price = -1
127+
128+
node_labels = {
129+
"kernelci/worker" = "worker"
130+
"kernelci/worker-size" = "big"
131+
}
132+
}

k8s/azure/outputs.tf

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
output "resource_group_name" {
2+
value = azurerm_resource_group.workers.name
3+
}

k8s/azure/variables.tf

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
variable "appId" {
2+
description = "Azure Kubernetes Service Cluster service principal"
3+
}
4+
5+
variable "password" {
6+
description = "Azure Kubernetes Service Cluster password"
7+
}

k8s/azure/versions.tf

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
terraform {
2+
required_providers {
3+
azurerm = {
4+
source = "hashicorp/azurerm"
5+
version = "2.66.0"
6+
}
7+
}
8+
9+
required_version = ">= 0.14"
10+
}
11+

0 commit comments

Comments
 (0)