Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions cmd/kops/integration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -945,11 +945,11 @@ func TestKarpenter(t *testing.T) {
withOIDCDiscovery().
withDefaults24().
withAddons("karpenter.sh-k8s-1.19").
withServiceAccountRole("aws-node-termination-handler.kube-system", true).
withoutNTH().
withServiceAccountRole("karpenter.kube-system", true)
test.expectTerraformFilenames = append(test.expectTerraformFilenames,
"aws_launch_template_karpenter-nodes-single-machinetype.minimal.example.com_user_data",
"aws_launch_template_karpenter-nodes-default.minimal.example.com_user_data",
"aws_s3_object_nodeupscript-karpenter-nodes-single-machinetype_content",
"aws_s3_object_nodeupscript-karpenter-nodes-default_content",
"aws_s3_object_nodeupconfig-karpenter-nodes-single-machinetype_content",
"aws_s3_object_nodeupconfig-karpenter-nodes-default_content",
)
Expand Down
131 changes: 92 additions & 39 deletions docs/operations/karpenter.md
Original file line number Diff line number Diff line change
@@ -1,69 +1,122 @@
# Karpenter

[Karpenter](https://karpenter.sh) is a Kubernetes-native capacity manager that directly provisions Nodes and underlying instances based on Pod requirements. On AWS, kOps supports managing an InstanceGroup with either Karpenter or an AWS Auto Scaling Group (ASG).
[Karpenter](https://karpenter.sh) is an open-source node lifecycle management project built for Kubernetes.
Adding Karpenter to a Kubernetes cluster can dramatically improve the efficiency and cost of running workloads on that cluster.

On AWS, kOps supports managing an InstanceGroup with either Karpenter or an AWS Auto Scaling Group (ASG).

## Prerequisites

Managed Karpenter requires kOps 1.34+ and that [IAM Roles for Service Accounts (IRSA)](/cluster_spec#service-account-issuer-discovery-and-aws-iam-roles-for-service-accounts-irsa) be enabled for the cluster.

If an older version of Karpenter was installed, it must be uninstalled before installing the new version.

## Installing

If using kOps 1.26 or older, enable the Karpenter feature flag :
### New clusters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we feature flagged karpenter, I think it is OK to remove support for the old pre 1.0 version of karpenter. (Users that doesn't want to upgrade to karpenter 1.0+ can stay with the older version of kOps for a while). I think that's a reasonable position for us to take here...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Karpenter is no longer feature-flagged...
We removed the flag just before the major changes.


```sh
export KOPS_FEATURE_FLAGS="Karpenter"
```
export KOPS_STATE_STORE="s3://my-state-store"
export KOPS_DISCOVERY_STORE="s3://my-discovery-store"
export NAME="my-cluster.example.com"
export ZONES="eu-central-1a"

Karpenter requires that external permissions for ServiceAccounts be enabled for the cluster. See [AWS IAM roles for ServiceAccounts documentation](/cluster_spec#service-account-issuer-discovery-and-aws-iam-roles-for-service-accounts-irsa) for how to enable this.
kops create cluster --name ${NAME} \
--cloud=aws \
--instance-manager=karpenter \
--discovery-store=${KOPS_DISCOVERY_STORE} \
--zones=${ZONES} \
--yes

kops validate cluster --name ${NAME} --wait=10m

kops export kubeconfig --name ${NAME} --admin
```

### Existing clusters

On existing clusters, you can create a Karpenter InstanceGroup by adding the following to its InstanceGroup spec:
The Karpenter addon must be enabled in the cluster spec:

```yaml
spec:
manager: Karpenter
karpenter:
enabled: true
```

You also need to enable the Karpenter addon in the cluster spec:
To create a Karpenter InstanceGroup, set the following in its InstanceGroup spec:

```yaml
spec:
karpenter:
enabled: true
manager: Karpenter
```

### New clusters

On new clusters, you can simply add the `--instance-manager=karpenter` flag:
### EC2NodeClass and NodePool

```sh
kops create cluster --name mycluster.example.com --cloud aws --networking=amazonvpc --zones=eu-central-1a,eu-central-1b --master-count=3 --yes --discovery-store=s3://discovery-store/
USER_DATA=$(aws s3 cp ${KOPS_STATE_STORE}/${NAME}/igconfig/node/nodes/nodeupscript.sh -)
USER_DATA=${USER_DATA//$'\n'/$'\n '}

kubectl apply -f - <<YAML
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: Custom
amiSelectorTerms:
- ssmParameter: /aws/service/canonical/ubuntu/server/24.04/stable/current/amd64/hvm/ebs-gp3/ami-id
- ssmParameter: /aws/service/canonical/ubuntu/server/24.04/stable/current/arm64/hvm/ebs-gp3/ami-id
associatePublicIPAddress: true
tags:
KubernetesCluster: ${NAME}
kops.k8s.io/instancegroup: nodes
k8s.io/role/node: "1"
subnetSelectorTerms:
- tags:
KubernetesCluster: ${NAME}
securityGroupSelectorTerms:
- tags:
KubernetesCluster: ${NAME}
Name: nodes.${NAME}
instanceProfile: nodes.${NAME}
userData: |
${USER_DATA}
YAML

kubectl apply -f - <<YAML
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand", "spot"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
YAML
```

## Karpenter-managed InstanceGroups

A Karpenter-managed InstanceGroup controls a corresponding Karpenter Provisioner resource. kOps will ensure that the Provisioner is configured with the correct AWS security groups, subnets, and launch templates. Just like with ASG-managed InstanceGroups, you can add labels and taints to Nodes and kOps will ensure those are added accordingly.

Note that not all features of InstanceGroups are supported.

## Subnets

By default, kOps will tag subnets with `kops.k8s.io/instance-group/<intancegroup>: "true"` for each InstanceGroup the subnet is assigned to. If you enable manual tagging of subnets, you have to ensure these tags are added, if not Karpenter will fail to provision any instances.

## Instance Types

If you do not specify a mixed instances policy, only the instance type specified by `spec.machineType` will be used. With Karpenter, one typically wants a wider range of instances to choose from. kOps supports both providing a list of instance types through `spec.mixedInstancesPolicy.instances` and providing instance type requirements through `spec.mixedInstancesPolicy.instanceRequirements`. See (/instance_groups)[InstanceGroup documentation] for more details.
A Karpenter-managed InstanceGroup controls the bootstrap script. kOps will ensure the correct AWS security groups, subnets and permissions.
`EC2NodeClass` and `NodePool` objects must be created by the cluster operator.

## Known limitations

### Karpenter-managed Launch Templates

On EKS, Karpener creates its own launch templates for Provisioners. These launch templates will not work with a kOps cluster for a number of reasons. Most importantly, they do not use supported AMIs and they do not install and configure nodeup, the instance-side kOps component. The Karpenter features that require Karpenter to directly manage launch templates will not be available on kOps.

### Unmanaged Provisioner resources

As mentioned above, kOps will manage a Provisioner resource per InstanceGroup. It is technically possible to create Provsioner resources directly, but you have to ensure that you configure Provisioners according to kOps requirements. As mentioned above, Karpenter-managed launch templates do not work and you have to maintain your own kOps-compatible launch templates.

### Other minor limitations

* Control plane nodes must be provisioned with an ASG, not Karpenter.
* Provisioners will unconditionally use spot with a fallback on ondemand instances.
* Provisioners will unconditionally include burstable instance groups such as the T3 instance family.
* kOps will not allow mixing arm64 and amd64 instances in the same Provider.
* **Upgrade is not supported** from the previous version of managed Karpenter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to say "We recommend creating a new cluster (karpenter support is currently feature-flagged / experimental so we do reserve the right to require new clusters)"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Karpenter is no longer feature-flagged...
We removed the flag just before the major changes.

* Control plane nodes must be provisioned with an ASG.
* All `EC2NodeClass` objects must have the `spec.amiFamily` set to `Custom`.
* `spec.instanceStorePolicy` configuration is not supported in `EC2NodeClass`.
* `spec.kubelet`, `spec.taints` and `spec.labels` configuration are not supported in `EC2NodeClass`, but they can be configured in the `Cluster` or `InstanceGroup` spec.
6 changes: 4 additions & 2 deletions docs/releases/1.34-NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This is a document to gather the release notes prior to the release.

## AWS

* TODO
* Karpenter has been upgraded to v1.6.2. ([17567](https://github.com/kubernetes/kops/pull/17567)

## GCP

Expand All @@ -33,7 +33,9 @@ This is a document to gather the release notes prior to the release.

## Other breaking changes

* Legacy addons have been removed from the kOps repo. These were only referenced by kOps <1.22 ([17322](https://github.com/kubernetes/kops/pull/17332))
* Legacy addons have been removed from the kOps repo. These were only referenced by kOps <1.22. ([17322](https://github.com/kubernetes/kops/pull/17332))

* If an older version of Karpenter was installed, it must be uninstalled before upgrading. ([17567](https://github.com/kubernetes/kops/pull/17567)

# Known Issues

Expand Down
3 changes: 3 additions & 0 deletions pkg/apis/kops/validation/validation.go
Original file line number Diff line number Diff line change
Expand Up @@ -1898,6 +1898,9 @@ func validateMetricsServer(cluster *kops.Cluster, spec *kops.MetricsServerConfig
}

func validateNodeTerminationHandler(cluster *kops.Cluster, spec *kops.NodeTerminationHandlerSpec, fldPath *field.Path) (allErrs field.ErrorList) {
if (spec.Enabled == nil || *spec.Enabled) && cluster.Spec.Karpenter != nil && cluster.Spec.Karpenter.Enabled {
allErrs = append(allErrs, field.Forbidden(fldPath, "nodeTerminationHandler cannot be used in conjunction with Karpenter"))
}
if spec.IsQueueMode() {
if spec.EnableSpotInterruptionDraining != nil && !*spec.EnableSpotInterruptionDraining {
allErrs = append(allErrs, field.Forbidden(fldPath.Child("enableSpotInterruptionDraining"), "spot interruption draining cannot be disabled in Queue Processor mode"))
Expand Down
22 changes: 12 additions & 10 deletions pkg/model/awsmodel/autoscalinggroup.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,19 +74,26 @@ func (b *AutoscalingGroupModelBuilder) Build(c *fi.CloudupModelBuilderContext) e
}
}

task, err := b.buildLaunchTemplateTask(c, name, ig)
// Always create the user data, even for Karpenter manged instance groups
// Kaprenter expects the user data to be available in the state store:
// ${KOPS_STATE_STORE}/${CLUSTER_NAME}/igconfig/node/${IG_NAME}/nodeupscript.sh
userData, err := b.BootstrapScriptBuilder.ResourceNodeUp(c, ig)
if err != nil {
return err
}
c.AddTask(task)

// @step: now lets build the autoscaling group task
if ig.Spec.Manager != "Karpenter" {
lt, err := b.buildLaunchTemplateTask(c, name, ig, userData)
if err != nil {
return err
}
c.AddTask(lt)

asg, err := b.buildAutoScalingGroupTask(c, name, ig)
if err != nil {
return err
}
asg.LaunchTemplate = task
asg.LaunchTemplate = lt
c.AddTask(asg)

warmPool := b.Cluster.Spec.CloudProvider.AWS.WarmPool.ResolveDefaults(ig)
Expand Down Expand Up @@ -136,7 +143,7 @@ func (b *AutoscalingGroupModelBuilder) Build(c *fi.CloudupModelBuilderContext) e
}

// buildLaunchTemplateTask is responsible for creating the template task into the aws model
func (b *AutoscalingGroupModelBuilder) buildLaunchTemplateTask(c *fi.CloudupModelBuilderContext, name string, ig *kops.InstanceGroup) (*awstasks.LaunchTemplate, error) {
func (b *AutoscalingGroupModelBuilder) buildLaunchTemplateTask(c *fi.CloudupModelBuilderContext, name string, ig *kops.InstanceGroup, userData fi.Resource) (*awstasks.LaunchTemplate, error) {
// @step: add the iam instance profile
link, err := b.LinkToIAMInstanceProfile(ig)
if err != nil {
Expand Down Expand Up @@ -180,11 +187,6 @@ func (b *AutoscalingGroupModelBuilder) buildLaunchTemplateTask(c *fi.CloudupMode
return nil, fmt.Errorf("error building cloud tags: %v", err)
}

userData, err := b.BootstrapScriptBuilder.ResourceNodeUp(c, ig)
if err != nil {
return nil, err
}

lt := &awstasks.LaunchTemplate{
Name: fi.PtrTo(name),
Lifecycle: b.Lifecycle,
Expand Down
51 changes: 49 additions & 2 deletions pkg/model/bootstrapscript.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,13 @@ import (
"k8s.io/kops/pkg/apis/nodeup"
"k8s.io/kops/pkg/assets"
"k8s.io/kops/pkg/model/resources"
"k8s.io/kops/pkg/nodemodel/wellknownassets"
"k8s.io/kops/pkg/wellknownservices"
"k8s.io/kops/upup/pkg/fi"
"k8s.io/kops/upup/pkg/fi/fitasks"
"k8s.io/kops/upup/pkg/fi/utils"
"k8s.io/kops/util/pkg/architectures"
"k8s.io/kops/util/pkg/vfs"
)

type NodeUpConfigBuilder interface {
Expand Down Expand Up @@ -65,6 +67,8 @@ type BootstrapScript struct {

// nodeupConfig contains the nodeup config.
nodeupConfig fi.CloudupTaskDependentResource
// nodeupScript contains the nodeup bootstrap script, for use with Karpenter.
nodeupScript fi.CloudupTaskDependentResource
}

var (
Expand All @@ -74,7 +78,7 @@ var (
)

// kubeEnv returns the boot config for the instance group
func (b *BootstrapScript) kubeEnv(ig *kops.InstanceGroup, c *fi.CloudupContext) (*nodeup.BootConfig, error) {
func (b *BootstrapScript) kubeEnv(cluster *kops.Cluster, ig *kops.InstanceGroup, c *fi.CloudupContext) (*nodeup.BootConfig, error) {
wellKnownAddresses := make(WellKnownAddresses)

for _, hasAddress := range b.hasAddressTasks {
Expand Down Expand Up @@ -121,6 +125,40 @@ func (b *BootstrapScript) kubeEnv(ig *kops.InstanceGroup, c *fi.CloudupContext)
bootConfig.NodeupConfigHash = base64.StdEncoding.EncodeToString(sum256[:])
b.nodeupConfig.Resource = fi.NewBytesResource(configData)

if ig.Spec.Manager == kops.InstanceManagerKarpenter {
assetBuilder := assets.NewAssetBuilder(vfs.NewVFSContext(), cluster.Spec.Assets, false)
nodeUpAssets := make(map[architectures.Architecture]*assets.MirroredAsset)
for _, arch := range architectures.GetSupported() {
asset, err := wellknownassets.NodeUpAsset(assetBuilder, arch)
if err != nil {
return nil, err
}
nodeUpAssets[arch] = asset
}

var nodeupScript resources.NodeUpScript
nodeupScript.NodeUpAssets = nodeUpAssets
nodeupScript.BootConfig = bootConfig

nodeupScript.WithEnvironmentVariables(cluster, ig)
nodeupScript.WithProxyEnv(cluster)
nodeupScript.WithSysctls()

nodeupScript.CompressUserData = fi.ValueOf(ig.Spec.CompressUserData)

nodeupScript.CloudProvider = string(cluster.GetCloudProvider())

scriptResource, err := nodeupScript.Build()
if err != nil {
return nil, err
}
scriptData, err := fi.ResourceAsBytes(scriptResource)
if err != nil {
return nil, err
}
b.nodeupScript.Resource = fi.NewBytesResource(scriptData)
}

return bootConfig, nil
}

Expand Down Expand Up @@ -194,6 +232,7 @@ func (b *BootstrapScriptBuilder) ResourceNodeUp(c *fi.CloudupModelBuilderContext
}
task.resource.Task = task
task.nodeupConfig.Task = task
task.nodeupScript.Task = task
c.AddTask(task)

c.AddTask(&fitasks.ManagedFile{
Expand All @@ -202,6 +241,14 @@ func (b *BootstrapScriptBuilder) ResourceNodeUp(c *fi.CloudupModelBuilderContext
Location: fi.PtrTo("igconfig/" + ig.Spec.Role.ToLowerString() + "/" + ig.Name + "/nodeupconfig.yaml"),
Contents: &task.nodeupConfig,
})
if ig.Spec.Manager == kops.InstanceManagerKarpenter {
c.AddTask(&fitasks.ManagedFile{
Name: fi.PtrTo("nodeupscript-" + ig.Name),
Lifecycle: b.Lifecycle,
Location: fi.PtrTo("igconfig/" + ig.Spec.Role.ToLowerString() + "/" + ig.Name + "/nodeupscript.sh"),
Contents: &task.nodeupScript,
})
}
return &task.resource, nil
}

Expand Down Expand Up @@ -231,7 +278,7 @@ func (b *BootstrapScript) Run(c *fi.CloudupContext) error {
return nil
}

bootConfig, err := b.kubeEnv(b.ig, c)
bootConfig, err := b.kubeEnv(b.cluster, b.ig, c)
if err != nil {
return err
}
Expand Down
Loading
Loading