Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
294 changes: 111 additions & 183 deletions README.md

Large diffs are not rendered by default.

112 changes: 112 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# OpenCenter Service Configuration Guides

This directory contains comprehensive configuration guides for all services available in the openCenter platform. Each guide provides detailed configuration examples, common pitfalls, troubleshooting steps, and best practices.

## Available Configuration Guides

### Core Infrastructure Services

| Service | Guide | Description |
|---------|-------|-------------|
| **Cert-manager** | [cert-manager-config-guide.md](cert-manager-config-guide.md) | TLS certificate management and automation |
| **Harbor** | [harbor-config-guide.md](harbor-config-guide.md) | Container registry with security scanning |
| **Keycloak** | [keycloak-config-guide.md](keycloak-config-guide.md) | Identity and access management |
| **Kyverno** | [kyverno-config-guide.md](kyverno-config-guide.md) | Kubernetes-native policy engine |
| **Longhorn** | [longhorn-config-guide.md](longhorn-config-guide.md) | Distributed block storage system |
| **MetalLB** | [metallb-config-guide.md](metallb-config-guide.md) | Load balancer for bare-metal clusters |
| **Sealed Secrets** | [sealed-secrets-config-guide.md](sealed-secrets-config-guide.md) | GitOps-friendly secret encryption |
| **Velero** | [velero-config-guide.md](velero-config-guide.md) | Backup and disaster recovery |

### Observability Stack

| Component | Guide | Description |
|-----------|-------|-------------|
| **Kube-Prometheus-Stack** | [kube-prometheus-stack-config-guide.md](kube-prometheus-stack-config-guide.md) | Complete monitoring with Prometheus, Grafana, Alertmanager |
| **Loki** | [loki-config-guide.md](loki-config-guide.md) | Log aggregation and storage system |
| **Tempo** | [tempo-config-guide.md](tempo-config-guide.md) | Distributed tracing backend |
| **OpenTelemetry** | [opentelemetry-kube-stack-config-guide.md](opentelemetry-kube-stack-config-guide.md) | Unified observability data collection |

## Guide Structure

Each configuration guide follows a consistent structure:

### 1. Overview
Brief description of the service and its role in the Kubernetes cluster.

### 2. Key Configuration Choices
Detailed examples of important configuration options with explanations of why specific choices were made.

### 3. Common Pitfalls
Description of frequently encountered issues, their causes, and step-by-step solutions with verification commands.

### 4. Required Secrets
Documentation of all secrets required by the service, including field descriptions and examples.

### 5. Verification
Commands to verify the service is running correctly and functioning as expected.

### 6. Usage Examples
Practical examples of common use cases and configuration patterns.

## Templates

### Service Documentation Templates

| Template | Purpose | Location |
|----------|---------|----------|
| **Service README Template** | Base template for service README files | [templates/service-readme-template.md](templates/service-readme-template.md) |
| **Configuration Guide Template** | Template for detailed configuration guides | [templates/service-config-guide-template.md](templates/service-config-guide-template.md) |
| **Service Standards Template** | Template for service standards documentation | [templates/service-standards-template.md](templates/service-standards-template.md) |

## Getting Started

1. **Choose Your Service**: Select the service you want to configure from the tables above
2. **Read the Configuration Guide**: Follow the detailed configuration examples and explanations
3. **Implement Configuration**: Apply the configurations to your cluster with appropriate customizations
4. **Verify Deployment**: Use the verification steps to ensure the service is working correctly
5. **Troubleshoot Issues**: Refer to the common pitfalls section if you encounter problems

## Best Practices

### Configuration Management
- Use GitOps principles for all configuration changes
- Store sensitive data in encrypted secrets (Sealed Secrets or SOPS)
- Implement proper resource limits and requests
- Follow security best practices for each service

### Monitoring and Observability
- Enable monitoring for all services using the observability stack
- Set up appropriate alerts for service health and performance
- Implement proper logging and tracing for troubleshooting

### Security
- Follow the principle of least privilege for RBAC
- Use network policies to restrict traffic between services
- Regularly update services and scan for vulnerabilities
- Implement proper backup and disaster recovery procedures

### Maintenance
- Regularly review and update configurations
- Test backup and restore procedures
- Monitor resource usage and scale as needed
- Keep documentation up to date with configuration changes

## Contributing

When adding new services or updating existing ones:

1. Use the appropriate template from the `templates/` directory
2. Follow the established structure and formatting
3. Include comprehensive examples and troubleshooting information
4. Test all configuration examples before documenting them
5. Update this README to include the new service

## Support

For service-specific issues:
1. Check the relevant configuration guide for troubleshooting steps
2. Review the service's upstream documentation
3. Check the service logs and Kubernetes events
4. Consult the observability dashboards for metrics and alerts

For platform-wide issues, refer to the main [README](../README.md) and service standards documentation.
194 changes: 194 additions & 0 deletions docs/cert-manager-config-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
# Cert-manager Configuration Guide

## Overview
Cert-manager automates the management and issuance of TLS certificates from various issuing sources. It ensures certificates are valid and up-to-date, and attempts to renew certificates at a configured time before expiry.

## Key Configuration Choices

### Certificate Issuers
```yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
```
**Why**:
- ClusterIssuer allows certificate issuance across all namespaces
- Let's Encrypt provides free, automated certificates
- HTTP01 challenge works with most ingress controllers

### DNS Challenge Configuration
```yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-dns
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: letsencrypt-dns
solvers:
- dns01:
cloudflare:
email: [email protected]
apiTokenSecretRef:
name: cloudflare-api-token
key: api-token
```
**Why**: DNS01 challenges enable wildcard certificates and work behind firewalls

### Certificate Resource
```yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: example-tls
namespace: default
spec:
secretName: example-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- example.com
- www.example.com
```
**Why**: Explicit certificate management provides fine-grained control over certificate lifecycle

## Common Pitfalls

### Certificate Stuck in Pending State
**Problem**: Certificate remains in pending state and is never issued

**Solution**: Check the CertificateRequest and Order resources for detailed error messages

**Verification**:
```bash
kubectl describe certificate <cert-name> -n <namespace>
kubectl get certificaterequest -n <namespace>
kubectl describe order <order-name> -n <namespace>
```

### HTTP01 Challenge Failures
**Problem**: ACME HTTP01 challenges fail due to ingress misconfiguration

**Solution**: Ensure ingress controller can route /.well-known/acme-challenge/ paths to cert-manager solver pods

### Rate Limiting Issues
**Problem**: Let's Encrypt rate limits prevent certificate issuance

**Solution**: Use staging environment for testing, implement proper retry logic

```bash
# Check rate limit status
kubectl logs -n cert-manager deployment/cert-manager | grep "rate limit"
```

## Required Secrets

### DNS Provider API Tokens
For DNS01 challenges, API tokens for your DNS provider are required

```yaml
apiVersion: v1
kind: Secret
metadata:
name: cloudflare-api-token
namespace: cert-manager
type: Opaque
stringData:
api-token: your-cloudflare-api-token
```

**Key Fields**:
- `api-token`: Cloudflare API token with Zone:Read and DNS:Edit permissions (required)

### ACME Account Private Key
Automatically generated but can be pre-created for account portability

```yaml
apiVersion: v1
kind: Secret
metadata:
name: letsencrypt-prod
namespace: cert-manager
type: Opaque
data:
tls.key: <base64-encoded-private-key>
```

**Key Fields**:
- `tls.key`: ACME account private key (automatically generated if not provided)

## Verification
```bash
# Check cert-manager pods are running
kubectl get pods -n cert-manager

# Verify ClusterIssuer is ready
kubectl get clusterissuer

# Check certificate status
kubectl get certificates -A

# View certificate details
kubectl describe certificate <cert-name> -n <namespace>
```

## Usage Examples

### Automatic Certificate with Ingress Annotations
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example-ingress
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- example.com
secretName: example-tls
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: example-service
port:
number: 80
```

### Wildcard Certificate
```yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: wildcard-example-com
spec:
secretName: wildcard-example-com-tls
issuerRef:
name: letsencrypt-dns
kind: ClusterIssuer
dnsNames:
- "*.example.com"
- example.com
```

Certificate renewal is automatic and occurs when certificates are within 30 days of expiry. Monitor certificate expiry dates and renewal events through Prometheus metrics and Kubernetes events.
Loading