Describes some experiments w/ the LLama Stack. The repository offers basically a LLS stack (Llama Stack Stack).
./hack/00-run-k8s.sh
./hack/01-deploy-ollama.sh
kubectl apply -f https://raw.githubusercontent.com/llamastack/llama-stack-k8s-operator/main/release/operator.yaml
Running the ollama
distribution:
cat <<-EOF | kubectl apply -f -
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: llamastackdistribution-sample
spec:
replicas: 1
server:
distribution:
name: ollama
containerSpec:
env:
- name: INFERENCE_MODEL
value: llama3.2:1b
- name: OLLAMA_URL
value: http://ollama-server-service.ollama-dist.svc.cluster.local:11434
name: llama-stack
resources: {}
storage:
mountPath: /home/lls/.lls
size: 20Gi
EOF
NOTE: There is also
./hack/02-deploy-llama-stack.sh
for the above.
After a couple of minutes, the Container
should be created, and you see:
k get llamastackdistribution.llamastack.io
NAME VERSION READY
llamastackdistribution-sample true