LLMOps Guide
Model Deployment
Ray Serve Configuration
apiVersion: ray.io/v1alpha1
kind: RayService
metadata:
name: llm-inference
spec:
serviceUnhealthySecondThreshold: 300
deploymentUnhealthySecondThreshold: 300
serveDeployments:
- name: llm-deployment
numReplicas: 2
rayStartParams:
num-cpus: "16"
num-gpus: "1"
containerConfig:
image: llm-server:latest
env:
- name: MODEL_NAME
value: "llama2-7b"
- name: BATCH_SIZE
value: "4"Model Monitoring
Prometheus Rules
Performance Optimization
Triton Inference Server
Best Practices
Last updated