Di era digital transformation, DevOps dan Cloud Native practices telah menjadi fundamental untuk delivering software dengan cepat, reliable, dan scalable. Organisasi yang mengadopsi Cloud Native dapat mencapai deployment frequency hingga 100x lebih cepat dengan failure rate 60x lebih rendah. Artikel ini akan membahas secara mendalam tentang DevOps best practices dan Cloud Native deployment strategies di tahun 2025.
Understanding DevOps Culture dan Principles
1. Core DevOps Principles
DevOps bukan hanya tentang tools, melainkan cultural philosophy yang menggabungkan development (Dev) dan operations (Ops):
• Collaboration: Break down silos antar teams
• Automation: Automate repetitive tasks dan processes
• Continuous Improvement: Kaizen philosophy untuk incremental improvements
• Customer Centricity: Focus pada delivering value ke customers
• Measurement: Data-driven decision making
• Sharing: Knowledge sharing dan collective ownership
2. CALMS Framework untuk DevOps Success
• Culture: Shared ownership dan blameless culture
• Automation: Toolchains untuk automating entire lifecycle
• Lean: Optimizing flow dan eliminating waste
• Measurement: Metrics dan monitoring untuk continuous improvement
• Sharing: Collaborative environment dan knowledge sharing
3. DevOps Evolution Stages
“`javascript
// DevOps Maturity Assessment Framework
class DevOpsMaturityAssessment {
constructor() {
this.stages = {
initial: {
name: ‘Initial’,
characteristics: [‘Manual processes’, ‘Siloed teams’, ‘Ad-hoc deployments’],
deploymentFrequency: ‘monthly’,
leadTime: ‘months’,
mttr: ‘weeks’
},
managed: {
name: ‘Managed’,
characteristics: [‘Basic automation’, ‘Defined processes’, ‘Limited visibility’],
deploymentFrequency: ‘weekly’,
leadTime: ‘weeks’,
mttr: ‘days’
},
defined: {
name: ‘Defined’,
characteristics: [‘CI/CD pipelines’, ‘Infrastructure as Code’, ‘Monitoring’],
deploymentFrequency: ‘daily’,
leadTime: ‘days’,
mttr: ‘hours’
},
optimized: {
name: ‘Optimized’,
characteristics: [‘Full automation’, ‘GitOps’, ‘Self-healing systems’],
deploymentFrequency: ‘multiple daily’,
leadTime: ‘hours’,
mttr: ‘minutes’
}
};
}
assessOrganization(organizationData) {
const score = this.calculateScore(organizationData);
const stage = this.determineStage(score);
return {
currentStage: stage,
score,
recommendations: this.getRecommendations(stage),
roadmap: this.createRoadmap(stage)
};
}
calculateScore(data) {
let score = 0;
// Automation maturity (0-100)
score += data.automationLevel * 0.25;
// Collaboration maturity (0-100)
score += data.collaborationLevel * 0.20;
// Monitoring maturity (0-100)
score += data.monitoringLevel * 0.20;
// Security integration (0-100)
score += data.securityLevel * 0.15;
// Cloud adoption (0-100)
score += data.cloudLevel * 0.20;
return Math.round(score);
}
determineStage(score) {
if (score < 25) return this.stages.initial;
if (score < 50) return this.stages.managed;
if (score Dockerfile < k8s/dev-deployment.yaml < k8s/prod-deployment.yaml <-
–health-cmd pg_isready
–health-interval 10s
–health-timeout 5s
–health-retries 5
ports:
– 5432:5432
redis:
image: redis:7-alpine
options: >-
–health-cmd “redis-cli ping”
–health-interval 10s
–health-timeout 5s
–health-retries 5
ports:
– 6379:6379
steps:
– name: Checkout code
uses: actions/checkout@v4
– name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ’18’
cache: ‘npm’
– name: Install dependencies
run: npm ci
– name: Run unit tests
run: npm run test:unit
env:
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test_db
REDIS_URL: redis://localhost:6379
– name: Upload coverage reports
uses: codecov/codecov-action@v3
with:
file: ./coverage/lcov.info
# Build dan Push
build-and-push:
needs: [quality-check, security-scan, test]
runs-on: ubuntu-latest
if: github.event_name == ‘push’
outputs:
image-digest: ${{ steps.build.outputs.digest }}
image-tag: ${{ steps.meta.outputs.tags }}
steps:
– name: Checkout code
uses: actions/checkout@v4
– name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
– name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
– name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=sha,prefix={{branch}}-
type=raw,value=latest,enable={{is_default_branch}}
– name: Build and push Docker image
id: build
uses: docker/build-push-action@v5
with:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
# Deploy to Development
deploy-dev:
needs: build-and-push
runs-on: ubuntu-latest
if: github.ref == ‘refs/heads/develop’
environment:
name: development
url: https://dev.yourapp.com
steps:
– name: Checkout code
uses: actions/checkout@v4
– name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBE_CONFIG_DEV }}
– name: Deploy to development
run: |
sed -i ‘s|IMAGE_PLACEHOLDER|${{ needs.build-and-push.outputs.image-tag }}|’ k8s/dev-deployment.yaml
kubectl apply -f k8s/dev-deployment.yaml
kubectl rollout status deployment/yourapp-dev -n development
– name: Run integration tests
run: |
npm run test:integration — –baseUrl=https://dev.yourapp.com
# Deploy to Production
deploy-prod:
needs: build-and-push
runs-on: ubuntu-latest
if: github.ref == ‘refs/heads/main’
environment:
name: production
url: https://yourapp.com
steps:
– name: Checkout code
uses: actions/checkout@v4
– name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBE_CONFIG_PROD }}
– name: Deploy to production
run: |
# Canary deployment strategy
sed -i ‘s|IMAGE_PLACEHOLDER|${{ needs.build-and-push.outputs.image-tag }}|’ k8s/canary-deployment.yaml
kubectl apply -f k8s/canary-deployment.yaml
# Wait for canary to be ready
kubectl rollout status deployment/yourapp-canary -n production
# Run smoke tests
npm run test:smoke — –baseUrl=https://canary.yourapp.com
# Promote canary to production
sed -i ‘s|IMAGE_PLACEHOLDER|${{ needs.build-and-push.outputs.image-tag }}|’ k8s/prod-deployment.yaml
kubectl apply -f k8s/prod-deployment.yaml
kubectl rollout status deployment/yourapp-prod -n production
– name: Notify deployment
uses: 8398a7/action-slack@v3
with:
status: ${{ job.status }}
channel: ‘#deployments’
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
“`
Kubernetes dan Container Orchestration
1. Multi-Environment Kubernetes Configuration
“`yaml
# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: development
labels:
environment: development
—
apiVersion: v1
kind: Namespace
metadata:
name: staging
labels:
environment: staging
—
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
environment: production
—
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: yourapp-config
namespace: development
data:
NODE_ENV: “development”
LOG_LEVEL: “debug”
API_BASE_URL: “https://dev-api.yourapp.com”
REDIS_HOST: “redis-service”
REDIS_PORT: “6379”
—
apiVersion: v1
kind: ConfigMap
metadata:
name: yourapp-config
namespace: staging
data:
NODE_ENV: “staging”
LOG_LEVEL: “info”
API_BASE_URL: “https://staging-api.yourapp.com”
REDIS_HOST: “redis-service”
REDIS_PORT: “6379”
—
apiVersion: v1
kind: ConfigMap
metadata:
name: yourapp-config
namespace: production
data:
NODE_ENV: “production”
LOG_LEVEL: “warn”
API_BASE_URL: “https://api.yourapp.com”
REDIS_HOST: “redis-service”
REDIS_PORT: “6379”
—
# k8s/secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: yourapp-secrets
namespace: development
type: Opaque
data:
DATABASE_URL:
JWT_SECRET:
REDIS_PASSWORD:
—
apiVersion: v1
kind: Secret
metadata:
name: yourapp-secrets
namespace: production
type: Opaque
data:
DATABASE_URL:
JWT_SECRET:
REDIS_PASSWORD:
—
# k8s/hpa.yaml – Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: yourapp-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: yourapp-prod
minReplicas: 3
maxReplicas: 50
metrics:
– type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
– type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
– type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: “100”
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
– type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
– type: Percent
value: 50
periodSeconds: 60
– type: Pods
value: 5
periodSeconds: 60
selectPolicy: Max
—
# k8s/network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: yourapp-network-policy
namespace: production
spec:
podSelector:
matchLabels:
app: yourapp
policyTypes:
– Ingress
– Egress
ingress:
– from:
– namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
– protocol: TCP
port: 3000
egress:
– to:
– namespaceSelector:
matchLabels:
name: database
ports:
– protocol: TCP
port: 5432
– to:
– namespaceSelector:
matchLabels:
name: cache
ports:
– protocol: TCP
port: 6379
– to: []
ports:
– protocol: TCP
port: 53
– protocol: UDP
port: 53
—
# k8s/pod-disruption-budget.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: yourapp-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: yourapp
environment: production
“`
2. Helm Charts untuk Template Management
“`yaml
# helm/yourapp/Chart.yaml
apiVersion: v2
name: yourapp
description: A Helm chart for YourApp
type: application
version: 0.1.0
appVersion: “1.0.0”
keywords:
– web
– application
– nodejs
home: https://github.com/yourorg/yourapp
sources:
– https://github.com/yourorg/yourapp
maintainers:
– name: YourTeam
email: [email protected]
—
# helm/yourapp/values.yaml
# Default values for yourapp.
replicaCount: 1
image:
repository: yourapp
pullPolicy: IfNotPresent
tag: “”
imagePullSecrets: []
nameOverride: “”
fullnameOverride: “”
serviceAccount:
create: true
annotations: {}
name: “”
podAnnotations: {}
podSecurityContext:
fsGroup: 1001
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 1001
readOnlyRootFilesystem: true
capabilities:
drop:
– ALL
service:
type: ClusterIP
port: 80
targetPort: 3000
ingress:
enabled: false
className: “”
annotations: {}
# kubernetes.io/ingress.class: nginx
# cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
– host: yourapp.local
paths:
– path: /
pathType: Prefix
tls: []
# – secretName: yourapp-tls
# hosts:
# – yourapp.local
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80
# targetMemoryUtilizationPercentage: 80
nodeSelector: {}
tolerations: []
affinity: {}
# Custom configurations
config:
nodeEnv: “production”
logLevel: “info”
database:
host: “”
port: 5432
name: yourapp
ssl: true
redis:
host: “”
port: 6379
database: 0
secrets:
databaseUrl: “”
jwtSecret: “”
redisPassword: “”
# Health check configurations
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
# Monitoring and observability
monitoring:
enabled: false
serviceMonitor:
enabled: false
namespace: monitoring
labels: {}
annotations: {}
interval: 30s
scrapeTimeout: 10s
# Logging configuration
logging:
enabled: true
fluentd:
enabled: false
config: {}
# Backup configuration
backup:
enabled: false
schedule: “0 2 * * *”
retention: “7d”
—
# helm/yourapp/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include “yourapp.fullname” . }}
labels:
{{- include “yourapp.labels” . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include “yourapp.selectorLabels” . | nindent 6 }}
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath “/configmap.yaml”) . | sha256sum }}
checksum/secret: {{ include (print $.Template.BasePath “/secret.yaml”) . | sha256sum }}
{{- with .Values.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include “yourapp.selectorLabels” . | nindent 8 }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include “yourapp.serviceAccountName” . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
– name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: “{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}”
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
– name: http
containerPort: 3000
protocol: TCP
livenessProbe:
{{- toYaml .Values.livenessProbe | nindent 12 }}
readinessProbe:
{{- toYaml .Values.readinessProbe | nindent 12 }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
env:
– name: NODE_ENV
value: {{ .Values.config.nodeEnv }}
– name: LOG_LEVEL
value: {{ .Values.config.logLevel }}
– name: DATABASE_URL
valueFrom:
secretKeyRef:
name: {{ include “yourapp.fullname” . }}
key: database-url
– name: JWT_SECRET
valueFrom:
secretKeyRef:
name: {{ include “yourapp.fullname” . }}
key: jwt-secret
– name: REDIS_HOST
value: {{ .Values.config.redis.host }}
– name: REDIS_PORT
value: {{ .Values.config.redis.port | quote }}
– name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include “yourapp.fullname” . }}
key: redis-password
volumeMounts:
– name: tmp
mountPath: /tmp
volumes:
– name: tmp
emptyDir: {}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
“`
Infrastructure as Code (IaC) dengan Terraform
1. Multi-Cloud Infrastructure Setup
“`hcl
# terraform/main.tf
terraform {
required_version = “>= 1.0”
required_providers {
aws = {
source = “hashicorp/aws”
version = “~> 5.0”
}
kubernetes = {
source = “hashicorp/kubernetes”
version = “~> 2.20”
}
helm = {
source = “hashicorp/helm”
version = “~> 2.10”
}
random = {
source = “hashicorp/random”
version = “~> 3.5”
}
}
backend “s3” {
bucket = “yourapp-terraform-state”
key = “production/terraform.tfstate”
region = “ap-southeast-1”
encrypt = true
dynamodb_table = “terraform-locks”
}
}
# terraform/provider.tf
provider “aws” {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
Project = “yourapp”
ManagedBy = “terraform”
}
}
}
provider “kubernetes” {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = “client.authentication.k8s.io/v1beta1”
command = “aws”
args = [“eks”, “get-token”, “–cluster-name”, module.eks.cluster_name]
}
}
provider “helm” {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = “client.authentication.k8s.io/v1beta1”
command = “aws”
args = [“eks”, “get-token”, “–cluster-name”, module.eks.cluster_name]
}
}
}
# terraform/variables.tf
variable “environment” {
description = “Environment name”
type = string
default = “production”
}
variable “aws_region” {
description = “AWS region”
type = string
default = “ap-southeast-1”
}
variable “project_name” {
description = “Project name”
type = string
default = “yourapp”
}
variable “domain_name” {
description = “Root domain name”
type = string
default = “yourapp.com”
}
# terraform/eks.tf
module “eks” {
source = “terraform-aws-modules/eks/aws”
version = “~> 19.15”
cluster_name = “${var.project_name}-${var.environment}”
cluster_version = “1.28”
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
cluster_addons = {
coredns = {
most_recent = true
}
kube-proxy = {
most_recent = true
}
vpc-cni = {
most_recent = true
}
aws-ebs-csi-driver = {
most_recent = true
}
}
cluster_security_group_id = module.eks.cluster_security_group_id
node_security_group_id = module.eks.node_security_group_id
self_managed_node_groups = {
app_nodes = {
instance_type = “m5.large”
min_size = 3
max_size = 10
desired_size = 3
k8s_labels = {
Environment = var.environment
Application = var.project_name
}
additional_tags = {
Name = “${var.project_name}-app-nodes”
Environment = var.environment
}
}
}
manage_aws_auth_configmap = true
aws_auth_roles = [
{
rolearn = “arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/yourapp-admin-role”
username = “yourapp-admin”
groups = [“system:masters”]
}
]
}
# terraform/vpc.tf
module “vpc” {
source = “terraform-aws-modules/vpc/aws”
version = “~> 5.0”
name = “${var.project_name}-${var.environment}-vpc”
cidr = “10.0.0.0/16”
azs = [“ap-southeast-1a”, “ap-southeast-1b”, “ap-southeast-1c”]
private_subnets = [“10.0.1.0/24”, “10.0.2.0/24”, “10.0.3.0/24”]
public_subnets = [“10.0.101.0/24”, “10.0.102.0/24”, “10.0.103.0/24”]
enable_nat_gateway = true
single_nat_gateway = true
tags = {
Name = “${var.project_name}-${var.environment}-vpc”
Environment = var.environment
}
}
# terraform/rds.tf
module “rds” {
source = “terraform-aws-modules/rds/aws”
version = “~> 6.0”
identifier = “${var.project_name}-${var.environment}-db”
engine = “postgres”
engine_version = “15.3”
instance_class = “db.t3.medium”
allocated_storage = 100
max_allocated_storage = 1000
storage_encrypted = true
storage_type = “gp2”
db_name = “yourapp”
username = “yourapp_user”
port = 5432
vpc_security_group_ids = [module.eks.node_security_group_id]
create_db_subnet_group = true
subnet_ids = module.vpc.private_subnets
maintenance_window = “Mon:00:00-Mon:03:00”
backup_window = “03:00-06:00”
backup_retention_period = 7
skip_final_snapshot = false
final_snapshot_identifier = “${var.project_name}-${var.environment}-final-snapshot”
tags = {
Name = “${var.project_name}-${var.environment}-rds”
Environment = var.environment
}
}
# terraform/redis.tf
module “elasticache” {
source = “terraform-aws-modules/elasticache/aws”
version = “~> 1.0”
create_replication_group = true
replication_group_id = “${var.project_name}-${var.environment}-redis”
replication_group_description = “Redis cluster for ${var.project_name}”
node_type = “cache.t3.micro”
port = 6379
parameter_group_name = “default.redis7”
subnet_ids = module.vpc.private_subnets
vpc_id = module.vpc.vpc_id
at_rest_encryption_enabled = true
transit_encryption_enabled = true
auth_token = random_password.redis_auth_token.result
automatic_failover_enabled = true
multi_az_enabled = true
num_cache_clusters = 2
tags = {
Name = “${var.project_name}-${var.environment}-redis”
Environment = var.environment
}
}
# terraform/outputs.tf
output “cluster_endpoint” {
description = “EKS cluster endpoint”
value = module.eks.cluster_endpoint
}
output “cluster_name” {
description = “EKS cluster name”
value = module.eks.cluster_name
}
output “database_endpoint” {
description = “RDS endpoint”
value = module.rds.db_instance_endpoint
}
output “redis_endpoint” {
description = “Redis endpoint”
value = module.elasticache.replication_group_primary_endpoint_address
}
output “vpc_id” {
description = “VPC ID”
value = module.vpc.vpc_id
}
output “private_subnets” {
description = “Private subnet IDs”
value = module.vpc.private_subnets
}
“`
Monitoring dan Observability
1. Prometheus dan Grafana Stack
“`yaml
# monitoring/prometheus-operator.yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: prometheus-community
namespace: monitoring
spec:
interval: 1h
url: https://prometheus-community.github.io/helm-charts
—
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: kube-prometheus-stack
namespace: monitoring
spec:
interval: 5m
chart:
spec:
chart: kube-prometheus-stack
version: “48.0.0”
sourceRef:
kind: HelmRepository
name: prometheus-community
namespace: monitoring
releaseName: prometheus
values:
prometheus:
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: gp3
accessModes: [“ReadWriteOnce”]
resources:
requests:
storage: 100Gi
resources:
requests:
cpu: 1000m
memory: 4Gi
limits:
cpu: 2000m
memory: 8Gi
grafana:
adminPassword: “secure-password”
persistence:
enabled: true
storageClassName: gp3
size: 20Gi
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 1000m
memory: 4Gi
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
– name: ‘default’
orgId: 1
folder: ”
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/default
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: gp3
accessModes: [“ReadWriteOnce”]
resources:
requests:
storage: 20Gi
defaultRules:
rules:
etcd: true
k8s: true
kubeScheduler: true
kubeControllerManager: true
node: true
prometheusOperator: true
prometheus: true
general: true
additionalPrometheusRulesMap:
– name: yourapp-rules
rules:
– alert: YourAppHighErrorRate
expr: rate(http_requests_total{status=~”5..”}[5m]) / rate(http_requests_total[5m]) > 0.05
for: 10m
labels:
severity: critical
annotations:
summary: “High error rate detected”
description: “Error rate is {{ $value | humanizePercentage }} for {{ $labels.job }}”
– alert: YourAppHighResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
for: 10m
labels:
severity: warning
annotations:
summary: “High response time detected”
description: “95th percentile response time is {{ $value }}s for {{ $labels.job }}”
– alert: YourAppPodRestart
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: “Pod restarting frequently”
description: “Pod {{ $labels.pod }} is restarting frequently”
—
# monitoring/servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: yourapp-metrics
namespace: production
labels:
app: yourapp
spec:
selector:
matchLabels:
app: yourapp
endpoints:
– port: metrics
path: /metrics
interval: 30s
scrapeTimeout: 10s
—
# monitoring/grafana-dashboards.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: yourapp-dashboards
namespace: monitoring
labels:
grafana_dashboard: “1”
data:
yourapp-overview.json: |
{
“dashboard”: {
“id”: null,
“title”: “YourApp Overview”,
“tags”: [“yourapp”, “overview”],
“timezone”: “browser”,
“panels”: [
{
“title”: “Request Rate”,
“type”: “graph”,
“targets”: [
{
“expr”: “rate(http_requests_total[5m])”,
“legendFormat”: “{{ method }} {{ status }}”
}
],
“gridPos”: {“h”: 8, “w”: 12, “x”: 0, “y”: 0}
},
{
“title”: “Response Time”,
“type”: “graph”,
“targets”: [
{
“expr”: “histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))”,
“legendFormat”: “95th percentile”
},
{
“expr”: “histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))”,
“legendFormat”: “50th percentile”
}
],
“gridPos”: {“h”: 8, “w”: 12, “x”: 12, “y”: 0}
}
],
“time”: {“from”: “now-1h”, “to”: “now”},
“refresh”: “30s”
}
}
“`
Kesimpulan
DevOps dan Cloud Native deployment telah menjadi essential untuk modern software development. Dengan proper implementation dari CI/CD pipelines, container orchestration, infrastructure as code, dan comprehensive monitoring, organizations dapat achieve unprecedented deployment velocity dan reliability.
Key success factors:
• Cultural Transformation: Adopt DevOps culture sebelum tools
• Automation: Automate everything yang bisa di-automate
• Measurement: Measure everything untuk continuous improvement
• Security Integrate: Shift security left (DevSecOps)
• Observability: Implement comprehensive monitoring dan logging
Investasi dalam DevOps capabilities akan memberikan ROI yang signifikan melalui:
– Faster time to market
– Higher deployment success rates
– Reduced failure recovery time
– Improved team productivity
– Better customer satisfaction
Start small, iterate continuously, dan focus pada delivering value ke customers. DevOps journey adalah marathon, bukan sprint.






























