Skip to content

Observability

Code Search includes built-in observability features for monitoring, troubleshooting, and performance optimization.

Code Search exposes Prometheus metrics for monitoring search performance, job queue status, and system health.

metrics:
enabled: true # Enable Prometheus metrics
path: "/metrics" # Endpoint path

Environment Variables:

  • CS_METRICS_ENABLED - Enable/disable metrics
  • CS_METRICS_PATH - Metrics endpoint path (default: /metrics)
MetricTypeDescription
code_search_http_requests_totalCounterTotal HTTP requests by method, path, and status
code_search_http_request_duration_secondsHistogramRequest duration histogram
MetricTypeDescription
code_search_searches_totalCounterTotal searches by type (text/regex)
code_search_search_duration_secondsHistogramSearch execution time
code_search_search_results_totalHistogramNumber of results per search
MetricTypeDescription
code_search_jobs_totalCounterTotal jobs by type and status
code_search_job_duration_secondsHistogramJob execution time
MetricTypeDescription
code_search_errors_totalCounterTotal errors by component and type

Add Code Search to your Prometheus prometheus.yml:

scrape_configs:
- job_name: 'code-search'
static_configs:
- targets: ['localhost:8080']
metrics_path: /metrics
scrape_interval: 15s

Example queries for Grafana dashboards:

# Request rate
rate(code_search_http_requests_total[5m])
# Error rate
rate(code_search_http_requests_total{status=~"5.."}[5m])
/ rate(code_search_http_requests_total[5m])
# Search latency (p99)
histogram_quantile(0.99, rate(code_search_search_duration_seconds_bucket[5m]))
# Active jobs
sum(code_search_jobs_total{status="running"})

Code Search supports distributed tracing via OpenTelemetry, compatible with Jaeger, Datadog, and other OTLP-compatible backends.

tracing:
enabled: false # Enable tracing
service_name: "code-search"
service_version: "1.0.0"
environment: "development"
endpoint: "localhost:4317"
protocol: "grpc" # grpc or http
sample_rate: 1.0 # 0.0 to 1.0
insecure: true # Disable TLS for local dev

Environment Variables:

  • CS_TRACING_ENABLED - Enable/disable tracing
  • CS_TRACING_SERVICE_NAME - Service name in traces
  • CS_TRACING_SERVICE_VERSION - Service version
  • CS_TRACING_ENVIRONMENT - Environment (development, staging, production)
  • CS_TRACING_ENDPOINT - OTLP collector endpoint
  • CS_TRACING_PROTOCOL - Protocol (grpc or http)
  • CS_TRACING_SAMPLE_RATE - Sampling rate (1.0 = 100%)
  • CS_TRACING_INSECURE - Disable TLS

Code Search also respects standard OpenTelemetry environment variables:

Standard VariableCS VariableDescription
OTEL_SERVICE_NAMECS_TRACING_SERVICE_NAMEService name
OTEL_EXPORTER_OTLP_ENDPOINTCS_TRACING_ENDPOINTOTLP endpoint
OTEL_EXPORTER_OTLP_PROTOCOLCS_TRACING_PROTOCOLProtocol
OTEL_EXPORTER_OTLP_INSECURECS_TRACING_INSECUREDisable TLS

For Datadog APM integration:

Datadog VariableCS VariableDescription
DD_TRACE_ENABLEDCS_TRACING_ENABLEDEnable tracing
DD_SERVICECS_TRACING_SERVICE_NAMEService name
DD_VERSIONCS_TRACING_SERVICE_VERSIONService version
DD_ENVCS_TRACING_ENVIRONMENTEnvironment

The following operations are traced:

  • HTTP requests - All API endpoints with method, path, and status
  • Search operations - Query execution with search type, query, and result count
  • Database operations - SQL queries with operation type
  • Job processing - Background jobs with job type and duration
  1. Start Jaeger all-in-one:

    Terminal window
    docker run -d --name jaeger \
    -p 6831:6831/udp \
    -p 16686:16686 \
    -p 4317:4317 \
    jaegertracing/all-in-one:latest
  2. Configure Code Search:

    tracing:
    enabled: true
    endpoint: "localhost:4317"
    protocol: "grpc"
    insecure: true
  3. Open Jaeger UI at http://localhost:16686

services:
api:
image: ghcr.io/techquestsdev/code-search-api:latest
environment:
CS_TRACING_ENABLED: "true"
CS_TRACING_ENDPOINT: "jaeger:4317"
CS_TRACING_SERVICE_NAME: "code-search-api"
CS_TRACING_ENVIRONMENT: "production"
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686"
- "4317:4317"

Code Search includes per-client IP rate limiting to protect against abuse.

rate_limit:
enabled: false # Enable rate limiting
requests_per_second: 10 # Requests per second per IP
burst_size: 20 # Maximum burst size

Environment Variables:

  • CS_RATE_LIMIT_ENABLED - Enable/disable rate limiting
  • CS_RATE_LIMIT_REQUESTS_PER_SECOND - Rate limit (requests/second)
  • CS_RATE_LIMIT_BURST_SIZE - Burst capacity

Rate limiting uses a token bucket algorithm:

  1. Each client IP gets a bucket with burst_size tokens
  2. Tokens refill at requests_per_second rate
  3. Each request consumes one token
  4. Requests without tokens receive HTTP 429 (Too Many Requests)

When rate limiting is enabled, responses include:

HeaderDescription
X-RateLimit-LimitMaximum requests per second
X-RateLimit-RemainingRemaining tokens
X-RateLimit-ResetSeconds until bucket refill
Environmentrequests_per_secondburst_size
Developmentdisabled-
Internal/staging50100
Production10-2030-50

Code Search provides health check endpoints for orchestrators:

EndpointPurposeChecks
GET /healthLiveness probeProcess is running
GET /readyReadiness probeDatabase, Redis, Zoekt connectivity
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5

Full observability stack with Prometheus, Grafana, and Jaeger:

services:
api:
image: ghcr.io/techquestsdev/code-search-api:latest
environment:
CS_METRICS_ENABLED: "true"
CS_TRACING_ENABLED: "true"
CS_TRACING_ENDPOINT: "jaeger:4317"
CS_RATE_LIMIT_ENABLED: "true"
CS_RATE_LIMIT_REQUESTS_PER_SECOND: "20"
ports:
- "8080:8080"
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
grafana:
image: grafana/grafana:latest
environment:
GF_SECURITY_ADMIN_PASSWORD: admin
ports:
- "3000:3000"
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686"
- "4317:4317"