Sharding Configuration
The sharding section configures horizontal scaling for large deployments with thousands of repositories.
Deployment Modes
Section titled “Deployment Modes”Code Search supports three deployment modes for scaling:
1. Single Indexer (Default)
Section titled “1. Single Indexer (Default)”The simplest deployment with one indexer and its Zoekt sidecar:
- Storage: Single PersistentVolume (ReadWriteOnce)
- Features: Full search, file browsing, replace jobs (with shared storage to API)
- Scale: Handles thousands of repositories
- Best for: Most deployments
# Default - no sharding config neededsharding: enabled: false2. Shared Storage (RWX)
Section titled “2. Shared Storage (RWX)”Multiple indexer workers sharing the same PersistentVolume:
- Storage: ReadWriteMany (NFS, CephFS, EFS, Azure Files)
- Features: All features work - search, file browsing, replace jobs
- Scale: Parallel indexing with queue-based work distribution
- Best for: Faster indexing without complex configuration
# In Kubernetes/Helmindexer: replicaCount: 4 # Multiple workerspersistence: accessMode: ReadWriteMany # Shared storage
# No sharding config neededsharding: enabled: false3. Hash-Based Sharding with Federated Access
Section titled “3. Hash-Based Sharding with Federated Access”Each shard handles a subset of repositories using consistent FNV hashing:
- Storage: Each shard has its own PersistentVolume (ReadWriteOnce)
- Features: Search via parallel Zoekt queries, file browsing/replace via federated proxy
- Scale: Extreme horizontal scaling without shared storage
- Best for: Very large deployments, cloud environments without good RWX options
sharding: enabled: true total_shards: 3 indexer_api_port: 8081 indexer_service: "code-search-indexer-headless" federated_access: trueConfiguration Options
Section titled “Configuration Options”enabled
Section titled “enabled”Enable hash-based sharding mode.
| Property | Value |
|---|---|
| Type | boolean |
| Default | false |
| Environment | CS_SHARDING_ENABLED |
When enabled:
- Repositories are distributed across shards using consistent FNV hashing
- Each indexer only processes repositories assigned to its shard
- Search queries all Zoekt instances in parallel
total_shards
Section titled “total_shards”Total number of indexer shards. Must match the number of indexer replicas.
| Property | Value |
|---|---|
| Type | integer |
| Default | 1 |
| Environment | CS_SHARDING_TOTAL_SHARDS |
The shard index is determined from the pod’s ordinal number (e.g., indexer-0 is shard 0).
indexer_api_port
Section titled “indexer_api_port”HTTP API port exposed by each indexer for federated file/replace access.
| Property | Value |
|---|---|
| Type | integer |
| Default | 8081 |
| Environment | CS_SHARDING_INDEXER_API_PORT |
This port is used internally for API-to-indexer communication. The endpoints are:
GET /files/{repo}/tree- List directory contentsGET /files/{repo}/blob- Get file contentsGET /files/{repo}/exists- Check if repo exists on shardPOST /replace/execute- Execute replace job on shard
indexer_service
Section titled “indexer_service”Headless Kubernetes service name for pod discovery.
| Property | Value |
|---|---|
| Type | string |
| Default | "code-search-indexer-headless" |
| Environment | CS_SHARDING_INDEXER_SERVICE |
The API uses this service to construct pod DNS names for routing requests:
{indexer_service}.{namespace}.svc.cluster.localFor shard N:
code-search-indexer-N.{indexer_service}.{namespace}.svc.cluster.localfederated_access
Section titled “federated_access”Enable federated file browsing and replace jobs via proxy.
| Property | Value |
|---|---|
| Type | boolean |
| Default | false |
| Environment | CS_SHARDING_FEDERATED_ACCESS |
When enabled:
- File browser requests are proxied to the indexer that owns the repository
- Replace jobs are fanned out to all shards and results are merged
- Each indexer runs an HTTP API on
indexer_api_port
Environment Variables
Section titled “Environment Variables”CS_SHARDING_ENABLED="true"CS_SHARDING_TOTAL_SHARDS="3"CS_SHARDING_INDEXER_API_PORT="8081"CS_SHARDING_INDEXER_SERVICE="code-search-indexer-headless"CS_SHARDING_FEDERATED_ACCESS="true"How Sharding Works
Section titled “How Sharding Works”Repository Distribution
Section titled “Repository Distribution”Repositories are assigned to shards using consistent FNV hashing:
shard = FNV32(repoName) % totalShardsThis ensures:
- Same repository always goes to the same shard
- Even distribution across shards
- No coordination needed between shards
Search Flow
Section titled “Search Flow”- API receives search query
- Query is sent to all Zoekt instances in parallel
- Results are merged and returned to the client
File Browsing Flow (Federated)
Section titled “File Browsing Flow (Federated)”- API receives file browse request for repository X
- API calculates:
shard = hash(X) % totalShards - API proxies request to
indexer-{shard}via headless service - Indexer returns file contents from its local storage
Replace Job Flow (Federated)
Section titled “Replace Job Flow (Federated)”- API receives replace request with matches from multiple repos
- API groups matches by shard (based on repository hash)
- API sends execute request to each shard in parallel
- Each shard processes its assigned repositories
- API merges results and returns to client
Helm Chart Configuration
Section titled “Helm Chart Configuration”For Kubernetes deployments, use the Helm chart’s sharding configuration:
sharding: enabled: true replicas: 3 federatedAccess: enabled: true indexerAPIPort: 8081This automatically:
- Creates a StatefulSet with the specified replicas
- Configures the headless service for pod discovery
- Sets up environment variables for sharding
- Exposes the indexer API port
Recommendations
Section titled “Recommendations”| Scenario | Mode | Configuration |
|---|---|---|
| < 1000 repos | Single | Default (no sharding) |
| 1000-5000 repos, fast indexing | Shared Storage | RWX + multiple replicas |
| > 5000 repos, no RWX available | Sharding | Hash-based with federated access |
| Cloud with limited storage options | Sharding | Hash-based with federated access |
Migrating Between Modes
Section titled “Migrating Between Modes”Single → Shared Storage
Section titled “Single → Shared Storage”- Change storage class to RWX
- Increase
indexer.replicaCount - No data migration needed
Shared Storage → Sharding
Section titled “Shared Storage → Sharding”- Enable sharding with
total_shardsmatching desired replicas - Each shard will re-index its assigned repositories
- Consider running re-index during off-peak hours
Sharding → Single/Shared
Section titled “Sharding → Single/Shared”- Disable sharding
- Scale to single replica
- Re-index all repositories to single storage
Next Steps
Section titled “Next Steps”- Indexer Configuration - Configure indexer workers
- Helm Deployment - Deploy with Helm
- Kubernetes Deployment - Deploy with manifests