Indexer Configuration
The indexer section configures the background indexer worker that clones and indexes repositories.
Configuration
Section titled “Configuration”indexer: concurrency: 2 clone_timeout: "10m" index_timeout: "30m"Options
Section titled “Options”concurrency
Section titled “concurrency”Number of concurrent indexing jobs.
| Property | Value |
|---|---|
| Type | integer |
| Default | 2 |
| Environment | CS_INDEXER_CONCURRENCY |
Higher concurrency means faster indexing but more resource usage (CPU, memory, disk I/O).
Recommendations:
- Small deployments (< 100 repos):
1-2 - Medium deployments (100-1000 repos):
2-4 - Large deployments (> 1000 repos):
4-8
clone_timeout
Section titled “clone_timeout”Maximum time allowed for cloning a repository.
| Property | Value |
|---|---|
| Type | duration |
| Default | "10m" |
| Environment | CS_INDEXER_CLONE_TIMEOUT |
Increase this for large repositories or slow network connections.
Examples:
"10m"- 10 minutes (default)"30m"- 30 minutes (for large repos)"1h"- 1 hour (for very large monorepos)
index_timeout
Section titled “index_timeout”Maximum time allowed for indexing a repository with Zoekt.
| Property | Value |
|---|---|
| Type | duration |
| Default | "30m" |
| Environment | CS_INDEXER_INDEX_TIMEOUT |
Increase this for very large repositories.
Environment Variables
Section titled “Environment Variables”CS_INDEXER_CONCURRENCY="2"CS_INDEXER_CLONE_TIMEOUT="10m"CS_INDEXER_INDEX_TIMEOUT="30m"How the Indexer Works
Section titled “How the Indexer Works”- Poll Queue - Indexer polls Redis for pending jobs
- Clone Repository - Git clone/fetch the repository
- Run Zoekt Index - Create search index
- Update Database - Mark repository as indexed
- Notify - Signal Zoekt to reload indexes
Job Types
Section titled “Job Types”| Type | Description |
|---|---|
index | Initial indexing of a new repository |
sync | Re-sync an existing repository (fetch + re-index) |
replace | Execute a search-and-replace operation |
Scaling Indexers
Section titled “Scaling Indexers”There are three approaches to scale indexing:
1. Multiple Workers with Shared Storage
Section titled “1. Multiple Workers with Shared Storage”Run multiple indexer instances sharing the same storage:
# Docker Composedocker compose up -d --scale indexer=4
# Kubernetes (requires ReadWriteMany PVC)kubectl scale deployment code-search-indexer --replicas=4Each indexer processes jobs independently from the Redis queue. All workers share the same index and repos directories.
Requirements: ReadWriteMany (RWX) storage class (NFS, CephFS, EFS, Azure Files)
2. Hash-Based Sharding
Section titled “2. Hash-Based Sharding”For very large deployments or when RWX storage isn’t available, use hash-based sharding:
sharding: enabled: true total_shards: 3 federated_access: trueEach shard:
- Has its own PersistentVolume (ReadWriteOnce)
- Processes only repositories assigned to it via consistent hashing
- Runs its own Zoekt instance
See Sharding Configuration for details.
3. Single Indexer (Default)
Section titled “3. Single Indexer (Default)”For smaller deployments (< 1000 repos), a single indexer handles everything:
- Simpler to operate
- No shared storage requirements
- Scale vertically with more CPU/memory
Resource Requirements
Section titled “Resource Requirements”Indexing is CPU-intensive. Each concurrent job uses approximately 1 CPU core.
Memory
Section titled “Memory”Memory usage depends on repository size:
- Small repos (< 100 MB): ~512 MB per job
- Medium repos (100 MB - 1 GB): ~1 GB per job
- Large repos (> 1 GB): ~2-4 GB per job
Disk I/O
Section titled “Disk I/O”Indexing is disk I/O intensive. Use SSDs for best performance.
Network
Section titled “Network”Initial clones download the full repository. Subsequent syncs only fetch changes.
Git Configuration
Section titled “Git Configuration”The indexer uses these Git settings:
# Shallow clone for initial index (faster)git clone --depth 1 --single-branch
# Full history for sync operationsgit fetch --allAuthentication
Section titled “Authentication”Git authentication is handled via the connection’s access token. The token is used for HTTPS cloning:
https://oauth2:{token}@github.com/org/repo.gitBranch Support
Section titled “Branch Support”By default, only the default branch (usually main or master) is indexed. Zoekt supports indexing multiple branches using the -branches flag.
How Branch Indexing Works
Section titled “How Branch Indexing Works”When a repository is indexed, the indexer runs:
zoekt-git-index -index /data/index -branches main,develop /data/repos/myrepoThis creates searchable indexes for each specified branch.
Searching Branches
Section titled “Searching Branches”Use the branch: filter to search specific branches:
FOO branch:developOmitting the branch: filter searches the default branch (HEAD).
Current Limitations
Section titled “Current Limitations”- Tags are not currently indexed (only branches)
- Multi-branch indexing requires manual configuration per repository
- The default behavior indexes only the default branch for efficiency
Future Enhancements
Section titled “Future Enhancements”Planned improvements include:
- Per-repository branch configuration
- Tag indexing support
- Automatic branch discovery based on patterns
Troubleshooting
Section titled “Troubleshooting”Clone timeout
Section titled “Clone timeout”job failed: clone timeout after 10mIncrease clone_timeout for large repositories:
indexer: clone_timeout: "30m"Index timeout
Section titled “Index timeout”job failed: index timeout after 30mIncrease index_timeout for very large repositories:
indexer: index_timeout: "1h"Out of memory
Section titled “Out of memory”If the indexer is killed by OOM:
- Reduce
concurrency - Increase container/pod memory limits
- Exclude very large repositories
Disk full
Section titled “Disk full”The indexer needs space for:
- Git clones (repos_dir)
- Zoekt indexes (index_dir)
- Temporary files during indexing
Monitor disk usage and increase storage as needed.
Jobs stuck in “running”
Section titled “Jobs stuck in “running””If jobs are stuck:
- Check indexer logs for errors
- Restart the indexer:
docker compose restart indexer - Failed jobs will be retried