IOTA Indexer
The iota-indexer binary in writer mode ingests checkpoints from an IOTA checkpoint source, writes them into PostgreSQL tables, and maintains that database over time (migrations on start-up, pruning per retention policy). See the Extended Data Services overview for how it fits together with the JSON-RPC and GraphQL reader services.
The iota-indexer binary also has a run-backfill subcommand — a one-off maintenance tool for filling specific tables over a historical range, typically after a schema change. Not part of normal operation.
Scope
The writer:
- Downloads checkpoints from a remote source (a fullnode gRPC endpoint or a historical checkpoint store).
- Indexes each checkpoint into structured Postgres tables (objects, transactions, events, indexes for
tx_*andevent_*filters, and more). - Runs the needed schema migrations on the database on start-up. A database ahead of the binary (migrations the binary doesn't know about) fails to start; a database behind the binary is migrated forward.
- Optionally runs background pruning tasks that enforce per-table retention policies. See Pruning below.
- Exposes Prometheus metrics for monitoring, e.g. ingestion lag, indexing throughput, etc.
The writer is the main component that writes to the indexer database.
Reader services (JSON-RPC, GraphQL) read from the same database. They write only to optimistic_transactions, objects, and display, as part of optimistic indexing.
Run a single writer instance per database. Running more than one against the same database is not supported.
Dependencies
- A PostgreSQL database. The database must not be ahead of the binary; on start-up the writer applies any pending migrations and refuses to start if the database contains migrations the binary doesn't know about. See Appendix: PostgreSQL configuration for a recommended
postgresql.conf. - A checkpoint source. At least one of:
--remote-store-urlpointing at a fullnode gRPC endpoint (e.g.http://0.0.0.0:50051).--remote-store-urlpointing at a historical checkpoint store hosting batched checkpoint files.--data-ingestion-pathpointing at a local directory that a colocated fullnode populates with checkpoint files.
- Optionally a live checkpoint store (
--live-checkpoints-store-url) that serves the latest checkpoints not yet in the historical archive. Combine it with a historical--remote-store-urlto get full coverage from genesis with low tip latency. See the Hybrid Historical Checkpoint Store section in the overview.
URLs for the publicly hosted historical and live checkpoint stores are listed under Checkpoint sources.
Hardware requirements
Disk is the main requirement and depends on the configured retention. See the sizing table under Pruning for predicted database sizes.
Run the writer on the same network as Postgres to keep latency low.
Running the writer
A minimal start command, ingesting from a fullnode gRPC endpoint into a local Postgres database, without pruning:
iota-indexer \
--db-url postgres://<user>:<password>@<host>:5432/<db> \
indexer \
--remote-store-url http://<fullnode-host>:50051
Alternatively, ingesting from a historical checkpoint store combined with a live checkpoint store for minimal tip latency:
iota-indexer \
--db-url postgres://<user>:<password>@<host>:5432/<db> \
indexer \
--remote-store-url https://checkpoints.mainnet.iota.cafe/ingestion/historical \
--live-checkpoints-store-url https://checkpoints.mainnet.iota.cafe/ingestion/live
For networks other than mainnet, use the matching URLs from the Checkpoint sources section below.
Configuration
Settings come from CLI flags and matching environment variables on the indexer subcommand. The only configuration file the writer reads is the optional pruning retention TOML described in the Pruning configuration file (TOML) section below. All flags are listed in the reference tables at the bottom of the page.
Database connection
--db-url— Postgres connection URL (e.g.postgres://<user>:<password>@<host>:5432/<db>). See the Postgres docs for the full URI syntax.--pool-size— connection pool size. Raise it for higher concurrency, but stay within the Postgresmax_connectionssetting and account for other clients of the same database.--connection-timeout— caps how long a request waits for a free database connection before failing.--statement-timeout— caps how long a single SQL query may run before it's cancelled.
Checkpoint sources
At least one source must be configured. The CLI accepts:
--remote-store-url— a fullnode gRPC URL or a historical checkpoint store URL. A historical checkpoint store covers all checkpoints from genesis; a fullnode only serves the checkpoints it hasn't pruned. Available historical stores:- Mainnet:
https://checkpoints.mainnet.iota.cafe/ingestion/historical - Testnet:
https://checkpoints.testnet.iota.cafe/ingestion/historical - Devnet:
https://checkpoints.devnet.iota.cafe/ingestion/historical
- Mainnet:
--data-ingestion-path— a local filesystem directory that a colocated fullnode populates with checkpoint files.--live-checkpoints-store-url— a live checkpoint store URL. Used as a fallback for the latest checkpoints that aren't yet in the historical archive. Requires--remote-store-urlto also be set. Available live stores:- Mainnet:
https://checkpoints.mainnet.iota.cafe/ingestion/live - Testnet:
https://checkpoints.testnet.iota.cafe/ingestion/live - Devnet:
https://checkpoints.devnet.iota.cafe/ingestion/live
- Mainnet:
Colocated fullnode
--data-ingestion-path requires a colocated fullnode dumping each executed checkpoint into a shared directory. Set data-ingestion-dir in the fullnode YAML and point the writer's --data-ingestion-path at the same path.
Prometheus metrics
The Prometheus endpoint exposes ingestion lag, counts of committed checkpoints, transactions, and epochs, indexing latencies, database connection pool size, etc.
--metrics-address sets the address the Prometheus metrics endpoint binds to (default 0.0.0.0:9184).
Resetting the database
Passing --reset-db on the indexer subcommand drops every table on start-up and re-applies all migrations from scratch. Use it only on a writer instance you intend to fully rebuild — the operation is destructive and affects any reader sharing the same database. It starts ingestion from scratch. Use with care.
Pruning
A running IOTA Indexer accumulates one row per checkpoint, transaction, and event the network produces, so the database grows without bound as long as the network progresses. For deployments that intend to run for months or years, that growth may not be sustainable.
The indexer addresses this with opt-in pruning: the operator declares a retention period (a number of most recent epochs) per table, and the indexer deletes anything older on each epoch boundary. Data older than the retention period becomes inaccessible to reads against the indexer database. A few tables are never pruned; see Non-prunable tables for the list.
Because retention is configured per table, the operator decides which data to retain. Shrinking the retention window of a table shortens the time range over which queries backed by that table return results.
The effect of pruning a table depends on its role and the queries it backs:
- Point lookups (
transactions,events,checkpoints) - direct lookup of a transaction by digest, an event by digest, etc. Trying to fetch an old digest whose data has been pruned will result in an error, unless a KV Store fallback is configured. For point-lookup tables, configuring a fallback lets the data be served for both pruned and unpruned ranges. - Filtered queries (
tx_*,event_*) - list transactions or events by attribute (sender, recipient, emitter package, etc.). Pruned transactions/events will be omitted from the result, with no explicit error returned. This case cannot be handled by the fallback service. - Historical object tables (
objects_backward_history) - back queries that read past versions of objects (e.g., consistent reads at a checkpoint, dynamic-field state at a specific point in time). Pruning shortens the time range over which such historical-state queries can be served; queries that target a checkpoint older than the retained range will fail or return missing data. - Optimistic tables (
optimistic_transactions) - Data in those tables is needed only for short periods of time and doesn't need long retention. It can be safely pruned with no side effects on data availability, as long as current and previous epochs are kept.
The subsections below cover sizing and retention guidance, how to enable pruning, the configuration file format, the schedule on which deletions happen, how to monitor pruner progress, how to reclaim disk space, and a full inventory of prunable and non-prunable tables.
Sizing and retention
Assuming a sustained 500 user TPS, the predicted DB size at typical retention windows is:
| Retention | DB size |
|---|---|
| 2 days | ~604 GB |
| 7 days | ~1.6 TB |
| 30 days | ~6.3 TB |
| 1 year | ~74 TB |
We recommend 30 epochs of retention as a default for all tables, and 2 epochs for objects_backward_history and optimistic_transactions.
Enabling pruning
Pruning is enabled by passing --pruning-config-path <PATH> to the indexer subcommand, pointing at a TOML file that describes retention policies. If no config path is supplied, pruning is disabled.
A sync worker with pruning enabled is started like this:
iota-indexer \
--db-url <DATABASE_URL> \
indexer \
--remote-store-url <REMOTE_STORE_URL> \
--pruning-config-path <PATH_TO_PRUNING_TOML>
Pruning configuration file (TOML)
The TOML file contains a default retention (in epochs) and an optional table of per-table overrides:
# Default retention applied to every prunable table (in epochs).
epochs_to_keep = 10
# Per-table overrides. Keys are the snake_case table names listed below.
[overrides]
transactions = 5
events = 5
tx_senders = 3
Every retention value — both the default and each override — must be strictly greater than zero.
Defaults. When the indexer resolves a final retention map, every prunable table receives the configured epochs_to_keep value. Any value supplied in [overrides] replaces the default for that table.
Selective pruning. The configuration model is all-or-nothing at the worker level: there is no flag to skip a specific table. Once pruning is enabled, the pruner evaluates every prunable table. To approximate per-table opt-out, set a very large epochs_to_keep override for any table you want to retain. For example, to prune only tx_senders and keep the rest effectively forever:
# Default high enough that no real network will ever reach it.
epochs_to_keep = 1_000_000
[overrides]
tx_senders = 5
Inter-table dependencies. Some tables only return useful data when their dependencies are present. For example, the tx_* index tables (tx_senders, tx_calls_*, tx_changed_objects, …) are looked up via the tx_digests table; setting a shorter retention on tx_digests than on the tx_* indexes means the indexes still hold rows that can no longer be resolved to a transaction digest. Keep retention on tx_digests at least as long as the retention on any other tx_* table that depends on it.
How pruning is applied
The pruner runs as a set of background tasks inside the sync worker process:
- A watermarks-update task polls every few seconds. When it detects that the network has advanced to a new epoch, it recomputes each prunable table's lower-bound watermark from the retention policy and writes the new bound into the
watermarkstable. - A per-table pruner task runs in parallel for each prunable table, polling every few seconds. When that table's watermark has been advanced, the task waits a configurable delay (
--pruning-delay-ms, default 2 hours) before actually deleting data — the delay protects in-flight reads against losing rows mid-query. It then deletes in batches sized by--pruning-batch-size(default1000), sleeping briefly between chunks to limit pressure on the database.
As a result, disk usage drops some time after the epoch transition, not at the boundary itself. Likewise, when a sync worker is started with pruning enabled for the first time, deletions will not begin sooner than the configured --pruning-delay-ms after the worker starts.
Monitoring pruning progress
Pruner state is exposed via the watermarks table — one row per table (every prunable table plus a few bookkeeping entries). Useful columns:
| Column | Meaning |
|---|---|
entity | Name of the table this row describes. |
current_epoch | Most recent epoch ingestion has written into the table. |
min_available_epoch / min_available_cp / min_available_tx | Lowest epoch / checkpoint / transaction sequence number still queryable. Anything below is scheduled for pruning. |
lowest_unpruned_key | Next epoch / checkpoint / transaction sequence number the pruner will actually delete from the database. Anything below has already been removed. |
min_bounds_updated_at_timestamp_ms | Timestamp of the last lower-bound update. The 2-hour deletion delay is measured from this value. |
Storage reclamation
Enabling pruning on a previously unpruned indexer — or shortening retention on an already-pruned one — does not immediately return the freed disk space to the operating system. Pruning deletes rows from Postgres; autovacuum reclaims the dead tuples for future inserts but does not shrink the on-disk footprint of tables and indexes.
What pruning does guarantee is bounded growth going forward: pruned tables stop accumulating data past the configured retention window.
To recover the existing on-disk footprint, run a compaction tool against the database (for example VACUUM FULL, REINDEX TABLE CONCURRENTLY, pg_repack, or pg_squeeze). Each option has different operational caveats — VACUUM FULL holds an ACCESS EXCLUSIVE lock for the full table rewrite, while the others run online but need temporary disk space. Review the linked Postgres docs before running any of these against a live database. The IOTA Indexer does not invoke these itself.
To prevent pruned tables from re-bloating over time, autovacuum must run aggressively enough to keep up with the pruner's deletes. The following per-table autovacuum settings are a good starting point for every prunable table:
vacuum_index_cleanup = on
autovacuum_vacuum_scale_factor = 0.01
autovacuum_vacuum_cost_limit = 500
On Docker-hosted Postgres, also raise shm_size to at least 2g — the default 64M is too small for autovacuum to complete on large prunable tables, which silently disables the cleanup.
Prunable tables
Tables are pruned using one of three strategies, determined by how the table is structured:
| Strategy | Mechanism | Disk reclaimed | Tables |
|---|---|---|---|
| By epoch partition | Drop the partition for each retired epoch. | Immediately | transactions, events |
| By checkpoint | DELETE rows below a checkpoint cutoff. | After compaction | checkpoints, pruner_cp_watermark, objects_backward_history |
| By transaction | DELETE rows below a transaction-sequence cutoff. | After compaction | event_emit_package, event_emit_module, event_senders, event_struct_instantiation, event_struct_module, event_struct_name, event_struct_package, tx_calls_pkg, tx_calls_mod, tx_calls_fun, tx_changed_objects, tx_digests, tx_input_objects, tx_kinds, tx_recipients, tx_senders, tx_wrapped_or_deleted_objects, tx_global_order, optimistic_transactions |
Tables not listed here are never pruned by the indexer.
Non-prunable tables
Several tables hold data that is retained for the full lifetime of the database. They are not affected by any retention setting and will not shrink when pruning is enabled. The largest contributors are objects_version, objects_snapshot, and objects.
Appendix: PostgreSQL configuration
The postgresql.conf below is the recommended configuration for an indexer host with roughly 96 GB of RAM and NVMe storage (mainnet-class hardware). Scale memory and parallelism values to your hardware.
# Connections
max_connections = 2000
listen_addresses = '*'
superuser_reserved_connections = 5
# Memory
shared_buffers = 32GB
effective_cache_size = 64GB
work_mem = 64MB
maintenance_work_mem = 2GB
temp_buffers = 64MB
wal_buffers = 64MB
# Write-ahead log
wal_level = replica
synchronous_commit = off
full_page_writes = on
wal_compression = off
wal_writer_delay = 200ms
checkpoint_completion_target = 0.9
checkpoint_timeout = 15min
min_wal_size = 4GB
max_wal_size = 32GB
# Query planner
default_statistics_target = 200
random_page_cost = 1.0
effective_io_concurrency = 200
# Parallelism
max_worker_processes = 24
max_parallel_workers = 16
max_parallel_workers_per_gather = 4
max_parallel_maintenance_workers = 4
# Logging
log_min_duration_statement = 500
log_checkpoints = on
log_autovacuum_min_duration = 250
log_temp_files = 512kB
log_statement = 'none'
log_line_prefix = '%t [%p]: [%l-1] user=%u db=%d app=%a client=%h '
# Other
huge_pages = try
max_files_per_process = 10000
archive_mode = off
# Timeouts
statement_timeout = 6000
Per-table autovacuum tuning and shm_size guidance that pair with this config are documented in Storage reclamation above.
All settings
Top-level iota-indexer flags
| Flag | Env var | Default | Description |
|---|---|---|---|
--db-url | — | (required) | Postgres connection URL. Also accepted as --database-url. |
--pool-size | DB_POOL_SIZE | 100 | Number of parallel Postgres connections held in the pool. |
--connection-timeout | DB_CONNECTION_TIMEOUT | 30 (seconds) | How long a request waits for a free database connection before failing. |
--statement-timeout | DB_STATEMENT_TIMEOUT | 3600 (seconds) | Cap on how long a single SQL query may run before it's cancelled. |
--metrics-address | — | 0.0.0.0:9184 | Address the Prometheus metrics endpoint binds to. |
indexer subcommand flags
| Flag | Env var | Default | Description |
|---|---|---|---|
--remote-store-url | — | (none) | Fullnode gRPC URL or historical checkpoint store URL serving batched checkpoint files. |
--live-checkpoints-store-url | — | (none) | Live checkpoint store URL. Fallback for the latest checkpoints not yet in the historical archive. Requires --remote-store-url. |
--data-ingestion-path | — | (none) | Local directory populated with checkpoint files by a colocated fullnode. |
--checkpoint-download-queue-size | DOWNLOAD_QUEUE_SIZE | 200 | Maximum number of checkpoints buffered for indexing. |
--checkpoint-download-queue-size-bytes | CHECKPOINT_PROCESSING_BATCH_DATA_LIMIT | 20000000 | Byte cap on the checkpoint download queue. |
--checkpoint-download-timeout | INGESTION_READER_TIMEOUT_SECS | 20 | Per-checkpoint download timeout, in seconds. |
--pruning-config-path | — | (none) | Path to a TOML file describing per-table retention policies. Enables pruning when set. See Pruning. |
--pruning-delay-ms | PRUNING_DELAY_MS | 7200000 (ms) | Delay between a watermark's lower bound being advanced and the pruner deleting rows. Lets in-flight reads finish before their data disappears. |
--pruning-batch-size | PRUNING_BATCH_SIZE | 1000 | Upper bound on units (checkpoints, transactions, or global sequence numbers) pruned per chunk, and on rows deleted per statement. |
--reset-db | — | false | Drop all tables and re-apply migrations from scratch on start-up. Starts ingestion from scratch. Use with care. |
At least one of --remote-store-url or --data-ingestion-path must be set.