Skip to main content

IOTA Indexer

The iota-indexer binary in writer mode ingests checkpoints from an IOTA checkpoint source, writes them into PostgreSQL tables, and maintains that database over time (migrations on start-up, pruning per retention policy). See the Extended Data Services overview for how it fits together with the JSON-RPC and GraphQL reader services.

The iota-indexer binary also has a run-backfill subcommand — a one-off maintenance tool for filling specific tables over a historical range, typically after a schema change. Not part of normal operation.

Scope

The writer:

  • Downloads checkpoints from a remote source (a fullnode gRPC endpoint or a historical checkpoint store).
  • Indexes each checkpoint into structured Postgres tables (objects, transactions, events, indexes for tx_* and event_* filters, and more).
  • Runs the needed schema migrations on the database on start-up. A database ahead of the binary (migrations the binary doesn't know about) fails to start; a database behind the binary is migrated forward.
  • Optionally runs background pruning tasks that enforce per-table retention policies. See Pruning below.
  • Exposes Prometheus metrics for monitoring, e.g. ingestion lag, indexing throughput, etc.

The writer is the main component that writes to the indexer database. Reader services (JSON-RPC, GraphQL) read from the same database. They write only to optimistic_transactions, objects, and display, as part of optimistic indexing.

Run a single writer instance per database. Running more than one against the same database is not supported.

Dependencies

  • A PostgreSQL database. The database must not be ahead of the binary; on start-up the writer applies any pending migrations and refuses to start if the database contains migrations the binary doesn't know about. See Appendix: PostgreSQL configuration for a recommended postgresql.conf.
  • A checkpoint source. At least one of:
    • --remote-store-url pointing at a fullnode gRPC endpoint (e.g. http://0.0.0.0:50051).
    • --remote-store-url pointing at a historical checkpoint store hosting batched checkpoint files.
    • --data-ingestion-path pointing at a local directory that a colocated fullnode populates with checkpoint files.
  • Optionally a live checkpoint store (--live-checkpoints-store-url) that serves the latest checkpoints not yet in the historical archive. Combine it with a historical --remote-store-url to get full coverage from genesis with low tip latency. See the Hybrid Historical Checkpoint Store section in the overview.

URLs for the publicly hosted historical and live checkpoint stores are listed under Checkpoint sources.

Hardware requirements

Disk is the main requirement and depends on the configured retention. See the sizing table under Pruning for predicted database sizes.

Run the writer on the same network as Postgres to keep latency low.

Running the writer

A minimal start command, ingesting from a fullnode gRPC endpoint into a local Postgres database, without pruning:

iota-indexer \
--db-url postgres://<user>:<password>@<host>:5432/<db> \
indexer \
--remote-store-url http://<fullnode-host>:50051

Alternatively, ingesting from a historical checkpoint store combined with a live checkpoint store for minimal tip latency:

iota-indexer \
--db-url postgres://<user>:<password>@<host>:5432/<db> \
indexer \
--remote-store-url https://checkpoints.mainnet.iota.cafe/ingestion/historical \
--live-checkpoints-store-url https://checkpoints.mainnet.iota.cafe/ingestion/live

For networks other than mainnet, use the matching URLs from the Checkpoint sources section below.

Configuration

Settings come from CLI flags and matching environment variables on the indexer subcommand. The only configuration file the writer reads is the optional pruning retention TOML described in the Pruning configuration file (TOML) section below. All flags are listed in the reference tables at the bottom of the page.

Database connection

  • --db-url — Postgres connection URL (e.g. postgres://<user>:<password>@<host>:5432/<db>). See the Postgres docs for the full URI syntax.
  • --pool-size — connection pool size. Raise it for higher concurrency, but stay within the Postgres max_connections setting and account for other clients of the same database.
  • --connection-timeout — caps how long a request waits for a free database connection before failing.
  • --statement-timeout — caps how long a single SQL query may run before it's cancelled.

Checkpoint sources

At least one source must be configured. The CLI accepts:

  • --remote-store-url — a fullnode gRPC URL or a historical checkpoint store URL. A historical checkpoint store covers all checkpoints from genesis; a fullnode only serves the checkpoints it hasn't pruned. Available historical stores:
    • Mainnet: https://checkpoints.mainnet.iota.cafe/ingestion/historical
    • Testnet: https://checkpoints.testnet.iota.cafe/ingestion/historical
    • Devnet: https://checkpoints.devnet.iota.cafe/ingestion/historical
  • --data-ingestion-path — a local filesystem directory that a colocated fullnode populates with checkpoint files.
  • --live-checkpoints-store-url — a live checkpoint store URL. Used as a fallback for the latest checkpoints that aren't yet in the historical archive. Requires --remote-store-url to also be set. Available live stores:
    • Mainnet: https://checkpoints.mainnet.iota.cafe/ingestion/live
    • Testnet: https://checkpoints.testnet.iota.cafe/ingestion/live
    • Devnet: https://checkpoints.devnet.iota.cafe/ingestion/live

Colocated fullnode

--data-ingestion-path requires a colocated fullnode dumping each executed checkpoint into a shared directory. Set data-ingestion-dir in the fullnode YAML and point the writer's --data-ingestion-path at the same path.

Prometheus metrics

The Prometheus endpoint exposes ingestion lag, counts of committed checkpoints, transactions, and epochs, indexing latencies, database connection pool size, etc.

--metrics-address sets the address the Prometheus metrics endpoint binds to (default 0.0.0.0:9184).

Resetting the database

Passing --reset-db on the indexer subcommand drops every table on start-up and re-applies all migrations from scratch. Use it only on a writer instance you intend to fully rebuild — the operation is destructive and affects any reader sharing the same database. It starts ingestion from scratch. Use with care.

Pruning

A running IOTA Indexer accumulates one row per checkpoint, transaction, and event the network produces, so the database grows without bound as long as the network progresses. For deployments that intend to run for months or years, that growth may not be sustainable.

The indexer addresses this with opt-in pruning: the operator declares a retention period (a number of most recent epochs) per table, and the indexer deletes anything older on each epoch boundary. Data older than the retention period becomes inaccessible to reads against the indexer database. A few tables are never pruned; see Non-prunable tables for the list.

Because retention is configured per table, the operator decides which data to retain. Shrinking the retention window of a table shortens the time range over which queries backed by that table return results.

The effect of pruning a table depends on its role and the queries it backs:

  • Point lookups (transactions, events, checkpoints) - direct lookup of a transaction by digest, an event by digest, etc. Trying to fetch an old digest whose data has been pruned will result in an error, unless a KV Store fallback is configured. For point-lookup tables, configuring a fallback lets the data be served for both pruned and unpruned ranges.
  • Filtered queries (tx_*, event_*) - list transactions or events by attribute (sender, recipient, emitter package, etc.). Pruned transactions/events will be omitted from the result, with no explicit error returned. This case cannot be handled by the fallback service.
  • Historical object tables (objects_backward_history) - back queries that read past versions of objects (e.g., consistent reads at a checkpoint, dynamic-field state at a specific point in time). Pruning shortens the time range over which such historical-state queries can be served; queries that target a checkpoint older than the retained range will fail or return missing data.
  • Optimistic tables (optimistic_transactions) - Data in those tables is needed only for short periods of time and doesn't need long retention. It can be safely pruned with no side effects on data availability, as long as current and previous epochs are kept.

The subsections below cover sizing and retention guidance, how to enable pruning, the configuration file format, the schedule on which deletions happen, how to monitor pruner progress, how to reclaim disk space, and a full inventory of prunable and non-prunable tables.

Sizing and retention

Assuming a sustained 500 user TPS, the predicted DB size at typical retention windows is:

RetentionDB size
2 days~604 GB
7 days~1.6 TB
30 days~6.3 TB
1 year~74 TB

We recommend 30 epochs of retention as a default for all tables, and 2 epochs for objects_backward_history and optimistic_transactions.

Enabling pruning

Pruning is enabled by passing --pruning-config-path <PATH> to the indexer subcommand, pointing at a TOML file that describes retention policies. If no config path is supplied, pruning is disabled.

A sync worker with pruning enabled is started like this:

iota-indexer \
--db-url <DATABASE_URL> \
indexer \
--remote-store-url <REMOTE_STORE_URL> \
--pruning-config-path <PATH_TO_PRUNING_TOML>

Pruning configuration file (TOML)

The TOML file contains a default retention (in epochs) and an optional table of per-table overrides:

# Default retention applied to every prunable table (in epochs).
epochs_to_keep = 10

# Per-table overrides. Keys are the snake_case table names listed below.
[overrides]
transactions = 5
events = 5
tx_senders = 3

Every retention value — both the default and each override — must be strictly greater than zero.

Defaults. When the indexer resolves a final retention map, every prunable table receives the configured epochs_to_keep value. Any value supplied in [overrides] replaces the default for that table.

Selective pruning. The configuration model is all-or-nothing at the worker level: there is no flag to skip a specific table. Once pruning is enabled, the pruner evaluates every prunable table. To approximate per-table opt-out, set a very large epochs_to_keep override for any table you want to retain. For example, to prune only tx_senders and keep the rest effectively forever:

# Default high enough that no real network will ever reach it.
epochs_to_keep = 1_000_000

[overrides]
tx_senders = 5

Inter-table dependencies. Some tables only return useful data when their dependencies are present. For example, the tx_* index tables (tx_senders, tx_calls_*, tx_changed_objects, …) are looked up via the tx_digests table; setting a shorter retention on tx_digests than on the tx_* indexes means the indexes still hold rows that can no longer be resolved to a transaction digest. Keep retention on tx_digests at least as long as the retention on any other tx_* table that depends on it.

How pruning is applied

The pruner runs as a set of background tasks inside the sync worker process:

  • A watermarks-update task polls every few seconds. When it detects that the network has advanced to a new epoch, it recomputes each prunable table's lower-bound watermark from the retention policy and writes the new bound into the watermarks table.
  • A per-table pruner task runs in parallel for each prunable table, polling every few seconds. When that table's watermark has been advanced, the task waits a configurable delay (--pruning-delay-ms, default 2 hours) before actually deleting data — the delay protects in-flight reads against losing rows mid-query. It then deletes in batches sized by --pruning-batch-size (default 1000), sleeping briefly between chunks to limit pressure on the database.

As a result, disk usage drops some time after the epoch transition, not at the boundary itself. Likewise, when a sync worker is started with pruning enabled for the first time, deletions will not begin sooner than the configured --pruning-delay-ms after the worker starts.

Monitoring pruning progress

Pruner state is exposed via the watermarks table — one row per table (every prunable table plus a few bookkeeping entries). Useful columns:

ColumnMeaning
entityName of the table this row describes.
current_epochMost recent epoch ingestion has written into the table.
min_available_epoch / min_available_cp / min_available_txLowest epoch / checkpoint / transaction sequence number still queryable. Anything below is scheduled for pruning.
lowest_unpruned_keyNext epoch / checkpoint / transaction sequence number the pruner will actually delete from the database. Anything below has already been removed.
min_bounds_updated_at_timestamp_msTimestamp of the last lower-bound update. The 2-hour deletion delay is measured from this value.

Storage reclamation

Enabling pruning on a previously unpruned indexer — or shortening retention on an already-pruned one — does not immediately return the freed disk space to the operating system. Pruning deletes rows from Postgres; autovacuum reclaims the dead tuples for future inserts but does not shrink the on-disk footprint of tables and indexes.

What pruning does guarantee is bounded growth going forward: pruned tables stop accumulating data past the configured retention window.

To recover the existing on-disk footprint, run a compaction tool against the database (for example VACUUM FULL, REINDEX TABLE CONCURRENTLY, pg_repack, or pg_squeeze). Each option has different operational caveats — VACUUM FULL holds an ACCESS EXCLUSIVE lock for the full table rewrite, while the others run online but need temporary disk space. Review the linked Postgres docs before running any of these against a live database. The IOTA Indexer does not invoke these itself.

To prevent pruned tables from re-bloating over time, autovacuum must run aggressively enough to keep up with the pruner's deletes. The following per-table autovacuum settings are a good starting point for every prunable table:

vacuum_index_cleanup           = on
autovacuum_vacuum_scale_factor = 0.01
autovacuum_vacuum_cost_limit = 500

On Docker-hosted Postgres, also raise shm_size to at least 2g — the default 64M is too small for autovacuum to complete on large prunable tables, which silently disables the cleanup.

Prunable tables

Tables are pruned using one of three strategies, determined by how the table is structured:

StrategyMechanismDisk reclaimedTables
By epoch partitionDrop the partition for each retired epoch.Immediatelytransactions, events
By checkpointDELETE rows below a checkpoint cutoff.After compactioncheckpoints, pruner_cp_watermark, objects_backward_history
By transactionDELETE rows below a transaction-sequence cutoff.After compactionevent_emit_package, event_emit_module, event_senders, event_struct_instantiation, event_struct_module, event_struct_name, event_struct_package, tx_calls_pkg, tx_calls_mod, tx_calls_fun, tx_changed_objects, tx_digests, tx_input_objects, tx_kinds, tx_recipients, tx_senders, tx_wrapped_or_deleted_objects, tx_global_order, optimistic_transactions

Tables not listed here are never pruned by the indexer.

Non-prunable tables

Several tables hold data that is retained for the full lifetime of the database. They are not affected by any retention setting and will not shrink when pruning is enabled. The largest contributors are objects_version, objects_snapshot, and objects.

Appendix: PostgreSQL configuration

The postgresql.conf below is the recommended configuration for an indexer host with roughly 96 GB of RAM and NVMe storage (mainnet-class hardware). Scale memory and parallelism values to your hardware.

# Connections
max_connections = 2000
listen_addresses = '*'
superuser_reserved_connections = 5

# Memory
shared_buffers = 32GB
effective_cache_size = 64GB
work_mem = 64MB
maintenance_work_mem = 2GB
temp_buffers = 64MB
wal_buffers = 64MB

# Write-ahead log
wal_level = replica
synchronous_commit = off
full_page_writes = on
wal_compression = off
wal_writer_delay = 200ms
checkpoint_completion_target = 0.9
checkpoint_timeout = 15min
min_wal_size = 4GB
max_wal_size = 32GB

# Query planner
default_statistics_target = 200
random_page_cost = 1.0
effective_io_concurrency = 200

# Parallelism
max_worker_processes = 24
max_parallel_workers = 16
max_parallel_workers_per_gather = 4
max_parallel_maintenance_workers = 4

# Logging
log_min_duration_statement = 500
log_checkpoints = on
log_autovacuum_min_duration = 250
log_temp_files = 512kB
log_statement = 'none'
log_line_prefix = '%t [%p]: [%l-1] user=%u db=%d app=%a client=%h '

# Other
huge_pages = try
max_files_per_process = 10000
archive_mode = off

# Timeouts
statement_timeout = 6000

Per-table autovacuum tuning and shm_size guidance that pair with this config are documented in Storage reclamation above.

All settings

Top-level iota-indexer flags

FlagEnv varDefaultDescription
--db-url(required)Postgres connection URL. Also accepted as --database-url.
--pool-sizeDB_POOL_SIZE100Number of parallel Postgres connections held in the pool.
--connection-timeoutDB_CONNECTION_TIMEOUT30 (seconds)How long a request waits for a free database connection before failing.
--statement-timeoutDB_STATEMENT_TIMEOUT3600 (seconds)Cap on how long a single SQL query may run before it's cancelled.
--metrics-address0.0.0.0:9184Address the Prometheus metrics endpoint binds to.

indexer subcommand flags

FlagEnv varDefaultDescription
--remote-store-url(none)Fullnode gRPC URL or historical checkpoint store URL serving batched checkpoint files.
--live-checkpoints-store-url(none)Live checkpoint store URL. Fallback for the latest checkpoints not yet in the historical archive. Requires --remote-store-url.
--data-ingestion-path(none)Local directory populated with checkpoint files by a colocated fullnode.
--checkpoint-download-queue-sizeDOWNLOAD_QUEUE_SIZE200Maximum number of checkpoints buffered for indexing.
--checkpoint-download-queue-size-bytesCHECKPOINT_PROCESSING_BATCH_DATA_LIMIT20000000Byte cap on the checkpoint download queue.
--checkpoint-download-timeoutINGESTION_READER_TIMEOUT_SECS20Per-checkpoint download timeout, in seconds.
--pruning-config-path(none)Path to a TOML file describing per-table retention policies. Enables pruning when set. See Pruning.
--pruning-delay-msPRUNING_DELAY_MS7200000 (ms)Delay between a watermark's lower bound being advanced and the pruner deleting rows. Lets in-flight reads finish before their data disappears.
--pruning-batch-sizePRUNING_BATCH_SIZE1000Upper bound on units (checkpoints, transactions, or global sequence numbers) pruned per chunk, and on rows deleted per statement.
--reset-dbfalseDrop all tables and re-apply migrations from scratch on start-up. Starts ingestion from scratch. Use with care.

At least one of --remote-store-url or --data-ingestion-path must be set.