Monitoring
After a successful setup of a validator node, the next important step is to set up monitoring. Monitoring is important to keep track of the node's health and performance.
Node Health Metrics
IOTA nodes expose a wide range of metrics to be scraped by Prometheus.
By default, metrics are available at the http://localhost:9184/metrics
endpoint.
The best way to visualize these metrics is to use Grafana.
Additionally, a common approach is to use node exporter to scrape performance metrics from the node and push them to Prometheus.
Fetch key health metrics
Key health metrics via the /metrics
HTTP endpoint:
curl -s localhost:9184/metrics | grep -E "^last_executed_checkpoint|^highest_synced_checkpoint|^highest_known_checkpoint|^last_committed_round|^consensus_threshold_clock_round|^highest_received_round|^consensus_proposed_blocks|^uptime"
For instance, for a validator node, the output would be:
consensus_proposed_blocks{force="false"} 247
consensus_proposed_blocks{force="true"} 1
consensus_threshold_clock_round 257
highest_known_checkpoint 555
highest_synced_checkpoint 886
last_executed_checkpoint 890
last_executed_checkpoint_age_bucket{le="0.001"} 0
last_executed_checkpoint_age_bucket{le="0.005"} 0
last_executed_checkpoint_age_bucket{le="0.01"} 0
...
last_executed_checkpoint_age_bucket{le="60"} 891
last_executed_checkpoint_age_bucket{le="90"} 891
last_executed_checkpoint_age_bucket{le="+Inf"} 891
last_executed_checkpoint_age_sum 156.52341099999992
last_executed_checkpoint_age_count 891
last_executed_checkpoint_timestamp_ms 1748335503888
uptime{chain_identifier="b5d7e5c8",is_docker="false",os_version="macOS 15.5 Sequoia",process="validator",version="1.1.0"} 196
Ensure node health using last checkpoint timestamp
To make sure your node runs properly, we check that the last processed checkpoint is recent enough:
- 10 seconds is typical
- 30 seconds is still fine
- You want to check that the timestamp difference stays under 1 minute
You can check that from the previous metric last_executed_checkpoint_timestamp_ms
, and compare timestamps with now using this command:
last_executed_checkpoint_timestamp_ms="$(curl -s localhost:9184/metrics | grep ^last_executed_checkpoint_timestamp_ms | awk '{print $2}')"
now_timestamp="$(date +%s%3N)"
if (( now_timestamp - last_executed_checkpoint_timestamp_ms < 60000 )); then
echo "[OK] healthy & in sync"
else
echo "[ERROR] Node unhealthy. Last known checkpoint is too old."
fi
Monitor consensus sync status
To ensure your node's consensus module is properly synced with the network, monitor the difference between consensus_commit_sync_local_index
and consensus_commit_sync_quorum_index
:
metrics="$(curl -s localhost:9184/metrics)"
local_index="$(echo "$metrics" | grep ^consensus_commit_sync_local_index | awk '{print $2}')"
quorum_index="$(echo "$metrics" | grep ^consensus_commit_sync_quorum_index | awk '{print $2}')"
difference=$((local_index - quorum_index))
if (( difference > 100 )); then
echo "[WARNING] Consensus module not in sync. Difference: $difference"
echo "[INFO] Monitor this difference over time:"
echo " - If growing: Node falling behind network"
echo " - If shrinking: Node catching up to network"
else
echo "[OK] Consensus module in sync. Difference: $difference"
fi
Key indicators:
- Difference > 100: Node's consensus module is not in sync
- Growing difference: Network is advancing faster than node is syncing
- Shrinking difference: Node is correctly syncing and catching up
Monitor validator connections
Check connections to other validators using the consensus_subscribed_by
and consensus_subscribed_to
metrics, with committee size determined dynamically:
metrics="$(curl -s localhost:9184/metrics)"
committee_size="$(echo "$metrics" | grep ^consensus_last_committed_authority_round | wc -l)"
min_connections=$((committee_size * 80 / 100))
# Count connections with value = 1 and report any with different values
subscribed_by_good="$(echo "$metrics" | grep ^consensus_subscribed_by | awk '{if($NF == 1) count++} END {print count+0}')"
subscribed_to_good="$(echo "$metrics" | grep ^consensus_subscribed_to | awk '{if($NF == 1) count++} END {print count+0}')"
# Report any connections with values != 1
bad_by="$(echo "$metrics" | grep ^consensus_subscribed_by | awk '{if($NF != 1) {match($0, /authority="([^"]*)"/, arr); print arr[1] ": " $NF}}')"
bad_to="$(echo "$metrics" | grep ^consensus_subscribed_to | awk '{if($NF != 1) {match($0, /authority="([^"]*)"/, arr); print arr[1] ": " $NF}}')"
if [[ -n "$bad_by" ]]; then
echo "[WARNING] subscribed_by connections with value != 1 (ignore your own name):"
echo "$bad_by"
fi
if [[ -n "$bad_to" ]]; then
echo "[WARNING] subscribed_to connections with value != 1 (ignore your own name):"
echo "$bad_to"
fi
# Check if both metrics have the same set of authorities
by_authorities="$(echo "$metrics" | grep ^consensus_subscribed_by | awk '{match($0, /authority="([^"]*)"/, arr); print arr[1]}' | sort)"
to_authorities="$(echo "$metrics" | grep ^consensus_subscribed_to | awk '{match($0, /authority="([^"]*)"/, arr); print arr[1]}' | sort)"
if [[ "$by_authorities" != "$to_authorities" ]]; then
echo "[WARNING] Different sets of authorities in subscribed_by vs subscribed_to metrics (ignore your own name)"
echo "Only in subscribed_by: $(comm -23 <(echo "$by_authorities") <(echo "$to_authorities") | tr '\n' ' ')"
echo "Only in subscribed_to: $(comm -13 <(echo "$by_authorities") <(echo "$to_authorities") | tr '\n' ' ')"
fi
if (( subscribed_by_good >= min_connections && subscribed_to_good >= min_connections )); then
echo "[OK] Sufficient validator connections"
else
echo "[WARNING] Low validator connections (need at least 80% of committee)"
echo "Committee size: $committee_size"
echo "Minimum required connections (80%): $min_connections"
echo "Connections subscribed by (value=1): $subscribed_by_good"
echo "Connections subscribed to (value=1): $subscribed_to_good"
fi
difference=$((subscribed_by_good > subscribed_to_good ? subscribed_by_good - subscribed_to_good : subscribed_to_good - subscribed_by_good))
if (( difference > 1 )); then
echo "[WARNING] Asymmetric connections - some validators only reachable one-way"
fi
Connection health indicators:
- Minimum 80% of committee: Required for healthy operation
- Value = 1: Healthy connections should have value 1, others indicate issues
- Equal counts: Both metrics should have similar numbers of good connections
- Same authority sets: Both metrics should contain the same set of authorities
- Unequal counts or sets: Indicates one-way connections to some validators
Overall node health
When all the above healthchecks pass, your node should be considered healthy:
- ✅ Last checkpoint timestamp is recent (< 60 seconds)
- ✅ Consensus sync difference is acceptable (< 100)
- ✅ Validator connections are sufficient (≥ 80% of committee)
- ✅ Connection symmetry is maintained
Logs
Configuring Logs
Log level (error, warn, info, trace) is controlled using the RUST_LOG
environment variable.
The RUST_LOG_JSON=1
environment variable can optionally be set to enable logging in JSON structured format.
Depending on your deployment method, these are configured in the following places:
- Systemd
- Docker Compose
[Service]
...
Environment=RUST_BACKTRACE=1
Environment=RUST_LOG=info,iota_core=debug,consensus=debug,jsonrpsee=error
Add the following to the node container settings:
environment:
- RUST_BACKTRACE=1
- RUST_LOG=info,iota_core=debug,consensus=debug,jsonrpsee=error
It is possible to change the logging configuration while a node is running using the admin interface.
Verify Configured Logging Values
- Systemd
- Docker Compose
To view the currently configured logging values:
curl -w "\n" localhost:1337/logging
To change the currently configured logging values:
curl localhost:1337/logging -d "info"
Note that the admin port (1337
) is only exposed to localhost
by default, so the commands must be executed inside the container.
To view the currently configured logging values:
docker exec <FULLNODE_CONTAINER_NAME> curl -w "\n" localhost:1337/logging
To change the currently configured logging values:
docker exec <FULLNODE_CONTAINER_NAME> curl localhost:1337/logging -d "info"
Replace <FULLNODE_CONTAINER_NAME>
with your actual container name, such as iota-fullnode-docker-setup-fullnode-1
.
Viewing Logs
- Systemd
- Docker Compose
To view and follow the IOTA node logs:
journalctl -u iota-node -f
To search for a particular match:
$ journalctl -u iota-node -g <SEARCH_TERM>
View and follow:
sudo docker compose logs -f [node_container_name]
By default, all logs are output. Limit this using --since
:
sudo docker logs --since 10m -f [node_container_name]
Monitoring Services
Implementing monitoring services is essential to ensure the reliability, security, and performance of the blockchain network by providing real-time insights, detecting anomalies, enabling proactive issue resolution, and receiving automatic alerts.
Prometheus and Grafana (recommended)
Example pre-made dashboards you can use:
- Community-owned grafana dashboard: https://github.com/stakeme-team/grafana-iota.
- Grafana setup for the local private network, which might be a good example of how to build your own setup.
- Officially supported dashboards can be found here in the IOTA repository.
Dolphin
Dolphin is a CLI tool that provides high-level features for validator and fullnode monitoring. Under the hood, it uses the IOTA node Prometheus metric exporter to check the health of the node. More info: https://gitlab.com/blockscope-net/dolphin-v2.