Skip to main content

Monitoring

After a successful setup of a validator node, the next important step is to set up monitoring. Monitoring is important to keep track of the node's health and performance.

Node Health Metrics

IOTA nodes expose a wide range of metrics to be scraped by Prometheus. By default, metrics are available at the http://localhost:9184/metrics endpoint. The best way to visualize these metrics is to use Grafana. Additionally, a common approach is to use node exporter to scrape performance metrics from the node and push them to Prometheus.

Fetch key health metrics

Key health metrics via the /metrics HTTP endpoint:

curl -s localhost:9184/metrics | grep -E "^last_executed_checkpoint|^highest_synced_checkpoint|^highest_known_checkpoint|^last_committed_round|^consensus_threshold_clock_round|^highest_received_round|^consensus_proposed_blocks|^uptime"

For instance, for a validator node, the output would be:

consensus_proposed_blocks{force="false"} 247
consensus_proposed_blocks{force="true"} 1
consensus_threshold_clock_round 257
highest_known_checkpoint 555
highest_synced_checkpoint 886
last_executed_checkpoint 890
last_executed_checkpoint_age_bucket{le="0.001"} 0
last_executed_checkpoint_age_bucket{le="0.005"} 0
last_executed_checkpoint_age_bucket{le="0.01"} 0
...
last_executed_checkpoint_age_bucket{le="60"} 891
last_executed_checkpoint_age_bucket{le="90"} 891
last_executed_checkpoint_age_bucket{le="+Inf"} 891
last_executed_checkpoint_age_sum 156.52341099999992
last_executed_checkpoint_age_count 891
last_executed_checkpoint_timestamp_ms 1748335503888
uptime{chain_identifier="b5d7e5c8",is_docker="false",os_version="macOS 15.5 Sequoia",process="validator",version="1.1.0"} 196

Ensure node health using last checkpoint timestamp

To make sure your node runs properly, we check that the last processed checkpoint is recent enough:

  • 10 seconds is typical
  • 30 seconds is still fine
  • You want to check that the timestamp difference stays under 1 minute

You can check that from the previous metric last_executed_checkpoint_timestamp_ms, and compare timestamps with now using this command:

last_executed_checkpoint_timestamp_ms="$(curl -s localhost:9184/metrics | grep ^last_executed_checkpoint_timestamp_ms | awk '{print $2}')"
now_timestamp="$(date +%s%3N)"
if (( now_timestamp - last_executed_checkpoint_timestamp_ms < 60000 )); then
echo "[OK] healthy & in sync"
else
echo "[ERROR] Node unhealthy. Last known checkpoint is too old."
fi

Monitor consensus sync status

To ensure your node's consensus module is properly synced with the network, monitor the difference between consensus_commit_sync_local_index and consensus_commit_sync_quorum_index:

metrics="$(curl -s localhost:9184/metrics)"
local_index="$(echo "$metrics" | grep ^consensus_commit_sync_local_index | awk '{print $2}')"
quorum_index="$(echo "$metrics" | grep ^consensus_commit_sync_quorum_index | awk '{print $2}')"
difference=$((local_index - quorum_index))

if (( difference > 100 )); then
echo "[WARNING] Consensus module not in sync. Difference: $difference"
echo "[INFO] Monitor this difference over time:"
echo " - If growing: Node falling behind network"
echo " - If shrinking: Node catching up to network"
else
echo "[OK] Consensus module in sync. Difference: $difference"
fi

Key indicators:

  • Difference > 100: Node's consensus module is not in sync
  • Growing difference: Network is advancing faster than node is syncing
  • Shrinking difference: Node is correctly syncing and catching up

Monitor validator connections

Check connections to other validators using the consensus_subscribed_by and consensus_subscribed_to metrics, with committee size determined dynamically:

metrics="$(curl -s localhost:9184/metrics)"
committee_size="$(echo "$metrics" | grep ^consensus_last_committed_authority_round | wc -l)"
min_connections=$((committee_size * 80 / 100))

# Count connections with value = 1 and report any with different values
subscribed_by_good="$(echo "$metrics" | grep ^consensus_subscribed_by | awk '{if($NF == 1) count++} END {print count+0}')"
subscribed_to_good="$(echo "$metrics" | grep ^consensus_subscribed_to | awk '{if($NF == 1) count++} END {print count+0}')"

# Report any connections with values != 1
bad_by="$(echo "$metrics" | grep ^consensus_subscribed_by | awk '{if($NF != 1) {match($0, /authority="([^"]*)"/, arr); print arr[1] ": " $NF}}')"
bad_to="$(echo "$metrics" | grep ^consensus_subscribed_to | awk '{if($NF != 1) {match($0, /authority="([^"]*)"/, arr); print arr[1] ": " $NF}}')"

if [[ -n "$bad_by" ]]; then
echo "[WARNING] subscribed_by connections with value != 1 (ignore your own name):"
echo "$bad_by"
fi

if [[ -n "$bad_to" ]]; then
echo "[WARNING] subscribed_to connections with value != 1 (ignore your own name):"
echo "$bad_to"
fi

# Check if both metrics have the same set of authorities
by_authorities="$(echo "$metrics" | grep ^consensus_subscribed_by | awk '{match($0, /authority="([^"]*)"/, arr); print arr[1]}' | sort)"
to_authorities="$(echo "$metrics" | grep ^consensus_subscribed_to | awk '{match($0, /authority="([^"]*)"/, arr); print arr[1]}' | sort)"

if [[ "$by_authorities" != "$to_authorities" ]]; then
echo "[WARNING] Different sets of authorities in subscribed_by vs subscribed_to metrics (ignore your own name)"
echo "Only in subscribed_by: $(comm -23 <(echo "$by_authorities") <(echo "$to_authorities") | tr '\n' ' ')"
echo "Only in subscribed_to: $(comm -13 <(echo "$by_authorities") <(echo "$to_authorities") | tr '\n' ' ')"
fi

if (( subscribed_by_good >= min_connections && subscribed_to_good >= min_connections )); then
echo "[OK] Sufficient validator connections"
else
echo "[WARNING] Low validator connections (need at least 80% of committee)"
echo "Committee size: $committee_size"
echo "Minimum required connections (80%): $min_connections"
echo "Connections subscribed by (value=1): $subscribed_by_good"
echo "Connections subscribed to (value=1): $subscribed_to_good"
fi

difference=$((subscribed_by_good > subscribed_to_good ? subscribed_by_good - subscribed_to_good : subscribed_to_good - subscribed_by_good))
if (( difference > 1 )); then
echo "[WARNING] Asymmetric connections - some validators only reachable one-way"
fi

Connection health indicators:

  • Minimum 80% of committee: Required for healthy operation
  • Value = 1: Healthy connections should have value 1, others indicate issues
  • Equal counts: Both metrics should have similar numbers of good connections
  • Same authority sets: Both metrics should contain the same set of authorities
  • Unequal counts or sets: Indicates one-way connections to some validators

Overall node health

When all the above healthchecks pass, your node should be considered healthy:

  • ✅ Last checkpoint timestamp is recent (< 60 seconds)
  • ✅ Consensus sync difference is acceptable (< 100)
  • ✅ Validator connections are sufficient (≥ 80% of committee)
  • ✅ Connection symmetry is maintained

Logs

Configuring Logs

Log level (error, warn, info, trace) is controlled using the RUST_LOG environment variable. The RUST_LOG_JSON=1 environment variable can optionally be set to enable logging in JSON structured format.

Depending on your deployment method, these are configured in the following places:

[Service]
...
Environment=RUST_BACKTRACE=1
Environment=RUST_LOG=info,iota_core=debug,consensus=debug,jsonrpsee=error

It is possible to change the logging configuration while a node is running using the admin interface.

Verify Configured Logging Values

To view the currently configured logging values:

curl -w "\n" localhost:1337/logging

To change the currently configured logging values:

curl localhost:1337/logging -d "info"

Viewing Logs

To view and follow the IOTA node logs:

journalctl -u iota-node -f

To search for a particular match:

$ journalctl -u iota-node -g <SEARCH_TERM>

Monitoring Services

Implementing monitoring services is essential to ensure the reliability, security, and performance of the blockchain network by providing real-time insights, detecting anomalies, enabling proactive issue resolution, and receiving automatic alerts.

Prometheus and Grafana (recommended)

Example pre-made dashboards you can use:

Dolphin
Dolphin is a CLI tool that provides high-level features for validator and fullnode monitoring. Under the hood, it uses the IOTA node Prometheus metric exporter to check the health of the node. More info: https://gitlab.com/blockscope-net/dolphin-v2.