Skip to main content

Monitoring

After a successful setup of a node, the next important step is to set up monitoring for the node. Monitoring is important to keep track of the node's health and performance.

JSON-RPC Endpoint

Test JSON RPC Interface

After the full node starts, you can test the JSON-RPC interfaces.

View Activity on Your Local Full Node with IOTA Explorer

The IOTA Explorer supports connecting to any network as long as it has https enabled. To view activity on your local full node:

Fetch latest checkpoint using JSON-RPC

curl --json '{"jsonrpc":"2.0","id":1,"method":"iota_getLatestCheckpointSequenceNumber","params":[]}' localhost:9000 -s | jq .result

To ensure node health, you can check that this value matches the latest checkpoint known by the rest of the network by using the value found on https://explorer.iota.org/

Node Health Metrics

IOTA nodes expose a wide range of metrics to be scraped by Prometheus. By default, metrics are available at the http://localhost:9184/metrics endpoint. The best way to visualize these metrics is to use Grafana. Additionally, a common approach is to use node exporter to scrape performance metrics from the node and push them to Prometheus.

Fetch key health metrics

Key health metrics via the /metrics HTTP endpoint:

curl -s localhost:9184/metrics | grep -E "^last_executed_checkpoint|^highest_synced_checkpoint|^highest_known_checkpoint|^last_committed_round|^consensus_threshold_clock_round|^highest_received_round|^consensus_proposed_blocks|^uptime"

For instance, for a validator node, the output would be:

consensus_proposed_blocks{force="false"} 247
consensus_proposed_blocks{force="true"} 1
consensus_threshold_clock_round 257
highest_known_checkpoint 555
highest_synced_checkpoint 886
last_executed_checkpoint 890
last_executed_checkpoint_age_bucket{le="0.001"} 0
last_executed_checkpoint_age_bucket{le="0.005"} 0
last_executed_checkpoint_age_bucket{le="0.01"} 0
...
last_executed_checkpoint_age_bucket{le="60"} 891
last_executed_checkpoint_age_bucket{le="90"} 891
last_executed_checkpoint_age_bucket{le="+Inf"} 891
last_executed_checkpoint_age_sum 156.52341099999992
last_executed_checkpoint_age_count 891
last_executed_checkpoint_timestamp_ms 1748335503888
uptime{chain_identifier="b5d7e5c8",is_docker="false",os_version="macOS 15.5 Sequoia",process="validator",version="1.1.0"} 196

Ensure node health using last checkpoint timestamp

To make sure your node runs properly, we check that the last processed checkpoint is recent enough:

  • 10 seconds is typical
  • 30 seconds is still fine
  • You want to check that the timestamp difference stays under 1 minute

You can check that from the previous metric last_executed_checkpoint_timestamp_ms, and compare timestamps with now using this command:

last_executed_checkpoint_timestamp_ms="$(curl -s localhost:9184/metrics | grep ^last_executed_checkpoint_timestamp_ms | awk '{print $2}')"
now_timestamp="$(date +%s%3N)"
if (( now_timestamp - last_executed_checkpoint_timestamp_ms < 60000 )); then
echo "[OK] healthy & in sync"
else
echo "[ERROR] Node unhealthy. Last known checkpoint is too old."
fi

Monitor consensus sync status

To ensure your node's consensus module is properly synced with the network, monitor the difference between consensus_commit_sync_local_index and consensus_commit_sync_quorum_index:

metrics="$(curl -s localhost:9184/metrics)"
local_index="$(echo "$metrics" | grep ^consensus_commit_sync_local_index | awk '{print $2}')"
quorum_index="$(echo "$metrics" | grep ^consensus_commit_sync_quorum_index | awk '{print $2}')"
difference=$((local_index - quorum_index))

if (( difference > 100 )); then
echo "[WARNING] Consensus module not in sync. Difference: $difference"
echo "[INFO] Monitor this difference over time:"
echo " - If growing: Node falling behind network"
echo " - If shrinking: Node catching up to network"
else
echo "[OK] Consensus module in sync. Difference: $difference"
fi

Key indicators:

  • Difference > 100: Node's consensus module is not in sync
  • Growing difference: Network is advancing faster than node is syncing
  • Shrinking difference: Node is correctly syncing and catching up

Monitor validator connections

Check connections to other validators using the consensus_subscribed_by and consensus_subscribed_to metrics, with committee size determined dynamically:

metrics="$(curl -s localhost:9184/metrics)"
committee_size="$(echo "$metrics" | grep ^consensus_last_committed_authority_round | wc -l)"
min_connections=$((committee_size * 80 / 100))

# Count connections with value = 1 and report any with different values
subscribed_by_good="$(echo "$metrics" | grep ^consensus_subscribed_by | awk '{if($NF == 1) count++} END {print count+0}')"
subscribed_to_good="$(echo "$metrics" | grep ^consensus_subscribed_to | awk '{if($NF == 1) count++} END {print count+0}')"

# Report any connections with values != 1
bad_by="$(echo "$metrics" | grep ^consensus_subscribed_by | awk '{if($NF != 1) {match($0, /authority="([^"]*)"/, arr); print arr[1] ": " $NF}}')"
bad_to="$(echo "$metrics" | grep ^consensus_subscribed_to | awk '{if($NF != 1) {match($0, /authority="([^"]*)"/, arr); print arr[1] ": " $NF}}')"

if [[ -n "$bad_by" ]]; then
echo "[WARNING] subscribed_by connections with value != 1 (ignore your own name):"
echo "$bad_by"
fi

if [[ -n "$bad_to" ]]; then
echo "[WARNING] subscribed_to connections with value != 1 (ignore your own name):"
echo "$bad_to"
fi

# Check if both metrics have the same set of authorities
by_authorities="$(echo "$metrics" | grep ^consensus_subscribed_by | awk '{match($0, /authority="([^"]*)"/, arr); print arr[1]}' | sort)"
to_authorities="$(echo "$metrics" | grep ^consensus_subscribed_to | awk '{match($0, /authority="([^"]*)"/, arr); print arr[1]}' | sort)"

if [[ "$by_authorities" != "$to_authorities" ]]; then
echo "[WARNING] Different sets of authorities in subscribed_by vs subscribed_to metrics (ignore your own name)"
echo "Only in subscribed_by: $(comm -23 <(echo "$by_authorities") <(echo "$to_authorities") | tr '\n' ' ')"
echo "Only in subscribed_to: $(comm -13 <(echo "$by_authorities") <(echo "$to_authorities") | tr '\n' ' ')"
fi

if (( subscribed_by_good >= min_connections && subscribed_to_good >= min_connections )); then
echo "[OK] Sufficient validator connections"
else
echo "[WARNING] Low validator connections (need at least 80% of committee)"
echo "Committee size: $committee_size"
echo "Minimum required connections (80%): $min_connections"
echo "Connections subscribed by (value=1): $subscribed_by_good"
echo "Connections subscribed to (value=1): $subscribed_to_good"
fi

difference=$((subscribed_by_good > subscribed_to_good ? subscribed_by_good - subscribed_to_good : subscribed_to_good - subscribed_by_good))
if (( difference > 1 )); then
echo "[WARNING] Asymmetric connections - some validators only reachable one-way"
fi

Connection health indicators:

  • Minimum 80% of committee: Required for healthy operation
  • Value = 1: Healthy connections should have value 1, others indicate issues
  • Equal counts: Both metrics should have similar numbers of good connections
  • Same authority sets: Both metrics should contain the same set of authorities
  • Unequal counts or sets: Indicates one-way connections to some validators

Overall node health

When all the above healthchecks pass, your node should be considered healthy:

  • ✅ Last checkpoint timestamp is recent (< 60 seconds)
  • ✅ Consensus sync difference is acceptable (< 100)
  • ✅ Validator connections are sufficient (≥ 80% of committee)
  • ✅ Connection symmetry is maintained

Logs

Configuring Logs

Log level (error, warn, info, trace) is controlled using the RUST_LOG environment variable. The RUST_LOG_JSON=1 environment variable can optionally be set to enable logging in JSON structured format.

Depending on your deployment method, these are configured in the following places:

[Service]
...
Environment=RUST_BACKTRACE=1
Environment=RUST_LOG=info,iota_core=debug,consensus=debug,jsonrpsee=error

It is possible to change the logging configuration while a node is running using the admin interface.

Verify Configured Logging Values

To view the currently configured logging values:

curl -w "\n" localhost:1337/logging

To change the currently configured logging values:

curl localhost:1337/logging -d "info"

Viewing Logs

To view and follow the IOTA node logs:

journalctl -u iota-node -f

To search for a particular match:

$ journalctl -u iota-node -g <SEARCH_TERM>

Monitoring Services

Implementing monitoring services is essential to ensure the reliability, security, and performance of the blockchain network by providing real-time insights, detecting anomalies, enabling proactive issue resolution, and receiving automatic alerts.

Prometheus and Grafana (recommended)

Example pre-made dashboards you can use:

Dolphin
Dolphin is a CLI tool that provides high-level features for validator and fullnode monitoring. Under the hood, it uses the IOTA node Prometheus metric exporter to check the health of the node. More info: https://gitlab.com/blockscope-net/dolphin-v2.