Skip to main content

Monitoring

After a successful setup of a validator node, the next important step is to set up monitoring. Monitoring is important to keep track of the node's health and performance.

Node Health Metrics

IOTA nodes expose a wide range of metrics to be scraped by Prometheus. By default, metrics are available at the http://localhost:9184/metrics endpoint. The best way to visualize these metrics is to use Grafana. Additionally, a common approach is to use node exporter to scrape performance metrics from the node and push them to Prometheus.

Fetch key health metrics

Key health metrics via the /metrics HTTP endpoint:

curl -s localhost:9184/metrics | grep -E "^last_executed_checkpoint|^highest_synced_checkpoint|^highest_known_checkpoint|^last_committed_round|^consensus_threshold_clock_round|^highest_received_round|^consensus_proposed_blocks|^uptime"

For instance, for a validator node, the output would be:

consensus_proposed_blocks{force="false"} 840272
consensus_proposed_blocks{force="true"} 2255
consensus_threshold_clock_round 1318080
highest_known_checkpoint 60575714
highest_synced_checkpoint 60575710
last_executed_checkpoint 60575714
last_executed_checkpoint_age_ms{pct="50"} 0
last_executed_checkpoint_age_ms{pct="95"} 0
last_executed_checkpoint_age_ms{pct="99"} 0
last_executed_checkpoint_age_ms_count 1342973
last_executed_checkpoint_age_ms_sum 367068033
last_executed_checkpoint_timestamp_ms 1745504535018
uptime{chain_identifier="2304aa97",is_docker="false",os_version="Linux (Ubuntu 24.04)",process="validator",version="0.12.0-rc-7e49c58b826b"} 700369

Ensure node health using last checkpoint timestamp

To make sure your node runs properly, we check that the last processed checkpoint is recent enough:

  • 10 seconds is typical
  • 30 seconds is still fine
  • You want to check that the timestamp difference stays under 1 minute

You can check that from the previous metric last_executed_checkpoint_timestamp_ms, and compare timestamps with now using this command:

last_executed_checkpoint_timestamp_ms="$(curl -s localhost:9184/metrics | grep ^last_executed_checkpoint_timestamp_ms | awk '{print $2}')"
now_timestamp="$(date +%s%3N)"
if (now_timestamp - last_executed_checkpoint_timestamp_ms < 60000); then
echo "[OK] healthy & in sync"
else
echo "[ERROR] Node unhealthy. Last known checkpoint is too old."
fi

Logs

Configuring Logs

Log level (error, warn, info, trace) is controlled using the RUST_LOG environment variable. The RUST_LOG_JSON=1 environment variable can optionally be set to enable logging in JSON structured format.

Depending on your deployment method, these are configured in the following places:

[Service]
...
Environment=RUST_BACKTRACE=1
Environment=RUST_LOG=info,iota_core=debug,consensus=debug,jsonrpsee=error

It is possible to change the logging configuration while a node is running using the admin interface.

Verify Configured Logging Values

To view the currently configured logging values:

curl -w "\n" localhost:1337/logging

To change the currently configured logging values:

curl localhost:1337/logging -d "info"

Viewing Logs

To view and follow the IOTA node logs:

journalctl -u iota-node -f

To search for a particular match:

$ journalctl -u iota-node -g <SEARCH_TERM>

Monitoring Services

Implementing monitoring services is essential to ensure the reliability, security, and performance of the blockchain network by providing real-time insights, detecting anomalies, enabling proactive issue resolution, and receiving automatic alerts.

Prometheus and Grafana (recommended)

Example pre-made dashboards you can use:

Dolphin
Dolphin is a CLI tool that provides high-level features for validator, fullnode, and bridge monitoring. Under the hood, it uses the IOTA node Prometheus metric exporter to check the health of the node. More info: https://gitlab.com/blockscope-net/dolphin-v2.