Maintenance and Troubleshooting

This section describes alerting, safe maintenance procedures, and troubleshooting tools for High Availability (HA) clusters in NexappOS.

Alerting

Subscription required This feature is available only if the firewall has a valid subscription.

The HA cluster provides automated monitoring and notifications to help administrators respond quickly to failover events or synchronization issues.

Available alerts:

ha:sync:failed Triggered when configuration synchronization between primary and secondary nodes fails. This usually means the secondary node is unreachable due to network issues, hardware failure, or a service interruption.
ha:primary:failed Triggered during failover events when the primary node becomes unavailable and the secondary takes over.

Maintenance

HA is designed to be automatic and low-maintenance, but you may occasionally need to stop one node for upgrades or hardware work.

Secondary node maintenance

You can safely power off the secondary node without affecting traffic.

Stop Keepalived on the secondary:

/etc/init.d/keepalived stop

Perform maintenance.
Start Keepalived again:

/etc/init.d/keepalived start

Primary node maintenance

Stopping the primary node will trigger failover. The secondary becomes Master and takes over VIPs and all services.

Stop Keepalived on the primary:

/etc/init.d/keepalived stop

Perform maintenance.
Start Keepalived again:

/etc/init.d/keepalived start

Remote access

The primary node is reachable from LAN and WAN.
The secondary node is reachable only from LAN while in standby.

To access the secondary from a remote site:

SSH into the primary node.
From the primary, run:

ns-ha-config ssh-remote

This opens an SSH session to the secondary using the key created during HA setup.

Upgrade

The secondary node does not download system updates automatically because it has no direct Internet access while in standby.

To upgrade the secondary:

SSH into the primary node.
Run:

ns-ha-config upgrade-remote

This command downloads the latest image, uploads it to the secondary, installs it, and reboots the secondary.

Troubleshooting

HA troubleshooting often requires checking both nodes. Remember: the secondary node, in standby state, does not have direct Internet access. Therefore:

It cannot resolve external DNS names
It cannot reach the Controller or external portals
It cannot receive system updates directly

Start troubleshooting via SSH access to both nodes.

Identifying the nodes

After first sync, both nodes share the same hostname, but the shell prompt shows role:

Primary prompt: root@Nexapp [P]:~#
Secondary prompt: root@Nexapp [S]:~#

Keepalived status

Check HA/VRRP statistics:

ns-ha-config status

Example excerpt:

Keepalived Statistics:
  advert_rcvd: 249
  advert_sent: 0
  become_master: 1
  release_master: 0
  packet_len_err: 0
  advert_interval_err: 0
  ip_ttl_err: 0
  invalid_type_rcvd: 0
  addr_list_err: 0
  invalid_authtype: 0
  authtype_mismatch: 0
  auth_failure: 0
  pri_zero_rcvd: 1
  pri_zero_sent: 0

Interpretation:

On primary (Master):
- become_master should be ≥ 1
- advert_sent should be > 0 (sending VRRP advertisements)
On secondary (Backup):
- advert_rcvd should be > 0 (receiving advertisements)
- become_master is normally 0 unless failover occurred

VRRP traffic

Primary sends VRRP advertisements every second.

Run on the primary:

tcpdump -vnnpi <lan_interface> vrrp

Replace <lan_interface> with your HA interface name (example: eth0).

You should see advertisements like:

192.168.100.238 > 192.168.100.239: VRRPv2, Advertisement ...

Running the same command on secondary should show VRRP packets being received.

Logs

HA logs are stored in:

/var/log/messages

Useful filters:

rsync synchronization logs:

grep ns-rsync.sh /var/log/messages

SSH sync activity:

grep dropbear /var/log/messages

Keepalived events:

grep Keepalived /var/log/messages

Network imports on secondary:

grep "ns-ha: Importing network configuration" /var/log/messages

Debugging

If logs aren’t enough:

Debug any HA command:

bash -x ns-ha-config <action> [options]

View live keepalived configuration:

cat /tmp/keepalived.conf

Enable keepalived debug logging (primary):

uci set keepalived.primary.debug=1
uci commit keepalived
reload_config

Then search in logs for:

grep Keepalived_vrrp /var/log/messages

Reset HA configuration

Reset restores HA to default state. The primary keeps running, but the secondary should be reset separately afterward to avoid conflicts.

After reset only the HA interface remains active, so reboot is required.

Run on the primary node:

ns-ha-config reset
reboot

Reset will:

Stop/disable keepalived and conntrackd
Remove HA configuration files
Clean dropbear config and HA SSH keys

Nexapp - Maintenance and Troubleshooting