Nexapp - Maintenance and Troubleshooting
Maintenance and Troubleshooting
This section describes alerting, safe maintenance procedures, and troubleshooting tools for High Availability (HA) clusters in NexappOS.
Alerting
Subscription required This feature is available only if the firewall has a valid subscription.
The HA cluster provides automated monitoring and notifications to help administrators respond quickly to failover events or synchronization issues.
Available alerts:
ha:sync:failed Triggered when configuration synchronization between primary and secondary nodes fails. This usually means the secondary node is unreachable due to network issues, hardware failure, or a service interruption.
ha:primary:failed Triggered during failover events when the primary node becomes unavailable and the secondary takes over.
Maintenance
HA is designed to be automatic and low-maintenance, but you may occasionally need to stop one node for upgrades or hardware work.
Secondary node maintenance
You can safely power off the secondary node without affecting traffic.
- Stop Keepalived on the secondary:
/etc/init.d/keepalived stop
Perform maintenance.
Start Keepalived again:
/etc/init.d/keepalived start
Primary node maintenance
Stopping the primary node will trigger failover. The secondary becomes Master and takes over VIPs and all services.
- Stop Keepalived on the primary:
/etc/init.d/keepalived stop
Perform maintenance.
Start Keepalived again:
/etc/init.d/keepalived start
Remote access
- The primary node is reachable from LAN and WAN.
- The secondary node is reachable only from LAN while in standby.
To access the secondary from a remote site:
- SSH into the primary node.
- From the primary, run:
ns-ha-config ssh-remote
This opens an SSH session to the secondary using the key created during HA setup.
Upgrade
The secondary node does not download system updates automatically because it has no direct Internet access while in standby.
To upgrade the secondary:
- SSH into the primary node.
- Run:
ns-ha-config upgrade-remote
This command downloads the latest image, uploads it to the secondary, installs it, and reboots the secondary.
Troubleshooting
HA troubleshooting often requires checking both nodes. Remember: the secondary node, in standby state, does not have direct Internet access. Therefore:
- It cannot resolve external DNS names
- It cannot reach the Controller or external portals
- It cannot receive system updates directly
Start troubleshooting via SSH access to both nodes.
Identifying the nodes
After first sync, both nodes share the same hostname, but the shell prompt shows role:
Primary prompt:
root@Nexapp [P]:~#Secondary prompt:
root@Nexapp [S]:~#
Keepalived status
Check HA/VRRP statistics:
ns-ha-config status
Example excerpt:
Keepalived Statistics:
advert_rcvd: 249
advert_sent: 0
become_master: 1
release_master: 0
packet_len_err: 0
advert_interval_err: 0
ip_ttl_err: 0
invalid_type_rcvd: 0
addr_list_err: 0
invalid_authtype: 0
authtype_mismatch: 0
auth_failure: 0
pri_zero_rcvd: 1
pri_zero_sent: 0
Interpretation:
On primary (Master):
become_mastershould be ≥ 1advert_sentshould be > 0 (sending VRRP advertisements)
On secondary (Backup):
advert_rcvdshould be > 0 (receiving advertisements)become_masteris normally 0 unless failover occurred
VRRP traffic
Primary sends VRRP advertisements every second.
Run on the primary:
tcpdump -vnnpi <lan_interface> vrrp
Replace <lan_interface> with your HA interface name (example: eth0).
You should see advertisements like:
192.168.100.238 > 192.168.100.239: VRRPv2, Advertisement ...
Running the same command on secondary should show VRRP packets being received.
Logs
HA logs are stored in:
/var/log/messages
Useful filters:
- rsync synchronization logs:
grep ns-rsync.sh /var/log/messages
- SSH sync activity:
grep dropbear /var/log/messages
- Keepalived events:
grep Keepalived /var/log/messages
- Network imports on secondary:
grep "ns-ha: Importing network configuration" /var/log/messages
Debugging
If logs aren’t enough:
- Debug any HA command:
bash -x ns-ha-config <action> [options]
- View live keepalived configuration:
cat /tmp/keepalived.conf
- Enable keepalived debug logging (primary):
uci set keepalived.primary.debug=1
uci commit keepalived
reload_config
Then search in logs for:
grep Keepalived_vrrp /var/log/messages
Reset HA configuration
Reset restores HA to default state. The primary keeps running, but the secondary should be reset separately afterward to avoid conflicts.
After reset only the HA interface remains active, so reboot is required.
Run on the primary node:
ns-ha-config reset
reboot
Reset will:
- Stop/disable keepalived and conntrackd
- Remove HA configuration files
- Clean dropbear config and HA SSH keys