Juniper SRX – Minimal Downtime Upgrade of an HA Cluste

Please note that this describes the process to upgrade an HA pair at JunOS code pre-11. Newer versions of the JunOS code allow for upgrading without corrupting the policy of the peer devices.

!! Note: interface names are the physical and not logical names
!! The following assumes node0 is master and node1 is backup
01.) download package to /var/tmp on both devices
02.) Disable node1\'s interfaces by running the following on node0. Commit will replicate to node1
  set interfaces ge-8/0/0 disable        [-- Should be node1's interfaces, NOT node0's
  set interfaces ge-8/0/1 disable
  set interfaces ge-8/0/2 disable
  set interfaces ge-8/0/3 disable
  set interfaces ge-8/0/4 disable
  set interfaces ge-8/0/5 disable
  set interfaces ge-8/0/6 disable
  set interfaces ge-8/0/7 disable
  set interfaces ge-8/0/8 disable
03.) Disable requiring three way handshake for session on node 0 (primary)
  set security flow tcp-session no-syn-check
  set security flow tcp-session no-sequence-check
04.) Save on node 0 (primary)
  commit
05.) Disconnect the fiber link (fab# interfaces) and the control interface cables
06.) Commit on both devices
07.) Upgrade node 1 (Backup)
  request system software add /var/tmp/junos-srx1k3k-10.4R3.4-domestic.tgz no-validate no-copy
  request system reboot
08.) Perform the following on node 1 (currently backup and now newly upgraded) to verify
  show version
  show chassis cluster status
  show chassis fpc pic-status
09.) After running "show chassis fpc pic-status," wait for the slots to come online, not Present before going to step 10
10.) Node 0 then Node 1, perform ALL the following commands
  delete interfaces ge-8/0/0 disable
  delete interfaces ge-8/0/1 disable
  delete interfaces ge-8/0/2 disable
  delete interfaces ge-8/0/3 disable
  delete interfaces ge-8/0/4 disable
  delete interfaces ge-8/0/5 disable
  delete interfaces ge-8/0/6 disable
  delete interfaces ge-8/0/7 disable
  delete interfaces ge-8/0/8 disable
  set interfaces ge-0/0/0 disable
  set interfaces ge-0/0/1 disable
  set interfaces ge-0/0/2 disable
  set interfaces ge-0/0/3 disable
  set interfaces ge-0/0/4 disable
  set interfaces ge-0/0/5 disable
  set interfaces ge-0/0/6 disable
  set interfaces ge-0/0/7 disable
  set interfaces ge-0/0/8 disable
11.) Save on both devices at same time  !! IMPORTANT TO BE DONE AT THE SAME TIME !!
  commit
12.) Verify that node1 has correctly taken over as master (if input increasing on monitor command, it has taken over)
  show security flow session summary
  run monitor interface traffic
13.) On node 0:
  request system software add /var/tmp/junos-srx1k3k-10.4R3.4-domestic.tgz no-validate no-copy
  request system reboot
14.) On node 0, after upgrade:
  show version
  show chassis cluster status
  show chassis fpc pic-status
15.) Wait for all interfaces to come "online" after "show chassis fpc pic-status" command
16.) Node 1 then Node 0 (this will failover so node0 is now master again)
  delete interfaces ge-0/0/0 disable
  delete interfaces ge-0/0/1 disable
  delete interfaces ge-0/0/2 disable
  delete interfaces ge-0/0/3 disable
  delete interfaces ge-0/0/4 disable
  delete interfaces ge-0/0/5 disable
  delete interfaces ge-0/0/6 disable
  delete interfaces ge-0/0/7 disable
  delete interfaces ge-0/0/8 disable
  set interfaces ge-8/0/0 disable
  set interfaces ge-8/0/1 disable
  set interfaces ge-8/0/2 disable
  set interfaces ge-8/0/3 disable
  set interfaces ge-8/0/4 disable
  set interfaces ge-8/0/5 disable
  set interfaces ge-8/0/6 disable
  set interfaces ge-8/0/7 disable
  set interfaces ge-8/0/8 disable
17.) Save on both devices at same time
  committ
18.) Reconnect control plane cable
19.) Veryify node0 is primary
  run show chassis cluster status
20.) Reboot Node1 and connect fab# interface cables between nodes while device is rebooting
21.) Verify node0 is still passing traffic
  run monitor interface traffic
22.) Wait for all interfaces to come "online"
  show chassis fpc pic-status
23.) Verify group 2 failover shows priority
24.) Re-enable interfaces on node1 and check for proper tcp sequence checks (run on node0, commit will replicate to node1)
  delete interfaces ge-8/0/0 disable
  delete interfaces ge-8/0/1 disable
  delete interfaces ge-8/0/2 disable
  delete interfaces ge-8/0/3 disable
  delete interfaces ge-8/0/4 disable
  delete interfaces ge-8/0/5 disable
  delete interfaces ge-8/0/6 disable
  delete interfaces ge-8/0/7 disable
  delete interfaces ge-8/0/8 disable

  delete security flow tcp-session no-syn-check
  delete security flow tcp-session no-sequence-check
25.) commit
26.) Verify failover group (group 0 and 1 should show primary or secondary and priorities)
  run show chassis cluster status
    26.a) If group 2 is not showing with priorities and status on node1 is "disabled", another reboot may be necessary. This is related to the fab# interfaces
    26.b) When node1 comes back online, verify fab interfaces are showing up and give a minute or 2 for "show chassis cluster status" to show priorities and status
    26.c) May take time due to sessions being synchronized
27.) Run on node0 to download and install IDP updates if needed. Status is for verifying progress of download or install
  run request security idp security-package download full-update
  run request security idp security-package download status

  run request security idp security-package install
  run request security idp security-package install status
28.) Verify versions match on both nodes and verify they are up to date
  run show security idp security-package-versionrun show
  run request security idp security-package download check-server
    28.a) Failover may be required to download IDP if no internet access on node0 (Per Juniper) or versions do not match