I’ve been building and deploying Juniper SRX Firewall clusters for a good 6 years now and even managed to pick up a JNCIE-SEC along the way, but last week I stumbled across an interesting configuration feature when using LACP and Reth interfaces that I’d never seen documented before.
Let’s start with a quick primer on SRX Redundant Ethernet (Reth) Interfaces and LACP:
Firstly, one or more physical ports from each SRX chassis-cluster node are assigned to a Reth interface:
Under the reth interface we configure LACP:
Behind the scenes this creates two distinct LACP sub-LAGs, one from each physical SRX node to the downstream device:
option is so that each LACP sub-LAG is considered to be up while ever there is still at least one port active.
Finally we assign the Reth to a redundancy-group that specifies the “Primary” node on which interfaces in the Reth will be active by way of the
(higher being more preferred), and optionally
fails the RG back to the primary node when it becomes available.
Something that isn’t immediately obvious to newcomers is that given the above configuration, if both ge-0/0/4 and ge-0/0/5 are unplugged from the primary node, the minimum-links threshold will be crossed and the reth on the primary node will go down, however the redundancy-group will NOT fail over:
I will digress here for a moment and say that to this day, I still can’t think of a single reason why this behaviour is ever desirable. If you’ve got a topology that benefits somehow from completely losing a reth without failing over, I want to know about it! :)
Each redundancy group has an in-built Threshold counter which determines when fail-over to the secondary node will occur - this value is set to be 255 under normal conditions.
Looking at our redundancy-group again, we now add in the
statements, which specify a
against physical interfaces.
Now whenever any of the four interfaces listed above goes down, their
will be subtracted from the Redundancy-group threshold; when this threshold reaches 0, the redundancy-group will fail over and activate interfaces associated with reths associated with this redundancy group on the secondary node.
It should be noted that the physical interfaces being monitored don’t have to be members of a Reth interface associated with this redundancy-group.
You can see the results of this using the hidden-until-recently command
show chassis cluster information
In the example above, I’ve deliberately used a
of 128 so that a single link loss will not cause fail-over, instead requiring that both links to a node go down before failing over - this configuration achieves the same thing as configuring
in the LACP bundle, except that it actually causes the fail-over to occur in a redundancy-group.
This seems somewhat (wait for it…) redundant to me.
A better way
What I recently discovered however was that you can configure interface-monitor to monitor the Reth interface instead of the physical links that make it up eg:
With this deployed, if the LACP sub-LAG bundle falls below
, it is taken down as before, but now
will detect this and fail the redundancy-group over.
This also means that if you decide to scale the number of LACP ports up or down in the future, you don’t have to fiddle with interface-monitor weights.
As an added bonus, the interface-monitor now has a dependency on the downstream device to be an active LACP participant, rather than just monitor physical link status - think of it as free BFD!