The Junos Operating System is turning 20 next year (the First Release was July 7, 1998 according to Wikipedia) and still to this day it includes what is, in my opinion, the most powerful command line interface available on a networking device.

Running Junos

But with the current trend in networking to move towards fully automated, API-driven, SDN-orchestrated, Jinja2 templated nirvana, you’d be forgiven for thinking that switches will all ship with no CLI at all.

I’m a bit more of a realist and while I’m completely onboard with the benefits of network automation, I (along with what I suspect are most other Network Engineers out there) also don’t want to have to punch out 400 lines of J2 template code every time I need to test out a new feature, or re-create a customer issue in the lab.

So to start the ball rolling in 2018 with a very unfashionable topic, I’ve decided to put together a list of lesser-known Junos CLI magic that I’ve picked up over the years from other engineers, customers and pure serendipity that might help you.

Protecting your assets

In the lab environment, or anywhere you are doing large scale automation it’s always best to err on the side of caution when making configuration changes.

To this end, I like to use the protect feature in Junos to lock specific sections of configuration that I don’t want accidentally (or deliberately) deleted - specifically management IPs, routes and last-resort local logins.

To implement protect, simply single out the sections of configuration you want to lock like so:

protect system login user backdoor
protect routing-options static route 0.0.0.0/0
protect interfaces fxp0

Now when you commit that configuration these sections of the configuration will be locked - observe:

{master:0}[edit]
root@qfx5100-1# delete
This will delete the entire configuration
Delete everything under this level? [yes,no] (no) yes

warning: [system login user backdoor] is protected, 'system login user maintenance' cannot be deleted
warning: [interfaces em0] is protected, 'interfaces em0' cannot be deleted
warning: [routing-options static route 0.0.0.0/0] is protected, 'routing-options static route 0.0.0.0/0' cannot be deleted

In order to delete or change these statements now, you need to unprotect them, commit, and then delete them and commit again:

unprotect system login user backdoor
unprotect routing-options static route 0.0.0.0/0
unprotect interfaces fxp0
commit
delete system login user backdoor
commit

This should slow down the annihilation when your PyEZ automation script becomes sentient!

Parse like a Boss

Junos implements quite a few UNIX commands in the shell that are exceptionally useful when parsing log or configuration files.

For example: if you’re browsing through a configuration file and want to quickly move to a specific section of it, you can use / to perform a forward regex search eg: type show configuration and then /interfaces.

In the same manner, you can search backwards for a section you wish to return to using ? to perform a reverse regex match eg: ?system will take you back to the system stanza (or to the first match of the string system).

When it comes to logging, we all know how frustrating it is having to troubleshoot issues sifting through on-box syslog files that are filled with hundreds of unrelated event messages.

Well, along with the forward and backward regex matches described above, you can also update match and exclude filters in real-time to rapidly zoom in on precisely the entries you care about.

Here’s an example with a very verbose flow trace file on an SRX:

bdale@0ffnet-srx210-gw> show log FLOW
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:flow process pak, mbuf 0x42e06780, ifl 0, ctxt_type 0 inq type 5

Jun 29 15:39:38 15:39:38.221829:CID-0:RT: in_ifp <junos-host:.local..0>

Jun 29 15:39:38 15:39:38.221829:CID-0:RT:flow_process_pkt_exception: setting rtt in lpak to 0x495f1fe8

Jun 29 15:39:38 15:39:38.221829:CID-0:RT:host inq check inq_type 0x5

Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Using vr id from pfe_tag with value= 0

Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Changing lpak->in_ifp from:.local..0 -> to:.local..0

Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Over-riding lpak->vsys with 0

Jun 29 15:39:38 15:39:38.221829:CID-0:RT:  .local..0:172.16.10.254/22->172.16.10.22/56060, tcp, flag 18

Jun 29 15:39:38 15:39:38.221829:CID-0:RT: find flow: table 0x48924c18, hash 20942(0xffff), sa 172.16.10.254, da 172.16.10.22, sp 22, dp 56060, proto 6, tok 2

Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Found: session id 0x188d. sess tok 2

Jun 29 15:39:38 15:39:38.221829:CID-0:RT:  flow got session.

Jun 29 15:39:38 15:39:38.221829:CID-0:RT:  flow session id 6285

Firstly, it’s all double-spaced which is hard to read, so let’s just focus on lines with Jun in them:

m Jun
Match for: Jun
Jun 29 15:39:38 15:39:38.221829:CID-0:RT: in_ifp <junos-host:.local..0>
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:flow_process_pkt_exception: setting rtt in lpak to 0x495f1fe8
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:host inq check inq_type 0x5
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Using vr id from pfe_tag with value= 0
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Changing lpak->in_ifp from:.local..0 -> to:.local..0
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Over-riding lpak->vsys with 0
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:  .local..0:172.16.10.254/22->172.16.10.22/56060, tcp, flag 18
Jun 29 15:39:38 15:39:38.221829:CID-0:RT: find flow: table 0x48924c18, hash 20942(0xffff), sa 172.16.10.254, da 172.16.10.22, sp 22, dp 56060, proto 6, tok 2
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Found: session id 0x188d. sess tok 2
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:  flow got session.
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:  flow session id 6285
Jun 29 15:39:38 15:39:38.221829:CID-0:RT: vector bits 0x2 vector 0x45ace0a0
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:mbuf 0x42e06780, exit nh 0x1f0010
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:flow_process_pkt_exception: Freeing lpak 0x489a9ad0 associated with mbuf 0x42e06780
Jun 29 15:39:38 15:39:38.221829:CID-0:RT: ----- flow_process_pkt rc 0x0 (fp rc 0)
Jun 29 15:39:38 15:39:38.222598:CID-0:RT:<172.16.10.22/56060->172.16.10.254/22;6> matched filter ALL:
Jun 29 15:39:38 15:39:38.222598:CID-0:RT:packet [52] ipid = 26605, @0x4232c51e
Jun 29 15:39:38 15:39:38.222598:CID-0:RT:---- flow_process_pkt: (thd 1): flow_ctxt type 15, common flag 0x0, mbuf 0x4232c300, rtbl_idx = 0
Jun 29 15:39:38 15:39:38.222598:CID-0:RT: flow process pak fast ifl 74 in_ifp vlan.10
Jun 29 15:39:38 15:39:38.222598:CID-0:RT:  vlan.10:172.16.10.22/56060->172.16.10.254/22, tcp, flag 10
Jun 29 15:39:38 15:39:38.222598:CID-0:RT: find flow: table 0x48924c18, hash 42742(0xffff), sa 172.16.10.22, da 172.16.10.254, sp 56060, dp 22, proto 6, tok 6

Now, let’s start culling entries that don’t appear to be useful like the ones mentioning vector and mbuf:

e vector
e mbuf
Match except (mbuf): vector
Jun 29 15:39:38 15:39:38.221829:CID-0:RT: in_ifp <junos-host:.local..0>
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:flow_process_pkt_exception: setting rtt in lpak to 0x495f1fe8
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:host inq check inq_type 0x5
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Using vr id from pfe_tag with value= 0
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Changing lpak->in_ifp from:.local..0 -> to:.local..0
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Over-riding lpak->vsys with 0
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:  .local..0:172.16.10.254/22->172.16.10.22/56060, tcp, flag 18
Jun 29 15:39:38 15:39:38.221829:CID-0:RT: find flow: table 0x48924c18, hash 20942(0xffff), sa 172.16.10.254, da 172.16.10.22, sp 22, dp 56060, proto 6, tok 2
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:Found: session id 0x188d. sess tok 2
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:  flow got session.
Jun 29 15:39:38 15:39:38.221829:CID-0:RT:  flow session id 6285
Jun 29 15:39:38 15:39:38.221829:CID-0:RT: ----- flow_process_pkt rc 0x0 (fp rc 0)
Jun 29 15:39:38 15:39:38.222598:CID-0:RT:<172.16.10.22/56060->172.16.10.254/22;6> matched filter ALL:
Jun 29 15:39:38 15:39:38.222598:CID-0:RT:packet [52] ipid = 26605, @0x4232c51e
Jun 29 15:39:38 15:39:38.222598:CID-0:RT: flow process pak fast ifl 74 in_ifp vlan.10
Jun 29 15:39:38 15:39:38.222598:CID-0:RT:  vlan.10:172.16.10.22/56060->172.16.10.254/22, tcp, flag 10
Jun 29 15:39:38 15:39:38.222598:CID-0:RT: find flow: table 0x48924c18, hash 42742(0xffff), sa 172.16.10.22, da 172.16.10.254, sp 56060, dp 22, proto 6, tok 6

As you can see - as you add more matches and exceptions, your logs will gradually be pared down to more useful information.

As with most Junos commands, you can chain these commands using pipes as well and in real-time:

monitor start messages | except alarmd | except mgd

will give you realtime log output from all daemons except alarmd and mgd.

And finally, don’t forget about </code> or reverse search in the CLI.

This allows you to do a quick regex search of your CLI history so you can find that long command you entered earlier without having to press up arrow 47 times!

{master:0} <ctrl-r>
(history search) '172': show route protocol static table inet.0 172.16.6.0/26 

Realtime location tracking

Well, sort of… Did you know that tucked away inside Junos is everybody’s favourite real-time traceroute implementation mtr?

It’s a fairly old version (v0.69) but the fact that you can run this from inside Junos is sometimes quite useful for looking at any instantaneous packet loss along a routed path.

To execute simply

traceroute monitor 8.8.8.8

and you’ll see some output like the following, updating in real-time.

                             My traceroute  [v0.69]
qfx5100-1 (0.0.0.0)(tos=0x0 psize=64 bitpattern=0x00)  Mon Jan 22 20:50:38 2018
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                              Packets               Pings
 Host                                       Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. vlan9.axs1.comlinx.lab                   0.0%     7    1.9   2.5   1.9   2.9   0.4
 2. vlan10.core1.comlinx.lab                 0.0%     7    3.1   3.5   2.5   6.5   1.4
 3. ge-0-0-1.62rob.bne.comlinx.com.au        0.0%     7    1.0   1.3   0.9   2.7   0.6
 4. 10.1.254.5                               0.0%     7    1.6   1.5   1.5   1.6   0.0
 5. 10.1.254.6                               0.0%     7    2.1   1.9   1.8   2.1   0.1
 6. ge-0-0-4.100wic.bne.comlinx.com.au       0.0%     7    1.7   1.6   1.5   1.7   0.1
 7. gen-xxx-xxx-xxx-xxx.ptr4.otw.net.au      0.0%     7    2.2   3.2   2.2   8.1   2.2
 8. gexxx.xx.pe2.100wic.bne.core.otw.net.au  0.0%     7    2.5   2.5   2.2   2.9   0.2
 9. as15169.nsw.ix.asn.au                    0.0%     7   14.3  14.6  14.2  16.7   0.9
10. 108.170.247.65                           0.0%     7   15.0  14.9  14.7  15.0   0.1
11. 209.85.247.157                           0.0%     7   15.4  15.4  15.2  15.9   0.3
12. google-public-dns-a.google.com           0.0%     7   14.4  14.4  14.3  14.4   0.0

Be advised that traceroute monitor is only available from within inet.0 though, which does limit it’s usefulness - if anyone from Juniper engineering is reading this - I’d love to see a routing-instance option added!

More Refreshments

And speaking of refreshing, another feature that doesn’t get much attention is the refresh action that you can pipe commands to.

What this does is re-run your previous command at the interval you specify, and time-stamp the output for you.

 show route 0.0.0.0/0 | refresh 5

{master:0}
root@qfx5100-1> show route 172.16.6.0/26 | refresh 5
---(refreshed at 2018-01-22 21:18:27 EST)---

inet.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

172.16.6.0/26      *[Static/200] 00:00:27
                    > to 10.0.9.1 via em0.0
---(refreshed at 2018-01-22 21:18:32 EST)---
inet.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

172.16.6.0/26      *[Static/200] 00:00:32
                    > to 10.0.9.1 via em0.0
---(refreshed at 2018-01-22 21:18:37 EST)---
inet.0: 8 destinations, 9 routes (8 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

172.16.6.0/26      *[OSPF/150] 00:00:05, metric 0, tag 0
                    > to 192.168.100.1 via xe-0/0/51:0.0
                    [Static/200] 00:00:39
                    > to 10.0.9.1 via em0.0

This is super helpful when troubleshooting intermittent issues - the results can even be piped to a file. Much better than standing by your console over night pressing Up-Arrow / Enter, and much less dangerous than setting up a drinking bird in your Data Centre.

Drinking Bird

And now for breadcrumbs

Another relatively unknown feature is configuration-breadcrumbs. This appeared around Junos 12.2 and displays the configuration hierarchy (breadcrumbs) of the currently displayed configuration output:

This is super useful if you’re examining a large configuration file on box and need to keep track of which stanza you’re under (especially subscriber detail on a BRAS, or deep configuration under a routing-instance)

{master:0}
bdale@qfx5100-2> show configuration
…
routing-instances {
    CRUMMY {
        instance-type virtual-router;
        routing-options {
            autonomous-system 65500;
        }
        protocols {
            bgp {
                group UPSTREAM {
                    neighbor 192.168.5.4 {
                        export LOOPBACK;
---(more 97%)---[routing-instances CRUMMY protocols bgp group UPSTREAM neighbor 192.168.5.4]---

To enable, you need to configure it under your login class:

 set system login class CRUMBS permissions all
 set system login class CRUMBS configuration-breadcrumbs

and then log back in again.

Are you committed?

This is one that I was shown by a customer very recently:

When you perform a commit confirmed the configuration will be applied and then Junos will wait for a follow-up commit before it removes the rollback timer.

On clustered Junos deployments such as EX virtual chassis, or Branch SRX Chassis Clusters with large configurations, running this follow-up commit can take quite a bit of time as your configuration is re-checked and applied against each VC member, even though nothing is changing.

It is especially daunting when you commit confirmed for only a minute or so, and then realise your testing will actually take up all of this time.

It turns out that by running a commit check instead, Junos will re-validate the configuration (against the REs only) and then remove the rollback timer, saving what might be precious seconds or minutes during a change window.

That’s just the Tip of it

Well, that’s about it for this post - hopefully you’ve picked up at least one new trick that will make your Junos CLI-fu that bit stronger.

For even more CLI tips, Junos includes a whole bunch built in - type

help tip cli

for a random one, or

help tip cli [0-98]

to read through them all independently - you can even give console/SSH users a tip of the day by adding the

set system login class helpful login-tip

to their login class.

Long live the CLI!

I’ve been building and deploying Juniper SRX Firewall clusters for a good 6 years now and even managed to pick up a JNCIE-SEC along the way, but last week I stumbled across an interesting configuration feature when using LACP and Reth interfaces that I’d never seen documented before.

Let’s start with a quick primer on SRX Redundant Ethernet (Reth) Interfaces and LACP:

Firstly, one or more physical ports from each SRX chassis-cluster node are assigned to a Reth interface:

ge-0/0/4 {
    gigether-options {
        redundant-parent reth4;
    }
}
ge-0/0/5 {
    gigether-options {
        redundant-parent reth4;
    }
}
ge-5/0/4 {
    gigether-options {
        redundant-parent reth4;
    }
}
ge-5/0/5 {
    gigether-options {
        redundant-parent reth4;
    }
}

Under the reth interface we configure LACP:

reth4 {
    redundant-ether-options {
        redundancy-group 4;
        minimum-links 1;
        lacp {
            active;
        }
    }
    unit 0 {
        description INTERNET;
        family inet {
            address 203.24.22.50/29;
        }
    }
}

Behind the scenes this creates two distinct LACP sub-LAGs, one from each physical SRX node to the downstream device:

SRX Reth LACP Topology

The

1
minimum-links
option is so that each LACP sub-LAG is considered to be up while ever there is still at least one port active.

Finally we assign the Reth to a redundancy-group that specifies the “Primary” node on which interfaces in the Reth will be active by way of the

1
priority
(higher being more preferred), and optionally
1
preempt
fails the RG back to the primary node when it becomes available.

redundancy-group 4 {
    node 0 priority 100;
    node 1 priority 50;
    preempt;
}

Something that isn’t immediately obvious to newcomers is that given the above configuration, if both ge-0/0/4 and ge-0/0/5 are unplugged from the primary node, the minimum-links threshold will be crossed and the reth on the primary node will go down, however the redundancy-group will NOT fail over:

bdale@srx-lab-fw1# run show interfaces terse ge-[05]/0/[45]
Interface               Admin Link Proto    Local                 Remote
ge-0/0/4                up    down
ge-0/0/5                up    down
ge-5/0/4                up    up
ge-5/0/5                up    up


bdale@srx-lab-fw1# run show lacp interfaces
Aggregated interface: reth4
    LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
      ge-0/0/4       Actor    No   Yes    No   No   No   Yes     Fast    Active
      ge-0/0/4     Partner    No   Yes    No   No   No   Yes     Fast   Passive
      ge-0/0/5       Actor    No   Yes    No   No   No   Yes     Fast    Active
      ge-0/0/5     Partner    No   Yes    No   No   No   Yes     Fast   Passive
      ge-5/0/4       Actor    No   Yes    No   No   No   Yes     Fast    Active
      ge-5/0/4     Partner    No   Yes    No   No   No   Yes     Fast   Passive
      ge-5/0/5       Actor    No   Yes    No   No   No   Yes     Fast    Active
      ge-5/0/5     Partner    No   Yes    No   No   No   Yes     Fast   Passive
    LACP protocol:        Receive State  Transmit State          Mux State 
      ge-0/0/4            Port disabled    No periodic           Detached
      ge-0/0/5            Port disabled    No periodic           Detached
      ge-5/0/4            Current         Fast periodic          Collecting Distributing
      ge-5/0/5            Current         Fast periodic          Collecting Distributing


bdale@srx-lab-fw1# run show chassis cluster interfaces
....
Redundant-ethernet Information:     
    Name         Status      Redundancy-group
    reth0        Down        Not configured   
    reth1        Down        Not configured   
    reth2        Down        Not configured   
    reth3        Down        Not configured   
    reth4        Down        4                
    reth5        Up          5                
    reth6        Up          6                
    reth7        Down        Not configured   
...

bdale@srx-lab-fw1# run show chassis cluster status
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 0
    node0                   100         primary        no       no  
    node1                   50          secondary      no       no  

Redundancy group: 4 , Failover count: 0
    node0                   100         primary        yes      no  
    node1                   50          secondary      yes      no  

Redundancy group: 5 , Failover count: 0
    node0                   0           primary        yes      no  
    node1                   0           secondary      yes      no  

Redundancy group: 6 , Failover count: 0
    node0                   0           primary        yes      no  
    node1                   0           secondary      yes      no  

I will digress here for a moment and say that to this day, I still can’t think of a single reason why this behaviour is ever desirable. If you’ve got a topology that benefits somehow from completely losing a reth without failing over, I want to know about it! :)

Each redundancy group has an in-built Threshold counter which determines when fail-over to the secondary node will occur - this value is set to be 255 under normal conditions.

Looking at our redundancy-group again, we now add in the

1
interface-monitor
statements, which specify a
1
weight
against physical interfaces.

redundancy-group 4 {
    node 0 priority 100;
    node 1 priority 50;
    preempt;
    interface-monitor {
        ge-0/0/4 weight 128;
        ge-0/0/5 weight 128;
        ge-5/0/4 weight 128;
        ge-5/0/5 weight 128;
    }
}

Now whenever any of the four interfaces listed above goes down, their

1
weight
will be subtracted from the Redundancy-group threshold; when this threshold reaches 0, the redundancy-group will fail over and activate interfaces associated with reths associated with this redundancy group on the secondary node.

It should be noted that the physical interfaces being monitored don’t have to be members of a Reth interface associated with this redundancy-group.

You can see the results of this using the hidden-until-recently command

1
show chassis cluster information

bdale@srx-lab-fw1# run show chassis cluster information 
node0:
--------------------------------------------------------------------------
Redundancy mode:
    Configured mode: active-active
    Operational mode: active-active

Redundancy group: 0, Threshold: 255, Monitoring failures: none
    Events:
        Jun 16 17:28:31.245 : hold->secondary, reason: Hold timer expired
        Jun 16 17:28:32.093 : secondary->primary, reason: Better priority (1/1)

Redundancy group: 4, Threshold: 255, Monitoring failures: none
    Events:
        Jun 18 03:28:54.773 : hold->secondary, reason: Hold timer expired
        Jun 18 03:28:54.809 : secondary->primary, reason: Remote yield (0/0)

In the example above, I’ve deliberately used a

1
weight
of 128 so that a single link loss will not cause fail-over, instead requiring that both links to a node go down before failing over - this configuration achieves the same thing as configuring
1
minimum-links 1
in the LACP bundle, except that it actually causes the fail-over to occur in a redundancy-group.

This seems somewhat (wait for it…) redundant to me.

A better way

What I recently discovered however was that you can configure interface-monitor to monitor the Reth interface instead of the physical links that make it up eg:

redundancy-group 4 {
    node 0 priority 100;
    node 1 priority 50;
    preempt;
    interface-monitor {
        reth4 weight 255;
    }
}

With this deployed, if the LACP sub-LAG bundle falls below

1
minimum-links
, it is taken down as before, but now
1
interface-monitor
will detect this and fail the redundancy-group over.

This also means that if you decide to scale the number of LACP ports up or down in the future, you don’t have to fiddle with interface-monitor weights.

As an added bonus, the interface-monitor now has a dependency on the downstream device to be an active LACP participant, rather than just monitor physical link status - think of it as free BFD!