SD-WAN

 View Only
Expand all | Collapse all

VRRP on SD-Branch

This thread has been viewed 36 times
  • 1.  VRRP on SD-Branch

    Posted Jan 26, 2026 09:27 PM

    I am playing around with the SD-Branch. 

    On my LAN I use a port-channel that i set as Untrusted

    All VLANs on this port-channel also are set as Untrusted

    I created a AAA-Profile + Role + Policy (access-list)

    When configuring VRRP I noticed that the two VRRP peers were unable to exchange packets

    Permitting protocol 112 did not work

    Eventually I noticed that enabling TCP ports 9190 and 9199 worked

    I cannot recall seeing this anyway?

    Is there  documentation descibing the best practices of ports to enable on the LAN side in a Redundant setup?



    ------------------------------
    Martijn van Overbeek
    Architect, Netcraftsmen a BlueAlly Company
    ------------------------------


  • 2.  RE: VRRP on SD-Branch

    Posted Jan 26, 2026 10:11 PM
    Hi Martijn,

    We have no dependency on the TCP ports you listed. And VRRP doesn't use it either.  Protocol 112 is correct and needs to be permitted.  I think there must be something else going on.

    Keith






  • 3.  RE: VRRP on SD-Branch

    Posted Jan 27, 2026 10:27 AM

    Thanks Keith

    Yes there  are definitely some unexplained things happening in summary:

    • DNS issues when using a local DNS server (other hosts in the same network are reachable)
    • The secondary Gateway is unreachable when connected through the primary gateway

    In more detail:

    DNS server:

    A packet capture shows back and forth ICMP and DNS between a host connected through the gateways but both ICMP and DNS do not appear to function properly. This only happens with the DNS server other hosts in the same network for example do respond to ICMP. My ACLs are network based.

    Below an example of what I see on Windows with an NSLOOKUP:

    >nslookup cppm.vanoverbeek.net
    DNS request timed out.
        timeout was 2 seconds.
    Server:  UnKnown
    Address:  192.168.25.251

    DNS request timed out.
        timeout was 2 seconds.
    DNS request timed out.
        timeout was 2 seconds.
    DNS request timed out.
        timeout was 2 seconds.
    DNS request timed out.
        timeout was 2 seconds.
    *** Request to UnKnown timed-out

    When I bypass the branch gateway and use my normal network it looks like this:

    C:\Users\MartijnVanOverbeek>nslookup cppm.vanoverbeek.net
    Server:  nas.vanoverbeek.net
    Address:  192.168.25.251

    Name:    cppm.vanoverbeek.net
    Address:  192.168.25.240

    I am pretty sure that it is not a security setting on the DNS server because when I create an SVI in the same VLAN I use for the Gateway but it bypasses the gateways at least ICMP works fine, which makes me think it must be something on the gateway.

    More detail on the Branch Gateway:

    When I am connected either through Wireless tunneled SSID or Wired User Based Tunnel, I am unable to ping the LAN IP address of the secondary Branch Gateway. The primary gateway, and the VRRP address respond fine but the secondary gateway is not reachable. The secondary gateway is reachable through other networks that not rely on the gateway.

    Any ideas what this could be?



    ------------------------------
    Martijn van Overbeek
    Architect, Netcraftsmen a BlueAlly Company
    ------------------------------



  • 4.  RE: VRRP on SD-Branch

    Posted Jan 27, 2026 11:57 AM
    That definitely sounds strange.  I would recommend double checking, triple checking the gateway configurations at their respective device levels.
      
    Things can be set in the UI and not either get pushed or be overridden by other configuration.  I have to think that you would find some differences between the two gateways.  Another place to check would be at the both the group and device level under Config mode would be the Config Audit tab.

    I can also confirm that yes, your clients should 100% be able oping not on the VRRP VIP, but both local addresses of the two gateways.

    Let me also ask what version of code are you running?

    Cheers,
    Keith

    Keith Mataranglo

    Technical Marketing Engineer

    +1 (405) 590 4266

    Image






  • 5.  RE: VRRP on SD-Branch

    Posted Jan 27, 2026 09:55 AM

    I get what you are testing based on other threads of communication. Untrusted ports has worked the same since I can remember back on 6.4. I am assuming after enabling AAA and putting all vlan's as untrusted. The gateway of your vrrp peer showed up as a user role? I am not sure what your intentions would be long term. Generally I would steer away from having core services between gateway peers show up in a role. I would generally try to make these vlan's as trusted. If you cant just a simple allowall acl for the "network" role. 

    After spot checking the design guide. The references are for UBT (user based tunneling) on switch to vrrp gateway. 

    https://arubanetworking.hpe.com/techdocs/VSG/docs/070-sd-branch-design/esp-sd-branch-design-120-sdb-branch-design/#l2-and-l3-boundaries

    I can propose a slightly different method if you are trying to use L2 to a switch and not use UBT. This suggestion could be slightly more limited.  Direclty connect both gateways as a trusted port. Have your VRRP heartbeats go directly vs the link to switch. Use spanning tree to shut down the port on the switch from gateway 2. Spanning tree will only enable the port if the primary gateway link goes down. 

    As to general documentation. I personally wouldn't spend my time looking it up. There could be some other traffic purposes that are used if both gateways are in the same site. You could capture a tech-support and look for the ports to see if you can find a source services that is initiating that traffic. 

    If there is question of blocked traffic.  I would check acl ace index you are hitting. You can reference the "session dpi" output to capture that and validate. 

    show datapath session dpi table x.x.x.x (used to get the ace index)
    show acl ace-table all (use your ace index to match acl)
    show acl hits role <name> (goto command for validating)

    -------------------------------------------



  • 6.  RE: VRRP on SD-Branch

    Posted Jan 27, 2026 03:26 PM

    Hi Justin

    Thanks for helping with this, I will do a little more research to see if your suggestion of just trusted VLAN 3 (routing VLAN) is a better option. I might make things easier.

    On Keith's question: Despite having done all configurations at the group level and only details on the gateways themselves I was surprised to see a lot of differences between the devices I couldn't really explain.  I will go through it tonight in more details to at least try to align them. It is a bit worrying because I literally wiped them last week an mainly worked at group mode.



    ------------------------------
    Martijn van Overbeek
    Architect, Netcraftsmen a BlueAlly Company
    ------------------------------



  • 7.  RE: VRRP on SD-Branch

    Posted Jan 27, 2026 04:45 PM
    Edited by mvanoverbeek Jan 27, 2026 05:30 PM

    Things improved which is great, I can now ping the secondary gateway and ssh to it when connected to the Gateway through UBT or Wireless. Thanks for both of your help!I t is a good reminder to not fully rely on the GUI. I am running 10.7.2.2 SSR

    The issue remaining is DNS and the fact that I cannot reach this server (very strange), I will do some more debugging on that end.


    That said there were some remnants from tests I did that I forgot about so I should NOT blame the Central
    I scanned through the configurations after comparing the GUI
    I scanned through the two configuration and did see a few things I did not understand below listed, hopefully one of you can help me understand the importance of these differences:
    I am especially curious about uplink-lb-sys-racl but basically all

    Configuration lines that can only be found on gateway 1 not on gateway 2
    netservice any-v6 255 
    netservice any 0 
    !
    ip access-list session sys-switch-acl 
        any any sys-svc-gre permit 
        any any sys-svc-syslog permit 
        any any sys-svc-snmp permit 
        any any sys-svc-http permit 
        any any sys-svc-https permit 
        user any sys-svc-kerberos-tcp permit 
        user any sys-svc-smb-tcp permit 
        any any sys-svc-snmp-trap permit 
        any any sys-svc-ntp permit 
        user any sys-svc-ftp permit 
        any user sys-svc-telnet deny 
    !
    ip access-list session switch-logon-acl 
        any any any permit 
    !
    ip access-list route uplink-lb-sys-racl 
        any host 44.228.90.242 any route next-hop-list load-balance-ipsecs 
        any network 192.168.26.0 255.255.255.0 any forward 
        any network 10.100.3.0 255.255.255.240 any forward 
    !
    user-role ap-role 
        no openflow-enable 
        access-list session global-sacl << this line
        access-list session apprf-ap-role-sacl << this line
        access-list session ra-guard 
        access-list session control 
        access-list session ap-acl 
        access-list session v6-control 
        access-list session v6-ap-acl 
    user-role sys-switch-role 
        access-list session global-sacl 
        access-list session apprf-sys-switch-role-sacl 
        access-list session sys-switch-acl 
    !
    user-role switch-logon 
        access-list session global-sacl << this line
        access-list session apprf-switch-logon-sacl << this line
        access-list session switch-logon-acl << this line
    !
    user-role guest-logon 
        captive-portal "default" 
        access-list session global-sacl << this line
        access-list session apprf-guest-logon-sacl << this line
        access-list session ra-guard 
        access-list session logon-control 
        access-list session captiveportal 
        access-list session v6-logon-control 
        access-list session captiveportal6 
    !
    user-role logon 
        access-list session global-sacl << this line
        access-list session apprf-logon-sacl << this line
        access-list session ra-guard 
        access-list session logon-control 
        access-list session captiveportal 
        access-list session vpnlogon 
        access-list session v6-logon-control 
        access-list session captiveportal6 

    Hope you can provide more clarity



    ------------------------------
    Martijn van Overbeek
    Architect, Netcraftsmen a BlueAlly Company
    ------------------------------



  • 8.  RE: VRRP on SD-Branch

    Posted Jan 27, 2026 07:38 PM

    The config that seems to be out of sync looks as default out of the box configs. You can check the device to make sure its synced (show swtiches). 

    It cant hurt to start off with a more simplistic design for testing and then layer in features. I have sort of drifted from L2 designs at branches due to frequent changes of default gateway/edge device with multi-vendor designs. If I recall on the 2930F's and per-port-tunneling the clients would get a special AAA profile when tunnel-node was in use. You can check which one they get from "show clients". I would use trusted vlans if I was only supplying a mgmt vlan to use UBT on the switch for all clients. 

    Items you pointed out... Nothing is really mission critical unless something was in the role that didn't exist on the other devcie. Normally I have always created roles I want vs system default roles.

    [global-sacl, apprf-logon-sacl] - These have originated in the code for many years. I have included a link from 8.x 

    https://arubanetworking.hpe.com/techdocs/ArubaOS/AOS_8x_WebHelp/Content/arubaos-solutions/roles-policies/conf-poli-aprf.htm

    [uplink-lb-sys-racl] - A default out of the box policy based route. I generally would create what I need vs use something out of the box incase its influenced with later updates. The IP address in this rule comes from the (show roleinfo) output. There is also no default config applied in the next-hop-list [load-balance-ipsecs]. In order to use the PBR you would need to populate *load-balance-ipsecs* and then apply the PBR to SVI.

    I prefer route based, although at this time load balancing is not something that can be used with route based metrics. Legacy central uses the topology node list and this increments by 10 based on each item in the list. 

    •  show ip oap route (can see metrics)

    Anything in PBR that says to forward will be sent to routing table. You can check PBR hits from (show acl hits). I would suggest validating which PBR or ACL's are in use by the following commands. 

    • show route-access-list (summary of pbr in use, and will show if they are applied to user roles)
    • show ip access-group (show which ACL or PBR are applied; acl can only be applied to trusted vlan/port)

    Next-hop-Lists can provide additional enhancements over traditional PBR rules. 

    -------------------------------------------



  • 9.  RE: VRRP on SD-Branch

    Posted Jan 27, 2026 07:55 PM

    Forgot to mention. If DNS is still not working. Grab the session table while testing a few times. 

    • show datapath session table <dns-server-ip>

    -------------------------------------------



  • 10.  RE: VRRP on SD-Branch

    Posted Jan 27, 2026 08:37 PM
    Edited by mvanoverbeek Jan 28, 2026 08:24 AM

    Hi Justin,

    These were helpful commands, I learned that through show route-access-list that only a PBR I configured was in sue not that uplink-lb-cfg-racl access-list which I can't recall even configuring. 

    Looks like the other ACLs aren't really in use either.

    I still find it strange however that two identical boxes with the same software have configuration differences. The one thing that comes to mind is: I think I configured one while in the default group (this was relating to that other post and the other VLAN for WAN) while I configured the other while being in the branch-gateway group. That must be it!

    I removed the aaa-profile from vlan 3 and changed port-channel and VLAN to trusted, after this change ping to DNS worked again.

    To things I learned:

    Setting up profiles, roles and policies is tedious, complex and most likely not adding much except for a lot of complexity. Especially when I think I can still provide security in the roles itself.

    I probably need to revisit the access-list and trust of VLANs and port-channels again and fully understand how this works

    I had a aaa-routing profile on the VLAN but noticed this did not capture any deny logs from DNS, so it must be something else that caused silent drops. The weird thing that remains is that only one IP out of the subnet was affected, and after removing the Aaa-profile and adding trust this IP was reachable again. I just can't explain that.

    Setup was like this:

                                                                          ==port channel == vlan 3,25,26 ==port channel== Gateway01==svi 26+svi3

    DNS server ==vlan25 ===switch with SVI 25

                                                                          ==port channel == vlan 3,25,26==port channel= Gateway02==svi 26+svi3



    ------------------------------
    Martijn van Overbeek
    Architect, Netcraftsmen a BlueAlly Company
    ------------------------------



  • 11.  RE: VRRP on SD-Branch

    Posted Jan 28, 2026 04:44 PM

    Your welcome!

    Its a hard OS to learn the CLI, although the pro's defiantly are there. 10.x has all the same features and concept's as previous releases; with some added. Each architecture has its focus points although configs still work pretty much the same.  I wouldn't get bent out of shape in regards to default configs that are not used by anything. You can try to move something back to the default group and then move back over to see if it corrects it. 

    Ideally the least amount of changes on design deployment is the intended goal. Apply 95% in group configs. I know you can use a CSV file for bulk config updates in branch gateways. I am not sure what options will exist when New central supports orchestration. There have been some general sync issues I have seen with new/legacy central over the years. The general direction seems to be the same if it happens. I never really had this experience with Mobility Conductor on 8.x. Of course it had its own flaws at times, although has been a large work horse for us over the last 4 years with BOC-VPN. 

    I am not 100% as to the vlan of your clients. If they are on same subnet as dhcp server and you are not using UBT. Your arp and forwarding table would be used on swtich @ L2 vs being directed back to gateways. If UBT or needing to route traffic. Then you would need to look at datapath session table for flags. If you look at datapath session "dpi" table it will tell you the ace index as to which ACL is actually used when the traffic hits. Even with dpi disabled at firewall it will still show you the ace index. There have been times I have had return traffic blocked in an role of a device but not the role of the source. When these times exist you can typically find which ace index it as fault and get it fixed pretty quickly. This used to be a tough spot when I was at customers years ago as I could never identify what was blocking the traffic. Once understood its pretty easy. 

    Also keep in mind, if you do have a PBR used back to datacenter. Make sure your local subnets are configured as "forward" vs route. You dont want to route your local site traffic back to datacenter. This is another reason I dont prefer pbr. Its much easier to troubleshoot and fix route based designs. 

    The routing profile (pbr) is used in parallel to a role (untrusted) or acl (trusted). If you learn the seesion table enough you can pickup traffic that is not returned or routed in PBR. I dont really go by what is in the config and instead I go by what outputs of commands inform me. The only real caveat in session table is when traffic is source-nat, you wont see the response traffic *if you are filtering at source/client ip*. If you look at destination IP you will see the response. There are other ways to go around this but you can identify the source/destination nat that takes place if you use session table in firewalls enough. 

    Its hard to suggest what the fault was that you had with a couple chats and not real time session. Some areas I would focus on..

    • does traffic arrive on gateway (session table)
    • is traffic routed by accident to datacenter (we need local subnets to be forwarded in pbr) (show acl hits)
    • are we traversing subnets, or do we need to look at local arp table, or are we bridging (show arp, show datapath bridge, show route-access-list, show ip access-group)
    • do clients/server show in user table (is port untrusted or UBT used?) (default design is to not permit client traffic if port is untrusted and client is not in user table)
    • are we using nat by accident?
    • does session table show response traffic for packets/bytes? 

    I would generally focus on the session table vs spend time on pcap's. It could be something simple that was not in place which blocked the response traffic or pushed the traffic to a destination you didn't want. You can rule things out by adding allowall acl in the roles that you use. 

    -------------------------------------------



  • 12.  RE: VRRP on SD-Branch

    Posted Jan 29, 2026 08:03 AM

    Thank Justink,

    Those are some good recommendations, I am going to test it back out over the weekend and see where I get with it. 



    ------------------------------
    Martijn van Overbeek
    Architect, Netcraftsmen a BlueAlly Company
    ------------------------------



  • 13.  RE: VRRP on SD-Branch

    Posted Feb 03, 2026 11:34 AM

    I had some info I wanted to just correct on previous statement about AAA Profile. I came across a site of ours with 2930F + per-port tunneled node. It ended up using the same AAA profile as what was applied to default authentication wired profile (show aaa authentication wired). This was on AOS 8.10 although could be slightly different on per-user (as cx does not support per port) or AOS 10.x. My recommendation would be to check the AAA profiles used by clients in the user table vs assume. If its not something that is trying to be achieved then its not applicable. 

    The default is on physical port. If the vlan or port is untrusted then it will use a use a role for every IP address (arp inspect or dhcp inspect) that is plugged into the port. Its best to understand which profiles are used and this way incorrect configs will not be used that may block intended traffic. 

    -------------------------------------------