The NMState Operator and Policy based routes

I recently needed to have containers egress through a different public IP due to some issues with some archiving of YouTube videos (They don’t seem to want you to do that for some reason).

I’m always looking for an excuse to put on my network administrator hat, so I grabbed some coffee and got to work. What resulted was perhaps the craziest Kubernetes and networking rabbit hole of have been down in a while. I won’t spoil the actual solution (that will be for a future post), but I did learn a neat trick with the NMState operator after some searching and some reverse engineering.

The NMState operator configures networking using NetworkManager on Kubernetes clusters. It is declarative, and allows the configuration of interfaces, sub-interfaces and bridges. This cuts down on management of the actual Linux endpoints, and is all but required for immutable distros like CoreOS (which was my first exposure to the operator).

For example, if I want to configure a VLAN sub-interface and create a bridge interface that is suitable for virtual machines, I can apply the following:

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: example-config
spec:
  desiredState:
    interfaces:
    - name: enp72s0f3.10
      description: VLAN 10 on enp72s0f3
      type: vlan
      state: up
      vlan:
        id: 10
        base-iface: enp72s0f3


    - name: br10
      description: Linux bridge with enp72s0f3.10 as a port
      type: linux-bridge
      state: up
      ipv4:
        enabled: true
        dhcp: false
        address:
        - ip: 10.0.10.10
          prefix-length: 24
      bridge:
        options:
          stp:
            enabled: false
        port:
        - name: enp72s0f3.10

The above creates a VLAN sub-interface on enp72s0f3 and created a new bridge interface called br10. What results is exactly what we would expect if we created a VLAN interface and bridge interface without all of the pesky nmtui, ansible, or tool-of-choice.

server ~ > ip addr show enp72s0f3.10
444: enp72s0f3.10@enp72s0f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br10 state UP group default qlen 1000
    link/ether f4:ce:46:a5:c0:d3 brd ff:ff:ff:ff:ff:ff

Policy-Based Routing

Now, for my initial problem, I needed to send packets down a different route, not based on their destination, but based on their source (or other criteria). This, in essence, is what policy-based routing is for. Linux allows us to create separate routing tables and assign interfaces to them.

Keep in mind that these are separate routing tables, so all of the things one takes for granted when it comes to having them populated on a Linux system does not exist, just take a look at what I need to do on my firewall to create and assign a routing table:

    ip route add default via ${WAN1_GW} dev ${WAN1_IF} table lab
    ip route add ${CLIENT_NET} dev ${CLIENT_IF} table lab
    ip route add ${STORAGE_NET} dev ${STORAGE_IF} table lab
    ip route add ${K8SPROD_NET} dev ${K8SPROD_IF} table lab
    ip route add ${VMPROD_NET} dev ${VMPROD_IF} table lab
    ip route add ${VMLAB_NET} dev ${VMLAB_IF} table lab
    ip route add ${CLIENT2_NET} dev ${CLIENT2_IF} table lab
    ip route add ${IOT_NET} dev ${IOT_IF} table lab
    ip route add ${GUEST_NET} dev ${GUEST_IF} table lab
    ip route add ${ADM_NET} dev ${ADM_IF} table lab
    ip route add ${VPN_NET} dev ${VPN_IF} table lab
    ip route add ${VMLAB2_NET} dev ${VMLAB2_IF} table lab
    ip route add ${SECURE_CLIENT_NET} dev ${SECURE_CLIENT_IF} table lab


    ip rule add from ${VMLAB_NET} dev ${VMLAB_IF} lookup lab priority 100
    ip rule add from ${VMLAB2_NET} dev ${VMLAB2_IF} lookup lab priority 100

A routing table is a series of routes, and a set of rules that match what will use the above routing table. The the case of the above, any traffic from the ${VMLAB_NET} and ${VMLAB2_NET} will use the lab routing interface. This allows the lab traffic to exit a different default gateway than the rest of my traffic. With the proper NAT rules, my lab traffic will now leave using a different interface and gateway on the firewall.

Note in the above example from my firewall, I needed to add all of the routes, even if they were directly connected.

NMState Operator and Policy-based routes

We can achieve the same result using the NMstate operator. Admittedly, this has less value on a Kubernetes node, but when combined with Cilium’s egress policies, we can really do some cool stuff.

Here is the example from my NodeNetworkConfigurationPolicy:

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: example-config
spec:
  desiredState:
...
    routes:
      config:
      - destination: 0.0.0.0/0
        next-hop-address: 10.0.10.1
        next-hop-interface: br10
        table-id: 101
      - destination: 10.0.10.0/24
        next-hop-interface: br10
        table-id: 101        
    route-rules:
      config:
      - ip-from: 10.0.10.10/32
        route-table: 101

The above creates a new route table with an ID of 101 and adds a couple of routes. It also creates a rule matching any traffic that originates from the 10.0.10.10 IP address, which is assigned to an interface.

The net result is the following:

server ~ > ip route show table 101
default via 10.0.10.1 dev br10 proto static 
10.0.10.0/24 dev br10 proto static scope link 

server ~ > ip rule show
...
30000:     from 10.0.10.10 iif br10 lookup 101 proto static
...

server ~ > curl ifconfig.me
131.191.104.138
server ~ > curl --interface br10 ifconfig.me
131.191.55.228

The path my packets take depends on the source interface they were assigned.

The other valid fields from the api are:

  • family
  • state
  • ip-from
  • ip-to
  • priority
  • route-table
  • fwmark
  • fwmask
  • action
  • iif
  • suppress-prefix-length
  • suppress_prefixlength

I have been fighting with the egress policy, but I figured a quick post to document how to create route rules would not hurt, as I didn’t find much in the way of documentation.

I’m still investigating, so drop me a line if you have an addition or correction.

Leave a Reply

Your email address will not be published. Required fields are marked *