I recently needed to have containers egress through a different public IP due to some issues with some archiving of YouTube videos (They don’t seem to want you to do that for some reason).
I’m always looking for an excuse to put on my network administrator hat, so I grabbed some coffee and got to work. What resulted was perhaps the craziest Kubernetes and networking rabbit hole of have been down in a while. I won’t spoil the actual solution (that will be for a future post), but I did learn a neat trick with the NMState operator after some searching and some reverse engineering.
The NMState operator configures networking using NetworkManager on Kubernetes clusters. It is declarative, and allows the configuration of interfaces, sub-interfaces and bridges. This cuts down on management of the actual Linux endpoints, and is all but required for immutable distros like CoreOS (which was my first exposure to the operator).
For example, if I want to configure a VLAN sub-interface and create a bridge interface that is suitable for virtual machines, I can apply the following:
apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: example-config spec: desiredState: interfaces: - name: enp72s0f3.10 description: VLAN 10 on enp72s0f3 type: vlan state: up vlan: id: 10 base-iface: enp72s0f3 - name: br10 description: Linux bridge with enp72s0f3.10 as a port type: linux-bridge state: up ipv4: enabled: true dhcp: false address: - ip: 10.0.10.10 prefix-length: 24 bridge: options: stp: enabled: false port: - name: enp72s0f3.10
The above creates a VLAN sub-interface on enp72s0f3
and created a new bridge interface called br10
. What results is exactly what we would expect if we created a VLAN interface and bridge interface without all of the pesky nmtui, ansible, or tool-of-choice.
server ~ > ip addr show enp72s0f3.10 444: enp72s0f3.10@enp72s0f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br10 state UP group default qlen 1000 link/ether f4:ce:46:a5:c0:d3 brd ff:ff:ff:ff:ff:ff
Policy-Based Routing
Now, for my initial problem, I needed to send packets down a different route, not based on their destination, but based on their source (or other criteria). This, in essence, is what policy-based routing is for. Linux allows us to create separate routing tables and assign interfaces to them.
Keep in mind that these are separate routing tables, so all of the things one takes for granted when it comes to having them populated on a Linux system does not exist, just take a look at what I need to do on my firewall to create and assign a routing table:
ip route add default via ${WAN1_GW} dev ${WAN1_IF} table lab ip route add ${CLIENT_NET} dev ${CLIENT_IF} table lab ip route add ${STORAGE_NET} dev ${STORAGE_IF} table lab ip route add ${K8SPROD_NET} dev ${K8SPROD_IF} table lab ip route add ${VMPROD_NET} dev ${VMPROD_IF} table lab ip route add ${VMLAB_NET} dev ${VMLAB_IF} table lab ip route add ${CLIENT2_NET} dev ${CLIENT2_IF} table lab ip route add ${IOT_NET} dev ${IOT_IF} table lab ip route add ${GUEST_NET} dev ${GUEST_IF} table lab ip route add ${ADM_NET} dev ${ADM_IF} table lab ip route add ${VPN_NET} dev ${VPN_IF} table lab ip route add ${VMLAB2_NET} dev ${VMLAB2_IF} table lab ip route add ${SECURE_CLIENT_NET} dev ${SECURE_CLIENT_IF} table lab ip rule add from ${VMLAB_NET} dev ${VMLAB_IF} lookup lab priority 100 ip rule add from ${VMLAB2_NET} dev ${VMLAB2_IF} lookup lab priority 100
A routing table is a series of routes, and a set of rules that match what will use the above routing table. The the case of the above, any traffic from the ${VMLAB_NET} and ${VMLAB2_NET} will use the lab routing interface. This allows the lab traffic to exit a different default gateway than the rest of my traffic. With the proper NAT rules, my lab traffic will now leave using a different interface and gateway on the firewall.
Note in the above example from my firewall, I needed to add all of the routes, even if they were directly connected.
NMState Operator and Policy-based routes
We can achieve the same result using the NMstate operator. Admittedly, this has less value on a Kubernetes node, but when combined with Cilium’s egress policies, we can really do some cool stuff.
Here is the example from my NodeNetworkConfigurationPolicy
:
apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: example-config spec: desiredState: ... routes: config: - destination: 0.0.0.0/0 next-hop-address: 10.0.10.1 next-hop-interface: br10 table-id: 101 - destination: 10.0.10.0/24 next-hop-interface: br10 table-id: 101 route-rules: config: - ip-from: 10.0.10.10/32 route-table: 101
The above creates a new route table with an ID of 101 and adds a couple of routes. It also creates a rule matching any traffic that originates from the 10.0.10.10 IP address, which is assigned to an interface.
The net result is the following:
server ~ > ip route show table 101 default via 10.0.10.1 dev br10 proto static 10.0.10.0/24 dev br10 proto static scope link server ~ > ip rule show ... 30000: from 10.0.10.10 iif br10 lookup 101 proto static ... server ~ > curl ifconfig.me 131.191.104.138 server ~ > curl --interface br10 ifconfig.me 131.191.55.228
The path my packets take depends on the source interface they were assigned.
The other valid fields from the api are:
- family
- state
- ip-from
- ip-to
- priority
- route-table
- fwmark
- fwmask
- action
- iif
- suppress-prefix-length
- suppress_prefixlength
I have been fighting with the egress policy, but I figured a quick post to document how to create route rules would not hurt, as I didn’t find much in the way of documentation.
I’m still investigating, so drop me a line if you have an addition or correction.