While experimenting with my approach to "secure" computers, one of the key ingredients is reducing the attack surface as most as possible, and unfortunately, one big attack surface of any Linux-based deployment is the kernel itself. Thus, one of the first things I've done is to configure and recompile my own extremely stripped down version of the Linux kernel (based on the latest LTS branch).
Among the kernel features that were removed
was also a very important security related subsystem,
namely the Linux firewall netfilter
(or iptables
the old version, and nf_tables
the new version).
Why remove the firewall subsystem when it's essential for security?
Because there have been quite a few vulnerabilities
that have impacted the Linux firewall subsystem,
many of them granting root
access,
either to a local unprivileged user,
or worse, RCE (Remote Code Execution)
to an external attacker.
Here are some links to the most recent vulnerabilities impacting the Linux firewall subsystem:
- CVE-2024-1086
Flipping Pages: An analysis of a new Linux vulnerability in nf_tables and hardened exploitation techniques - CVE-2023-32233
Linux kernel use-after-free in Netfilter nf_tables when processing batch requests can be abused to perform arbitrary reads and writes in kernel memory - CVE-2022-32250
Linux kernel a use-after-free write in the netfilter subsystem - CVE-2022-25636
The Discovery and Exploitation of CVE-2022-25636 - CVE-2021-22555
CVE-2021-22555: Turning \x00\x00 into 10000$ - (and the list goes on...)
However, I still do want the ability to firewall inbound traffic, and especially outbound traffic, in and out of my "secure" computer.
But, without the Linux firewall subsystem, how could we achieve this?
one solution is to declare all ports as privileged:
sysctl -w 'net.ipv4.ip_local_port_range = 65535 65535' sysctl -w 'net.ipv4.ip_unprivileged_port_start = 65535'
thus, connecting sockets would require either
root
or theCAP_NET_BIND_SERVICE
capability, plus it would also require to explicitlybind
a socket on an explicit non-zero port;another solution is to use
seccomp
and either disable the socket related syscalls, or write a policy that allows a subset of calls (for example, for specific users, or specific ports and destinations, etc.);
However, none of these approaches give us the ability to restrict incoming traffic. It only allows us to restrict what local processes can listen to (thus inbound traffic) or connect to (thus outbound traffic).
Which brings me to my alternative solution: Linux policy based routing.
Unfortunately, policy based routing is meant for, as the name says, "routing", and not for "firewalling", thus, it's worth stressing the fact that I'm abusing this tool.
Foremost, we must understand that routing applies only to outbound traffic, not inbound, thus it applies only to packets that are generated locally, and must be sent to somewhere else via a network device. (It also applies to forwarded traffic, when our node behaves as a router.)
It is also worth noting that policy based routing is stateless,
as opposed to stateful,
meaning that it doesn't see connections (e.g. TCP),
or related traffic (like Linux's contrack
does for UDP),
but instead just individual packets.
Thus, any "firewalling" that we can do by using
Linux policy based routing can't take into account flows.
And, as a final note,
due to a feature called
return path filtering
present in the Linux kernel,
if enabled, we actually also get inbound traffic filtering.
In a few words,
with sysctl -w 'net.ipv4.conf.all.rp_filter = 1'
,
the kernel is instructed that before accepting and delivering a packet locally,
to look if there is a route
for a packet that has the source and destination addresses switched
(thus the "return path" name),
and if not to consider it as "martian",
causing it to be dropped and logged,
because it is not coming from an expected route.
Fore more details on Linux based policy routing, see the ip-rule man page or this article.
Finally, here are the snippets I've used in my initial experiment:
(remember, these rules apply only to outbound packets, but due to the return path feature, it also applies to inbound traffic in the reverse, by switching the
sport
anddport
values;)anything from a user-id over
2000
is not allowed to touch the network;ip rule add priority 1537 type prohibit iif lo uidrange 2000-4294967294
we allow WireGuard UDP traffic from source port
51820
to destination port51280
ip rule add priority 1538 type unicast table main iif lo ipproto 17 sport 51820 dport 51820
we allow SSH server TCP traffic from the source port
22
to any non-privileged port:ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 6 sport 22 dport 1024-65534
(if we also want to allow SSH client traffic, use a rule similar to the ones below;)
we allow the
root
user and other "normal" users (user-id in the range of1000
to1999
) HTTP/HTTPS access:ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 6 dport 80 ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 6 dport 443 ip rule add priority 1538 type unicast table main iif lo uidrange 1000-1999 ipproto 6 dport 80 ip rule add priority 1538 type unicast table main iif lo uidrange 1000-1999 ipproto 6 dport 443
we allow DNS client TCP and UPD traffic (to destination port
53
) for any user-id in the range of0
to1999
:ip rule add priority 1538 type unicast table main iif lo uidrange 0-1999 ipproto 6 dport 53 ip rule add priority 1538 type unicast table main iif lo uidrange 0-1999 ipproto 17 dport 53
we allow NTP client UDP traffic (to destination port 123), and also NTP-secure client TCP traffic (to destination port 4460), but only for the user-id
122
(under whichchrony
is being run); for some reason, we also need to whitelist theroot
account, otherwisechrony
fails to bind to the UDP socket;ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 17 dport 123 ip rule add priority 1538 type unicast table main iif lo uidrange 122-122 ipproto 17 dport 123 ip rule add priority 1538 type unicast table main iif lo uidrange 122-122 ipproto 6 dport 4460
we allow DHCP client UDP traffic (from the client port
68
, to the server port67
)ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 17 dport 67 sport 68
we allow only the
root
user toping
other hosts, this also allows other hosts to ping us:ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 1
at last, some catch-all fallback rules to drop any ICMP, TCP and UDP traffic not whitelisted above, or that can't find a proper route in the
table main
routing table;ip rule add priority 1539 type prohibit iif lo ipproto 1 ip rule add priority 1539 type prohibit iif lo ipproto 6 ip rule add priority 1539 type prohibit iif lo ipproto 17
A few observations:
the way
ip rules
are matched is a bit interesting:- each rule is tried in turn, ordered by the priority;
- it stops when either a
type prohibit
ortype blackhole
rule is found; - it also stops when a
type unicast
withtable xyz
rule is found, and the destination is actually found in thattable xyz
, else it continues;
the
iif lo
argument is essential, as it selects only locally generated traffic; (without it, I think all traffic breaks;)the
table main
is the "normal" Linux routing table; however, it can be replaced with some other table, thus perhaps allowing further filtering, by treating routing tables as ipset replacement; like for example, having a routing table only for HTTP / HTTPS connections, that explicitly lists all allowed destination IP's, and adding a "default" blackhole route,ip route add blackhole 0.0.0.0/0
;for some reason, for outbound connections (like for example HTTP/HTTPS), if one uses
sport 1024-65534
to explicitly states that the local address of these connections should use ephemeral ports, it breaks all outgoing traffic; (the solution would be for the application to explicitlybind
the local address to an explicit non-zero port;)unfortunately, the
busybox ip
applet doesn't understand theipproto
option, thus one needs to use theip
tool from theiproute2
project;unfortunately, for some reason I haven't explored much, having a catch all
prohibit
rule without theipproto
breaks all traffic;when
ping
-ing other hosts, one needs to explicitly state the outbound interface, e.g.ping -I eth0 1.1.1.1
, else (for some reason) it won't work; (perhapsping
tries to detect the interface address via some other technique, and that fails?)there is no way to do any quantitative filtering, like for example rate limiting, bandwidth throttling, etc.; for that the Linux firewall subsystem is required;
Have I sparked some new ideas?