A weekend of IPv6 bug chasing
To enable me to actually test some ideas for IPv6 in libvirt’s virtual networking APIs, I recently purchased a LinkSys WRT54GL wireless router which was promptly flashed to run OpenWRT. I won’t go into all the details of the setup in this post, it suffices to say that thanks to the folks at SIXXS my home network has a globally routable IPv6 /48 subnet (no stinking NAT required!). That gives me 80 bits of addressing to use on my LAN – enough to run a good sized handful of virtual machines :-) With a little scripting completed on the LinkSys router, full IPv6 connectivity is provided to any machine on the LAN which wants it. Which is more or less where the “fun” begins
Initially I was just running radvd to advertise a /64 prefix for the LAN & letting devices do autoconfiguration (they basically combine the /64 prefix with their own MAC address to generate a globally unique 128-bit IPv6 address. As part of my exploration for IPv6 support in libvirt though, I wanted to give DHCPv6 a try too.
It was fairly straightforward – much like DHCP on IPv4 you tell the DHCPv6 server what address range it can use (make sure that matches the range that radvd is advertising) and then configure the networking scripts on the client to do DHCP instead of autoconfiguration (basically add DHCPV6C=yes to each interface config). In debugging this though I came across a fun bug in the DHCPv6 client & server in Fedora – it consistently passes in sizeof(sockaddr_in6.sa_addr) as the salen parameter to getnameinfo() which it should be sizeof(sockaddr_in6). So all getnameinfo() calls were failing – fortunately this didn’t appear to have any operational ill-effects to the DHCPv6 client/server – it just means that your logs don’t include details of the addresses being handed out / received. So that was IPv6 bug #1
With globally routable IPv6 addresses now being configured on my laptop, it was time to try browsing some IPv6 enabled websites. If you have a globally routable IPv6 address configured on your interface, then there’s no magic config needed in the web browsers – the getaddrinfo() calls will automatically return an IPv6 address for a site if it is available. BTW, if you’re still using the legacy gethostbyname() calls when writing network code you should really read Uli’s doc on modern address resolution APIs. Suffice to say, if you use getaddrinfo() and getnameinfo() correctly in your apps, IPv6 will pretty much ‘just work’. Well, while the hostname lookups & web browsers were working correctly, all outgoing connections would just hang. After much debugging I discovered that while the SYN packet was going out the default ip6tables firewall rules were not letting the reply backthrough, so the connection never got established. In IPv4 world there is a rule using conntrack to match on ‘state = RELATED,ESTABLISHED’ but there was no equivalent added in the IPv6 firewall rules. That gives us IPv6 bug #2
With that problem temporarily hacked/worked around by allowing all port 80 traffic through the firewall, web browsing was working nicely. For a while at least. I noticed that inexplicably, every now & then, my network device would loose all its IPv6 addresses – even the link local one! This was very odd indeed & I couldn’t figure out what on earth would be causing it. I was about to resort to using SystemTAP when I suddenly realized the loss of addresses co-incided with disconnecting from the office VPN. This gave two obvious targets for investigation – NetworkManager and/or VPNC. After yet more debugging and it transpired that when a VPN conenction is torn down, NetworkManager flushes all addresses & routes from the underlying physical device, but then only re-adds the IPv4 configuration. The fix was this was trivial – during the initial physical device configuration NetworkManager already has code to automatically add an IPv6 link-local address – that code just needed to be invoked from the VPN teardown script to re-add the link-local address after the device was flushed. Finally we have IPv6 bug #3. Only 3 minor, edge-case bugs is pretty good considering how few people actively use this stuff.
Overall it has been a very worthwhile learning exercise. Trying to get your head around IPv6 is non-trivial if you’re doing it merely based on reading HOWTOs & RFCs. As with many things, actually trying it out & making use of IPv6 for real is a far better way to learn just what it is all about. Second tip is to get yourself a globally routable IPv6 address & subnet right from the start – site-local addresses are deprecated & there’s not nearly as much you can do if you can’t route to the internet as a whole – remember there’s no NAT in IPv6 world. I would have been much less likely to have encounter the firewall / NetworkManager bugs if I had only been using site-local addresses, since I would not have been browsing public sites over IPv6. While there are almost certainly more IPv6 bugs lurking in various Fedora applications, on the whole Fedora Core 6 IPv6 support is working really rather well – the biggest problem is lack of documentation & the small userbase – the more people who try it, the more quickly we’ll be able to shake out & resolve the bugs.
BTW, there’s nothing stopping anyone trying out IPv6 themselves. Even if your internet connection is a crappy Verizon DSL service with a single dynamic IPv4 address that changes every time your DSL bounces, the folks as SIXXS have way to get you a tunnel into the IPv6 with a fully routable global IPv6 address & subnet.
Isn’t IPv6 bug #2 a dupe of https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=209945 ?
You can also take a look at this post http://www.redhat.com/archives/fedora-test-list/2006-October/msg00580.html
Regards,
Dawid
No, it isn’t a dup – my bug was complaining about the fact that system-config-securitylevel does not even add a match on state == RELATED,ESTABLISHED to the ip6tables ruleset in the first place.
Though the kernel bug you mention does mean that even if it did add such a conntrack rule, things would still be broken. So I guess my bug 233725 would have to be marked as depending on 209945. As a temporary workaround system-config-securitylevel could at least add an non-conntrack style rule to INPUT matching on ‘! –syn’ to allow return traffic until the kernel conntrack stuff was fixed. As it stands it is just broken out-of-the box which doesn’t help people trying out IPv6 because they will all have to manually allow port 80 (and similar for any other protocols they use outbound).
OK, thanks for the clarification.