Hi again IT World...
I've been quite busy lately... too much for comfort, and too often because of network issues (at the best).
It has took me almost two months of work to regain our desired levels of stability and tranquilty, since we underwent the upgrade of our hybrid cloud infrastructure at OVH, upgraded to PROXMOX 5, ZFS, and pfSense 2.4.
The quest for redundancy: Asymmetric routing
Part of the problems appeared as we intended to take profit of the full benefits of OVH vRack features, which allows us to have unrestricted numbers Layer2 VLANs spread transparently accross datacenters, reaching any Hypervisor node (and its VMs), Storage server, Dedicated servers, and so on.
vRack is a very resilient design, it works at full interface speed, has low latency, allows any Layer2 and above traffic across datacenters, and, thanks to its VLAN support (in combination with PROXMOX and OpenVSwitch), you can create independent hybrid physical/Virtual networks for mixed assets at incredible costs.
Putting resources at OVH vRack gave us higher levels of availability... but it also, at the same time, and all of the sudden, rendered our classic routing schemes, and single, peer-to-peer VPN connections from premises to cloud resorces, not only obsolescent, but dangerous single-points-of-failure, true show-stoppers.
But I remembered my CCNA readings, those equal-cost paths load-balancing theories on native on protocols such as EIGRP and OSPF that sound magic at that times...
So I though the time to put that theory into practice had come, and it woulb be somehow easy to let a multiple VPN p2p uplinks combined with OSPF do the the magic work ... and you know... IT WORKED!!!!
But it worked just to simply uncover a huge problem: while asymmetric routing works nice in PacketTracer simulations, it was a bitter surprise to discover that, not only pfSense, but also PROXMOX firewall (and some other) are not intended to deal with asymmetric traffic at all!.
If you're lucky, It is considered some kind of hostile traffic and droped, leaving some trace at the logs, but often it is silently dropped (this is PROXMOX firewall behaviour, for instance, but not the only one) and it is extremely confusing to guess what is happening.
There are workarounds from many (if not all) of the issues, and we manage to handle the situation somehow, but, at the end, the best solution we found is to get ride of any pre-made firewall solutions, and go back to the do-it-yourself style, raw Linux virtual router solution... and enjoy life!
The pfSense crisis
pfSense has been (and somehow it still is) my firewalling solution of choice for years.
It has worked very well on tiny platforms such as PCEngines Alix, Soekris, ViaTech NFRs and, more important here, KVM.
The fact that the project has been purchased by NetGate may have had some influence on the decissions made regarding support or availability of pfSense for some hardware and architectures in teh future... at the end, they intend to sell NetGate appliances (which look very nice) but we never though that moving from pfSense 2.2 to 2.4 would be a full disaster.
In reality, pfSense virtual VMs were somehow tricky from the beginning, limited to qemu 32 CPU, they required some tweaks to configure on PROXMOX... but once they were up and running, they were rock solid!
We blindly trusted on pfSense: we tested PROXMOX 5, we tested ZFS, we tested vRack comaptibility...we tested everything... but assumed no problem with pfSense... big mistake...
In adition to hostile behaviour with asymmetric routing (which makes sense for a firewall after all), we started having crashes of pfSense 2.4 from the very beginning... it simply Crashes under less that average load.
pfSense 2.3, with equivalent setup, may last several days up, then, its interfaces/networking stack crashes, forcing for a reboot.
We had to set crons on our pfSense machines to reboot every 12 hours in order to have decent availability.
To make justice to pfSense, it has to be said that, most of those problems come from the underlying BSD OS, and the NetGate appliances manage to avoid them or have minimal impact.
Also, for simple setups, where none of the problematic features are used, pfSense may still be a good choice!
In the weak of a few weeks of struggling, we manage to realize, that a number of reported, well-known bugs with BSD/pfSense do exist, some ones for long time ago, some others quite new, but they impacting some critical features for us:
- Virtio Drivers / Tx Offloading problems on KVM
- Crashing network interfaces due to ipSec
- SNMP causing full CPU usage
- Several important Quagga OSPF issues
In adition, some packages have being dropped from pfSense, such as Check_MK and others, and from pfSense2.4, no more BSD repos are available to use with pkg.
Again, everything pointed to a change of ideas. pfSense 2.3 GUI is nice, but tranquility is nicer... pfSense days are over... Debian came to the rescue!!!!
So, I had to migrate all features, from pfSense VMs, to Debian VMs.
As I use to do, I take notes, specially from configurations, and, whenever I got trouble finding guides, articles, or clear forum posts regarding some setup, I consider the option to share them at my blog on my own!
So in the next set of articles, I will share my 'recipes' for the followin set of 'migrated' features.
- OpenVPN 1: Bridged/tap mode setup (connect road-Warriors directly to OVH vRack!)
- OpenVPN 2: Site-to-site Point-to-point setup (typical for OSPF).
- from CARP to VRRP: Using keepalived to get your floating IPs
- IPSec 1: Connecting CISCO router, with dynamic IP, to StrongSwan IPsec
- IPSec 2: GRE under IPSec between CISCO and Linux (typical for OSPF)
There are some other features I had to migrate, including PKI management, DNS, DHCP and HAProxy ... but well, they're vastly documented.
Likewise, a lot of iptables stuff had to be tunned, leading to tailoring quite nice scripts, but, well ... iptables is also vaaaaastly docummented!
So... enough for today... next one we'll play with OpenVPN! :-D