The 74tjump patch described earlier has an overflow problem, and I think that it's not really a simple enough fix.
The 74tjump2 fix is conceptually easier to grasp: for packets sent more than 250 milliseconds ago, according to the gettimeofday time, instead use a jiffies-based time. This change requires an extra datum per "frame", the data structure that tracks an AoE command, to record the sent time in jiffies.
By subtracting unsigned integers, the jiffies time calculation avoids overflow as long as the time difference is not more than 2^32 jiffies. That's about 1193 hours on a machine with HZ set to 1000.
In order to best accomodate both congested, non-congested, and changing networks, the aoe driver started using high-resolution timing information for tracking round trip times in version 65.
In 2.6.31, sysfs exposes the request queue of block drivers that don't use a request queue. The aoe driver was not allocating its request queue using the block layer's functions like blk_alloc_queue or blk_init_queue, and so an oops was being generated.
This patch uses blk_alloc_queue to allocate the queue in the expected way. Jens Axboe says that the aoe driver should have been doing that already, even though the recent changes to the kobject code revealed the problem.
This patch is now (20090910) being pushed from Jens Axboe to Linus Torvalds.
This change improves diagnostic messages in the mainline kernel, where the driver is still limited to 16 slot addresses per shelf address. Once dynamic system minor device numbers are put into the mainline aoe driver, this restriction will be eliminated.
This change has been announced on the Linux Kernel Mailing List and pushed to the linux-next tree.
This patch allows the aoe driver to be built for a 2.6.30 kernel.
In 2.6.30, the hdreg.h macros like WIN_READ, etc., are only defined outside of the kernel. The changes committed to the mainline kernel allow aoe6-72pre2 to build for 2.6.30 but must be reverted during compatibility configuration for backwards compatibility with kernels so old that hdreg.h and ata.h are incompatible.
I used the relay_fs module (or the relay module plus debugfs) to do some tracing when last working on network congestion avoidance. I left the tracing code skeleton in, because it could be quite handy during testing, but it had been a major headache for the compatibility system: People running kernels without relay were building aoe modules that depended on relay, and their aoe module wouldn't load.
One reason for that issue was a simple mixup, where a different configuration was used to build the running kernel than the one that was in the kernel sources used to build the aoe driver. It turns out there was another reason. If the Module.symvers file is absent, then the "relay_open undefined" message that the compatibility test looks for never appears, even when the function is not defined.
This patch causes the compatibility system to create an aoe driver that doesn't use relay when Module.symvers is absent or when the relay functions are undefined.
This prerelease fixes a bug that was causing premature timeouts of I/O when AoE commands were reassigned from an unresponsive destination MAC address to a new one.
It also includes support for an easier way to specify local network interfaces to the aoe_iflist module parameter: Using commas to separate them means that fancy quoting is no longer needed in scripts.
Other changes are described in the NEWS file.
This patch helps the aoe driver handle complex networks with serious congestion problems. In our tests, this patch increased throughput from 36 MB/s to 186 MB/s in a scenario with a theoretical limit of about 200 MB/s. It decreased throughput for a best case network by only about 3%. Examples of networks that could benefit from this patch include...
The patch can be found here.
The three parts of this patch reflected in its name are,
This blog describes patches for the Linux aoe driver, the in-kernel AoE initiator. The patches change an aoe driver to fix problems or to add features. They are considered "unstable" because they are experimental and not thoroughly tested, but you can try them if you have a test environment. If you do, please leave feedback here by commenting. You can also email ecashin at coraid dot com directly.
A patch can be applied to the aoe driver source code using the "patch" command while working in the aoe source directory, e.g.,
patch -p1 < /tmp/aoe6-69-70pre1.diff
There are two places to find the aoe driver. One is at CORAID's website, and the other is in the Linux kernel distributed at kernel.org or through the various Linux distributions (like Fedora and debian).
Unless otherwise specified, the patches here apply to the aoe driver at the CORAID website. There is ongoing work to bring the kernel.org driver up to the level of development that the CORAID-website driver has reached.