This bug affects sys-kernel/ck-sources as well as any other kernel which has BFQ support. Reproducible: Always Steps to Reproduce: 1. use any kernel with BFQ as the default IO scheduler 2. use eudev in default configuration on a system with SSD 3. boot the system normally Actual Results: The less-optimal IO Scheduler, noop is set by eudev due to a hardcoded "noop" in /lib/udev/rules.d/60-block.rules Expected Results: The IO Scheduler, BFQ (optimal) remains active once the system finishes booting. https://github.com/gentoo/eudev/commit/b38f3aaba153414eb357dce18611772d2cffa1f6 > The BFQ developers' benchmarks on SSDs appear to account for both. They show noop as being far better than CFQ and second only to BFQ [...] For SSD-backed systems with the BFQ IO scheduler, this commit is a performance regression.
(In reply to kuzetsa from comment #0) > This bug affects sys-kernel/ck-sources as well as any other kernel which has > BFQ support. > > Reproducible: Always > > Steps to Reproduce: > 1. use any kernel with BFQ as the default IO scheduler > 2. use eudev in default configuration on a system with SSD > 3. boot the system normally > Actual Results: > The less-optimal IO Scheduler, noop is set by eudev due to a hardcoded > "noop" in /lib/udev/rules.d/60-block.rules > > Expected Results: > The IO Scheduler, BFQ (optimal) remains active once the system finishes > booting. > > https://github.com/gentoo/eudev/commit/ > b38f3aaba153414eb357dce18611772d2cffa1f6 > > > The BFQ developers' benchmarks on SSDs appear to account for both. They > show noop as being far better than CFQ and second only to BFQ [...] > > For SSD-backed systems with the BFQ IO scheduler, this commit is a > performance regression. Short of reverting, do you have any other suggestions?
I don't know how [e]udev rules work well enough to know if they can detect what kernel is running... nor do I know if (or how easily) the rules can have an order of precedence such that BFQ would be prioritized over NOOP, when available for a non-rotational SSD-type storage device. I tried to research such a remedy, but still haven't have time to follow through.
Ryao, this was your commit. Can you comment, else I'm going to revert before the next release.
This was a heuristic meant to prevent DoS issues (operations that should take seconds take hours) on certain hardware when discard is enabled on ext4 and other filesystems. It still prevents that and it is not known to me if BFQ suffers from the same issues. I have the hardware required to do testing, but it would take some time to move things off it so that I can repurpose it for tests. I am willing to do that, but until I have, there are two ways of approaching this. The first is to assume that only CFQ is unsafe and implement a helper that we can execute on an add event to check if cfq is specified and set it to noop, while leaving it alone if not. Then when the tests on BFQ are done, we could either leave it as-is or change it to include BFQ if we find BFQ is suspectible to the same problem. The other is to leave it alone until tests are done. Anthony, let me know what you want to do. If you would rather do something else such that there is no point in running the tests, I would prefer to know that before I go through the trouble of setting up a test environment with my hardware that can reproduce these problems.
@ryao - thank for the insight. This particular sort of helper is beyond my current experience and time to research such a remedy, but I really think it sounds like a good plan, assuming I understand correctly: 1) Detect if a particular hardware combination is bad for CFQ 2) If something other than CFQ is set, leave it alone. 3) If CFQ is set, change to noop. Concerning the details about not using CFQ - I don't personally know enough about these details, so can't really comment on the original issue. If it's bad enough to be described as a DoS, I still think the various benchmarks I've reviewed, and seen (local testing) demonstrate fewer "major latency problem" cases on BFQ under for SSD block devices, and the overhead in certain workloads is less bad than what CFQ is shown to suffer from. ... On the other hand, for performance benefits and UX tests, BFQ demonstrates a marked decrease (improvement) for launch times of various sized applications whenever the kernel cache is unavailable for IO, such that the SSD block device must be accessed to fulfill a request. (e.g. first launch after a reboot, or after the cached data was released during a period memory and/or IO contention)
Additional supporting info (incl. comparison against noop) // section "PLEXTOR SSD" - "The next three figures show the cold-cache start-up time of bash, xterm and lowriter." http://algo.ing.unimo.it/people/paolo/disk_sched/extra_results.php ^ I've personally timed (nothing automated - stopwatch) various browsers, mozilla thunderbird, and a few games for first-run after boot as well. Both small and large applications on SSD have shown faster load times in my testing. ... though since I haven't kept and posted the results, you may wish to run some automated testing for various UX usage cases of noop compared to BFQ so that you'll have a record of non-anecdotal data confirm. That's entirely up to your judgement / prerogative. Again, thanks :)
The problem is that TRIM is an unqueued command prior to SATA 3.1. Things like unlinking a million inodes on an ext4 rootfs mounted with discard will lockup a system for hours on certain hardware. The hardware that I have is the OCZ Vertex 2. CFQ's ordering of discard commands will exercerbate this. deadline and noop are known to not have issues. I do not know about BFQ.
By the way, I agree that BFQ is superior to other IO elevators on most filesystems with ZFS being a notable exception due to it having its own IO elevator. If people are not using ZFS and have BFQ built into their kernel, then they definitely ought to use it.
That particular detail about TRIM on SSDs is exactly the kind of thing I don't know enough about to test or research. Sorry I can't comment or contribute further, as I'm generally not a hardware person, and don't know these kinds of things. I'll yield to whatever resolution happens on this bug, though I think no longer forcing users to change unconditionally to using NOOP for SSD might be preferable?
It's also worth noting here that if anyone has scsi blk.mq enabled they'll get a forced noop in their kernels, and as I understand it blk.mq is recommended on SSDs.. Given this is a udev rule and not something enforced via hwdb (right?), I think we should just evaluate what the majority wants here and then ensure the default ruleset goes that way. Since BFQ is iirc not yet in mainline, likely that would still be to do what it's doing now. It shouldn't be difficult with an override rule in /etc/udev/rules.d/ to change the default behaviour in this case, right?
FWIW - Fixing this bug may be a bit more relevant in the next year or two - looks like linux kernel 4.12 is rumored to have mainline BFQ IO scheduler queued up in the pipe.
The rule should definitely not be applied when blk-mq is used. First of all there is no "noop" scheduler in this case only "none". And the default [mq-deadline], in recent kernels, is right for both SSDs and spinning rust.
this rule has been removed from the more resent versions of eudev.