Summary: | sys-fs/convertfs eats 100% CPU for days on end | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Andreas Klauer <Andreas.Klauer> |
Component: | [OLD] Core system | Assignee: | Tom Payne (RETIRED) <twp> |
Status: | RESOLVED WONTFIX | ||
Severity: | critical | CC: | antarus, jakub, martin.sandsmark, qa, world.root |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | Pending Removal | ||
Package list: | Runtime testing required: | --- | |
Attachments: |
devremap profiling output
very ugly hack to fix 100% CPU problem |
Description
Andreas Klauer
2005-09-29 12:15:49 UTC
Created attachment 69517 [details]
devremap profiling output
devremap is the core program used by convertfs to relocate the blocks of the
scarce image file created within a filesystem to the actual physical partition.
Created attachment 69555 [details, diff]
very ugly hack to fix 100% CPU problem
I analyzed the 100% CPU usage problem further; I still don't fully understand
whatever this program is doing, so the following is just what I'm assuming so
far.
devremap does it's work on a block-per-block basis; my partition is 80GB, it
has 78148161 blocks. Now, for every block that has to be relocated, devremap
loops through the whole list of blocks to find out if the target block is
already occupied (by an other part of the scarce file image, perhaps?). And
this is what's taking so long...
Now, by putting in a lot of debug messages and stuff I found out that usually,
when one target block was occupied, chances are high the next block will be
occupied too, so I'm just saving the position a match was found in the last
iteration, so it can be checked and re-used in the next iteration.
I'm not really a C programmer so I have a hard time debugging this thing; the
way I implemented the patch is a VERY UGLY HACK (it uses global variables and
goto statements, nuff said), so I don't recommend using this patch at all.
I don't even know if the patch really works - I've got the patched devremap
running now at high speed (it now needs 0-2 seconds for 1022 blocks as opposed
to 90-110 seconds before), but I'll still have to wait until it's finished to
see if the result is corrupted or not. I'll report back later.
I'm just posting this patch here in case someone else is already stuck with a
very slow devremap and maybe in hope that a real C programmer can figure out
what I was doing there. ;-)
So, devremap is done now. Resulting filesystem is mountable. All files are present. I cannot verify the files contents, though, because I don't have md5sums or whatever for them. I tested a couple of files (tar archives, images, ...) and they all are okay, so I guess it worked out okay and the patch should be reasonable safe to use (the stuff I added is still very ugly code, though). On a sidenote, I was mistaken about the block numbers printed by devremap... I thought it meant the real blocks, but at block group 18494243 it was finished (I thought it would go up to 78148161). So the original version wouldn't actually have taken as many days as I estimated (but still far too long for my taste). I'll wait and see if the original author of this script replies to any of the emails I sent him; if he doesn't, I might consider trying to make a properly working software out of this thing (although I think I'd go for integration in parted rather than keeping software as dangerous as this in a shell script...). Yay, straight-to-stable on version bumps rocks [1], especially with tools that mess with filesystems and have ewarn "This tool is HIGHLY DANGEROUS. Read the homepage before using it!" in the ebuild. [1] http://bugs.gentoo.org/show_bug.cgi?id=88510 http://sources.gentoo.org/viewcvs.py/gentoo-x86/sys-fs/convertfs/convertfs-20050113.ebuild?hideattic=1&rev=1.1&view=markup Please package.mask or at least ~arch this thing, should have never gone stable in the first place. :/ (In reply to comment #4) > Please package.mask or at least ~arch this thing, should have never gone stable > in the first place. :/ To add to reasons to kill it, 1. it doesn't cleanup properly if you Ctrl-C. 2. the last note in the changlog on the homepage is 13 Jan 2005 3. the maintainer is currently away according to herdstat -m I say mask, and maybe even mark for removal in 30 days. I pushed this back to ~x86 since its not really stable. What is going on with this package? If its this unsafe, it should probably be package.mask'd. Agree that package should be masked, and possibly removed. I've emailed the author and will await his response. If there is none then I suggest we remove it completely. Last Rites sent to -dev PPC doesn't need to be CC'd, remove it :) The debian people have patched this program. They added a program called "ftwmv" to move the files individually, so you now can have a directory which takes over 50% of the filesystem. They have also patched it to fix the spaces in filenames bug and support for Reiser4, and some other minor stuff. The only thing they haven't patched (AFAICS), is what the veryuglyhack.patch fixes. I don't think it should be removed, only masked, with the new patches. It actually keeps a journal of some sorts, so it should be safe to interrupt (i. e. it's recoverable) to ctrl+c. http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=convertfs Patches: http://ftp.debian.org/debian/pool/main/c/convertfs/convertfs_20050113-1.diff.gz Great to hear that there is some progress. The relocation process should be optimized, simply because it takes ages (the bigger the partition, the worse, and I doubt it's linear). With the ever growing size of hard disks and partitions, this is a serious issue. What's the use if the data is safe when it takes days, weeks of hard CPU work until you can access it again. Removed from portage. Last rites sent to -dev 5 months ago, with no responses: http://article.gmane.org/gmane.linux.gentoo.devel/38641 |