Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 107635

Summary: sys-fs/convertfs eats 100% CPU for days on end
Product: Gentoo Linux Reporter: Andreas Klauer <Andreas.Klauer>
Component: [OLD] Core systemAssignee: Tom Payne (RETIRED) <twp>
Severity: critical CC: antarus, jakub, martin.sandsmark, qa, world.root
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard: Pending Removal
Package list:
Runtime testing required: ---
Attachments: devremap profiling output
very ugly hack to fix 100% CPU problem

Description Andreas Klauer 2005-09-29 12:15:49 UTC

I suggest sys-fs/convertfs to be masked for several reasons.

1.) It's highly dangerous software. It required in-depth understanding of the
way it works, otherwise data loss is imminent.

2.) Even if you understand the basic idea, it's buggy and has a lot of hidden
requirements. For example, it uses 'mv' to move all files from the root
directory of the original partition onto the image. If a 'file' happens to be a
directory, the directory will be moved. Because of the nature of 'mv', the
source directory will only be deleted after the WHOLE directory tree including
all contents was copied; so for this to succeed, you need at least as much free
space as the size of the biggest directory on the partition. In other words, if
a single directory consumes more than 50% of total available space, convertfs
will not succeed even if the other 50% of the partition are free space.

3.) Even if you're lucky and you got enough free space, it simply does not work.
I'm in the situation right now that I read and understood the ideas utilized by
this script before actually using it. I was really careful about it, and still
ran into a problem now.

convertfs copied all data onto the scarce file image all right, unmounted the
partition and began relocating blocks. That was more than 24 hours ago. Right
now, the core program (devremap) is still working, utilizing 100% CPU all the
time (CPU is a Athlon XP 2000+, reasonably fast even for today's standards). The
progress is 1GB out of 80GB. The way it looks now, it will take months to finish.

I investigated the problem a little by profiling it using gprof. I'll attach the
output. 99.85% of CPU power is wasted in a function called 'find_cross_block'.
Unfortunately this code is not documented at all, so although I have a basic
idea of what's going on, I don't really know how to fix it.

By searching the web and Gentoo forums, I found that others had the same
problem. Although most data loss reports can be ignored (caused by stupid users
/ newbies who don't know what they're doing, like convertfsing an already
mounted file system), the only success stories I found date back to 2002. So I
guess this 100% CPU usage issue is a bug that was introduced sometime in
between. Unfortunately I don't seem to be able to find older versions of this
software, so I can't even see what's been changed.

In any case, the software as it is now, is dangerous, so please mask it until
someone can conjure a proper fix for the issues mentioned. I already contacted
the author about this.

Andreas Klauer

Reproducible: Always
Steps to Reproduce:

Portage (default-linux/x86/2005.0, gcc-3.3.6, glibc-2.3.5-r1, i686)
System uname: i686 AMD Athlon(tm) XP 2000+
Gentoo Base System version 1.6.13
dev-lang/python:     2.3.5-r2
sys-apps/sandbox:    1.2.12
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6
sys-devel/libtool:   1.5.18-r1
virtual/os-headers:  2.6.11-r2
CFLAGS="-march=athlon-xp -O2 -pipe -g"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env
/usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config
/usr/lib/X11/xkb /usr/share/config /usr/share/texmf/dvipdfm/config/
/usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/
/usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-march=athlon-xp -O2 -pipe -g"
FEATURES="autoconfig distlocks nostrip sandbox sfperms strict"
LDFLAGS="-Wl,-O1 -Wl,--sort-common -Wl,-z,now"
USE="x86 3dnow 3dnowext 3ds X Xaw3d a52 aac aalib acpi alsa anthy apm arts avi
bash-completion berkdb bigger-fonts bitmap-fonts browserplugin bzip2 canna cdda
cddb cdr cid cjk crypt css cups curl custom-cflags dga divx4linux doc dvd dvdr
dvdread eds emboss encode esd fam ffmpeg firefox flac font-server foomaticdb
fortran freewnn ftp gd gdb gdbm gif gimp gimpprint glx gnome gnutls gphoto2 gpm
gs gstreamer gtk gtk2 guile ieee1394 imagemagick imlib immqt ipv6 java jpeg
junit kde lame latex libg++ libwww logitech-mouse mad matroska mccp mikmod mime
mmx mmxext motif mozilla mp3 mpeg mplayer mysql ncurses nls nsplugin nvidia ogg
oggvorbis opengl oss pam pcre pdflib perl png python qt quicktime readline rtc
ruby sblive scanner sdl sndfile sox spell sse ssl subtitles tcltk tcpd tiff
truetype truetype-fonts type1 type1-fonts unicode usb utf8 vorbis win32codecs
xine xml xml2 xmms xv xvid zlib userland_GNU kernel_linux elibc_glibc"
Comment 1 Andreas Klauer 2005-09-29 12:17:30 UTC
Created attachment 69517 [details]
devremap profiling output

devremap is the core program used by convertfs to relocate the blocks of the
scarce image file created within a filesystem to the actual physical partition.
Comment 2 Andreas Klauer 2005-09-30 07:11:35 UTC
Created attachment 69555 [details, diff]
very  ugly hack to fix 100% CPU problem

I analyzed the 100% CPU usage problem further; I still don't fully understand
whatever this program is doing, so the following is just what I'm assuming so

devremap does it's work on a block-per-block basis; my partition is 80GB, it
has 78148161 blocks. Now, for every block that has to be relocated, devremap
loops through the whole list of blocks to find out if the target block is
already occupied (by an other part of the scarce file image, perhaps?). And
this is what's taking so long...

Now, by putting in a lot of debug messages and stuff I found out that usually,
when one target block was occupied, chances are high the next block will be
occupied too, so I'm just saving the position a match was found in the last
iteration, so it can be checked and re-used in the next iteration.

I'm not really a C programmer so I have a hard time debugging this thing; the
way I implemented the patch is a VERY UGLY HACK (it uses global variables and
goto statements, nuff said), so I don't recommend using this patch at all.

I don't even know if the patch really works - I've got the patched devremap
running now at high speed (it now needs 0-2 seconds for 1022 blocks as opposed
to 90-110 seconds before), but I'll still have to wait until it's finished to
see if the result is corrupted or not. I'll report back later.

I'm just posting this patch here in case someone else is already stuck with a
very slow devremap and maybe in hope that a real C programmer can figure out
what I was doing there. ;-)
Comment 3 Andreas Klauer 2005-09-30 16:59:19 UTC
So, devremap is done now. Resulting filesystem is mountable. All files are
present. I cannot verify the files contents, though, because I don't have
md5sums or whatever for them. I tested a couple of files (tar archives, images,
...) and they all are okay, so I guess it worked out okay and the patch should
be reasonable safe to use (the stuff I added is still very ugly code, though).

On a sidenote, I was mistaken about the block numbers printed by devremap... I
thought it meant the real blocks, but at block group 18494243 it was finished (I
thought it would go up to 78148161). So the original version wouldn't actually
have taken as many days as I estimated (but still far too long for my taste).

I'll wait and see if the original author of this script replies to any of the
emails I sent him; if he doesn't, I might consider trying to make a properly
working software out of this thing (although I think I'd go for integration in
parted rather than keeping software as dangerous as this in a shell script...).
Comment 4 Jakub Moc (RETIRED) gentoo-dev 2005-12-30 07:01:07 UTC
Yay, straight-to-stable on version bumps rocks [1], especially with tools that mess with filesystems and have 

ewarn "This tool is HIGHLY DANGEROUS. Read the homepage before using it!"

in the ebuild.


Please package.mask or at least ~arch this thing, should have never gone stable in the first place. :/
Comment 5 Lares Moreau 2005-12-30 10:33:23 UTC
(In reply to comment #4)

> Please package.mask or at least ~arch this thing, should have never gone stable
> in the first place. :/

To add to reasons to kill it,
 1. it doesn't cleanup properly if you Ctrl-C.
 2. the last note in the changlog on the homepage is 13 Jan 2005
 3. the maintainer is currently away according to herdstat -m 

I say mask, and maybe even mark for removal in 30 days.
Comment 6 Mark Loeser (RETIRED) gentoo-dev 2006-01-06 14:55:48 UTC
I pushed this back to ~x86 since its not really stable.  What is going on with this package?  If its this unsafe, it should probably be package.mask'd.
Comment 7 Tom Payne (RETIRED) gentoo-dev 2006-01-15 09:28:51 UTC
Agree that package should be masked, and possibly removed.

I've emailed the author and will await his response. If there is none then I suggest we remove it completely.
Comment 8 Alec Warner archtester Gentoo Infrastructure gentoo-dev Security 2006-05-28 07:17:37 UTC
Last Rites sent to -dev
Comment 9 Joe Jezak (RETIRED) gentoo-dev 2006-07-08 16:10:51 UTC
PPC doesn't need to be CC'd, remove it :)
Comment 10 Martin Sandsmark 2006-07-23 04:58:43 UTC
The debian people have patched this program.

They added a program called "ftwmv" to move the files individually, so you now can have a directory which takes over 50% of the filesystem.
They have also patched it to fix the spaces in filenames bug and support for Reiser4, and some other minor stuff. The only thing they haven't patched (AFAICS), is what the veryuglyhack.patch fixes. I don't think it should be removed, only masked, with the new patches.
It actually keeps a journal of some sorts, so it should be safe to interrupt (i. e. it's recoverable) to ctrl+c.

Comment 11 Andreas Klauer 2006-07-23 06:45:24 UTC
Great to hear that there is some progress.

The relocation process should be optimized, simply because it takes ages (the bigger the partition, the worse, and I doubt it's linear). With the ever growing size of hard disks and partitions, this is a serious issue. What's the use if the data is safe when it takes days, weeks of hard CPU work until you can access it again.
Comment 12 Tom Payne (RETIRED) gentoo-dev 2006-11-07 03:00:43 UTC
Removed from portage.

Last rites sent to -dev 5 months ago, with no responses: