Bug 350455

Summary:	sys-kernel/vanilla-sources-2.6.36 and above: sys-fs/lvm2's pvmove causes deadlock
Product:	Gentoo Linux	Reporter:	Xake <kanelxake>
Component:	[OLD] Core system	Assignee:	The Gentoo Linux Hardened Kernel Team (OBSOLETE) <hardened-kernel+disabled>
Status:	RESOLVED UPSTREAM
Severity:	normal	CC:	ago, hardened, kernel, klondike
Priority:	High	Keywords:	UPSTREAM
Version:	unspecified
Hardware:	All
OS:	Linux
Whiteboard:
Package list:		Runtime testing required:	---
Attachments:	output from sysrq-W (list of blocked processes)

Description Xake 2011-01-03 10:45:57 UTC

I have a system currently running kernel 2.6.36-hardened-r7 which I have some problems with. I have also seen this with vanilla-2.6.36.2 and hardened-2.6.35.
All the same configuration, and the CFQ IO scheduler.
This bug I can reproduce with pvmove, configuration of lvm to be described later.
I have also had problems with swapoff "hanging" while taking care of swap devices with allocated content, but do not reall know if this is the same bug.

This time around I logged in with help of gdm, logged out again of gdm, switched to console, and started "pvmove /dev/md126". pvmove never gave any output, and after a while my hangcheck timer started to talk about processes blocked for more then 120 seconds. I switched to another console entered a username, but then never got the prompt for password. Since I forgotten to start netconsole I sysrq "REISUB", waiting about one minute in between, but I never did get the messages about emergency sync or remount ever being finished. Started the computer without xdm, logged into console as root, insmod netconsole and then ran "pvmove" to resume the move. Got output "/dev/md126: Moved: 0,4%", but no more output from pvmove. After a while hangcheck timer again.

The hang in question manifests itself in the following way:
1. pvmove stops give output (most often right away, but may from time to time hang at 50% or even 99,9%)
2. I can switch to and between consoles, but not login to a new one (can enter username, but password prompt never apperes)
3. If I already am logged into a console some commands works, some does not, "sync" being one that hangs.
4. sysrq works and so does netconsole, but I have tried emergency sync and then waited for about one hour without ever been given the message about sync being completed

From previously experience with this hanging I have found that from time to time pvmove does not hang, but I have never been able to figure out what makes it not hang. Running it from gnome-terminal may sometimes hang, but rebooting and resume the move from gnome-terminal and it finishes.
First time I experenced this was the first time I did a major pvmvoe, moving everything from /dev/md124 to /dev/md126 where the first one was a software raid, the other was a fake raid.
This time around it hang while moving from the fake raid to some random harddisks also being part of the volume group.
I have seen that "pvmove -n <name of lv> <device>" seems to hang less then "pvmove <device>" (i.e. only move one lv at a time does not hang as much as moving every lv at once).
I have also AFAICR never been able to hang pvmove from single mode (init 1).


This is the current setup from the middle of the move.
From the beginning everything was placed on /dev/md126 except lv "home". All volumes except "boot" and "home" is mounted. All of them is ext4.

If you need any more info just ask.

# pvs --segments --verbose
    Scanning for physical volume names
  PV         VG     Fmt  Attr PSize   PFree   Start  SSize LV        Start  Type   PE Ranges                                       
  /dev/md126 lillen lvm2 a-   596,18g 143,18g      0   256 [pvmove0] 102912 mirror /dev/md126:0-255 /dev/sdg1:51712-51967          
  /dev/md126 lillen lvm2 a-   596,18g 143,18g    256  3584                0 free                                                   
  /dev/md126 lillen lvm2 a-   596,18g 143,18g   3840 51200 [pvmove0]    512 mirror /dev/md126:3840-55039 /dev/sdg1:512-51711       
  /dev/md126 lillen lvm2 a-   596,18g 143,18g  55040 51200 [pvmove0]  51712 mirror /dev/md126:55040-106239 /dev/sde1:76800-127999  
  /dev/md126 lillen lvm2 a-   596,18g 143,18g 106240   512 [pvmove0]      0 mirror /dev/md126:106240-106751 /dev/sdg1:0-511        
  /dev/md126 lillen lvm2 a-   596,18g 143,18g 106752 25600                0 free                                                   
  /dev/md126 lillen lvm2 a-   596,18g 143,18g 132352 12800 [pvmove0] 103168 mirror /dev/md126:132352-145151 /dev/sde1:128000-140799
  /dev/md126 lillen lvm2 a-   596,18g 143,18g 145152  7469                0 free                                                   
  /dev/sde1  lillen lvm2 a-   698,63g 148,63g      0 76800 home           0 linear /dev/sde1:0-76799                               
  /dev/sde1  lillen lvm2 a-   698,63g 148,63g  76800 51200 [pvmove0]  51712 mirror /dev/md126:55040-106239 /dev/sde1:76800-127999  
  /dev/sde1  lillen lvm2 a-   698,63g 148,63g 128000 12800 [pvmove0] 103168 mirror /dev/md126:132352-145151 /dev/sde1:128000-140799
  /dev/sde1  lillen lvm2 a-   698,63g 148,63g 140800 38050                0 free                                                   
  /dev/sdg1  lillen lvm2 a-   232,88g  29,88g      0   512 [pvmove0]      0 mirror /dev/md126:106240-106751 /dev/sdg1:0-511        
  /dev/sdg1  lillen lvm2 a-   232,88g  29,88g    512 51200 [pvmove0]    512 mirror /dev/md126:3840-55039 /dev/sdg1:512-51711       
  /dev/sdg1  lillen lvm2 a-   232,88g  29,88g  51712   256 [pvmove0] 102912 mirror /dev/md126:0-255 /dev/sdg1:51712-51967          
  /dev/sdg1  lillen lvm2 a-   232,88g  29,88g  51968  7649                0 free                                              

# lvs --segments --verbose
    Finding all logical volumes
  LV       VG     Attr   Start SSize   #Str Type   Stripe Chunk
  boot     lillen -wI-a-    0    1,00g    1 linear     0     0 
  home     lillen -wi-ao    0  300,00g    1 linear     0     0 
  packages lillen -wI-ao    0   50,00g    1 linear     0     0 
  portage  lillen -wI-ao    0    2,00g    1 linear     0     0 
  root     lillen -wI-ao    0  200,00g    1 linear     0     0 
  usr      lillen -wI-ao    0  200,00g    1 linear     0     0 

# mdadm -D /dev/md126
/dev/md126:
      Container : /dev/md/imsm_1, member 0
     Raid Level : raid0
     Array Size : 625137664 (596.18 GiB 640.14 GB)
   Raid Devices : 2
  Total Devices : 2

          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 128K


           UUID : 96b8cb3a:e553d8ac:a22d7fc5:a6982a27
    Number   Major   Minor   RaidDevice State
       1       8        0        0      active sync   /dev/sda
       0       8       16        1      active sync   /dev/sdb

sys-fs/lvm2-2.02.7
# emerge --info
Portage 2.2.0_alpha11 (hardened/linux/amd64, gcc-4.5.2, glibc-2.12.1-r3, 2.6.36-hardened-r7 x86_64)
=================================================================
System uname: Linux-2.6.36-hardened-r7-x86_64-Intel-R-_Core-TM-_i7_CPU_920_@_2.67GHz-with-gentoo-2.0.1
Timestamp of tree: Sat, 01 Jan 2011 09:00:21 +0000
distcc 3.1 x86_64-pc-linux-gnu [disabled]
ccache version 3.1.3 [disabled]
app-shells/bash:     4.1_p9
dev-java/java-config: 2.1.11-r3
dev-lang/python:     2.6.6-r1, 2.7.1::Mine, 3.1.3
dev-util/ccache:     3.1.3
dev-util/cmake:      2.8.1-r2
sys-apps/baselayout: 2.0.1-r1
sys-apps/openrc:     0.6.8
sys-apps/sandbox:    2.4
sys-devel/autoconf:  2.13, 2.68
sys-devel/automake:  1.9.6-r3, 1.10.3, 1.11.1
sys-devel/binutils:  2.21
sys-devel/gcc:       4.5.2
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.4-r1
sys-devel/make:      3.82
virtual/os-headers:  2.6.36.1 (sys-kernel/linux-headers)
Repositories: gentoo hardened-dev gamerlay-stable x11 mozilla gnome Mine
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe -ggdb -mtune=native"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=native -O2 -pipe -ggdb -mtune=native"
DISTDIR="/var/portage/distfiles"
FEATURES="assume-digests binpkg-logs buildpkg distlocks fixlafiles fixpackages metadata-transfer news parallel-fetch preserve-libs protect-owned sandbox sfperms splitdebug strict test unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox"
GENTOO_MIRRORS="ftp://ftp.sunet.se/pub/os/Linux/distributions/gentoo"
INSTALL_MASK="*.la"
LANG="sv_SE.UTF-8"
LC_ALL="C"
LDFLAGS="-Wl,--as-needed -Wl,-O1 -Wl,--sort-common -Wl,--hash-style=gnu"
LINGUAS="sv en"
MAKEOPTS="-j16 -l15"
PKGDIR="/var/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/var/portage"
PORTDIR_OVERLAY="/var/overlays/layman/hardened-development /var/overlays/layman/gamerlay /var/overlays/layman/x11 /var/overlays/layman/mozilla /var/overlays/layman/gnome /var/overlays/mine"
SYNC="rsync://liten.csbnet.se/gentoo-portage"
USE="X a52 aac accessibility acl acpi alsa amd64 amr amrnb amrwb applet archive asyncns avahi bash-completion bluetooth branding bzip2 cairo caps ccache cdaudio cdda cdr cleartype cli clutter connection-sharing consolekit coverart cracklib crypt cups cxx dbus device-mapper devicekit devkit dhcpcd digitalradio djvu dri dts dvd dvdr dvi eds enca encode eselect evo exif faac faad fam fat fbcondecor ffmpeg fftw flac fluidsynth fontconfig fuse gdbm gdm gdu gif gimp glib gmp gnome gnome-keyring gphoto2 gpm grammar graphite gsf gsm gstreamer gtk gudev hardened hpn ical iconv iconvacl icq icu id3tag idn ieee1394 iptc ipv6 jabber jack java6 jingle jpeg jpeg2k justify kate kvm lcms libffi libnotify libsamplerate logrotate lvm lvm2 lzma mad maps math matroska md mdadm midi mms mmx mmxext mng moonlight mp2 mp3 mpeg mpi msn mtp mudflap multilib musepack musicbrainz natspec nautilus ncurses network-cron networkmanager nfs nls nntp nptl nptlonly ntfs ntp nut offensive ogg openal opencore-amr opengl openmp openntpd ots pam pango parted pcre pdf perl pic pidgin playlist png policykit pppd pulseaudio python qt3support quicktime raw readline rrdcgi rtmp samba schroedinger seed sensord session smp sms speex spell sse sse2 sse3 ssl ssse3 startup-notification subversion svg sysfs test tex theora thesaurus threads tiff totem truetype udev unicode upnp urandom usb userlocales v4l2 vaapi vhook videos vim-syntax vorbis webkit wmf x264 xcb xcomposite xml xmp xmpp xorg xrandr xscreensaver xulrunner xv xvid xvmc zeroconf zlib" ALSA_CARDS="hda-intel" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="sv en" PHP_TARGETS="php5-3" QEMU_SOFTMMU_TARGETS="i386 x86_64" QEMU_USER_TARGETS="i386 x86_64" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="nouveau" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" 
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

Comment 1 Xake 2011-01-03 10:47:39 UTC

Created attachment 258727 [details]
output from sysrq-W (list of blocked processes)

A while back talking to klondike on #gentoo-hardened he suggested me turning on loglevel 8 and posting this. I had never the time to finish the bugreport back then, but here it is.

Comment 2 Xake 2011-01-03 11:09:57 UTC

seems like it is possible to get a "hang" in single mode too. First boot, and run "init 1" in a console, then "pvmove" to resume the move, then ctrl-c which gives back the prompt and then run "pvmove --abort" and watch for apparent non existing output. However this time around emergency sync responds after about one-two minutes, so it may be that IO just becomes so slow the system becomes unusable.
After 20 minutes no further respons from pvmove.

Comment 3 Xake 2011-01-04 20:09:59 UTC

pvmove hangs vanilla kernel too, tested 2.6.36.2 some days ago and 2.6.37-rc8+ this morning.

Sometimes it hangs right away, sometimes it hangs after a couple of minutes, sometimes it hangs at 100% and sometimes it does not hang at all. And what seems to hangs is only parts of disk IO, if I have ssh running it still works and I may be able to run processes like top, and if I already have processes running it is seldom they hang until I close them or try to save something in them. So seems like everything that does more then just read from disk is no-go.

After speaking with klondike we have a suspicion this may be a race that happens because I am moving a mounted and used filesystem. I am going to do some more testing related to this when I get the time.

Comment 4 Anthony Basile gentoo-dev

2011-01-04 21:39:24 UTC

(In reply to comment #1)
> Created an attachment (id=258727) [details]
> output from sysrq-W (list of blocked processes)
> 
> A while back talking to klondike on #gentoo-hardened he suggested me turning on
> loglevel 8 and posting this. I had never the time to finish the bugreport back
> then, but here it is.
> 

Looking at your list of blocked (deadlocked?) processes, the major players are jbd2/dm-2-8 and flush-253:2.  The rest, I suspect, are blocked on I/O to the locked device.  This looks like an upstream bug with vanilla 2.6.36

   https://bugzilla.kernel.org/show_bug.cgi?id=25632

I'm not 100% sure because its is the same kernel threads, but different call traces.

I'm going to try to reproduce.

Comment 5 Xake 2011-01-05 11:40:26 UTC

Looking for a way to reporoduce this I tried the following today:
First I found myself two empty harddisk, the smaller on about 160GB, the bigger being 250GB.
I created one partition with the size of 1GB, and a second partition being a vg with 14 lvs on 10GB each on the smaller disk. The vg also includes the other disk, but nothing allocated on it.
In a empty directory I created a mountpoint for each and mounted them all and ran "for d in * ; do tar xaf linux-2.6.36.tar.bz2 -C $d ; done"

When it was finished I ran in one console:
"for d in * ; do make -C $d/linux-2.6.36 allyesconfig -j4 && make -C $d/linux-2.6.36 -j4 & echo ; done"
In the next console:
"sync"
and in the third console:
"pvmove <partition of small harddisk>"

Currently that system has the following problems:
The compilation stopped (wc reports 96 running make processes, but no output from them, or the multiple cc1 processes), sync does not return and pvmove has stopped report progress after about 7%.

I can login to consoles (so that part of the former problem could be because of me running the system from the partitions I tried to move), and also "touch ~/test" returns at once, but "touch test1/test" does not return. Neither does "ls -l test1" or pvs.
iotop, "bwm-ng -i disk" reports no activity on any disk or process that is included in the vg, and nearly no activity on the other disks.
top shows nothing taking resources, however it reports a load of about 81.65 and rising (checking back now it reaches 95.0).

So with other word my system works fine, except for that everything trying to do any io with regards to that volumegroup stops responding (other volumegroups still works fine). I can write to the 1GB partiton on the same disk as the vg, so it seems like it is only the vg that is locked, not the entire disk. This with a 2.6.36-r7 hardened kernel. I will try the same with vanilla.

My next step is to change around in this to see what is the minimal needed thing that needs to be done to get this to hang, for example see if it hang even if I only start the compilations and so on, to see if I can reduce this.

Comment 6 Anthony Basile gentoo-dev

2011-01-05 12:54:02 UTC

xake, please focus testing on vanilla if you can because it narrows it down and makes it more likely upstream will take it seriously.

I tried to reproduce this on a virtual machine and could not.  I think the issues requires heavy I/O --- even the upstream bug suggests that.  This probably increases the likelihood of hitting some race condition.

The fact that you can still interact with the drive but not the vg is important and further confirms that the issues is between the jbd and flushing on the dm.

Comment 7 Xake 2011-01-05 14:17:55 UTC

(In reply to comment #6)
> xake, please focus testing on vanilla if you can because it narrows it down and
> makes it more likely upstream will take it seriously.
> 

Yeah, am currently building 2.6.37 from git to use.

Also enabled a bounch of the debug options in the kernel with regards to locks to try to find out what ths is coming from,  and when I hit this problem this morning while try to emergency sync the system (shutdown did not work probably as it tries to kill stuff that did not want to die) I got a message about a possible spinlock/deadlock and I will try to fetch that message over netconsole from vanilla kernel if possible.

Comment 8 Xake 2011-01-05 19:12:13 UTC

So I retried with tag v2.6.37 compiled from git.

Started the same scenario as before (one kernel compilation per lv, 14 in total, while sync and pvmove).
After having it running for some time, I had to abort, and did by "killall make" followed by "pvmove --abort".
Now I am back in a situation where everything works except stuff like "touch test1/test" or other stuff touching any lv involved in the compilation, and no apparent activity on disks.

Just hitting sysrq+s spawn the following interesting message:

[16771.737229] SysRq : Emergency Sync
[16771.737639] 
[16771.737640] =======================================================
[16771.737666] [ INFO: possible circular locking dependency detected ]
[16771.737680] 2.6.37 #1
[16771.737687] -------------------------------------------------------
[16771.737700] kworker/0:0/25366 is trying to acquire lock:
[16771.737711]  (&type->s_umount_key#23){++++..}, at: [<ffffffff8118e1de>] iterate_supers+0x5e/0xd0
[16771.737749] 
[16771.737750] but task is already holding lock:
[16771.737764]  ((work)#2){+.+...}, at: [<ffffffff810b2926>] process_one_work+0x126/0x670
[16771.737799] 
[16771.737799] which lock already depends on the new lock.
[16771.737800] 
[16771.737820] 
[16771.737821] the existing dependency chain (in reverse order) is:
[16771.737838] 
[16771.737839] -> #3 ((work)#2){+.+...}:
[16771.737870]        [<ffffffff810d3018>] lock_acquire+0x98/0x1e0
[16771.737888]        [<ffffffff810b297c>] process_one_work+0x17c/0x670
[16771.737906]        [<ffffffff810b4b21>] worker_thread+0x161/0x340
[16771.737923]        [<ffffffff810b9b36>] kthread+0x96/0xa0
[16771.737940]        [<ffffffff8103fed4>] kernel_thread_helper+0x4/0x10
[16771.737959] 
[16771.737959] -> #2 (events){+.+.+.}:
[16771.737987]        [<ffffffff810d3018>] lock_acquire+0x98/0x1e0
[16771.738004]        [<ffffffff810b30d9>] start_flush_work+0x119/0x1b0
[16771.738022]        [<ffffffff810b3a00>] flush_work+0x20/0x40
[16771.738039]        [<ffffffff810b5873>] schedule_on_each_cpu+0xe3/0x120
[16771.738057]        [<ffffffff81146240>] lru_add_drain_all+0x10/0x20
[16771.738076]        [<ffffffff811b8b48>] invalidate_bdev+0x28/0x40
[16771.738094]        [<ffffffff8125bf22>] ext4_put_super+0x1a2/0x360
[16771.738113]        [<ffffffff8118d0cd>] generic_shutdown_super+0x6d/0x100
[16771.738134]        [<ffffffff8118d18c>] kill_block_super+0x2c/0x50
[16771.738152]        [<ffffffff8118d4d5>] deactivate_locked_super+0x45/0x60
[16771.738169]        [<ffffffff8118e075>] deactivate_super+0x45/0x60
[16771.738186]        [<ffffffff811a8dc6>] mntput_no_expire+0x86/0xe0
[16771.738205]        [<ffffffff811a98aa>] sys_umount+0x6a/0x360
[16771.738221]        [<ffffffff8103f07b>] system_call_fastpath+0x16/0x1b
[16771.738240] 
[16771.738241] -> #1 (&type->s_lock_key){+.+...}:
[16771.738268]        [<ffffffff810d3018>] lock_acquire+0x98/0x1e0
[16771.738285]        [<ffffffff8184976b>] mutex_lock_nested+0x4b/0x380
[16771.738303]        [<ffffffff8118cee2>] lock_super+0x22/0x30
[16771.738319]        [<ffffffff8125c7b8>] ext4_remount+0x48/0x4a0
[16771.738338]        [<ffffffff8118e431>] do_remount_sb+0x91/0x130
[16771.738496]        [<ffffffff811aa5b2>] do_mount+0x4c2/0x7c0
[16771.738513]        [<ffffffff811aabbb>] sys_mount+0x8b/0xe0
[16771.738529]        [<ffffffff8103f07b>] system_call_fastpath+0x16/0x1b
[16771.738547] 
[16771.738548] -> #0 (&type->s_umount_key#23){++++..}:
[16771.738580]        [<ffffffff810d2588>] __lock_acquire+0x1a78/0x1e10
[16771.738598]        [<ffffffff810d3018>] lock_acquire+0x98/0x1e0
[16771.738615]        [<ffffffff81849f2f>] down_read+0x2f/0x50
[16771.738631]        [<ffffffff8118e1de>] iterate_supers+0x5e/0xd0
[16771.738648]        [<ffffffff811b61bb>] sync_filesystems+0x1b/0x20
[16771.738666]        [<ffffffff811b61d3>] do_sync_work+0x13/0x40
[16771.738682]        [<ffffffff810b2987>] process_one_work+0x187/0x670
[16771.738699]        [<ffffffff810b4b21>] worker_thread+0x161/0x340
[16771.738716]        [<ffffffff810b9b36>] kthread+0x96/0xa0
[16771.738732]        [<ffffffff8103fed4>] kernel_thread_helper+0x4/0x10
[16771.738750] 
[16771.738751] other info that might help us debug this:
[16771.738752] 
[16771.739069] 2 locks held by kworker/0:0/25366:
[16771.739370]  #0:  (events){+.+.+.}, at: [<ffffffff810b2926>] process_one_work+0x126/0x670
[16771.739736]  #1:  ((work)#2){+.+...}, at: [<ffffffff810b2926>] process_one_work+0x126/0x670
[16771.740165] 
[16771.740166] stack backtrace:
[16771.741001] Pid: 25366, comm: kworker/0:0 Tainted: G        W   2.6.37 #1
[16771.741426] Call Trace:
[16771.741845]  [<ffffffff810ce167>] print_circular_bug+0xe7/0xf0
[16771.742258]  [<ffffffff810d2588>] __lock_acquire+0x1a78/0x1e10
[16771.742670]  [<ffffffff810d3018>] lock_acquire+0x98/0x1e0
[16771.743073]  [<ffffffff8118e1de>] ? iterate_supers+0x5e/0xd0
[16771.743476]  [<ffffffff811b6290>] ? sync_one_sb+0x0/0x20
[16771.743869]  [<ffffffff81849f2f>] down_read+0x2f/0x50
[16771.744262]  [<ffffffff8118e1de>] ? iterate_supers+0x5e/0xd0
[16771.744656]  [<ffffffff8184b6b0>] ? _raw_spin_unlock+0x30/0x60
[16771.745055]  [<ffffffff8118e1de>] iterate_supers+0x5e/0xd0
[16771.745452]  [<ffffffff811b61bb>] sync_filesystems+0x1b/0x20
[16771.745844]  [<ffffffff811b61d3>] do_sync_work+0x13/0x40
[16771.746239]  [<ffffffff810b2987>] process_one_work+0x187/0x670
[16771.746632]  [<ffffffff810b2926>] ? process_one_work+0x126/0x670
[16771.747026]  [<ffffffff811b61c0>] ? do_sync_work+0x0/0x40
[16771.747421]  [<ffffffff810b4b21>] worker_thread+0x161/0x340
[16771.747813]  [<ffffffff810b49c0>] ? worker_thread+0x0/0x340
[16771.748205]  [<ffffffff810b9b36>] kthread+0x96/0xa0
[16771.748594]  [<ffffffff8103fed4>] kernel_thread_helper+0x4/0x10
[16771.748985]  [<ffffffff810804ff>] ? finish_task_switch+0x6f/0x100
[16771.749375]  [<ffffffff8184be80>] ? restore_args+0x0/0x30
[16771.749762]  [<ffffffff810b9aa0>] ? kthread+0x0/0xa0
[16771.750153]  [<ffffffff8103fed0>] ? kernel_thread_helper+0x0/0x10



What do you want me to do? I can try bisect? As can take some time and poking between breakage I really want to look for other possible faster ways of hunting this bug if what is given here is not enought.

Comment 9 Anthony Basile gentoo-dev

2011-01-06 21:02:43 UTC

I've reported this upstream.

Comment 10 Anthony Basile gentoo-dev

2011-03-12 16:20:59 UTC

Xake, are you still encountering this?  I haven't done anymore testing.

Comment 11 Xake 2011-03-13 08:48:09 UTC

(In reply to comment #10)
> Xake, are you still encountering this?  I haven't done anymore testing.

I have encountered it with one of the 2.6.38-rc's, cannot remember if it was with 5 or 6, as I had to do some reorganization among my PVs. Have not had time to look into this for a while.

Comment 12 Francisco Blas Izquierdo Riera (RETIRED) gentoo-dev

2011-03-26 22:58:38 UTC

Xake on teh gentoo-user list this issue was commented, this bit may be interesting for you:
> Looks like an issue with heavy I/O, affecting the LVM layer trying to lock the filesystem.

> But I wonder if he's not running into a known issue (which can easily be worked around) where pvmove has a memory-leak with the reporting. (eg. the bit that checks the progress every 5 seconds, reducing that to every 5 minutes significantly reduces that) However, I do believe this (mem-leak) was fixed.

Comment 13 Anthony Basile gentoo-dev

2011-03-27 00:08:24 UTC

(In reply to comment #12)
> Xake on teh gentoo-user list this issue was commented, this bit may be
> interesting for you:
> > Looks like an issue with heavy I/O, affecting the LVM layer trying to lock the filesystem.
> 
> > But I wonder if he's not running into a known issue (which can easily be worked around) where pvmove has a memory-leak with the reporting. (eg. the bit that checks the progress every 5 seconds, reducing that to every 5 minutes significantly reduces that) However, I do believe this (mem-leak) was fixed.

Links please.

Comment 14 Francisco Blas Izquierdo Riera (RETIRED) gentoo-dev

2011-03-27 00:17:49 UTC

http://archives.gentoo.org/gentoo-user/msg_fa2600783d3ffaf458b661c7cb8f6c59.xml

Comment 15 Xake 2011-03-27 00:20:35 UTC

(In reply to comment #12)
> > But I wonder if he's not running into a known issue (which can easily be worked around) where pvmove has a memory-leak with the reporting. (eg. the bit that checks the progress every 5 seconds, reducing that to every 5 minutes significantly reduces that) However, I do believe this (mem-leak) was fixed.

I am pretty sure this is not it since for me it does not matter how long pvmove has been running. Issuing pvmove can hang before the first report is done, and I have some memories of sync hanging when I did not even have pvmove running on (just the move itself running in background).

Comment 16 Agostino Sarubbo gentoo-dev

2012-11-13 16:43:02 UTC

this version is not anymore in the tree, so if it is reproducible keep this as opened, otherwise please close.

Comment 17 Agostino Sarubbo gentoo-dev

2012-12-11 12:36:22 UTC

Please reopen with the correct version of package if any of you will reproduce this bug in the future.