115089 – parallel emerge + confcache => currupted db

Bug 115089 - parallel emerge + confcache => currupted db

Summary: parallel emerge + confcache => currupted db

Status:	RESOLVED NEEDINFO

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	New packages (show other bugs)
Hardware:	All Linux

Importance:	High normal
Assignee:	Brian Harring (RETIRED)

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-12-10 07:04 UTC by Carsten Lohrke (RETIRED)
Modified:	2005-12-26 18:46 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Carsten Lohrke (RETIRED) gentoo-dev

2005-12-10 07:04:18 UTC

I know parallel emerge isn't supported anyways, but maybe you're interested.
Every try to emerge with confcache enabled will fail as the both samples show. 
Versions are Portage .53 and Confcache 0.3.3.


Good - your configure finished. Start make now

Traceback (most recent call last):
  File "/usr/bin/confcache", line 478, in ?
    sys.exit(c.run(args))
  File "/usr/bin/confcache", line 185, in run
    self._update(new_loc, curdir)
  File "/usr/bin/confcache", line 212, in _update
    self.file_db[f] = None
  File "/usr/lib/python2.4/shelve.py", line 130, in __setitem__
    self.dict[key] = f.getvalue()
  File "/usr/lib/python2.4/bsddb/__init__.py", line 218, in __setitem__
    self.db[key] = value
bsddb._db.DBRunRecoveryError: (-30978, 'DB_RUNRECOVERY: Fatal error, run
database recovery -- PANIC: Invalid argument')
Exception bsddb._db.DBRunRecoveryError: (-30978, 'DB_RUNRECOVERY: Fatal error,
run database recovery -- PANIC: fatal region error detected; run recovery') in
<bound method cache.__del__ of <__main__.cache object at 0x4041c3cc>> ignored
Exception bsddb._db.DBRunRecoveryError: (-30978, 'DB_RUNRECOVERY: Fatal error,
run database recovery -- PANIC: fatal region error detected; run recovery') in 
ignored
Exception bsddb._db.DBRunRecoveryError: (-30978, 'DB_RUNRECOVERY: Fatal error,
run database recovery -- PANIC: fatal region error detected; run recovery') in 
ignored

!!! Please attach the config.log to your bug report:
!!! /var/tmp/portage/kfind-3.5.0/work/kfind-3.5.0/config.log

!!! ERROR: kde-base/kfind-3.5.0 failed.
!!! Function econf, Line 495, Exitcode 0
!!! econf failed
!!! If you need support, post the topmost build error, NOT this status message.


*** Finished
    Don't forget to run ./configure
    If you haven't done so in a while, run ./configure --help
 * econf: updating kdebase-kioslaves-3.5.0/admin/config.guess with
/usr/share/gnuconfig/config.guess
 * econf: updating kdebase-kioslaves-3.5.0/admin/config.sub with
/usr/share/gnuconfig/config.sub
/usr/bin/confcache --confcache-dir /var/confcache ./configure --prefix=/usr
--host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info
--datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --with-ldap
--with-samba --without-hal --with-openexr --with-ldap --with-samba --without-hal
--with-openexr --with-ldap --with-samba --without-hal --with-openexr
--without-java --with-x --enable-mitshm --without-xinerama
--with-qt-dir=/usr/qt/3 --enable-mt --with-qt-libraries=/usr/qt/3/lib
--disable-dependency-tracking --disable-debug --without-debug --disable-final
--without-arts --prefix=/usr/kde/3.5 --mandir=/usr/kde/3.5/share/man
--infodir=/usr/kde/3.5/share/info --datadir=/usr/kde/3.5/share
--sysconfdir=/usr/kde/3.5/etc --build=i686-pc-linux-gnu
Traceback (most recent call last):
  File "/usr/bin/confcache", line 478, in ?
    sys.exit(c.run(args))
  File "/usr/bin/confcache", line 163, in run
    elif not self._verify_files():
  File "/usr/bin/confcache", line 257, in _verify_files
    for f, chksum in self.file_db.iteritems():
  File "/usr/lib/python2.4/UserDict.py", line 100, in iteritems
    for k in self:
  File "/usr/lib/python2.4/UserDict.py", line 87, in __iter__
    for k in self.keys():
  File "/usr/lib/python2.4/shelve.py", line 98, in keys
    return self.dict.keys()
  File "/usr/lib/python2.4/bsddb/__init__.py", line 238, in keys
    return self.db.keys()
bsddb._db.DBRunRecoveryError: (-30978, 'DB_RUNRECOVERY: Fatal error, run
database recovery -- PANIC: Invalid argument')
Exception bsddb._db.DBRunRecoveryError: (-30978, 'DB_RUNRECOVERY: Fatal error,
run database recovery -- PANIC: fatal region error detected; run recovery') in
<bound method cache.__del__ of <__main__.cache object at 0x4041d38c>> ignored
Exception bsddb._db.DBRunRecoveryError: (-30978, 'DB_RUNRECOVERY: Fatal error,
run database recovery -- PANIC: fatal region error detected; run recovery') in 
ignored
Exception bsddb._db.DBRunRecoveryError: (-30978, 'DB_RUNRECOVERY: Fatal error,
run database recovery -- PANIC: fatal region error detected; run recovery') in 
ignored

!!! ERROR: kde-base/kdebase-kioslaves-3.5.0 failed.
!!! Function econf, Line 495, Exitcode 0
!!! econf failed
!!! If you need support, post the topmost build error, NOT this status message.

Comment 1 Brian Harring (RETIRED) gentoo-dev

2005-12-26 00:47:56 UTC

FS?

I built locking into confcache to protect against this from the get go, and at least my testing of it indicates the locking is working...

try a 
( python -c'import fcntl,os,time;fcntl.flock(os.open("/var/tmp",os.O_RDONLY),fcntl.LOCK_EX);print "master snagged it";time.sleep(15)' &);
sleep 5s; python -c'import fcntl,os,time;fcntl.flock(os.open("/var/tmp", os.O_RDONLY),fcntl.LOCK_EX);print "got it"'

That's effectively the locking involved; should succed with "master snagged it", then 10 seconds later "got it";

Comment 2 Carsten Lohrke (RETIRED) gentoo-dev

2005-12-26 05:06:47 UTC

(In reply to comment #1)
> FS?

No, the whole system is fine.

> I built locking into confcache to protect against this from the get go, and at
> least my testing of it indicates the locking is working...

He, I'm just the user of your code, but didn't look at it, "master snagged it", the db is dead though and this was my first shot in the dark. :)

Comment 3 Brian Harring (RETIRED) gentoo-dev

2005-12-26 15:12:29 UTC

Err... so locking test succeeded?  If that's the case, it would indicate a race is possible (yuck).

Comment 4 Carsten Lohrke (RETIRED) gentoo-dev

2005-12-26 18:46:02 UTC

No, the "got it" was printed of coure, sorry for the confusion. :)