Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 232464 - some file-based databases are not binary compatible between arches
Summary: some file-based databases are not binary compatible between arches
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High normal
Assignee: Jeroen Roovers (RETIRED)
URL: http://www.gentoo.org/doc/en/gentoo-a...
Whiteboard:
Keywords: InVCS
Depends on:
Blocks:
 
Reported: 2008-07-20 12:24 UTC by Rumi Szabolcs
Modified: 2008-09-02 18:22 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rumi Szabolcs 2008-07-20 12:24:18 UTC
I'm in the process of migrating a server from an x86 platform to an amd64
platform. Several software packages (openldap, cyrus-imap) are using
berkeley db as their backend and apparently none of them work when
they are started with a database that has been previously created
on an x86 system.

Reproducible: Always

Steps to Reproduce:

1. Create databases with software using the bdb backend on x86
2. Copy such databases as-is to an amd64 system
3. Try to use them with the amd64 userland.
Actual Results:  
OpenLDAP cannot open the database (gracefully).
Cyrus-IMAP daemons simply crash. 

For example:

Starting program: /usr/lib64/cyrus/ctl_mboxlist
Program received signal SIGSEGV, Segmentation fault.
0x00007fe849bc7f89 in ?? () from /lib64/ld-linux-x86-64.so.2
(gdb) bt
#0  0x00007fe849bc7f89 in ?? () from /lib64/ld-linux-x86-64.so.2
#1  0x00007fe849bcb781 in ?? () from /lib64/ld-linux-x86-64.so.2
#2  0x00007fe849bd0ff2 in ?? () from /lib64/ld-linux-x86-64.so.2
#3  0x00007fe84848315a in __db_e_attach () from /usr/lib/libdb-4.5.so
#4  0x00007fe848480d93 in __env_open () from /usr/lib/libdb-4.5.so
#5  0x000000000042f7b3 in ?? ()
#6  0x00000000004260e6 in ?? ()
#7  0x0000000000417196 in ?? ()
#8  0x0000000000406818 in ?? ()
#9  0x00007fe847c81b74 in __libc_start_main () from /lib/libc.so.6
#10 0x00000000004054a9 in ?? ()
#11 0x00007fff51dd9c58 in ?? ()
#12 0x0000000000000000 in ?? ()




With OpenLDAP I have managed to work around the problem by moving
the database back to an x86 system, slapcat the whole database
into LDIF and then recreate the database on amd64 with slapadd.

With Cyrus-IMAP there are no tools (I know of) to do this properly
so now I'm stuck.

Any hints how to circumvent this especially with Cyrus?
I'm in a hurry with the migration, company email services are down...
Comment 1 Rumi Szabolcs 2008-07-20 12:39:50 UTC
imapd log:

Jul 20 13:48:30 nexus master[26767]: process started                                                                                                                             
Jul 20 13:48:30 nexus master[26771]: about to exec /usr/lib/cyrus/ctl_cyrusdb                                                                                                    
Jul 20 13:48:30 nexus master[26767]: process 26771 exited, signaled to death by 11                                                                                               
Jul 20 13:48:30 nexus master[26767]: ready for work                                                                                                                              
Jul 20 13:48:30 nexus master[26776]: about to exec /usr/lib/cyrus/tls_prune                                                                                                      
Jul 20 13:48:30 nexus master[26767]: process 26776 exited, signaled to death by 11                                                                                               
Jul 20 13:48:30 nexus master[26777]: about to exec /usr/lib/cyrus/ctl_deliver                                                                                                    
Jul 20 13:48:30 nexus master[26778]: about to exec /usr/lib/cyrus/ctl_cyrusdb                                                                                                    
Jul 20 13:48:30 nexus master[26767]: process 26778 exited, signaled to death by 11                                                                                               
Jul 20 13:48:30 nexus master[26767]: process 26777 exited, signaled to death by 11

This is about the same with every daemon or utility in the Cyrus-IMAP package...
Comment 2 Peter Alfredsen (RETIRED) gentoo-dev 2008-07-20 12:53:17 UTC
Is it possible to use the db4.5_dump and db4.5_load utilities on these files?
Comment 3 Rumi Szabolcs 2008-07-20 14:09:46 UTC
I've moved them to an x86 machine and tried to dump but they seem to
be btree databases and the dumps seem to be corrupt.

I suspect this might be the reason:

"Dumping and reloading Btree databases that use user-defined prefix or comparison functions will result in new databases that use the default prefix and comparison functions. In this case, it is quite likely that the database will be damaged beyond repair permitting neither record storage or retrieval.

The only available workaround for either case is to modify the sources for the db4.5_load utility to load the database using the correct hash, prefix, and comparison functions."
Comment 4 Rumi Szabolcs 2008-07-20 17:48:27 UTC
I've found a workaround for Cyrus-IMAP by practically rebuilding
all it's databases. Not too elegant but it does seem to work...

Here is my quick howto:


While still being on 32bit platform stop imapd and do this:

/usr/lib/cyrus/ctl_mboxlist -d > /tmp/mailboxes.txt

If you forgot this (like I did) then you can move the mailboxes.db
file back onto an x86 machine which has cyrus installed or just
emerge a default install there, then do:

/usr/lib/cyrus/ctl_mboxlist -d -f /path/to/mailboxes.db > /tmp/mailboxes.txt


On the 64bit platform move all cyrus data dirs into place and do this:

tar cvpzf /tmp/var_imap_backup.tgz /var/imap
rm /var/imap/db/*
rm -rf /var/imap/db.backup*
rm /var/imap/*.db

Then as user cyrus (su cyrus) do:

/usr/lib/cyrus/ctl_mboxlist -u < /tmp/mailboxes.txt
/usr/lib/cyrus/reconstruct -r -f user

What this essentially does is that it dumps the mailboxes list
into a plain text file, then deletes all the current databases,
reconstructs mailboxes.db from the text file, then reconstructs
this and all the other databases by scanning the mail spool.
This way you retain /var/imap/user (seen info) and /var/imap/quota
but you lose annotations.db, deliver.db, and tls_sessions.db.
I don't exactly know what information these databases contain,
but from what's apparent after such a rebuild, they are probably
not essential. It would be nice to have someone comment on this
who has a clue about Cyrus IMAP whether this is right or wrong...

Please note that this is not a fix but an ugly workaround.
Comment 5 Jeroen Roovers (RETIRED) gentoo-dev 2008-07-20 20:24:57 UTC
What supposes that I should not resolve this bug as UPSTREAM? In other words, what does Gentoo have to do with this?
Comment 6 Rumi Szabolcs 2008-07-21 18:38:10 UTC
Well, the core problem is probably not Gentoo specific.

It may affect a number of Gentoo users though as it does affect several
key software packages. I suspect upstream won't come up with anything
useful in the near future so a Gentoo specific workaround or solution
could be useful at least temporarily.

The following ideas come to my mind right now:

- patch config.h or similar in the sys-libs/db ebuild on amd64 so that it
  does not use different size pointers or counters (or whatever it does)
  on amd64 than on x86 thereby circumventing the problem if possible

- some way to install 32bit binaries of affected software and/or the
  sys-libs/db library onto a 64bit system, either to be able to use
  the databases unaltered or at least temporarily to make necessary
  dumps or backups

- scripts or other tools to detect and maybe repair/convert 32bit
  databases on 64bit systems that may have this problem

- mentioning the problem in Gentoo amd64 documentation and/or ebuilds
  so that everybody knows about it before actually beginning with
  such a migration or at least when already being at it

Please note that these are just some quick ideas without any deep
checks or proof-of-concept made so not necessarily realistic at all.
Comment 7 Jeroen Roovers (RETIRED) gentoo-dev 2008-07-21 20:53:19 UTC
(In reply to comment #6)
> Well, the core problem is probably not Gentoo specific.
> 
> It may affect a number of Gentoo users though as it does affect several
> key software packages. I suspect upstream won't come up with anything
> useful in the near future so a Gentoo specific workaround or solution
> could be useful at least temporarily.

Define "temporary" - people have been migrating their data to 64-bit systems for decades. :)

> The following ideas come to my mind right now:

> - scripts or other tools to detect and maybe repair/convert 32bit
>   databases on 64bit systems that may have this problem

Our net-mail team would probably be happy to commit an ebuild for an established package of migration tools.

> - mentioning the problem in Gentoo amd64 documentation and/or ebuilds
>   so that everybody knows about it before actually beginning with
>   such a migration or at least when already being at it

The amd64 documentation covers installation of Gentoo Linux on x86_64 systems - there's nothing about migrating from x86 to x86_64. Apart from that, there is no Gentoo document (that I could find) that covers cyrus-imap.

Generally, when migrating data from one platform to another, you should check whether it works, whether your data hasn't been corrupted while it was first write accessed on the target platform.

Useful links Google found for me:
* "Re: [SLE] Cyrus-IMAP migration between two servers"[1] - This states that
  the platform wouldn't perhaps matter as much as the sys-libs/db version.
* "Migration 32 to 64 bit"[2] - from the info-cyrus mailing list.

If you plan to put your experience into a neat HOWTO, I am sure others could benefit from that - CMU would perhaps host it for you, perhaps at [3]. However, I still doubt this is a Gentoo specific problem, and I still think this bug report will serve Gentoo or its users no purpose that other distro's users would not also benefit from.


[1] http://linux.derkeiler.com/Mailing-Lists/SuSE/2005-12/msg01107.html
[2] http://lists.andrew.cmu.edu/pipermail/info-cyrus/2007-November/027857.html
[3] http://cyrusimap.web.cmu.edu/twiki/bin/view/Cyrus/WebHome
Comment 8 Ranjit Singh 2008-07-24 01:13:51 UTC
(In reply to comment #6)
> Well, the core problem is probably not Gentoo specific.
> 
> It may affect a number of Gentoo users though as it does affect several
> key software packages. I suspect upstream won't come up with anything
> useful in the near future so a Gentoo specific workaround or solution
> could be useful at least temporarily.
>
Agreed, it can always be punted upstream once tested and verified on x86 and amd64.

> The following ideas come to my mind right now:
> 
> - some way to install 32bit binaries of affected software and/or the
>   sys-libs/db library onto a 64bit system, either to be able to use
>   the databases unaltered or at least temporarily to make necessary
>   dumps or backups

If you keep it just to sys-libs/db, agreed.

> - scripts or other tools to detect and maybe repair/convert 32bit
>   databases on 64bit systems that may have this problem
>
++
 
> - mentioning the problem in Gentoo amd64 documentation and/or ebuilds
>   so that everybody knows about it before actually beginning with
>   such a migration or at least when already being at it
>
Perhaps in docs specific to the packages/areas covered, but not amd64 across the board, imo.
 
(In reply to comment #7) 
> Define "temporary" - people have been migrating their data to 64-bit systems
> for decades. :)
>
Surely that's more an argument for enabling this for Gentoo users than not?

> Generally, when migrating data from one platform to another, you should check
> whether it works, whether your data hasn't been corrupted while it was first
> write accessed on the target platform.
>
Which is where the bug report came from?
 
> If you plan to put your experience into a neat HOWTO, I am sure others could
> benefit from that - CMU would perhaps host it for you, perhaps at [3]. However,
> I still doubt this is a Gentoo specific problem, and I still think this bug
> report will serve Gentoo or its users no purpose that other distro's users
> would not also benefit from.
>
I agree it's not Gentoo specific, but I feel it would be very useful. I am unsure as to the documentation side of it, but would encourage the actual proposal (even if it should be discussed elsewhere, such as the dev m-l or in #gentoo-server.)

As to a howto, I'd start it in docs, tips and tricks on the forums, so it can be worked up into a decent shape with help from other knowledgeable users and perhaps put on the wiki if the doc team doesn't want to use it.
Comment 9 Jeroen Roovers (RETIRED) gentoo-dev 2008-07-25 15:39:42 UTC
(In reply to comment #8)
> > Generally, when migrating data from one platform to another, you should check
> > whether it works, whether your data hasn't been corrupted while it was first
> > write accessed on the target platform.
> >
> Which is where the bug report came from?

Yes, we're on our way to figure out where the bug report is *going to*. :)
Comment 10 Jeroen Roovers (RETIRED) gentoo-dev 2008-08-01 15:36:42 UTC
Let's see if the documentation people want to work on this.
Comment 11 nm (RETIRED) gentoo-dev 2008-08-01 21:02:01 UTC
The AMD64 FAQ already covers this topic, basically:

http://www.gentoo.org/doc/en/gentoo-amd64-faq.xml#upgradex86

In short, you *cannot* really migrate/upgrade from x86 to x86-64; you have to reinstall. You can't expect stuff that's compiled on one architecture to run on another, though I suppose firefox-bin/OOo-bin and the like are exceptions to that rule.

Copying databases as-is is usually a bad idea anyway; I think there are database migration howtos elsewhere on the internet, possibly at tldp.org -- though that one may have been for something like mysql, not berkdb.

Nothing to do on the GDP's end; back to the wrangers.
Comment 12 Jeroen Roovers (RETIRED) gentoo-dev 2008-08-02 04:56:06 UTC
Yes, Josh, but this isn't about installing - it is about migrating stuff from one system to another. Surely that's not within the scope of installing Gentoo x86/amd64 or whatever, but the 'Net doesn't seem to have a "nice" document about migrating from 32- to 64-bit (or from one endian to another, for that matter), so maybe Gentoo could be nice and provide something...

(Bouncing this to bug-wranglers will not do - I heard some interns might have some time to write something up, so I thought something might come out of this...)
Comment 13 Rumi Szabolcs 2008-08-02 06:46:29 UTC
I do agree that with databases etc. a dump and restore operation
is the standard way to go and is also recommended by the vendors.

The problem occurs (as with bdb) when the database itself does not
have proper tools for that and the application software (cyrus-imap)
neither supports such operations which is a real problem not only
because of such migration problems but also when it's just about
doing regular backups. This indeed points to upstream.

The other source of trouble is when the documentation does not
explicitly warn/remind the user that he is going to run into this
which he doesn't expect or simply forgets to think about. Then he
may have to migrate back or move his data back to another 32bit
system to make proper dumps which is quite a bit of a hassle.

Mentioning this should be done by each and every affected software
vendor in their documentation but generally I'd expect the distro
(e.g. Gentoo) to take care of this because this is more of a general
problem than application specific. IMHO it belongs to an x86-amd64
migration HOWTO, FAQ or something.

Slightly off topic but FYI:

PostgreSQL crashes immediately when it sees a 32bit database on
amd64 (same software version otherwise). The way to go is pg_dumpall
on 32bit and then restore into a fresh empty installation on amd64.

MySQL 5.0.x does work seamlessly so they are probably using sizeof()
independent pointers and counters, maybe even endian-safe? *respect*
But still, as far as I can remember, MySQL AB strongly recommends dump
and restore for every major version change and an architecture change
should probably be considered as major...

Please note that in both cases there are proper tools at hand with
easy and documented procedures.
Comment 14 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-08-16 22:56:53 UTC
(In reply to comment #13)
> Please note that in both cases there are proper tools at hand with
> easy and documented procedures.
OpenLDAP has slapcat/slapadd that you need to run on the original DB major version+architecture matching, and then take the output to the new system.

MySQL does fail if you change between endianness of the machine. 

My general opinion is just to mark this bug RESO-UPSTREAM, because while berkdb is being used as the backend store, it has specific warnings that dump+reload is not safe in many conditions when it's being used from a higher level.

The problem isn't limited to berkdb either, you can get it on sqlite and several other similar setups (mysql innodb, myisam). I've taken sys-libs/db out of the summary line because of this.
Comment 15 Peter Volkov (RETIRED) gentoo-dev 2008-08-21 06:54:26 UTC
(In reply to comment #11)
> The AMD64 FAQ already covers this topic, basically:
> 
> http://www.gentoo.org/doc/en/gentoo-amd64-faq.xml#upgradex86

But there is no information about databases and generally about binary data there. May be it's good idea to add it there...

============================================================================
Can I upgrade from my x86 system to amd64 by doing emerge -e world?

Due to several differences between an x86 and an amd64 installation, it is impossible to upgrade. Please perform a fresh install. The installation is slightly different than an x86 one, so please use the AMD64 Handbook. 

Also note that it's very probable that binary files created on x86 system can
not be read by software on amd64 system. Particularly this means that it's
impossible to transfer databases (Mysql, Berkeley DB, etc) as-is and you
should dump databse into some architecture independent format (e.g. text file)
on x86 system and then restore it on amd64 system.

============================================================================

What do you think about such addition (btw, I suppose wording could be better).
Comment 16 nm (RETIRED) gentoo-dev 2008-09-02 07:49:32 UTC
Okay, I've taken pva's suggestion and combined it with robbat2's notes, and added a paragraph on this to the AMD64 FAQ.

Jer, you're the bug owner, so you're free to resolve this however you want (Robin suggested UPSTREAM), but since I fixed the doc, from the GDP's point of view it's essentially closed. :)
Comment 17 Jeroen Roovers (RETIRED) gentoo-dev 2008-09-02 18:22:07 UTC
(In reply to comment #16)
> Okay, I've taken pva's suggestion and combined it with robbat2's notes, and
> added a paragraph on this to the AMD64 FAQ.

That appears to be InCVS and published now.

> Jer, you're the bug owner, so you're free to resolve this however you want
> (Robin suggested UPSTREAM), but since I fixed the doc, from the GDP's point of
> view it's essentially closed. :)

Thanks for all the work. I'm simply closing as FIXED because this is all we can do (and as far as I am aware, all that any UPSTREAM will do).