I'm in the process of migrating a server from an x86 platform to an amd64 platform. Several software packages (openldap, cyrus-imap) are using berkeley db as their backend and apparently none of them work when they are started with a database that has been previously created on an x86 system. Reproducible: Always Steps to Reproduce: 1. Create databases with software using the bdb backend on x86 2. Copy such databases as-is to an amd64 system 3. Try to use them with the amd64 userland. Actual Results: OpenLDAP cannot open the database (gracefully). Cyrus-IMAP daemons simply crash. For example: Starting program: /usr/lib64/cyrus/ctl_mboxlist Program received signal SIGSEGV, Segmentation fault. 0x00007fe849bc7f89 in ?? () from /lib64/ld-linux-x86-64.so.2 (gdb) bt #0 0x00007fe849bc7f89 in ?? () from /lib64/ld-linux-x86-64.so.2 #1 0x00007fe849bcb781 in ?? () from /lib64/ld-linux-x86-64.so.2 #2 0x00007fe849bd0ff2 in ?? () from /lib64/ld-linux-x86-64.so.2 #3 0x00007fe84848315a in __db_e_attach () from /usr/lib/libdb-4.5.so #4 0x00007fe848480d93 in __env_open () from /usr/lib/libdb-4.5.so #5 0x000000000042f7b3 in ?? () #6 0x00000000004260e6 in ?? () #7 0x0000000000417196 in ?? () #8 0x0000000000406818 in ?? () #9 0x00007fe847c81b74 in __libc_start_main () from /lib/libc.so.6 #10 0x00000000004054a9 in ?? () #11 0x00007fff51dd9c58 in ?? () #12 0x0000000000000000 in ?? () With OpenLDAP I have managed to work around the problem by moving the database back to an x86 system, slapcat the whole database into LDIF and then recreate the database on amd64 with slapadd. With Cyrus-IMAP there are no tools (I know of) to do this properly so now I'm stuck. Any hints how to circumvent this especially with Cyrus? I'm in a hurry with the migration, company email services are down...
imapd log: Jul 20 13:48:30 nexus master[26767]: process started Jul 20 13:48:30 nexus master[26771]: about to exec /usr/lib/cyrus/ctl_cyrusdb Jul 20 13:48:30 nexus master[26767]: process 26771 exited, signaled to death by 11 Jul 20 13:48:30 nexus master[26767]: ready for work Jul 20 13:48:30 nexus master[26776]: about to exec /usr/lib/cyrus/tls_prune Jul 20 13:48:30 nexus master[26767]: process 26776 exited, signaled to death by 11 Jul 20 13:48:30 nexus master[26777]: about to exec /usr/lib/cyrus/ctl_deliver Jul 20 13:48:30 nexus master[26778]: about to exec /usr/lib/cyrus/ctl_cyrusdb Jul 20 13:48:30 nexus master[26767]: process 26778 exited, signaled to death by 11 Jul 20 13:48:30 nexus master[26767]: process 26777 exited, signaled to death by 11 This is about the same with every daemon or utility in the Cyrus-IMAP package...
Is it possible to use the db4.5_dump and db4.5_load utilities on these files?
I've moved them to an x86 machine and tried to dump but they seem to be btree databases and the dumps seem to be corrupt. I suspect this might be the reason: "Dumping and reloading Btree databases that use user-defined prefix or comparison functions will result in new databases that use the default prefix and comparison functions. In this case, it is quite likely that the database will be damaged beyond repair permitting neither record storage or retrieval. The only available workaround for either case is to modify the sources for the db4.5_load utility to load the database using the correct hash, prefix, and comparison functions."
I've found a workaround for Cyrus-IMAP by practically rebuilding all it's databases. Not too elegant but it does seem to work... Here is my quick howto: While still being on 32bit platform stop imapd and do this: /usr/lib/cyrus/ctl_mboxlist -d > /tmp/mailboxes.txt If you forgot this (like I did) then you can move the mailboxes.db file back onto an x86 machine which has cyrus installed or just emerge a default install there, then do: /usr/lib/cyrus/ctl_mboxlist -d -f /path/to/mailboxes.db > /tmp/mailboxes.txt On the 64bit platform move all cyrus data dirs into place and do this: tar cvpzf /tmp/var_imap_backup.tgz /var/imap rm /var/imap/db/* rm -rf /var/imap/db.backup* rm /var/imap/*.db Then as user cyrus (su cyrus) do: /usr/lib/cyrus/ctl_mboxlist -u < /tmp/mailboxes.txt /usr/lib/cyrus/reconstruct -r -f user What this essentially does is that it dumps the mailboxes list into a plain text file, then deletes all the current databases, reconstructs mailboxes.db from the text file, then reconstructs this and all the other databases by scanning the mail spool. This way you retain /var/imap/user (seen info) and /var/imap/quota but you lose annotations.db, deliver.db, and tls_sessions.db. I don't exactly know what information these databases contain, but from what's apparent after such a rebuild, they are probably not essential. It would be nice to have someone comment on this who has a clue about Cyrus IMAP whether this is right or wrong... Please note that this is not a fix but an ugly workaround.
What supposes that I should not resolve this bug as UPSTREAM? In other words, what does Gentoo have to do with this?
Well, the core problem is probably not Gentoo specific. It may affect a number of Gentoo users though as it does affect several key software packages. I suspect upstream won't come up with anything useful in the near future so a Gentoo specific workaround or solution could be useful at least temporarily. The following ideas come to my mind right now: - patch config.h or similar in the sys-libs/db ebuild on amd64 so that it does not use different size pointers or counters (or whatever it does) on amd64 than on x86 thereby circumventing the problem if possible - some way to install 32bit binaries of affected software and/or the sys-libs/db library onto a 64bit system, either to be able to use the databases unaltered or at least temporarily to make necessary dumps or backups - scripts or other tools to detect and maybe repair/convert 32bit databases on 64bit systems that may have this problem - mentioning the problem in Gentoo amd64 documentation and/or ebuilds so that everybody knows about it before actually beginning with such a migration or at least when already being at it Please note that these are just some quick ideas without any deep checks or proof-of-concept made so not necessarily realistic at all.
(In reply to comment #6) > Well, the core problem is probably not Gentoo specific. > > It may affect a number of Gentoo users though as it does affect several > key software packages. I suspect upstream won't come up with anything > useful in the near future so a Gentoo specific workaround or solution > could be useful at least temporarily. Define "temporary" - people have been migrating their data to 64-bit systems for decades. :) > The following ideas come to my mind right now: > - scripts or other tools to detect and maybe repair/convert 32bit > databases on 64bit systems that may have this problem Our net-mail team would probably be happy to commit an ebuild for an established package of migration tools. > - mentioning the problem in Gentoo amd64 documentation and/or ebuilds > so that everybody knows about it before actually beginning with > such a migration or at least when already being at it The amd64 documentation covers installation of Gentoo Linux on x86_64 systems - there's nothing about migrating from x86 to x86_64. Apart from that, there is no Gentoo document (that I could find) that covers cyrus-imap. Generally, when migrating data from one platform to another, you should check whether it works, whether your data hasn't been corrupted while it was first write accessed on the target platform. Useful links Google found for me: * "Re: [SLE] Cyrus-IMAP migration between two servers"[1] - This states that the platform wouldn't perhaps matter as much as the sys-libs/db version. * "Migration 32 to 64 bit"[2] - from the info-cyrus mailing list. If you plan to put your experience into a neat HOWTO, I am sure others could benefit from that - CMU would perhaps host it for you, perhaps at [3]. However, I still doubt this is a Gentoo specific problem, and I still think this bug report will serve Gentoo or its users no purpose that other distro's users would not also benefit from. [1] http://linux.derkeiler.com/Mailing-Lists/SuSE/2005-12/msg01107.html [2] http://lists.andrew.cmu.edu/pipermail/info-cyrus/2007-November/027857.html [3] http://cyrusimap.web.cmu.edu/twiki/bin/view/Cyrus/WebHome
(In reply to comment #6) > Well, the core problem is probably not Gentoo specific. > > It may affect a number of Gentoo users though as it does affect several > key software packages. I suspect upstream won't come up with anything > useful in the near future so a Gentoo specific workaround or solution > could be useful at least temporarily. > Agreed, it can always be punted upstream once tested and verified on x86 and amd64. > The following ideas come to my mind right now: > > - some way to install 32bit binaries of affected software and/or the > sys-libs/db library onto a 64bit system, either to be able to use > the databases unaltered or at least temporarily to make necessary > dumps or backups If you keep it just to sys-libs/db, agreed. > - scripts or other tools to detect and maybe repair/convert 32bit > databases on 64bit systems that may have this problem > ++ > - mentioning the problem in Gentoo amd64 documentation and/or ebuilds > so that everybody knows about it before actually beginning with > such a migration or at least when already being at it > Perhaps in docs specific to the packages/areas covered, but not amd64 across the board, imo. (In reply to comment #7) > Define "temporary" - people have been migrating their data to 64-bit systems > for decades. :) > Surely that's more an argument for enabling this for Gentoo users than not? > Generally, when migrating data from one platform to another, you should check > whether it works, whether your data hasn't been corrupted while it was first > write accessed on the target platform. > Which is where the bug report came from? > If you plan to put your experience into a neat HOWTO, I am sure others could > benefit from that - CMU would perhaps host it for you, perhaps at [3]. However, > I still doubt this is a Gentoo specific problem, and I still think this bug > report will serve Gentoo or its users no purpose that other distro's users > would not also benefit from. > I agree it's not Gentoo specific, but I feel it would be very useful. I am unsure as to the documentation side of it, but would encourage the actual proposal (even if it should be discussed elsewhere, such as the dev m-l or in #gentoo-server.) As to a howto, I'd start it in docs, tips and tricks on the forums, so it can be worked up into a decent shape with help from other knowledgeable users and perhaps put on the wiki if the doc team doesn't want to use it.
(In reply to comment #8) > > Generally, when migrating data from one platform to another, you should check > > whether it works, whether your data hasn't been corrupted while it was first > > write accessed on the target platform. > > > Which is where the bug report came from? Yes, we're on our way to figure out where the bug report is *going to*. :)
Let's see if the documentation people want to work on this.
The AMD64 FAQ already covers this topic, basically: http://www.gentoo.org/doc/en/gentoo-amd64-faq.xml#upgradex86 In short, you *cannot* really migrate/upgrade from x86 to x86-64; you have to reinstall. You can't expect stuff that's compiled on one architecture to run on another, though I suppose firefox-bin/OOo-bin and the like are exceptions to that rule. Copying databases as-is is usually a bad idea anyway; I think there are database migration howtos elsewhere on the internet, possibly at tldp.org -- though that one may have been for something like mysql, not berkdb. Nothing to do on the GDP's end; back to the wrangers.
Yes, Josh, but this isn't about installing - it is about migrating stuff from one system to another. Surely that's not within the scope of installing Gentoo x86/amd64 or whatever, but the 'Net doesn't seem to have a "nice" document about migrating from 32- to 64-bit (or from one endian to another, for that matter), so maybe Gentoo could be nice and provide something... (Bouncing this to bug-wranglers will not do - I heard some interns might have some time to write something up, so I thought something might come out of this...)
I do agree that with databases etc. a dump and restore operation is the standard way to go and is also recommended by the vendors. The problem occurs (as with bdb) when the database itself does not have proper tools for that and the application software (cyrus-imap) neither supports such operations which is a real problem not only because of such migration problems but also when it's just about doing regular backups. This indeed points to upstream. The other source of trouble is when the documentation does not explicitly warn/remind the user that he is going to run into this which he doesn't expect or simply forgets to think about. Then he may have to migrate back or move his data back to another 32bit system to make proper dumps which is quite a bit of a hassle. Mentioning this should be done by each and every affected software vendor in their documentation but generally I'd expect the distro (e.g. Gentoo) to take care of this because this is more of a general problem than application specific. IMHO it belongs to an x86-amd64 migration HOWTO, FAQ or something. Slightly off topic but FYI: PostgreSQL crashes immediately when it sees a 32bit database on amd64 (same software version otherwise). The way to go is pg_dumpall on 32bit and then restore into a fresh empty installation on amd64. MySQL 5.0.x does work seamlessly so they are probably using sizeof() independent pointers and counters, maybe even endian-safe? *respect* But still, as far as I can remember, MySQL AB strongly recommends dump and restore for every major version change and an architecture change should probably be considered as major... Please note that in both cases there are proper tools at hand with easy and documented procedures.
(In reply to comment #13) > Please note that in both cases there are proper tools at hand with > easy and documented procedures. OpenLDAP has slapcat/slapadd that you need to run on the original DB major version+architecture matching, and then take the output to the new system. MySQL does fail if you change between endianness of the machine. My general opinion is just to mark this bug RESO-UPSTREAM, because while berkdb is being used as the backend store, it has specific warnings that dump+reload is not safe in many conditions when it's being used from a higher level. The problem isn't limited to berkdb either, you can get it on sqlite and several other similar setups (mysql innodb, myisam). I've taken sys-libs/db out of the summary line because of this.
(In reply to comment #11) > The AMD64 FAQ already covers this topic, basically: > > http://www.gentoo.org/doc/en/gentoo-amd64-faq.xml#upgradex86 But there is no information about databases and generally about binary data there. May be it's good idea to add it there... ============================================================================ Can I upgrade from my x86 system to amd64 by doing emerge -e world? Due to several differences between an x86 and an amd64 installation, it is impossible to upgrade. Please perform a fresh install. The installation is slightly different than an x86 one, so please use the AMD64 Handbook. Also note that it's very probable that binary files created on x86 system can not be read by software on amd64 system. Particularly this means that it's impossible to transfer databases (Mysql, Berkeley DB, etc) as-is and you should dump databse into some architecture independent format (e.g. text file) on x86 system and then restore it on amd64 system. ============================================================================ What do you think about such addition (btw, I suppose wording could be better).
Okay, I've taken pva's suggestion and combined it with robbat2's notes, and added a paragraph on this to the AMD64 FAQ. Jer, you're the bug owner, so you're free to resolve this however you want (Robin suggested UPSTREAM), but since I fixed the doc, from the GDP's point of view it's essentially closed. :)
(In reply to comment #16) > Okay, I've taken pva's suggestion and combined it with robbat2's notes, and > added a paragraph on this to the AMD64 FAQ. That appears to be InCVS and published now. > Jer, you're the bug owner, so you're free to resolve this however you want > (Robin suggested UPSTREAM), but since I fixed the doc, from the GDP's point of > view it's essentially closed. :) Thanks for all the work. I'm simply closing as FIXED because this is all we can do (and as far as I am aware, all that any UPSTREAM will do).