553974 – net-p2p/bitcoind-0.10.2 crashes (really is 0.10.1 with patches)

Bug 553974 - net-p2p/bitcoind-0.10.2 crashes (really is 0.10.1 with patches)

Summary: net-p2p/bitcoind-0.10.2 crashes (really is 0.10.1 with patches)

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Anthony Basile

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-07-05 10:59 UTC by Lionel Bouton
Modified:	2017-01-25 08:55 UTC (History)
CC List:	2 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Lionel Bouton 2015-07-05 10:59:02 UTC

I upgraded from bitcoind-0.9.3 to 0.10.2 2 days ago.

I had 2 crashes of net-p2p/bitcoind-0.10.2 with the same last messages in debug.log :

2015-07-05 03:44:26 ERROR: CheckProofOfWork() : nBits below minimum work
2015-07-05 03:44:26 ERROR: ReadBlockFromDisk : Errors in block header

Normally I would report this upstream but I'm not sure if this bug is in the official 0.10.2 release. The bitcoind distributed by Gentoo called 0.10.2 uses a reference to a commit in a different repository than mainline which seems to be based on 0.10.1 work in progress or 0.10.1 with patches (difficult to say without comparing the commits but this seems confirmed by bitcoin-cli getinfo, which returns "version" : 100100).
As the repository is maintained by the ebuild author it seems safer to report the bug here.

Note that currently there isn't an ebuild which is both BIP66 compatible (mandatory for miners and strongly recommanded for wallet users) and stable in Portage.

I looked into the bitcoin overlay and the only ebuild I could find which could verify both conditions is 0.9.5 which uses the official repository (I'm not aware of any crash in the official 0.9.5 bitcoind). I'm running it now and will report if it crashes (I only had 2 crashes in 2 days with the Portage 0.10.2 ebuild so it may take time).

Comment 1 Luke-Jr 2015-07-05 16:40:55 UTC

What USE flags and BITCOIN_POLICY do you have set? The base source for the "0.10.2" ebuild is d8ac90184254fea3a7f4991fd0529dfbd750aea0, which is the official 0.10.1 release tag from the master repository. I highly suspect disk corruption as the cause here, but let's look into it more first. Am I correct in assuming it doesn't crash at startup, but at some arbitrary random point later on?

Comment 2 Lionel Bouton 2015-07-05 17:05:15 UTC

(In reply to Luke-Jr from comment #1)
> What USE flags

logrotate and wallet

> and BITCOIN_POLICY do you have set?

none

> The base source for the
> "0.10.2" ebuild is d8ac90184254fea3a7f4991fd0529dfbd750aea0, which is the
> official 0.10.1 release tag from the master repository. I highly suspect
> disk corruption as the cause here,

If disk corruption occurred it's most probably a bug in the previous version (0.9.3) or this one the filesystem used is RAID1 Btrfs so if corruption had occured due to hardware problems it would have been detected by the internal checksums.

> but let's look into it more first. Am I
> correct in assuming it doesn't crash at startup, but at some arbitrary
> random point later on?

Yes. It did crash approximately 1 day after the first launch and 6 hours after the second.

Note that we will probably need a real stable 0.10.2 shortly as there is a CVE (CVE-2015-3641) for an unspecified DoS to be announced in 2 days affecting pre-0.10.2 releases.

Comment 3 Luke-Jr 2015-07-05 17:40:08 UTC

(In reply to Lionel Bouton from comment #2)
> (In reply to Luke-Jr from comment #1)
> > What USE flags
> 
> logrotate and wallet
> 
> > and BITCOIN_POLICY do you have set?
> 
> none

Ok, from this information you have a vanilla 0.10.1.

> > The base source for the
> > "0.10.2" ebuild is d8ac90184254fea3a7f4991fd0529dfbd750aea0, which is the
> > official 0.10.1 release tag from the master repository. I highly suspect
> > disk corruption as the cause here,
> 
> If disk corruption occurred it's most probably a bug in the previous version
> (0.9.3) or this one the filesystem used is RAID1 Btrfs so if corruption had
> occured due to hardware problems it would have been detected by the internal
> checksums.

Has the machine suddenly lost power at any point? This is known to be able to cause corruption with at least 0.10.x (not sure about 0.9.x). The only solution in the case of a corrupt storage is to reindex (-reindex option).

> > but let's look into it more first. Am I
> > correct in assuming it doesn't crash at startup, but at some arbitrary
> > random point later on?
> 
> Yes. It did crash approximately 1 day after the first launch and 6 hours
> after the second.
> 
> Note that we will probably need a real stable 0.10.2 shortly as there is a
> CVE (CVE-2015-3641) for an unspecified DoS to be announced in 2 days
> affecting pre-0.10.2 releases.

There is unfortunately no version yet which can properly recover from missing/corrupt block data, nor any supported version which can avoid corruption on power failure. :(

Comment 4 Lionel Bouton 2015-07-05 17:55:59 UTC

(In reply to Luke-Jr from comment #3)
> Has the machine suddenly lost power at any point? This is known to be able
> to cause corruption with at least 0.10.x (not sure about 0.9.x). The only
> solution in the case of a corrupt storage is to reindex (-reindex option).

Indeed it has. We had a power cut last week so it is likely to be the cause (I didn't expect bitcoind to have this bug).

> > 
> > Note that we will probably need a real stable 0.10.2 shortly as there is a
> > CVE (CVE-2015-3641) for an unspecified DoS to be announced in 2 days
> > affecting pre-0.10.2 releases.
> 
> There is unfortunately no version yet which can properly recover from
> missing/corrupt block data, nor any supported version which can avoid
> corruption on power failure. :(

The CVE is about a DoS not power failure. So, forgetting about my own problem (which is probably a -reindex away from being fixed) let us address the other part of the problem: there will be a vulnerability Gentoo users will rightfully believe they are covered for as they installed a version supposed to have fixed this CVE.

Unfortunately Portage lies about the version. It should be 0.10.1 and not 0.10.2 so they will be vulnerable.
I tried to make a 0.10.2-r1 ebuild myself but the bitcoincore eclass is tied to ljr patches so it's unusable as-is.

Could you at least make a proper 0.10.2-r<n> release and mark the current ebuild unstable so that users upgrading will get a fix for CVE-2015-3641? Not all users will fix their systems but at least those keeping their system up to date will be covered.

Comment 5 Luke-Jr 2015-07-05 20:39:22 UTC

(In reply to Lionel Bouton from comment #4)
> The CVE is about a DoS not power failure. So, forgetting about my own
> problem (which is probably a -reindex away from being fixed) let us address
> the other part of the problem: there will be a vulnerability Gentoo users
> will rightfully believe they are covered for as they installed a version
> supposed to have fixed this CVE.

The CVE does not apply to 0.10.1. In fact, the official Ubuntu PPA has not been (and probably will not be) updated to 0.10.2.

> Unfortunately Portage lies about the version. It should be 0.10.1 and not
> 0.10.2 so they will be vulnerable.
> I tried to make a 0.10.2-r1 ebuild myself but the bitcoincore eclass is tied
> to ljr patches so it's unusable as-is.

The patches are all just fine with the real 0.10.2 code, so the only "problem" will be that the eclass only expects one possible commit per PV. I could easily modify it to use a different filename for this, or we could just update the Manifest and remove the fake/pre-revbump 0.10.2 from the tree and let Portage rename/remove the old distfile. So: should we fix this anyway, even though there's little value? if so, which approach to dealing with the distfile should be taken?

Comment 6 Lionel Bouton 2015-07-05 21:02:08 UTC

I'm not a Gentoo dev so I can't say with any authority what the right approach is for replacing the distfiles (it will generate confusing errors if checksums don't match local copies for users reinstalling or upgrading though but maybe there's a process to avoid this).

The only advice I can give here is: don't stray from upstream version numbers if you want maintaining ebuilds to remain simple. I liked that I could choose patches with USE flags or BITCOIN_POLICY and that the default was not applying them. This was helping identifying the source of bugs (upstream or patches).
What was not clear from the ebuild is that it was based on a release. I was expecting a git tag, not a changeset which (at least for me) implied a work in progress.
The fact that you choose the changeset for a release is good because now I know that I can safely bring bugs I found upstream if any. But if using tags isn't possible in ebuilds I suggest to clearly document the version (# this changeset is release x.yy.zz) and avoid as much as possible version mismatches as it is a receipe for miscommunication.

For example of unattended consequences I suspect writing a GLSA for the comming CVE (or to be more precise any potential CVE that affects 0.10.1 and 0.10.2 differently) might not be easy if the writer wants to warn (or not) users having installed the current 0.10.2 ebuild.

Comment 7 Luke-Jr 2015-07-05 21:24:21 UTC

See https://bugs.gentoo.org/show_bug.cgi?id=552208#c22

Comment 8 Lionel Bouton 2015-07-05 21:33:18 UTC

As the crash seems linked to a known bug upstream and #552208 address my concerns about version mismatch more in depth I think this bug could be closed.

At least it will document that crashes are possible after unclean shutdown and -reindex is the appropriate correction which will probably be useful for other users.

Comment 9 Andreas Sturmlechner gentoo-dev

2017-01-24 11:32:14 UTC

Is this still an issue?

Comment 10 Lionel Bouton 2017-01-24 11:47:38 UTC

I don't think this is an issue anymore : I didn't get any crash since then and the packaging practices have been addressed in a separate bug.