Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 924864 - dev-libs/igraph-0.10.4-r1 fails to compile: random.c:684:17: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
Summary: dev-libs/igraph-0.10.4-r1 fails to compile: random.c:684:17: error: dereferen...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Science Biology related packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: lto
  Show dependency tree
 
Reported: 2024-02-18 07:42 UTC by Agostino Sarubbo
Modified: 2024-03-11 14:24 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
build.log (build.log,154.88 KB, text/plain)
2024-02-18 07:42 UTC, Agostino Sarubbo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Agostino Sarubbo gentoo-dev 2024-02-18 07:42:15 UTC
https://blogs.gentoo.org/ago/2020/07/04/gentoo-tinderbox/

Issue: dev-libs/igraph-0.10.4-r1 fails to compile.
Discovered on: amd64 (internal ref: lto_tinderbox)
System: LTO-SYSTEM (https://wiki.gentoo.org/wiki/Project:Tinderbox/Common_Issues_Helper#LTO)

Info about the issue:
https://wiki.gentoo.org/wiki/Project:Tinderbox/Common_Issues_Helper#CF0014
Comment 1 Agostino Sarubbo gentoo-dev 2024-02-18 07:42:17 UTC
Created attachment 885282 [details]
build.log

build log and emerge --info
Comment 2 Agostino Sarubbo gentoo-dev 2024-02-18 07:42:19 UTC
Error(s) that match a know pattern in addition to what has been reported in the summary:


FAILED: src/CMakeFiles/igraph.dir/random/random.c.o 
/var/tmp/portage/dev-libs/igraph-0.10.4-r1/work/igraph-0.10.4/src/random/random.c:684:17: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
Comment 3 Szabolcs Horvát 2024-02-19 15:28:53 UTC
This is not a bug in igraph, the data is intentionally reinterpreted. Either disable `-Wstrict-aliasing` or turn off treating warnings as errors using the methods I described at https://bugs.gentoo.org/895066
Comment 4 Eli Schwartz 2024-03-05 01:49:56 UTC
If the strict-aliasing violation is really needed, then shouldn't the build system be passing -fno-strict-aliasing to prevent miscompilation?
Comment 5 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-03-05 01:52:11 UTC
You can also use the 'may_alias' attribute. But if you want to properly misinterpret it, you need to memcpy it or can use a union. That might be wrong in other ways (which is what you want for your bad interpretation thing, I think?) but it's not an aliasing violation then.
Comment 6 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-03-05 01:55:53 UTC
Actually, having looked at the test, I think may_alias is the best fit here.
Comment 7 Szabolcs Horvát 2024-03-05 11:05:49 UTC
> If the strict-aliasing violation is really needed, then shouldn't the build system be passing -fno-strict-aliasing to prevent miscompilation?

It seems to me that it is Gentoo that enables this warning, not igraph. 

If there is a compiler (or compiler version) that won't compile igraph with _default_ settings, and you think igraph should support this compiler, open an issue at https://github.com/igraph/igraph/issues/new/choose To my knowledge, igraph works with all GCC versions since at least GCC 5, as well as all Clang releases from the recent few years.

What's happening here is that Gentoo isn't using the default warnings of the compiler. It turns on additional warnings.
Comment 8 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-03-05 11:07:33 UTC
(In reply to Szabolcs Horvát from comment #7)
> > If the strict-aliasing violation is really needed, then shouldn't the build system be passing -fno-strict-aliasing to prevent miscompilation?
> 
> It seems to me that it is Gentoo that enables this warning, not igraph. 
> 

The compile will enforce aliasing rules regardless of the warning. The warning simply tells you about violations.

> If there is a compiler (or compiler version) that won't compile igraph with
> _default_ settings, and you think igraph should support this compiler, open
> an issue at https://github.com/igraph/igraph/issues/new/choose To my
> knowledge, igraph works with all GCC versions since at least GCC 5, as well
> as all Clang releases from the recent few years.
> 
> What's happening here is that Gentoo isn't using the default warnings of the
> compiler. It turns on additional warnings.

Yes, additional warnings are enabled when testing with LTO to find likely runtime issues.
Comment 9 Szabolcs Horvát 2024-03-05 11:53:00 UTC
What's happening here is that the bits of a `double` are manipulated explicitly by accessing it as an `int64_t`. There is a safety test in the build system that checks that the sizes of these two types are identical, and that the representation of `double` is as expected. If this is not the case on some platform, the configuration step (with CMake) will fail.

This is not something that we can change, as it is performance-critical code.
Comment 10 David Seifert gentoo-dev 2024-03-05 12:06:01 UTC
(In reply to Szabolcs Horvát from comment #9)
> What's happening here is that the bits of a `double` are manipulated
> explicitly by accessing it as an `int64_t`. There is a safety test in the
> build system that checks that the sizes of these two types are identical,
> and that the representation of `double` is as expected. If this is not the
> case on some platform, the configuration step (with CMake) will fail.
> 
> This is not something that we can change, as it is performance-critical code.

This is still UB. You cannot cast a uint64_t* to double*, even if the sizes and alignments match, because the strict aliasing rules say so. Instead of trying to do clever pointer tricks, use a memcpy() (which gets optimized away with modern compilers) or even better, a union instead (https://godbolt.org/z/xGqrn5ovT):

#include <assert.h>
#include <string.h>
#include <stdint.h>

double bad(uint64_t x) {
    return *(double*)&x;
}

double better(uint64_t x) {
    double result;
    static_assert(sizeof(result) == sizeof(x), "");
    memcpy(&result, &x, sizeof(result));
    return result;
}

double best(uint64_t x) {
    union {
        uint64_t in;
        double out;
    } convert = {.in = x};
    return convert.out;
}

which compiles to

bad:
        movq    xmm0, rdi
        ret
better:
        movq    xmm0, rdi
        ret
best:
        movq    xmm0, rdi
        ret

Notice that all functions yield the same optimized code, with the exception that better() and best() do not invoke undefined behavior. You don't need to violate strict-aliasing rules to get performant code.
Comment 11 Szabolcs Horvát 2024-03-05 12:30:26 UTC
If you feel strongly about this, feel free to open an issue, and it will be discussed.

Keep in mind that "undefined behaviour" here means not defied by the standard—that is not the same as not defined by specific compilers on specific platforms we care about.

We had bad experiences with bugs in old compilers in the past so I am reluctant to change something that works and is well tested just to make a diagnostic go away. There's always a risk of breaking something with some old compiler version.

For Gentoo, the best use of time would be to actually upgrade to igraph 0.10.10 instead staying on the outdated (and known to be buggy) 0.10.4 ...
Comment 12 David Seifert gentoo-dev 2024-03-05 12:37:47 UTC
(In reply to Szabolcs Horvát from comment #11)
> If you feel strongly about this, feel free to open an issue, and it will be
> discussed.
> 
> Keep in mind that "undefined behaviour" here means not defied by the
> standard—that is not the same as not defined by specific compilers on
> specific platforms we care about.
> 
> We had bad experiences with bugs in old compilers in the past so I am
> reluctant to change something that works and is well tested just to make a
> diagnostic go away. There's always a risk of breaking something with some
> old compiler version.
> 
> For Gentoo, the best use of time would be to actually upgrade to igraph
> 0.10.10 instead staying on the outdated (and known to be buggy) 0.10.4 ...

And Gentoo has had far, far, far more problems with GCC N+1 or Clang N+1 breaking code because it invokes UB. Even more so, given that the union trick has been supported since forever and I have never had a bug due to it. It's somewhat ironic, since you rely on the union trick in the CMake check code...

"Well tested code" is not an argument, especially in light of people telling you that it invokes undefined behavior. And yes, both GCC *and* Clang define this to be undefined behavior, the two most prominent C compilers on Linux. That leaves pretty much no other mainstream compiler on the table.
Comment 13 Eli Schwartz 2024-03-05 12:55:29 UTC
(In reply to Szabolcs Horvát from comment #11)
> Keep in mind that "undefined behaviour" here means not defied by the
> standard—that is not the same as not defined by specific compilers on
> specific platforms we care about.


This is unfortunately not good enough. The problem is not that it's being done accidentally, the problem is that it's being done at all. Violating the C standard grants the compiler a license to kill (your code). It doesn't matter if you perform a configure check to see whether it matches. The compiler sees UB, and then all bets are off.

Compilers do NOT define this "on specific platforms we care about".

The -Werror's here are used to detect likely areas where the compiler tries to optimize based on assuming that UB cannot exist, leading to miscompilations. strict-aliasing rules are special here because they can be miscompiled even when LTO is not used, but LTO can make this much worse.

For this reason, Gentoo QA tests for packages that fail to build when these warnings are enabled, and solves the problem by:
- adding -fno-strict-aliasing to *FLAGS for the entire program (a pessimization that covers more than just this function)
- preventing LTO from being switched on for the entire program (a pessimization that covers more than just this function)


This is because we *know* that compilers will see your UB and commit horrible crimes to it in the name of optimizations, and also, the compilers are allowed to do it. But disabling optimization passes entirely makes it less likely the compiler will choose to do so.
Comment 14 Eli Schwartz 2024-03-05 13:02:41 UTC
(In reply to Szabolcs Horvát from comment #11)
> For Gentoo, the best use of time would be to actually upgrade to igraph
> 0.10.10 instead staying on the outdated (and known to be buggy) 0.10.4 ...

Also note: Gentoo QA has hundreds of packages to make LTO safe with the eventual goal of being able to recommend LTO as a standard flag (maybe even use it automatically on binhosts).

But we are not the maintainer of *this* package. If updating it would make it LTO-safe, it.might be worth the time to do so anyway, but it's been made clear that the upstream code in git has the same problem, even if it fixes other bugs, so upgrading the package is frankly better left to its actual maintainers.
Comment 15 Larry the Git Cow gentoo-dev 2024-03-11 14:24:05 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=37bd1334b41213813cb38acf46c98b1f80c5fd15

commit 37bd1334b41213813cb38acf46c98b1f80c5fd15
Author:     Eli Schwartz <eschwartz93@gmail.com>
AuthorDate: 2024-03-11 04:00:26 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-03-11 14:22:51 +0000

    dev-libs/igraph: add 0.10.10
    
    Include backported patch from upstream which fixes strict-aliasing
    violations.
    
    Closes: https://bugs.gentoo.org/924864
    Closes: https://bugs.gentoo.org/925227
    Closes: https://bugs.gentoo.org/921172
    Signed-off-by: Eli Schwartz <eschwartz93@gmail.com>
    Signed-off-by: Sam James <sam@gentoo.org>

 dev-libs/igraph/Manifest                           |  1 +
 .../808c083fbe661207ee8f0fcd3be5096b5dc17d0d.patch | 35 ++++++++++++++
 dev-libs/igraph/igraph-0.10.10.ebuild              | 53 ++++++++++++++++++++++
 3 files changed, 89 insertions(+)

Additionally, it has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=cccba67944b52d8d48beb95a39e653ecbbf5471f

commit cccba67944b52d8d48beb95a39e653ecbbf5471f
Author:     Eli Schwartz <eschwartz93@gmail.com>
AuthorDate: 2024-03-11 04:16:44 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-03-11 14:22:51 +0000

    dev-libs/igraph: mark as LTO-unsafe, strict-aliasing unsafe
    
    Only in 0.10.4, as it is fixed and backported into the ~arch version.
    
    Bug: https://bugs.gentoo.org/924864
    Signed-off-by: Eli Schwartz <eschwartz93@gmail.com>
    Signed-off-by: Sam James <sam@gentoo.org>

 dev-libs/igraph/igraph-0.10.4-r1.ebuild | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)