Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 920556 - sci-libs/dealii-9.5.1[mpi]: with sys-cluster/mpich and sys-devel/gcc-13: segmentation fault of linked executables on startup
Summary: sci-libs/dealii-9.5.1[mpi]: with sys-cluster/mpich and sys-devel/gcc-13: segm...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Matthias Maier
URL: https://github.com/dealii/dealii/issu...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-22 21:51 UTC by jang0
Modified: 2024-02-18 10:46 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (emerge-info.log,6.31 KB, text/x-log)
2023-12-22 21:51 UTC, jang0
Details
emerge -pv dealii (emerge-dealii.log,519 bytes, text/x-log)
2023-12-22 21:55 UTC, jang0
Details
emerge -pv mpich (emerge-mpich.log,347 bytes, text/x-log)
2023-12-22 22:00 UTC, jang0
Details
gdb log (gdb.txt,1.08 KB, text/plain)
2023-12-22 22:00 UTC, jang0
Details
emerge @system (emerge-sys.log.xz,519.31 KB, application/x-xz)
2023-12-22 22:01 UTC, jang0
Details
ldd step-1 (ldd,4.04 KB, text/plain)
2024-02-16 20:29 UTC, jang0
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jang0 2023-12-22 21:51:34 UTC
Created attachment 880242 [details]
emerge --info

The last dealii-9.5.0 actualization have give me compilation errors with any code, when I compile it enabling "mpi" use flag; in example: 
%make run
[ 33%] Building CXX object CMakeFiles/step-1.dir/step-1.cc.o
[ 66%] Linking CXX executable step-1
[ 66%] Built target step-1
[100%] Run step-1 with Debug configuration
make[3]: *** [CMakeFiles/run.dir/build.make:71: CMakeFiles/run] Segmentation fault
make[2]: *** [CMakeFiles/Makefile2:116: CMakeFiles/run.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:123: CMakeFiles/run.dir/rule] Error 2
make: *** [Makefile:137: run] Error 2
make run  37.83s user 12.90s system 92% cpu 54.941 total

Downgrade nor use dealii Rollin release version can solve the problem.
Comment 1 jang0 2023-12-22 21:55:14 UTC
Created attachment 880243 [details]
emerge -pv dealii
Comment 2 jang0 2023-12-22 22:00:02 UTC
Created attachment 880244 [details]
emerge -pv mpich
Comment 3 jang0 2023-12-22 22:00:58 UTC
Created attachment 880245 [details]
gdb log
Comment 4 jang0 2023-12-22 22:01:51 UTC
Created attachment 880246 [details]
emerge @system
Comment 5 jang0 2024-02-14 21:35:38 UTC
Hi, in fact I've depleted all resources to fix dealii. Please Tamiko just give me some clues and I can continue deepen into the problem solution; I use this library as a work tool and in the near future I wanna help you to maintain it, but I need your help today.
Comment 6 Matthias Maier gentoo-dev 2024-02-14 23:42:15 UTC
(In reply to jang0 from comment #5)
> Hi, in fact I've depleted all resources to fix dealii. Please Tamiko just
> give me some clues and I can continue deepen into the problem solution; I
> use this library as a work tool and in the near future I wanna help you to
> maintain it, but I need your help today.

Would you mind to attach the output of  $ ldd step-1  as well?
Comment 7 Matthias Maier gentoo-dev 2024-02-15 01:45:58 UTC
Failing construct in question:

  1730     template <typename T>                                                        
  1731     const MPI_Datatype                                                           
  1732       mpi_type_id_for_type = internal::MPIDataTypes::mpi_type_id(                
  1733         static_cast<std::remove_cv_t<std::remove_reference_t<T>> *>(nullptr));
Comment 8 Matthias Maier gentoo-dev 2024-02-15 01:49:30 UTC
Possibly invoked from this context (source/base/mpi.cc):
   101   namespace MPI                                                                  
   102   {                                                                              
   103 #ifdef DEAL_II_WITH_MPI                                                          
   104     // Provide definitions of template variables for all valid instantiations.   
   105     template const MPI_Datatype mpi_type_id_for_type<bool>;
Comment 9 Matthias Maier gentoo-dev 2024-02-15 08:10:24 UTC
I was able to reproduce this issue with sys-cluster/mpich (and your exact collection of USE flags). In the meantime, sys-cluster/openmpi should be fine.

I am now triaging whether this depends on one of the use flag combinations (USE="mpi-threads, threads, valgrind") used with mpich and verify that I indeed do not run into the segfault with openmpi.
Comment 10 Larry the Git Cow gentoo-dev 2024-02-16 01:03:29 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2c012df379f458f614c12177d41fbaa035706390

commit 2c012df379f458f614c12177d41fbaa035706390
Author:     Matthias Maier <tamiko@gentoo.org>
AuthorDate: 2024-02-16 01:01:17 +0000
Commit:     Matthias Maier <tamiko@gentoo.org>
CommitDate: 2024-02-16 01:01:34 +0000

    sci-libs/dealii: add 9.5.2, drop 9.5.1
    
     - apply mpich fix
    
    Closes: https://bugs.gentoo.org/920556
    Signed-off-by: Matthias Maier <tamiko@gentoo.org>

 sci-libs/dealii/Manifest                                     | 4 ++--
 sci-libs/dealii/{dealii-9.5.1.ebuild => dealii-9.5.2.ebuild} | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

Additionally, it has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=71634c39d291459439fadb16adfb9506278d2835

commit 71634c39d291459439fadb16adfb9506278d2835
Author:     Matthias Maier <tamiko@gentoo.org>
AuthorDate: 2024-02-16 00:55:22 +0000
Commit:     Matthias Maier <tamiko@gentoo.org>
CommitDate: 2024-02-16 00:59:14 +0000

    sci-libs/dealii: add 9.4.2, drop 9.4.1-r1
    
     - apply mpich fix
    
    Bug: https://bugs.gentoo.org/920556
    Signed-off-by: Matthias Maier <tamiko@gentoo.org>

 sci-libs/dealii/Manifest                           |  4 +-
 ...{dealii-9.4.1-r1.ebuild => dealii-9.4.2.ebuild} |  7 ++-
 ...-remove-superfluous-explicit-instantiatio.patch | 59 ++++++++++++++++++++++
 ...mark-a-template-variable-to-have-const-in.patch | 28 ++++++++++
 4 files changed, 94 insertions(+), 4 deletions(-)
Comment 11 Matthias Maier gentoo-dev 2024-02-16 09:26:33 UTC
Minimal reproducer for the segfault:


inline int mpi_type_id(const unsigned long int *) { return 42; }

template <typename T>
const int mpi_type_id_for_type = mpi_type_id(static_cast<T *>(nullptr));

void broadcast() {
  int ierr = mpi_type_id_for_type<unsigned long>;
}

template const int mpi_type_id_for_type<unsigned long int>;

int main() {}
Comment 12 jang0 2024-02-16 20:29:56 UTC
Created attachment 885161 [details]
ldd step-1

Here is the ldd step-1 output, I can't work around by installing openmpi instead mpich because dealii compilation fails. Nevertheless I've read in the abovesaid Github page that is a GCC-13 bug, I'll disupgrade it and test. 
Thank you so much!
Comment 13 Matthias Maier gentoo-dev 2024-02-16 21:18:09 UTC
(In reply to jang0 from comment #12)
> I'll disupgrade it and test. 

Not necessary, simply test the new revision in the tree with your current setup.

> Thank you so much!

yw