Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 921932 - app-emulation/xen-4.18 version bump
Summary: app-emulation/xen-4.18 version bump
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Tomáš Mózes
URL: https://forums.gentoo.org/viewtopic.p...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-12 12:00 UTC by moritori
Modified: 2024-02-05 17:10 UTC (History)
7 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
draft of xen-4.18.0.ebuild (xen-4.18.0.ebuild,4.40 KB, text/plain)
2024-01-31 23:27 UTC, John L. Poole
Details
xen-tools-4.18.0.ebuild (xen-tools-4.18.0.ebuild,15.70 KB, text/plain)
2024-01-31 23:28 UTC, John L. Poole
Details
patch -- unvetted by xen-developers (xen-tools-4.18.0-jlpoole1.patch,556 bytes, patch)
2024-01-31 23:29 UTC, John L. Poole
Details | Diff
Failed attempt to execute Woodhouse command (Woodhouse_command_request_1_202402012122.txt,28.21 KB, text/plain)
2024-02-02 05:45 UTC, John L. Poole
Details
xen-operations.i (bzipped) requested by David Woodhouse (xen-operations.i.bz2,264.13 KB, application/x-bzip)
2024-02-02 13:55 UTC, John L. Poole
Details

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 John L. Poole 2024-01-29 16:54:02 UTC
I created an ebuild 4.18 based on the previous release.  Nullified any of the patching relating to 4.17 and rem'd out the vnc-png as that flag is no longer recognized.

A scripted record of the failed build is here:

https://salemdata.us/xen/xen_tools_20240128_Sun_174740.script.html

table of contents for the above session:

###########

Line #    Description

======    ===================

3         tree of my repo for xen-tools

29        cat -n
/var/db/repos/ryz7950/app-emulation/xen-tools/xen-tools-4.18.0.ebuild

572       date; time USE="-ipxe" emerge app-emulation/xen-tools

25307     first sign of error

25339     emerge --info '=app-emulation/xen-tools-4.18.0::ryz7950'

25423     cat
/var/tmp/portage/app-emulation/xen-tools-4.18.0/temp/environment

###########

Notes:

1) I earlier ran into problems with ipxe and finessed it by negating the
USE flag.

2) vnc-png is no longer an option for qemu, see line 394 displaying line
365 of the ebuild

## Discussion

I did not feel I could pinpoint the source of error in the above
session. So I downloaded the project using this URL:

  https://downloads.xenproject.org/release/xen/4.18.0/xen-4.18.0.tar.gz

I was able to successfully build.  I tried choking down my number or processors from 32 to 5 to 1 and the latter two attempts also resulted in the same failure.  I sent an email to xen@gentoo.org.

I'll try the Gentoo Forum and/or IRC and see if anyone has any suggestions on how to identify why the Gentoo build attempt of the same source code fails vs. building the source code without Gentoo packaging.

The good news is that Xen Project 4.18 successfully builds.  Now, I need to determine what in Gentoo is affecting the attempt and causing the build to fail.
Comment 2 John L. Poole 2024-01-29 18:38:14 UTC
Created a topic for discussion:  https://forums.gentoo.org/viewtopic-p-8814935.html#8814935
Comment 3 John L. Poole 2024-01-31 01:40:12 UTC
 I've been learning more about the problem causing the build to fail.  I'm not sure if I should add my various findings here to help others, or will that just muddy the waters.  Basically what is happening, the #error clause of line 5 is tiggered: 

jlpoole@ryzwork ~/problem_builds/xen-tools-4.18.0/work $ cat -n xen-4.18.0/tools/qemu-xen/include/hw/xen/xen_native.h|head -n 7|tail -n 5
     3
     4  #ifdef __XEN_INTERFACE_VERSION__
     5  #error In Xen native files, include xen_native.h before other Xen headers
     6  #endif
     7
jlpoole@ryzwork ~/problem_builds/xen-tools-4.18.0/work $ 

The compile command has in it:

-D__XEN_INTERFACE_VERSION__=__XEN_LATEST_INTERFACE_VERSION__ 

So an investigation is needed to determine where __XEN_LATEST_INTERFACE_VERSION__ is being populated as I'm guessing is should be null at this point in the build.
Comment 4 John L. Poole 2024-01-31 05:24:31 UTC
I think I have found the problem.  Here's a "blame" listing for the line 5 error trigger:
  
     #error In Xen native files, include xen_native.h before other Xen headers

See https://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blame;f=include/hw/xen/xen_native.h;h=6bcc83baf9ebc6ae76e76ed40f710f18e1043ee8;hb=HEAD


The Commit, https://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=commit;f=include/hw/xen/xen_native.h;h=e2abfe5ec67b69fb310fbeaacf7e68d61d16609e, explains:

Since the toolstack libraries may depend on the
specific version of Xen headers that they pull in (and will set the
__XEN_TOOLS__ macro to enable internal definitions that they depend on),
the rule is that xen_native.h (and thus the toolstack library headers)
must be included *before* any of the headers in include/hw/xen/interface.

It looks like the above policy has the implicit assumption that the "xen" core is built first, and then the tools.  Gentoo's approach happens to be the opposite: the tools are built first, i.e. app-emulation/xen-tools, and then the xen core is built, app-emulation/xen.  The reasoning above seems to run contrary to Gentoo's approach.  

In addition to adding the line 5 error, the file was renamed "Rename xen_common.h to xen_native.h" in January, 2023.
  The previous version of the, therefore, was named xen_common.h and may be viewed at: https://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=include/hw/xen/xen_common.h;h=7edcf3eb25ea8333ecf34ffa2d62364c69844f30;hb=a9ae1418b36b20ab06fb760b1108f61f49a76164

It looks like in addition to the file renaming, there is also the introduction of the line 5 "if" clause which, of course, defeats one's ability to build app-emulation/xen-tools.

I could use some help on this: 1) to assess what, if any patch, might be used... maybe something to nullify line 5 **just to see** if the build continues and completes, 2) to assess Gentoo's process of building the tools first, then the xen instance vs. the Xen Project's approach of build the xen first, then the tools.  I am not familiar with adding patching to ebuilds, it would be great to try this and it's possible David Woodhouse @Amazon may have introduced such checks elsewhere which I would like to nullify, if only to see if the build succeeds otherwise.

I determined all the timestamps of the object files from my successful build and  from those built before the package failed in Gentoo.  I then aligned the directory paths to determine the order of processing for the two packages.  The Xen project builds the xen instance, and then the tools.  Gentoo's app-emulation/xen-tools, clearly does not since by design, the building of xen is reserved for the package app-emulation/xen.
Comment 5 John L. Poole 2024-01-31 23:26:31 UTC
I succeeded in building app-emulation/xen-tools-4.18.0 and app-emulation/xen-4.18.0.

OVERVIEW

1) xen-tools-4.18.0.ebuild

Modifications I made using xen-tools-4.17.3:
  # nullified ...PATCHSET... variables
  # disable png automagic - no longer needed
  # under src_install(), rem'd line: rm -rv "${ED}/var/run" || die as in my session, there was no ${ED}/var/run found, so the die was triggered.  Rem'ing out the line allow the install to proceed.
  # I had app-emulation/qemu 8.2.0 installed and that caused xen-tools to abort intallation because of file collisions with what looked like language files.  I removed app-emulation/qemu 8.2.0 and the xen-tools completed.
  # I ran my emerge: 
        USE="-ipxe " emerge  xen-tools  
    because having ipxe caused a very early failure which I did not want to tackle.
  # @387 I added:
     eapply "${FILESDIR}"/${P}-jlpoole1.patch
  The patch file simply removes 3 lines, nos. 4-6, that were added January 2023 per commit message of David Woodhouse 7 Mar 2023 09:04:30 -0800 at  https://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=commit;f=include/hw/xen/xen_native.h;h=e2abfe5ec67b69fb310fbeaacf7e68d61d16609e.  I verified that the these three lines were newly introduced into the 4.18.0 code line and not present in the 4.17.3 release:

   4 #ifdef __XEN_INTERFACE_VERSION__
   5 #error In Xen native files, include xen_native.h before other Xen headers
   6 #endif

   What still needs to be done is to inquiry with xen-developers if removing the above header check could cause problems given Gentoo's build process which causes the tools to be built before the xen image.  I'll do that.  

2) xen-4.18.0.ebuild

  # nullified ...PATCHSET... variables
  # rem'd out src_prepare re: Symlinks:

	# Symlinks do not work on fat32 volumes # 829765
	#if ! use boot-symlinks || use efi; then
	#	eapply "${XEN_GENTOO_PATCHES_DIR}"/no-boot-symlinks/${PN}-4.16-no-symlinks.patch
	#fi
  # used this to emerge:

      USE="-boot-symlinks" emerge  app-emulation/xen

  I could not grasp what was happening in the eapply and took the chance that my system could withstand the omission.  Others may have serious problem ommitting the boot-symlinks feature.  You have been warned.
Comment 6 John L. Poole 2024-01-31 23:27:11 UTC
Created attachment 883881 [details]
draft of xen-4.18.0.ebuild

See accompanying comment for modification/limitations.
Comment 7 John L. Poole 2024-01-31 23:28:39 UTC
Created attachment 883882 [details]
xen-tools-4.18.0.ebuild

draft, see accompanying comments for limitations/warnings.
Comment 8 John L. Poole 2024-01-31 23:29:41 UTC
Created attachment 883883 [details, diff]
patch -- unvetted by xen-developers

This patch erases a recently introduced headers check, see comments.  This has not been vetted by the xen-developers at this time, I still have to pass it by them.
Comment 9 John L. Poole 2024-02-01 01:54:47 UTC
Discussion on xen-developers list:  https://lists.xenproject.org/archives/html/xen-devel/2024-02/msg00000.html
Comment 10 David Woodhouse 2024-02-01 22:45:43 UTC
That check is there for a reason. If you mix the qemu-internal copy of Xen headers and the external ones, you are going to get strange breakage.

As the #error says, you should include xen_native.h *before* any other Xen headers. But there are only three other includes in xen-operations.c before xen_native.h and none of them should be including Xen headers AFAICT.

Need to see preprocessed source for the offending file, please.
Comment 11 John L. Poole 2024-02-01 23:46:31 UTC
I will make available the entire build tree that faile and log as well as my findings in my effort to trace back where values could be set.  It will take me at least an hour to prepare this for you, but I thought I would advise I'm working on it so those monitoring this bug will know something is forthcoming.
Comment 12 John L. Poole 2024-02-02 00:42:02 UTC
I reran the build without my patch so that the error would occur and I can provide the complete tree with the script session.  I used the name Woodhouse to help qualify the files since I have a lot files from previous work on this matter.

The zipped archive of the Gentoo build tree (failed) is at:
  https://salemdata.us/xen/gentoo-4.18.0/Woodhouse_1/Woodhouse_run_20230201_1694.tar.bz2
  
I also unzipped the tree under the Woodhouse_1 directory so its contents may be traversed. Starting node is at:
   https://salemdata.us/xen/gentoo-4.18.0/Woodhouse_1/app-emulation/
   
If you are unfamilir with Gentoo's build tree, you may want to go directly to the staged tree at:
   https://salemdata.us/xen/gentoo-4.18.0/Woodhouse_1/app-emulation/xen-tools-4.18.0/work/xen-4.18.0/
   
I used script for the session and the script files is at:
   https://salemdata.us/xen/gentoo-4.18.0/Woodhouse_1/xen-tools-Woodhouse_20240201_Thu_155042.script
   (timing log: https://salemdata.us/xen/gentoo-4.18.0/Woodhouse_1/xen-tools-Woodhouse_20240201_Thu_155042_timing.log)
   
An HTML representation of the script including line numbering and in color is at:
   https://salemdata.us/xen/gentoo-4.18.0/Woodhouse_1/xen-tools-Woodhouse_20240201_Thu_155042.script.html
   
At the end of the script file, I performed a cat of the ebuild, see line 24,690.  The ebuild is, if you will, the "recipe" Gentoo uses to build a project, it's kind of like a super "Make". Note: while there had been patches for the earlier 4.17.3 build, references were nullified so that no patches would be applied to 4.18 tree.  Therefore, the build should be of a pristine tree.

I was reviewing my notes which may be of interest to someone, so I'm placing them on my server:
    https://salemdata.us/xen/gentoo-4.18.0/Woodhouse_1/debugging_xen-tools_4.18_Jan_30_2024.txt

I eventually began to wonder about the order of processing and how The Xen Project's build which succeeded differed form the xen-tools build that failed.  The URL for those files, including a LibreOffice spreadsheet is at:
    HTML table: https://salemdata.us/xen/gentoo-4.18.0/comparison_of_build_orders_Xen_4.18.xhtml
    LibreOffice spreadsheet: https://salemdata.us/xen/gentoo-4.18.0/comparison_of_build_orders_Xen_4.18.ods
    
I hope the above proves helpful.  Please do not hestitate to ask if there is anything else I can provide.  I can, of course, try out any patches you may have by using the emerge director "--buildpkgonly, -B"  (See https://wiki.gentoo.org/wiki/Full_manpages/emerge#OPTIONS for anything else that you may find helpful.  As I indicated on the xen-developers list, I'm not really proficient with emerge.)
Comment 13 David Woodhouse 2024-02-02 01:54:09 UTC
Please could you just cd into the relevant build directory and run the individual GCC build command line.
Comment 14 John L. Poole 2024-02-02 02:34:58 UTC
(In reply to David Woodhouse from comment #13)
> Please could you just cd into the relevant build directory and run the
> individual GCC build command line.

I'll need specifics.  
1) Which directory?  example: xen-4.18.0/tools/qemu-xen/include/exec?
2) Please provide the command you want run.  I'm assuming it is different from the very long command showing in the build log at line 24,150.
Comment 15 David Woodhouse 2024-02-02 04:56:57 UTC
I mean the specific command at the specific line I mentioned before on my email response, with the specific change I said before.
Comment 16 John L. Poole 2024-02-02 05:44:21 UTC
(In reply to David Woodhouse from comment #15)
> I mean the specific command at the specific line I mentioned before on my
> email response, with the specific change I said before.

For those following this bug, David wrote on 2/1/24 at 9:18 AM PST:
vvvvvvvvvvvvvvvvvvvvvvvvvvvv
That isn't what the #error told you to do, though.

 24788	In file included from ../qemu-xen/hw/xen/xen-operations.c:16:
 24789	/var/tmp/portage/app-emulation/xen-tools-4.18.0/work/xen-4.18.0/tools/qemu-xen/include/hw/xen/xen_native.h:5:2: error: #error In Xen native files, include xen_native.h before other Xen headers
 24790	    5 | #error In Xen native files, include xen_native.h before other Xen headers
 24791	      |  ^~~~~

So it's hw/xen/xen-operations.h which is failing. As far as I can tell
(visually and empirically because it does actually build elsewhere), it
*is* doing what the #error said — it *is* including xen_native.h before
any other Xen headers.

The first four non-comment lines of xen-operations.c should look
something like this...

  #include "qemu/osdep.h"
  #include "qemu/uuid.h"
  #include "qapi/error.h"

  #include "hw/xen/xen_native.h"

So... did you patch it so it doesn't start like that any more? Or does
one of those first three files (perhaps qemu/osdep.h?) end up bringing
in the Xen interface headers in a way that I didn't anticipate and
which doesn't seem to happen elsewhere?

I didn't cite the full gcc command line from line 24787 of your log
because it's huge. Can you run a variant of that command to just give
me the *preprocessed* output (-E -dD -o xen-operations.i).
^^^^^^^^^^^^^^^^^^^^^^^^^^
I tried, but failed: I cannot seem to generate the desired file "xen-operations.i".
I'm uploading my attempt and results in file: 
Woodhouse_command_request_1_202402012122.txt

Perhaps someone else could assist as it is clear I am unqualified to undertake David's directives?  I hope I have left a clear trail.
Comment 17 John L. Poole 2024-02-02 05:45:57 UTC
Created attachment 883971 [details]
Failed attempt to execute Woodhouse command
Comment 18 David Woodhouse 2024-02-02 06:32:09 UTC
Ah right, the #error makes the preprocessor bail out and not emit the file. Can you make your change (or preferably just change #error to #warning) and do the same?
Comment 19 John L. Poole 2024-02-02 13:55:14 UTC
Created attachment 884033 [details]
xen-operations.i (bzipped) requested by David Woodhouse

The compile statement was reaching back into /tmp/portage... instead of just within the preserved tree /home/jlpoole/build_problems.  So I modified the cached file:

   sed -i 's/error/warning/' /var/tmp/portage/app-emulation/xen-tools-4.18.0/work/xen-4.18.0/tools/qemu-xen/include/hw/xen/xen_native.h

Then the file  /tmp/xen-operations.i was created and is uploaded here.
I have a script of my session saved in my Bug transition directory which I can provide in the future if need be.
Comment 20 David Woodhouse 2024-02-02 17:27:18 UTC
D'oh. Sorry, you *said* it was coming from the command line. The preprocessed source wasn't necessary at all.

So... *why* is the Gentoo build putting -D__XEN_INTERFACE_VERSION__=… into CFLAGS when building QEMU? That seems wrong. I can't quite see where it's coming from.
Comment 21 John L. Poole 2024-02-02 18:17:11 UTC
(In reply to David Woodhouse from comment #20)
> D'oh. Sorry, you *said* it was coming from the command line. The
> preprocessed source wasn't necessary at all.
> 
> So... *why* is the Gentoo build putting -D__XEN_INTERFACE_VERSION__=… into
> CFLAGS when building QEMU? That seems wrong. I can't quite see where it's
> coming from.

That's why I started to try to determine where __XEN_INTERFACE_VERSION__ was being set and I concluded it was coming from what appears to be a global/superior setting of __XEN_LATEST_INTERFACE_VERSION__ (nota bene "LATEST") and I tried to determine where that was being set.  Realizing I was getting in over my head, I thought to take a more practical approach and compare the time lines of the Xen Project's session of its Makefile vs. Gentoo's emerge of app-emulation/xen-tools session.  I believe the xen-tools selectively omits the "xen" build and proceeds right with the tools.  My time line referenced above suggests this.

I then concluded that something must be set in the processing of the core xen module that carries over in the environment so when the tools are built, there is a value there or not (I'm still not clear on what your test is testing).  That's when I punted and tried remming out your 3-line headers check and then determined that your check was something new introduced in 4.18 (which hasn't officially been pursued by the Gentoo Xen team as they do not have the bandwidth) and that the check had not occurred in 4.17.3 and presumably all prior versions.  When you advised that unpredictable results could occur without that check, I began to wonder if all prior xen instances in Gentoo were subject to this fault, and possibly they were and nobody's been able to figure it out.  It's my impression that very few people use Gentoo's Xen package.

So, here we are.  I'm wondering if the Gentoo Xen Team might consider changing their paradigm of building tools first and then xen so as to bring their process into alignment with The Xen Project's approach: xen, then tools.  But, I can see the merit of making sure your tool set is solid before undertaking the compilation of xen.  That is a policy decision made long ago probably presumably by people no longer associated with the Xen Team in Gentoo and might be reconsidered.  I say presumably because nobody has come forward on this bug which ought to be a great interest to someone involved in the decision to bifurcate, in reverse, the build process.

I was hoping David W. might be able to assert what the headers should be and perhaps have an "if" clause, instead of #error, which either set the desired headers or #warns that you are using the desired headers instead of what is found in the environment?

I think it's time for people higher up in the Gentoo community to chime in as there are policy considerations here and I do not think it a good use of David Woodhouse's time to try and troubleshoot an issue which arises only because Gentoo has decided to reverse the sequence of building.  I'll alert Sam and Roy and invite them to take a look at this comment.

David Woodhouse: thank you for what time you have spent, let's see what the elders of Gentoo have to say about this.  Of course, if you can think of an easy fix, that would be great.  I'm inclined to change my patch to replace #error with #warning so at least the build can go forward as it apparently has for the last decade.  Of course, the caveat to whoever decided to "release" 4.18, is your warning that misalignment of headers can produce unpredictable results.