Created attachment 568246 [details] Serial Console Log March 3 - March 8 of many boots I've been wrestling with getting Xen to boot on my Intel Atom. What is happening is that at boot up of the Xen kernel, it often hangs at the point where the various CPUs are being masked. The stop point appears to be random. And, sometimes I get lucky and make it past the ExtINT stage and the Xen kernel successfully boots. (I still do not get a login: entry and have to access the instance via ssh.) What is novel here is that the hanging can occur after masked ExtINT on CPU#1 or #2 or #3 &etc. There is no pattern. I'm attaching my serial console log started March 3, 2019, which has been on APPEND mode since them. From my Serial console log started 2019.03.03 09:51:00 Some later points where boot of kernel hung in chronological order: Line# Entry 44254 (XEN) [2019-03-09 05:29:44] masked ExtINT on CPU#1 reboot 44448 (XEN) [2019-03-09 05:31:38] masked ExtINT on CPU#1 reboot 44644 (XEN) [2019-03-09 05:32:59] masked ExtINT on CPU#3 reboot 44867 (XEN) [2019-03-09 05:34:26] masked ExtINT on CPU#3 Note, I can successfully boot into a normal kernel zeta / # uname -a Linux zeta 4.19.23-gentoo #8 SMP Mon Mar 4 20:48:52 PST 2019 x86_64 Intel(R) Atom(TM) CPU C2750 @ 2.40GHz GenuineIntel GNU/Linux zeta / # I have another bug pending for app-emulation/xen-tools-4.12.0_rc4 in Bug #679824. I can provide you my /boot/grub/grub.cfg, /etec/default/grub, my kernel /usr/src/linux/.config. Just let me know what you'd like. I had posted to the Xen Users mailing list and no response. See: https://lists.xenproject.org/archives/html/xen-users/2019-03/msg00006.html https://lists.xenproject.org/archives/html/xen-users/2019-03/msg00018.html I direct you attention to my msg00018 posting where I report out a possible theory: (spurious interrupt) My postings on the Xen Users list also identifies where in the Xen code the problem may be occurring. I realize this issue may really be an issue within Xen, but I thought I'd start here just for the record.
So this is a new installation where no Xen version works. Have you tried legacy bios boot? Disabling xen security features? Xpti.. Does other os with xen work on it?
Responding to question of Tomáš Mózes 2019-03-09 11:07:27 UTC : 1) This is an upgrade installation, not a new installation. I purchased a Supermicro Intel Atom based unit in October 2016 and then undertook to install Xen. I had lots of problems trying to use Gentoo as the DOM0. I did successfully install using the Debian -- the then recommended by the Xen Project on their wiki. But I wanted Gentoo and I wanted to help clear the way for others desiring Gentoo instead of Debian. I ran into problems, for instance see Gentoo Bug #601872 "xen.gz Kernel Load And Hangs". My attempts around December 2016 revealed there was a bug in binutils or something re: "COFF and never PE" - see https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg00815.html. (Jan Beulich ?) I got around that issue by patching and/or using a patched package, then I ran into another problem with Grub. I ended up having a dialog with the person who seemed to specialize on Xen and Grub (also worked for my employers in another division) who advised that Grub was not quite ready for launching Xen, it might not be until Spring 2017. So I cut my losses and adopted an EFI console procedure where I manually loaded the kernel. That manual procedure served me reliably. Xen 4.7.1 or thereabouts was the version I was using. I do not recall ever running into the masked ExtINT issue then. In December 2018 through now, I thought I'd try to get Grub to work assuming the milestone of getting Grub to load Xen was working now. Specifications of this server are: Product SKU: SuperServer 5018A-TN4 (Black) Motherboard: Super A1SAi-2750F Processor/Cache: CPU Intel® Atom® Processor C2750 CPU TDP 20W (8-Core) FCBGA 1283 System-on-Chip System Memory: 4x 204-pin DDR3 SO-DIMM slots Supports up to 64GB DDR3 ECC memory From the sale Quote 11/1/2016: SYS-5018A-TN4-OTO-50 --OPTIMIZED SYS-5018A-TN4(x1)A1SAi-2750F, 504-203B --MEM-DR316L-CL02-ES16(x4)16GB DDR3-1600 1.35V 2RX8 ECC SODIMM --HDD-T4000-MG04ACA400E(x1)[NR]Toshiba 3.5" 4TB SATA 6Gb/s 7.2K RPM 128M 512E 2) I have not tried legacy BIOS. I recall looking into this option learning that "legacy BIOS" is just a mode UEFI runs to simulate BIOS. Had I the option of truly replacing EUFI for BIOS, I think I would have chosen to go with legacy BIOS. 3) I have not tried disabling XPti -- I do not know what that is, but I'll look into it and give a try and update this bug with my findings. 4) Yes, Debian in 2016 worked, I was able to boot into DOM0 without incident.
Created attachment 568328 [details] archive of grub & Linux .config Attaching more information: zeta /home/jlpoole/gentoobugs/679824 # tar -cjvf gentoo_bug_679824_addl.tar.bz2 addl addl/ addl/default_grub_201903090841 addl/boot_grub_grub_201903090842.cfg addl/linux_201903090840.config zeta /home/jlpoole/gentoobugs/679824 #
I tried modify my command line by adding: xpti=false per https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#xpti-x86 which provides: 1.2.186 xpti (x86) = List of [ default | <boolean> | dom0=<bool> | domu=<bool> ] Default: false on hardware known not to be vulnerable to Meltdown (e.g. AMD) Default: true everywhere else Override default selection of whether to isolate 64-bit PV guest page tables. true activates page table isolation even on hardware not vulnerable by Meltdown for all domains. false deactivates page table isolation on all systems for all domains. default sets the default behaviour. With dom0 and domu it is possible to control page table isolation for dom0 or guest domains only. Here are snippets from my log of my attempt: (XEN) Command line: placeholder vga=gfx-1024x768x16 com1=115200,8n1,pci console=com1,vga console_timestamps console_to_ring conring_size=64 log_buf_len=16M loglvl=all guest_loglvl=all sync_console=true sched_debug iommu=verbose apic_verbosity=verbose xpti=false no-real-mode edd=off ... (XEN) [2019-03-09 17:52:09] HVM: HAP page sizes: 4kB, 2MB (XEN) [2019-03-09 17:52:05] masked ExtINT on CPU#1 (XEN) [2019-03-09 17:52:05] masked ExtINT on CPU#2 [HUNG]
So I have been trying several times to start DOM0, and when I finally successfully pass through the portion of masking ExtINT, I'm able to log in remotely vis ssh. The console on the server and the serial port do not show a login prompt. The last entry on the serial console is "* Starting local ..." followed by "[ ok ]". Then nothing more appears on the serial console until I perform a shutdown from another console. But that's a minor point. Aside from that, I was configuring a bridge and then my ssh session hung. The console of the serial session had this error message: * Starting local ... [ ok ] [ 220.213278] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] (XEN) [2019-03-11 03:16:13] Watchdog timer fired for domain 0 (XEN) [2019-03-11 03:16:13] Hardware Dom0 shutdown: watchdog rebooting machine Lastly, I cannot say definitively, but, if I go through the following sequence, I seem to have better luck getting a successful launch of the kernel: In grub, selecting either of the XEN menus, then clicking "e" to edit. Then scroll a line or two. Then ctrl-X. Hit return for the default menu entries appearing thereafter, two of them. Then the (XEN) reporting starts.
Tomas Mozes brought to my attention this thread where Juergen Gross-3 on Jan 11, 2019 suggested setting "pcid=false": http://xen.1045712.n5.nabble.com/xen-domU-segfaults-with-xpti-on-intel-based-systems-td5744423.html So I tried adding "pcid=false" and the boot still hangs around the same place: Booting a command listBooting a command list Loading Xen xen ...Loading Xen xen ... WARNING: no console will be available to OSWARNING: no console will be available to OS Loading Linux x86_64-4.19.23-gentoo ...Loading Linux x86_64-4.19.23-gentoo ... Loading initial ramdisk ...Loading initial ramdisk ... error: no suitable video mode found. error: no suitable video mode found. Xen 4.11.1 (XEN) Xen version 4.11.1 (@[unknown]) (x86_64-pc-linux-gnu-gcc (Gentoo 7.3.0-r3 p1.4) 7.3.0) debug=n Wed Mar 6 19:34:00 PST 2019 (XEN) Latest ChangeSet: (XEN) Console output is synchronous. (XEN) Bootloader: GRUB 2.02 (XEN) Command line: placeholder pcid=false vga=gfx-1024x768x16 com1=115200,8n1,pci console=com1,vga console_timestamps console_to_ring conring_size=64 log_buf_len=16M loglvl=all guest_loglvl=all sync_console=true sched_debug iommu=verbose apic_verbosity=verbose xpti=false no-real-mode edd=off (XEN) Xen image load base address: 0 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) Disc information: (XEN) Found 0 MBR signatures (XEN) Found 0 EDD information structures (XEN) Multiboot-e820 RAM map: (XEN) 0000000000000000 - 00000000000a0000 (usable) (XEN) 0000000000100000 - 000000007e16d000 (usable) (XEN) 000000007e16d000 - 000000007eba4000 (reserved) (XEN) 000000007eba4000 - 000000007ed12000 (usable) (XEN) 000000007ed12000 - 000000007f28d000 (ACPI NVS) (XEN) 000000007f28d000 - 000000007f5f3000 (reserved) (XEN) 000000007f5f3000 - 000000007f648000 type 20 (XEN) 000000007f648000 - 000000007f800000 (usable) (XEN) 00000000e0000000 - 00000000e4000000 (reserved) (XEN) 00000000fed01000 - 00000000fed04000 (reserved) (XEN) 00000000fed08000 - 00000000fed09000 (reserved) (XEN) 00000000fed0c000 - 00000000fed10000 (reserved) (XEN) 00000000fed1c000 - 00000000fed1d000 (reserved) (XEN) 00000000fef00000 - 00000000ff000000 (reserved) (XEN) 00000000ff800000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000ff0000000 (usable) (XEN) New Xen image base address: 0x7da00000 (XEN) ACPI Error (tbxfroot-0217): A valid RSDP was not found [20070126] (XEN) System RAM: 63204MB (64721100kB) (XEN) No NUMA configuration found (XEN) Faking a node at 0000000000000000-0000000ff0000000 (XEN) Domain heap initialised (XEN) Allocated console ring of 64 KiB. (XEN) CPU Vendor: Intel, Family 6 (0x6), Model 77 (0x4d), Stepping 8 (raw 000406d8) (XEN) found SMP MP-table at 000fd8a0 (XEN) DMI 2.7 present. (XEN) Using APIC driver default (XEN) Intel MultiProcessor Specification v1.4 (XEN) Virtual Wire compatibility mode. (XEN) OEM ID: A M I Product ID: ALASKA APIC at: 0xfee00000 (XEN) Processor #00 6:13 APIC version 20 (XEN) Processor #02 6:13 APIC version 20 (XEN) Processor #04 6:13 APIC version 20 (XEN) Processor #06 6:13 APIC version 20 (XEN) Processor #08 6:13 APIC version 20 (XEN) Processor #0a 6:13 APIC version 20 (XEN) Processor #0c 6:13 APIC version 20 (XEN) Processor #0e 6:13 APIC version 20 (XEN) I/O APIC #2 Version 32 at 0xfec00000. (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs (XEN) Processors: 8 (XEN) SMP: Allowing 8 CPUs (0 hotplug CPUs) (XEN) mapped APIC to ffff82cfffffb000 (fee00000) (XEN) mapped IOAPIC to ffff82cfffffa000 (fec00000) (XEN) IRQ limits: 24 GSI, 1528 MSI/MSI-X (XEN) CPU0: Intel machine check reporting enabled (XEN) Unrecognised CPU model 0x4d - assuming not reptpoline safe (XEN) Speculative mitigation facilities: (XEN) Hardware features: (XEN) Compiled-in support: INDIRECT_THUNK SHADOW_PAGING (XEN) Xen settings: BTI-Thunk RETPOLINE, SPEC_CTRL: No, Other: (XEN) Support for VMs: PV: RSB, HVM: RSB (XEN) XPTI (64-bit PV only): Dom0 disabled, DomU disabled (XEN) PV L1TF shadowing: Dom0 disabled, DomU disabled (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Platform timer is 1.193MHz PIT (XEN) Detected 2400.052 MHz processor. (XEN) Initing memory sharing. (XEN) alt table ffff82d08042a838 -> ffff82d08042c5ce (XEN) I/O virtualisation disabled (XEN) nr_sockets: 1 (XEN) enabled ExtINT on CPU#0 (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) init IO_APIC IRQs (XEN) IO-APIC (apicid-pin) 2-0, 2-6, 2-7, 2-10, 2-11, 2-12, 2-15 not connected. (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) number of MP IRQ sources: 39. (XEN) number of IO-APIC #2 registers: 24. (XEN) testing the IO APIC....................... (XEN) IO APIC #2...... (XEN) .... register #00: 02000000 (XEN) ....... : physical APIC id: 02 (XEN) ....... : Delivery Type: 0 (XEN) ....... : LTS : 0 (XEN) .... register #01: 00170020 (XEN) ....... : max redirection entries: 0017 (XEN) ....... : PRQ implemented: 0 (XEN) ....... : IO APIC version: 0020 (XEN) .... IRQ redirection table: (XEN) NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: (XEN) 00 000 00 1 0 0 0 0 0 0 00 (XEN) 01 001 01 0 0 0 0 0 1 1 28 (XEN) 02 001 01 0 0 0 0 0 1 1 F0 (XEN) 03 001 01 0 0 0 0 0 1 1 30 (XEN) 04 001 01 1 0 0 0 0 1 1 F1 (XEN) 05 001 01 0 0 0 0 0 1 1 38 (XEN) 06 000 00 1 0 0 0 0 0 0 00 (XEN) 07 000 00 1 0 0 0 0 0 0 00 (XEN) 08 001 01 0 0 0 0 0 1 1 40 (XEN) 09 001 01 1 1 0 0 0 1 1 48 (XEN) 0a 000 00 1 0 0 0 0 0 0 00 (XEN) 0b 000 00 1 0 0 0 0 0 0 00 (XEN) 0c 000 00 1 0 0 0 0 0 0 00 (XEN) 0d 001 01 0 0 0 0 0 1 1 50 (XEN) 0e 001 01 0 0 0 0 0 1 1 58 (XEN) 0f 000 00 1 0 0 0 0 0 0 00 (XEN) 10 001 01 1 1 0 1 0 1 1 60 (XEN) 11 001 01 1 1 0 1 0 1 1 68 (XEN) 12 001 01 1 1 0 1 0 1 1 70 (XEN) 13 001 01 1 1 0 1 0 1 1 78 (XEN) 14 001 01 1 1 0 1 0 1 1 88 (XEN) 15 001 01 1 1 0 1 0 1 1 90 (XEN) 16 001 01 1 1 0 1 0 1 1 98 (XEN) 17 001 01 1 1 0 1 0 1 1 A0 (XEN) Using vector-based indexing (XEN) IRQ to pin mappings: (XEN) IRQ240 -> 0:2 (XEN) IRQ40 -> 0:1 (XEN) IRQ48 -> 0:3 (XEN) IRQ241 -> 0:4 (XEN) IRQ56 -> 0:5 (XEN) IRQ64 -> 0:8 (XEN) IRQ72 -> 0:9 (XEN) IRQ80 -> 0:13 (XEN) IRQ88 -> 0:14 (XEN) IRQ96 -> 0:16 (XEN) IRQ104 -> 0:17 (XEN) IRQ112 -> 0:18 (XEN) IRQ120 -> 0:19 (XEN) IRQ136 -> 0:20 (XEN) IRQ144 -> 0:21 (XEN) IRQ152 -> 0:22 (XEN) IRQ160 -> 0:23 (XEN) .................................... done. (XEN) Using local APIC timer interrupts. (XEN) calibrating APIC timer ... (XEN) ..... CPU clock speed is 2400.0484 MHz. (XEN) ..... host bus clock speed is 100.0019 MHz. (XEN) ..... bus_scale = 0x6669 (XEN) TSC deadline timer enabled (XEN) [2019-03-11 11:54:13] mwait-idle: MWAIT substates: 0x3000020 (XEN) [2019-03-11 11:54:13] mwait-idle: v0.4.1 model 0x4d (XEN) [2019-03-11 11:54:13] mwait-idle: lapic_timer_reliable_states 0xffffffff (XEN) [2019-03-11 11:54:13] VMX: Supported advanced features: (XEN) [2019-03-11 11:54:13] - APIC MMIO access virtualisation (XEN) [2019-03-11 11:54:13] - APIC TPR shadow (XEN) [2019-03-11 11:54:13] - Extended Page Tables (EPT) (XEN) [2019-03-11 11:54:13] - Virtual-Processor Identifiers (VPID) (XEN) [2019-03-11 11:54:13] - Virtual NMI (XEN) [2019-03-11 11:54:13] - MSR direct-access bitmap (XEN) [2019-03-11 11:54:13] - Unrestricted Guest (XEN) [2019-03-11 11:54:13] - VM Functions (XEN) [2019-03-11 11:54:13] HVM: ASIDs enabled. (XEN) [2019-03-11 11:54:13] HVM: VMX enabled (XEN) [2019-03-11 11:54:13] HVM: Hardware Assisted Paging (HAP) detected (XEN) [2019-03-11 11:54:13] HVM: HAP page sizes: 4kB, 2MB (XEN) [2019-03-11 11:54:09] masked ExtINT on CPU#1 (XEN) [2019-03-11 11:54:09] masked ExtINT on CPU#2 [HUNG] Also, my perceived routine of going into edit mode and then not making any edits and invoking the command with Ctrl-X and subsequently accepting the default menu entries for XEN does not seem to make a difference on whether I get past the "masked ExtINT" issue. Lastly, I am still able to boot a regular Gentoo kernel and go into a non-Xen mode without incident. I was beginning to wonder if I have a hardware issue, but my ability to launch a regular Gentoo kernel suggests the problem I am encountering is something in the Xen Kernel that is not properly handling an interrupt of the CPU or whatever is going on in api.c's function setup_local_APIC(void) https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/apic.c;h=2a2432619e3edce2cdbc275abbd4e80ffcdcd9f0;hb=HEAD#l524
For anyone following this bug and wanting to learn more about interrupts, here is an explanation of masking interrupts in the IA-32 Intel® Architecture Software Developer’s Manual (dated 2001), a copy of which is at: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&uact=8&ved=2ahUKEwi4_aOsjPrgAhUIvp4KHQ0DC2wQFjADegQICBAC&url=https%3A%2F%2Fwww.cs.cmu.edu%2F~410%2Fdoc%2Fintel-sys.pdf&usg=AOvVaw1g3zshJSuA3-7Y5mKO_ajJ 5.1.1.2. MASKABLE HARDWARE INTERRUPTS Any external interrupt that is delivered to the processor by means of the INTR pin or through the local APIC is called a maskable hardware interrupt. The maskable hardware interrupts that can be delivered through the INTR pin include all IA-32 architecture defined interrupt vectors from 0 through 255; those that can be delivered through the local APIC include interrupt vectors 16 through 255. [sheet 140] Sheet 146 of the IA-32 Intel® Architecture Software Developer’s Manual has "5.6.1 Masking Maskable Hardware Interrupts" I guess I'll explore the BIOS settings for my processor and see if there is any configuration which affects handling of interrupts and/or EUFI.
I altered a setting in BIOS: Extended APIC from enabled to disabled Boot up hung after: (XEN) [2019-03-11 13:04:00] HVM: HAP page sizes: 4kB, 2MB (XEN) [2019-03-11 13:03:56] masked ExtINT on CPU#1
I tried setting acpi_verbosity to its other setting of "debug" to see if there was any more information output around the hanging point. Conclusion: no difference between "verbose" vs. "debug" for acpi_verbosity Here are my two attempts (2nd I removed xpti parameter) and their final output: (XEN) Command line: placeholder vga=gfx-1024x768x16 com1=115200,8n1,pci console=com1,vga console_timestamps console_to_ring conring_size=64 log_buf_len=16M loglvl=all guest_loglvl=all sync_console=true sched_debug iommu=verbose apic_verbosity=debug xpti=false no-real-mode edd=off (XEN) [2019-03-11 18:00:42] HVM: HAP page sizes: 4kB, 2MB (XEN) [2019-03-11 18:00:38] masked ExtINT on CPU#1 (XEN) [2019-03-11 18:00:38] masked ExtINT on CPU#2 (XEN) Command line: placeholder vga=gfx-1024x768x16 com1=115200,8n1,pci console=com1,vga console_timestamps console_to_ring conring_size=64 log_buf_len=16M loglvl=all guest_loglvl=all sync_console=true sched_debug iommu=verbose apic_verbosity=debug no-real-mode edd=off (XEN) [2019-03-11 18:02:58] HVM: ASIDs enabled. (XEN) [2019-03-11 18:02:58] HVM: VMX enabled (XEN) [2019-03-11 18:02:58] HVM: Hardware Assisted Paging (HAP) detected (XEN) [2019-03-11 18:02:58] HVM: HAP page sizes: 4kB, 2MB
I propose creating a patch for the kernel code which provide more details of events leading up to the hang. I feel competent to insert print statements after important events in various *.c files, e.g. api.c. and setup.c. It's been years since I've done something like this. What I am intending on doing is creating a custom copy ebuild under /usr/local/portage... then having patches in a subdirectory. The problem I am encountering is looking like something that should of interest to the Xen code maintainers, I'd like to make it as easy as possible for them to focus on this so a resolution or analysis can be made. Suggestions? Comments?
Tomas Mozes noted a search of "xen efi supermicro masked ExtINT on CPU#1" gives https://lists.xenproject.org/archives/html/xen-devel/2015-12/msg00653.html and possible work-around. I tried adding "efi=no-rs" and "reboot=acpi" to the kernel command line and the boot still hung. Subsequent responses to the referenced posting suggest using only the "reboot=acpi", so I tried that alone, as well. The result was the same: Below is a log including JLPDEBUG statements I added to isolate the point of failure. Xen 4.11.1 (XEN) Xen version 4.11.1 (@[unknown]) (x86_64-pc-linux-gnu-gcc (Gentoo 7.3.0-r3 p1.4) 7.3.0) debug=n Mon Mar 11 20:57:43 PDT 2019 (XEN) Latest ChangeSet: (XEN) Console output is synchronous. (XEN) Bootloader: GRUB 2.02 (XEN) Command line: placeholder vga=gfx-1024x768x16 com1=115200,8n1,pci console=com1,vga console_timestamps console_to_ring conring_size=64 log_buf_len=16M loglvl=all guest_loglvl=all sync_console=true sched_debug iommu=verbose apic_verbosity=debug xpti=false no-real-mode edd=off efi=no-rs reboot=acpi ... (XEN) [2019-03-12 05:04:21] HVM: Hardware Assisted Paging (HAP) detected (XEN) [2019-03-12 05:04:21] HVM: HAP page sizes: 4kB, 2MB (XEN) [2019-03-12 05:04:17] JLPDEBUG 527 starting setup_local_APIC() (XEN) [2019-03-12 05:04:17] JLPDEBUG 535 after pounding w/big hammer. (XEN) [2019-03-12 05:04:17] JLPDEBUG 550 after init_apc_ldr() (XEN) [2019-03-12 05:04:17] JLPDEBUG 555 starting after apic_write (XEN) [2019-03-12 05:04:17] JLPDEBUG 574 after for loop (XEN) [2019-03-12 05:04:17] JLPDEBUG 627 after apic_write. (XEN) [2019-03-12 05:04:17] masked ExtINT on CPU#1 (XEN) [2019-03-12 05:04:17] JLPDEBUG 649 after apic_write CPU#1 (XEN) [2019-03-12 05:04:17] JLPDEBUG 658 after if CPU#1 (XEN) [2019-03-12 05:04:17] JLPDEBUG 662 after apic_write CPU#1 (XEN) [2019-03-12 05:04:17] JLPDEBUG 673 after apic_write CPU#1 (XEN) [2019-03-12 05:04:17] JLPDEBUG 680 after apic_write CPU#1 (XEN) [2019-03-12 05:04:17] JLPDEBUG 700 after apic_pm_activate() CPU#1 (XEN) [2019-03-12 05:04:17] JLPDEBUG 527 starting setup_local_APIC() (XEN) [2019-03-12 05:04:17] JLPDEBUG 535 after pounding w/big hammer. (XEN) [2019-03-12 05:04:17] JLPDEBUG 550 after init_apc_ldr() (XEN) [2019-03-12 05:04:17] JLPDEBUG 555 starting after apic_write (XEN) [2019-03-12 05:04:17] JLPDEBUG 574 after for loop (XEN) [2019-03-12 05:04:17] JLPDEBUG 627 after apic_write. (XEN) [2019-03-12 05:04:17] masked ExtINT on CPU#2 (XEN) [2019-03-12 05:04:17] JLPDEBUG 649 after apic_write CPU#2 (XEN) [2019-03-12 05:04:17] JLPDEBUG 658 after if CPU#2 (XEN) [2019-03-12 05:04:17] JLPDEBUG 662 after apic_write CPU#2 (XEN) [2019-03-12 05:04:17] JLPDEBUG 673 after apic_write CPU#2 (XEN) [2019-03-12 05:04:17] JLPDEBUG 680 after apic_write CPU#2 (XEN) [2019-03-12 05:04:17] JLPDEBUG 700 after apic_pm_activate() CPU#2 (XEN) [2019-03-12 05:04:17] JLPDEBUG 527 starting setup_local_APIC() (XEN) [2019-03-12 05:04:17] JLPDEBUG 535 after pounding w/big hammer. (XEN) [2019-03-12 05:04:17] JLPDEBUG 550 after init_apc_ldr() (XEN) [2019-03-12 05:04:17] JLPDEBUG 555 starting after apic_write (XEN) [2019-03-12 05:04:17] JLPDEBUG 574 after for loop (XEN) [2019-03-12 05:04:17] JLPDEBUG 627 after apic_write. (XEN) [2019-03-12 05:04:17] masked ExtINT on CPU#3 (XEN) [2019-03-12 05:04:17] JLPDEBUG 649 after apic_write CPU#3 (XEN) [2019-03-12 05:04:17] JLPDEBUG 658 after if CPU#3 (XEN) [2019-03-12 05:04:17] JLPDEBUG 662 after apic_write CPU#3 (XEN) [2019-03-12 05:04:17] JLPDEBUG 673 after apic_write CPU#3 (XEN) [2019-03-12 05:04:17] JLPDEBUG 680 after apic_write CPU#3 (XEN) [2019-03-12 05:04:17] JLPDEBUG 700 after apic_pm_activate() CPU#3 [HUNG] ================= 2nd try with just "reboot=acpi" ======================== (XEN) Command line: placeholder vga=gfx-1024x768x16 com1=115200,8n1,pci console=com1,vga console_timestamps console_to_ring conring_size=64 log_buf_len=16M loglvl=all guest_loglvl=all sync_console=true sched_debug iommu=verbose apic_verbosity=debug xpti=false no-real-mode edd=off reboot=acpi (XEN) Xen image load base address: 0 ... (XEN) JLPOOLEDEBUG_smpboot 1155 before connect_bsp_APIC()<2>JLPOOLEDEBUG_smpboot 1157 before setup_local_APIC()JLPDEBUG 527 starting setup_local_APIC() (XEN) JLPDEBUG 535 after pounding w/big hammer. (XEN) JLPDEBUG 550 after init_apc_ldr() (XEN) JLPDEBUG 555 starting after apic_write (XEN) JLPDEBUG 574 after for loop (XEN) JLPDEBUG 627 after apic_write. (XEN) enabled ExtINT on CPU#0 (XEN) JLPDEBUG 649 after apic_write CPU#0 (XEN) JLPDEBUG 658 after if CPU#0 (XEN) JLPDEBUG 662 after apic_write CPU#0 (XEN) JLPDEBUG 673 after apic_write CPU#0 (XEN) JLPDEBUG 680 after apic_write CPU#0 (XEN) JLPDEBUG 700 after apic_pm_activate() CPU#0 (XEN) JLPOOLEDEBUG_smpboot 1159 before setup_io_apic()ENABLING IO-APIC IRQs (XEN) -> Using new ACK method ... (XEN) [2019-03-12 05:22:23] HVM: VMX enabled (XEN) [2019-03-12 05:22:23] HVM: Hardware Assisted Paging (HAP) detected (XEN) [2019-03-12 05:22:23] HVM: HAP page sizes: 4kB, 2MB [HUNG]
I think I may have found the problem. In /etc/grub.d/20_linux_xen starting at line 119 is a "sed" insert. sed "s/^/$submenu_indentation/" << EOF echo '$(echo "$xmessage" | grub_quote)' if [ "\$grub_platform" = "pc" -o "\$grub_platform" = "" ]; then xen_rm_opts= else xen_rm_opts="no-real-mode edd=off" echo 'WARNING: JLPOOLE HACK of /etc/grub.d/20_linux_xen since failed id of pc' xen_rm_opts= fi multiboot ${rel_xen_dirname}/${xen_basename} placeholder ${xen_args} \${xen_rm_opts} echo '$(echo "$lmessage" | grub_quote)' module ${rel_dirname}/${basename} placeholder root=${linux_root_device_thisversion} ro ${args} EOF The above includes my modification of the "else" clause. Early on in addressing this bug I noticed the the test for the "if" clause resulted in my grub_platform NOT being a "pc". So the else clause was triggered and xen_rm_opts was being populated with two kernel parameters that then were included in my Xen kernel launch. When the trial of the two additional parameters "efi=no-rs" and "reboot=acpi" did not affect anything, I later remembered that the entire kernel line invocation was tainted with the two values populated in xen_rm_opts. So I went back an hacked /etc/grub.d/20_linux_xen to mimic the "then" clause. So far I have had 3 successful boot-ups in a row just selected the: *Gentoo GNU/Linux, with Xen hypervisor menu option in grub without further editing/modification. Therefore the absence of "no-real-mode edd=off" and the addition of "efi=no-rs reboot=acpi" seems to be working. I'll try 3 more boot-ups to verify. All of this is premised upon the fact that the test ("Grub Platform Test"): "\$grub_platform" = "pc" -o "\$grub_platform" = "" should equate to true which it does not on my system (UEFI). Should the Grub Platform Test equate to "pd" or "" on an Intel Atom based procesor with UEFI?
Alas, I successfully rebooted, and then reset after the successfully getting past the posting of the ExtINT items, and the 2nd time the Xen kernel hung. I then unplugged the unit and started afresh, and the Xen kernel hung. Here's my latest log: Xen 4.11.1 (XEN) Xen version 4.11.1 (@[unknown]) (x86_64-pc-linux-gnu-gcc (Gentoo 7.3.0-r3 p1.4) 7.3.0) debug=n Mon Mar 11 20:57:43 PDT 2019 (XEN) Latest ChangeSet: (XEN) Console output is synchronous. (XEN) Bootloader: GRUB 2.02 (XEN) Command line: placeholder vga=gfx-1024x768x16 com1=115200,8n1,pci console=com1,vga console_timestamps console_to_ring conring_size=64 log_buf_len=16M loglvl=all guest_loglvl=all sync_console=true sched_debug iommu=verbose apic_verbosity=verbose efi=no-rs reboot=acpi (XEN) Xen image load base address: 0 ... (XEN) [2019-03-12 14:30:56] HVM: VMX enabled (XEN) [2019-03-12 14:30:56] HVM: Hardware Assisted Paging (HAP) detected (XEN) [2019-03-12 14:30:56] HVM: HAP page sizes: 4kB, 2MB [HUNG]
Created attachment 568922 [details] Crib Notes For Making/Deploying Patch Here are crib notes for creating the patch I made to add debug statements for xen-4.11.1-rc1
Created attachment 568924 [details, diff] Debug Patch with JLPOOLEDEBUG Here's a current patch as of March 12, 2019. I'm awaiting further word from the Xen Mailing list as to what other files the failure point could be in after completing apic.c's function. See https://lists.xenproject.org/archives/html/xen-users/2019-03/msg00026.html
Created attachment 568926 [details] output of dmidecode (637 lines)
Created attachment 568928 [details] lspci -vvv output (513 lines)
Created attachment 568930 [details] Xen Console All Diagnostics [ key '*' (ascii '2a') => print all diagnostics] I successfully boot and then in my serial console pressed Control-A thrice entering into the Xen Console. I then pushed "h" for help and later "*" for a complete "all" diagnostics output. Moments after the all diagnostics (line "...done"), the server on its own accord through watchdog rebooted: (XEN) [2019-03-13 02:14:20] .................................... done. (XEN) [2019-03-13 02:14:39] Watchdog timer fired for domain 0 (XEN) [2019-03-13 02:14:39] Hardware Dom0 shutdown: watchdog rebooting machine
*** Bug 680472 has been marked as a duplicate of this bug. ***
I've logged Bug #680472 which arises from kernel 4.12.0. This bug, Bug #679826, arises from kernel 4.11.1. Each bug relates to a specific kernel and while 4.12.0 was marked RESOLVED, that status does not accurately depict the status of kernel 4.12.0. Since I'm running on 4.12.0 now, I'll be updating #680472 so as to keep the outputs derived from each kernel isolated.
Have you also tested linux kernel 4.14?
(In reply to Tomáš Mózes from comment #21) > Have you also tested linux kernel 4.14? I have not. I can do so if you indicate it will be helpful. I'd just take the 4.12.0_rc4 ebuilds and change the versions and cross my fingers. I would also like to know if I should then remain on 4.14, or revert back to 4.12.0 or whatever.
Do you pass all the dom0 kernel requirements? https://wiki.xenproject.org/wiki/Mainline_Linux_Kernel_Configs
(In reply to John L. Poole from comment #22) > (In reply to Tomáš Mózes from comment #21) > > Have you also tested linux kernel 4.14? > > I have not. I can do so if you indicate it will be helpful. I'd just take > the 4.12.0_rc4 ebuilds and change the versions and cross my fingers. I > would also like to know if I should then remain on 4.14, or revert back to > 4.12.0 or whatever. I mean the linux kernel, not xen.
I don't use kernel 4.19 yet, but have a few machines on 4.14 lts. Please note the difference between the xen versions (4.10, 4.11, 4.12) and the linux kernel versions (4.14, 4.19, 4.20, 5.0).
This expert analysis just came in from a Xen Developer Andrew Cooper at Citrix: conclusion: ...the root of your problem is that Xen can't find the ACPI tables, which is either going to be a grub or a Xen build misconfiguration. Further discussion: So the first problem is that there aren't any APCI tables to be found. You're presumably booting in EFI mode, but either Grub hasn't handed the SystemTable/etc to Xen, or Xen wasn't built with an EFI-capable toolchain and isn't capable of receiving them via the extended multiboot2 protocol. One way or another, this is the root of the problem. Complete email at: https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg01279.html I'm going to hold off pursuing kernel changes as proposed in the last hour and consider Mr. Cooper's analysis and see what I can make of it.
For posterity, here is the version of grub I've been working with: zeta /home/jlpoole # eix grub -I [I] sys-boot/grub Available versions: (2) 2.02-r1(2/2.02-r1)^st ~2.02-r2(2/2.02-r2)^st ~2.02-r3(2/2.02-r3)^st **9999(2/9999)^st {debug device-mapper doc efiemu +fonts libzfs mount multislot nls sdl static test +themes truetype GRUB_PLATFORMS="coreboot efi-32 efi-64 emu ieee1275 loongson multiboot pc qemu qemu-mips uboot xen xen-32"} Installed versions: 2.02-r1(2/2.02-r1)^st(11:16:22 PM 03/02/2019)(fonts nls themes -debug -device-mapper -doc -efiemu -libzfs -mount -multislot -sdl -static -test -truetype GRUB_PLATFORMS="efi-64 pc xen -coreboot -efi-32 -emu -ieee1275 -loongson -multiboot -qemu -qemu-mips -uboot -xen-32") Homepage: https://www.gnu.org/software/grub/ Description: GNU GRUB boot loader zeta /home/jlpoole # Since Andrew Cooper has indicated there may be a grub issue afoot plus the fact that two years ago grub was not ready for UEFI xen support, I'm going to focus on grub for the moment. It seems to me I probably should have "multiboot", though the problem is happening solely within the early stages of the Xen kernel. Could someone from the Gentoo Xen team indicate their use flags for grub?
I have a single uefi system: sys-boot/grub-2.02-r3::gentoo was built with the following: USE="-debug -device-mapper -doc -efiemu -fonts -libzfs -mount -multislot -nls -sdl -static (-test) -themes -truetype" ABI_X86="(64)" GRUB_PLATFORMS="efi-64 -coreboot -efi-32 -emu -ieee1275 -loongson -multiboot -pc -qemu -qemu-mips -uboot -xen -xen-32" CFLAGS="" LDFLAGS="" app-emulation/xen-4.10.3-r1::gentoo was built with the following: USE="efi -custom-cflags -debug -flask" ABI_X86="(64)" CFLAGS="" LDFLAGS="" app-emulation/xen-tools-4.10.3-r1::gentoo was built with the following: USE="hvm pam qemu qemu-traditional screen -api -custom-cflags -debug -doc -flask -ocaml -ovmf -pygrub -python -sdl -static-libs -system-qemu -system-seabios" ABI_X86="(64)" PYTHON_TARGETS="python2_7" CFLAGS="-fno-strict-overflow" CXXFLAGS="-mtune=native -O2 -pipe -fno-strict-overflow" LDFLAGS=""
Upgraded grub: zeta /home/jlpoole # eix -I grub [I] sys-boot/grub Available versions: (2) 2.02-r1(2/2.02-r1)^st ~2.02-r2(2/2.02-r2)^st (~)2.02-r3(2/2.02-r3)^st **9999(2/9999)^st {debug device-mapper doc efiemu +fonts libzfs mount multislot nls sdl static test +themes truetype GRUB_PLATFORMS="coreboot efi-32 efi-64 emu ieee1275 loongson multiboot pc qemu qemu-mips uboot xen xen-32"} Installed versions: 2.02-r3(2/2.02-r3)^st(06:47:25 PM 03/15/2019)(fonts nls themes -debug -device-mapper -doc -efiemu -libzfs -mount -multislot -sdl -static -test -truetype GRUB_PLATFORMS="efi-64 pc xen -coreboot -efi-32 -emu -ieee1275 -loongson -multiboot -qemu -qemu-mips -uboot -xen-32") Homepage: https://www.gnu.org/software/grub/ Description: GNU GRUB boot loader zeta /home/jlpoole # Tried rebooting twice. Same problem. To clarify, I think Andrew Cooper meant "ACPI", not "APCI" when he wrote "aren't any APCI tables". There seems to be some recent threads re: ACPI on the Xen List, so I'm going to research those. In https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg00524.html Andrew Cooper suggests "Upgrade Grub to 2.02". My previous grub was 2.02-r1, now I am at r3.
(In reply to John L. Poole from comment #29) > In https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg00524.html > Andrew Cooper suggests "Upgrade Grub to 2.02". My previous grub was > 2.02-r1, now I am at r3. I remember installing a new HP DL360 Gen9 server 2 years ago with efi and it only worked while booting a normal kernel, however under Xen only 1 cpu was reported. That's why i switched back to legacy boot and it worked fine.
OK the easier path now is to explore if my hardware will give me the opportunity to be in non-EUFI mode. I'm certain I went through this exercise, but did not document it.
I was unable to find any setting in the BIOS menu that allows me to be in a mode other than UEFI. Moreover, I consulted the Supermicro Manual (https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=2ahUKEwiOqM3R4IXhAhWE-lQKHWE1D_MQFjABegQIABAC&url=https%3A%2F%2Fwww.supermicro.com%2Fmanuals%2Fmotherboard%2FAtom_on-chip%2FMNL-1568.pdf&usg=AOvVaw168wtPJE65dUbO8SsJMEgy) for "B1SA4-2750F B1SA4-2550F" and there is no mention of switching BIOS or legacy BIOS. In fact, they state "The B1SA4-2750F/B1SA4-2550F Motherboard is a micro cloud motherboard optimized for the Supermicro Microblade chassis." Ironically, the also state: "This product is intended to be installed and serviced by professional technicians." I'm resigned to the fact that I am not a professional. :(
Progress: I have resorted to launching the Xen kernel from an EFI console. To get to the EFI console, I've been letting grub load and then "c" for command line, and "exit" to exit grub which drops me into an EFI console session. One can go directly to an EFI console session before entering grub. Here are some highlights of what I have learned: 1) the serial console set-up I have (using Windows PuTTY 7.7.0.40 [2019] -- yes, you have to "contribute" to obtain a recent version) seems to be introducing invisible non-ASCII characters, e.g. \177, into the text. These characters could have been coming from my Windows session of Notepad++ or Emacs from a regular Gentoo kernel session, probably the former. So I ended up using the keyboard connected to the USB port on the Xen server to assure no introduction of extraneous invisible characters. Also, I do have a USB extended cable and the cable may be causing some problems. 2) This is really quirky. I kept getting the error message below when executing a command, i.e. xen-4.12.0-rc.efi -cfg=xen.cfg, that I thought had previously worked years ago. I'm certain I had configuration files like "jp.cfg", but no matter what I tried, I kept getting the "No configuration file found" from the just launched xen-4.12.0-rc.efi The error message may have been generated because of non-ASCII characters in the command line which I later discovered in an editing session using nano. [31;1H[1;33;40mfs0:\efi\gentoo> xen-4.12.0-rc.efi -cfg=xen.cfg[31;48H[0;37;40m Xen 4.12.0-rc (c/s ) EFI loader No configuration file found. At any rate, I finally edited in the EFI console (for EFI commands, see https://software.intel.com/en-us/articles/efi-shells-and-scripting/) a file named xen.cfg and I launched the command "xen-4.12.0-rc.efi " just by itself and with no "-cfg=..." specification and let its built-in search facility to find a configuration file in the same directory do its job and finally got past the "No configuration file found." Heed this: "To illustrate the name handling, a binary named xen-4.2-unstable.efi would try xen-4.2-unstable.cfg, xen-4.2.cfg, xen-4.cfg, and xen.cfg in order." from http://xenbits.xenproject.org/docs/unstable/misc/efi.html A successful load of the kernel has this output on the serial console, and then no more: Xen 4.12.0-rc (c/s ) EFI loader Using configuration file 'xen.cfg' xen-4.12.0-rc.gz: 0x000000005ad2b000-0x000000005ae49573 0x0000:0x02:0x00.0x0: ROM: 0x8000 bytes at 0x7c8bc028 Everything thing else, e.g. the (XEN) postings, just goes to the console attached to the Xen server which cannot be routed to a log file. Note, I do not have this problem when launching from grub, so the serial console settings within the xen.cfg are not set correctly and I'll have to sort that out. 3) I get the same inconsistent stopping points when loading the Xen kernel from EFI. Since I cannot capture the output as it scrolls by, I have only the final postings to compare. But I'm guessing in the output was the warning Andrew Cooper noted that indicated that the kernel cannot find the correct table. This is important because it demonstrates that the Xen kernel I have built is having the problem and since grub is not part of the equation. I'm going to now review the Xen kernel configuration and logs of the xen-tools emerge which builds the kernel (I believe because the app-emulation/xen-tools log are so large and the app-emulation/xen log is only a few line).
Created attachment 569408 [details] Boot Log (Unsuccessful) from EFI Console The error message Andrew Cooper identified from the grub2 boot attempt: (XEN) ACPI Error (tbxfroot-0217): A valid RSDP was not found [20070126] is generated by xen / drivers / acpi / tables / tbxfroot.c at line 217. See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/drivers/acpi/tables/tbxfroot.c;h=18e5ad6e5a18804d80434354425dd0b7bb224e76;hb=HEAD#l217 ACPI is Advanced Configuration and Power Interface (ACPI) -- see wikipedia. It is a Power Management and configuration standard for the PC, developed by Intel, Microsoft and Toshiba. https://wiki.osdev.org/ACPI To begin using ACPI, the operating system must look for the RSDP (Root System Description Pointer). A RSDP is Root System Description Pointer. See https://wiki.osdev.org/RSDP, especially https://wiki.osdev.org/RSDP#Detecting_the_RSDP Line 217 is the line before the final return of tbxfroot's function acpi_tb_scan_memory_for_rsdp(u8 * start_address, u32 length). After 2 attempts to locate the root ACPI table (RSDT) the function prints this warning/error message. In a previous posting, I concluded that the boot by EFI caused the same problem without grub2 and that therefore the problem must be in the Xen kernel. Unfortunately, at that time, when I booted using the EFI command line, all the print-outs went only to the console attached to the server and my serial port session remained blank after the launch of the Xen kernal. I therefore did not have showing on my console the above message "A valid RSDP was not found" because it had flown off the screen, but I found my system hanging at the same location. I made an error jumping to the previous conclusion, and suspecting so, I have engaged on getting Xen kernel output (booted from the EFI command line) to my serial console so I can capture it into a log on my Windows PuTTY session. I have achieved that now. The EFI boot session which I am attached does *NOT* contain the "A valid RSDP was not found". I realized I had copied the gentoo kernel parameters to the EFI configuration one-to-one. It turns out the kernel parameters to the Xen kernel are different than the ones to the Gentoo kernel. I therefore looked at each parameter and checked it against the documented Xen ones published at https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#apic-x86 and added an options line before the gentoo kernel line. The result is I have on my boot directory the following: zeta /home/jlpoole # ls -la /boot/efi/gentoo total 12778 drwxr-xr-x 3 root root 6144 Mar 17 2019 . drwxr-xr-x 5 root root 512 Mar 17 2019 .. drwxr-xr-x 2 root root 1024 Mar 15 21:05 attic -rwxr-xr-x 1 root root 8919936 Mar 6 08:05 initramfs-genkernel-x86_64-4.19.23-gentoo -rwxr-xr-x 1 root root 368 Mar 15 21:04 jp.conf -rwxr-xr-x 1 root root 368 Mar 15 21:05 '#jp.config#' -rwxr-xr-x 1 root root 368 Mar 15 21:04 jp.config -rwxr-xr-x 1 root root 2980978 Mar 15 21:02 xen-4.12.0-rc.efi -rwxr-xr-x 1 root root 1172851 Mar 15 21:03 xen-4.12.0-rc.gz -rwxr-xr-x 1 root root 354 Mar 17 2019 xen.cfg -rwxr-xr-x 1 root root 368 Mar 16 05:27 xen.cfg.WORKS zeta /home/jlpoole # Note: the xen-4.12.0-rc.efi was placed by app-emulation/xen under /usr/libe64/xen: zeta /home/jlpoole # ls -la /usr/lib64/efi total 2956 drwxr-xr-x 2 root root 4096 Mar 14 20:58 . drwxr-xr-x 44 root root 36864 Mar 14 20:55 .. -rw-r--r-- 1 root root 2980978 Mar 14 20:58 xen-4.12.0-rc.efi lrwxrwxrwx 1 root root 17 Mar 14 20:58 xen-4.12.efi -> xen-4.12.0-rc.efi lrwxrwxrwx 1 root root 17 Mar 14 20:58 xen-4.efi -> xen-4.12.0-rc.efi lrwxrwxrwx 1 root root 17 Mar 14 20:58 xen.efi -> xen-4.12.0-rc.efi zeta /home/jlpoole # The xen.cfg file (recall the program is looking for this particularly named file "xen.cfg" and the attempt to use -cfg=my.cfg failed with "not found") has this in its contents: zeta /home/jlpoole # cat -n /boot/efi/gentoo/xen.cfg 1 [global] 2 default=abc 3 4 [abc] 5 options=console=vga,com1 com1=115200,8n1 6 kernel=xen-4.12.0-rc.gz root=/dev/sda4 vga=gfx-1024x768x16 com1=115200,8n1 console=com1 console_timestamps=date console_to_ring conring_size=16k loglvl=all guest_loglvl=all sync_console=true iommu=debug apic_verbosity=debug 7 initramfs=initramfs-genkernel-x86_64-4.19.23-gentoo 8 9 zeta /home/jlpoole # The point of all of the above is that under the EFI boot attempt where the Xen kernel hangs at the EXTint points, the EFI boot log does not contain the error message "RSDP was not found". Thus, different boot messages are being posted dependening upon whether it is grub2 or EFI. I also am seeing on my regular Gentoo kernel boot log (which can successfully boot) ACPI output, so I can contrast the successful Gentoo keneral output with the sporadic Grub2 and EFI boot attempts and perhaps hone in on where the locating of the Root System Description Pointer. Vogue la galère!
I tried building from the Xen source once I saw the Gentoo build applied patches. I run into the same problems, but without the error Andrew Cooper focused on, and have contacted the XEN-DEVEL list at https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg01691.html
Any progress on this?
I've reached a point where Jan Beulich is "out of ideas for the moment." https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg01976.html I've compared Gentoo's Linux apic.c vs. the Xen Projects and there are many differences. For instance, there is a macro "if" statement with a value of 1 at line 605 and I am not understanding why this hard override was being implemented. I have to build a patch when monitors each step so I can see if the hanging is occurring at a point where the two codes diverge. I have two courses to pursue 1) download an old Gentoo ebuild around version 7 which did load two years ago (through an EFI console) and then compare the apic.c and calling codes to see what has changed. 2) debug the existing and Gentoo's with print statements and contrast. Both of these endeavors are going to take at least 4 hours and I have not had the time and energy to undertake this at this time.
Short version: Windows wireless USB keyboard hardware incompatibility caused the problem. The Take-away: a USB keyboard can affect the boot for the xen kernel This was a hardware caused problem. Long version: I had several critical matters that I could not postpone so my work on this was suspended since May. I finally had time to resume work on this problem. Recall, I could successfully boot a Gentoo kernel, but when I tried a Xen kernel, the system would hand early on at the masking of the CPUs. By chance, I decided to swap out the USB keyboard "Microsoft Wireless Desktop Receiver 3.1" model: 1028, because I had to keep replacing batteries and the range was very limited, e.g. 15", and characters were dropping out. I replaced it with a generic Amazon USB keyboard. Suddenly the boot problems went away: no more hanging at the CPU masking point. I sailed throught and successfully booted. Moreover, I had placed in a new hard disk in the server, disengaged the exsting one, and installed the Debian version, 8.6.0 of 11/8/2016, I first used to test this server so I had an apples-to-apples test case before I returned this for service under warranty, and the installation while occurring, had video artifacts the prohibited the graphic install and dropped me into a console install with colorations that caused invisible selections. After I installed the Debian 8.6.0, I had the same problem -- I could not get past the "masked ExtINT on CPU#..." Since this discovery several days ago, I have booted my various xen kernels (in EFI) and have not encountered any of the problems I previously suffered. While I do have some other issues that relate to Gentoo specific tweaks, I am not concerned and I wanted to close this issue by reporting this discovery. Of course, I can make available the USB unit to qualified persons if they want to test or I can affix it to the server to test a debugging version.
Oh, haven't thought about this possibility, although it's true I had this issue in the past, but not only with xen. Maybe try posting your findings to the xen mailing list so they'll decide whether to continue with the investigation. It would be best to continue your previous thread where Jan Beulich was "out of ideas".
I did post to the mailing list at the same time with the same text. It never occurred to me that a USB product such as a keyboard could do anything to interfere with the low level set-up of processors. This kind of failure ought to be publicized and the first question out of the box to people have problems at the very initial start-up is: what hardware do you have attached to the USB ports as that can affect the kernel start-up. A very very expensive lesson. This cost me about 7 months and is a story I'll tell other people's grandchildren.