Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 680472 - app-emulation/xen-4.12.0_rc4 - ?
Summary: app-emulation/xen-4.12.0_rc4 - ?
Status: RESOLVED DUPLICATE of bug 679826
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Xen Devs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-15 17:12 UTC by John L. Poole
Modified: 2019-03-18 17:14 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge log of xen-tools (bzip2) (app-emulation_xen-tools-4.12.0_rc4_20190315-034739.log.tar.bz2,55.65 KB, application/octet-stream)
2019-03-15 17:41 UTC, John L. Poole
Details
kernel config (kernel_Mar_4_2019_2046.config,162.16 KB, text/plain)
2019-03-15 17:41 UTC, John L. Poole
Details
dmesg during DOM0 session (DOM0_dmesg_2019_0315_1015.log,39.08 KB, text/plain)
2019-03-15 17:42 UTC, John L. Poole
Details
lspci -vvv output in DOM0 session (DOM0_lspci-vvv_2019_0315_1017.log,34.60 KB, text/plain)
2019-03-15 17:42 UTC, John L. Poole
Details
* [all diagnostics] from serial port (Xen_all_diagnostics_serial_port_during_DOM0_session.log,71.84 KB, text/plain)
2019-03-15 17:51 UTC, John L. Poole
Details
serial port log (zeta_serial_capture_4-12-0_rc4.log,412.18 KB, text/plain)
2019-03-15 17:52 UTC, John L. Poole
Details
HTML in 3 columns displaying logs (BootLogAnalysis_2019_0318_0923.zip,22.05 KB, application/octet-stream)
2019-03-18 16:28 UTC, John L. Poole
Details

Note You need to log in before you can comment on or make changes to this bug.
Description John L. Poole 2019-03-15 17:12:59 UTC
This bug parallels Bug# 679826 which exhibits the same problems booting up the Xen kernel: the boot-up hangs during the masking of EXTInt [various] and sometimes the kernel successfully boots up.  There is no discernable pattern.  Since there is a lot of diagnostics surrounding this, I've created this separate bug to isolate logs &etc. concerning xen kernel 4.12.0_rc4.

Here's an example of attempts and their failure points:
9:17 AM 3/15/2019   2   (XEN) [2019-03-15 16:17:33] Adding cpu 2 to runqueue 0
9:18 AM 3/15/2019   5   (XEN) [2019-03-15 16:18:53] Adding cpu 5 to runqueue 0
9:19 AM 3/15/2019   SUCCESS!!
after * key in serial port for "diagnostics all", shutdown by watchdog.
9:30 AM 3/15/2019   SUCCESS!!
(XEN) [2019-03-15 16:32:54] Hardware Dom0 shutdown: watchdog rebooting machine
9:34 AM 3/15/2019   Before 1: (XEN) [2019-03-15 16:34:14] HVM: HAP page sizes: 4kB, 2MB
9:35 AM 3/15/2019   3
9:36 AM 3/15/2019   4   (XEN) [2019-03-15 16:37:21] Adding cpu 4 to runqueue 0
9:37 AM 3/15/2019   Before 1: (XEN) [2019-03-15 16:38:33] HVM: HAP page sizes: 4kB, 2MB
9:39 AM 3/15/2019   SUCCESS!!

I will update this bug with more information about the system.  Note this server is a Supermicro Intel Atom based server with UEFI.  I am certain the hardware is a contributing factor to this inconsistent behavior.  Also, I am able to boot up a regular Gentoo Kernel with no problems, it is only during the Xen kernel booting that the inconsistent behavior occurs at the point of masking the interrupts of the processors 1-7 (after processor 0 has been handled).

Reproducible: Always

Steps to Reproduce:
1. Activate a serial console
2. Push start button on server
3. Select in grub2: *Gentoo GNU/Linux, with Xen hypervisor

Actual Results:  
Example of tail of boot-up log:

(XEN) [2019-03-15 16:35:53] HVM: ASIDs enabled.
(XEN) [2019-03-15 16:35:53] HVM: VMX enabled
(XEN) [2019-03-15 16:35:53] HVM: Hardware Assisted Paging (HAP) detected
(XEN) [2019-03-15 16:35:53] HVM: HAP page sizes: 4kB, 2MB
(XEN) [2019-03-15 16:35:49] masked ExtINT on CPU#1
(XEN) [2019-03-15 16:35:53] Adding cpu 1 to runqueue 0
(XEN) [2019-03-15 16:35:49] masked ExtINT on CPU#2
(XEN) [2019-03-15 16:35:53] Adding cpu 2 to runqueue 0
(XEN) [2019-03-15 16:35:49] masked ExtINT on CPU#3
(XEN) [2019-03-15 16:35:53] Adding cpu 3 to runqueue 0
[HUNG]

Expected Results:  
Chain Loading of Gentoo kernel and login screen



The hardware is:
Product SKU: SuperServer 5018A-TN4 (Black)
Motherboard: Super A1SAi-2750F
Processor/Cache: 
    CPU
    Intel® Atom® Processor C2750
    CPU TDP 20W (8-Core)
    FCBGA 1283
    System-on-Chip
System Memory:
4x 204-pin DDR3 SO-DIMM slots
Supports up to 64GB DDR3 ECC memory

From the sale Quote 11/1/2016:
SYS-5018A-TN4-OTO-50
--OPTIMIZED SYS-5018A-TN4(x1)A1SAi-2750F, 504-203B
--MEM-DR316L-CL02-ES16(x4)16GB DDR3-1600 1.35V 2RX8 ECC SODIMM
--HDD-T4000-MG04ACA400E(x1)[NR]Toshiba 3.5" 4TB SATA 6Gb/s 7.2K
RPM 128M 512E

There also are two threads from the Xen Users mailing list that may touch upon this problem:

1)  "...BIOS was enabling the APs to interact with the legacy 8259
interrupt controller when only the BSP should. During POST the APs were
exposed to ExtINT/INTR events as a result of the mis-configuration
(probably due to a UEFI timer-tick using the 8259) and this left a
pending ExtINT/INTR interrupt latched on the APs." See https://lkml.org/lkml/2019/3/5/538

2) "Xen's EFI workarounds (/mapbs & efi=no-rs) on SuperMicro hardware; fixes solve 1/2 problems & SM responds that can't/won't fix their firmware"  See thread of posting to the xen-devel.lists.xen.org at https://lists.xenproject.org/archives/html/xen-devel/2015-12/msg00653.html

Additional information such as logs will be added to this bug shortly.
Comment 1 John L. Poole 2019-03-15 17:41:17 UTC
Created attachment 569220 [details]
emerge log of xen-tools (bzip2)
Comment 2 John L. Poole 2019-03-15 17:41:44 UTC
Created attachment 569222 [details]
kernel config
Comment 3 John L. Poole 2019-03-15 17:42:26 UTC
Created attachment 569224 [details]
dmesg during DOM0 session
Comment 4 John L. Poole 2019-03-15 17:42:58 UTC
Created attachment 569226 [details]
lspci -vvv output in DOM0 session
Comment 5 John L. Poole 2019-03-15 17:51:39 UTC
Created attachment 569228 [details]
* [all diagnostics] from serial port

During a successful boot-up of the Xen kernel and in a DOM0 instance, I switch to the kernel (Control-A thrice in the serial port console (PuTTY on Windows)) and depressed the "*" for a complete diagnostic.  This is the log file of the output, extracted from the serial port log concurrently being posted in this Bug. 

Note: I found if I performed a complete diagnostic, about a minute later, watchdog would shut the instance down. I included the two lines of the final notification following the diagnostic's dump.  Thereafter, I had to manually reboot the server.

(XEN) [2019-03-15 16:28:28] .................................... done.
(XEN) [2019-03-15 16:28:46] Watchdog timer fired for domain 0
(XEN) [2019-03-15 16:28:46] Hardware Dom0 shutdown: watchdog rebooting machine
Comment 6 John L. Poole 2019-03-15 17:52:30 UTC
Created attachment 569230 [details]
serial port log

Includes failed attempts to boot and successful attempts as well as Xen diagnostics "*"
Comment 7 John L. Poole 2019-03-15 18:13:01 UTC
In the earlier bug, Bug # 679826, I modified the app-emulation/xen package with debugging code in an attempt to isolate the event where the system hangs.  My attempts cause me to conclude that the function setup_local_APIC(void)" [lines 524-726] completes it's task and the the hanging it occurring at a higher level.  Unfortunately, my inexperience with C leaves me clueless as to where this might be happening.  I tried wrapping around all calls to "setup_local_APIC", but there may be a callback or hook at play from some macro.  

Here is a link to the patch I used: https://bugs.gentoo.org/attachment.cgi?id=568924

Here is a sample of my debug statements: https://bugs.gentoo.org/679826#c11 

I'm prepared to create a debug patch for this version if desired.  What I really would need is suggestions of where to insert "print" statements outside of the apic.c file.
Comment 8 John L. Poole 2019-03-15 20:17:55 UTC
Posted "Bug" to xen-devel mailing list.
https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg01268.html
Comment 9 Jeroen Roovers (RETIRED) gentoo-dev 2019-03-15 20:30:58 UTC
(In reply to John L. Poole from comment #0)
> This bug parallels Bug# 679826 which exhibits the same problems booting up

"the same problems"

*** This bug has been marked as a duplicate of bug 679826 ***
Comment 10 John L. Poole 2019-03-18 16:28:14 UTC
Created attachment 569606 [details]
HTML in 3 columns displaying logs

It is helpful for me to be able to compare and contrast the boot logs of:
1) regular Gentoo kernel 
2) grub2 launch of Xen kernel
3) EFI console launch of Xen kernel.

So I created an HTML page that displays all three in 3 scrollable columns.  The purpose of this is to see if the log entries of the successful Gentoo boot reveal something ommitted form the Xen kernel.  This is the first round and I wanted to post this so if anyone thinks this helpful and has suggestions I can incorporate same sooner rather than later.  My next step is to map the output to various functions while reviewing the kernel configurations and log files of the emerge app-emulation/xen.