Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 920747 - app-emulation/xen-4.16.6_pre1-kernel bug - xen/p2m.c: 542: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Summary: app-emulation/xen-4.16.6_pre1-kernel bug - xen/p2m.c: 542: invalid opcode: 0...
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-26 22:20 UTC by John L. Poole
Modified: 2023-12-27 01:51 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John L. Poole 2023-12-26 22:20:03 UTC
On new system, AMD 7950+ w/64 GB Ram (see below), I installed Gentoo's stage3-amd64-20210203T214504Z and compiled the installed kernel with genkernel --lvm ( referred to as "Kernel A"). Then I modified the kernel with LVM (https://wiki.gentoo.org/wiki/LVM) and Xen (https://wiki.gentoo.org/wiki/Xen) and recompiled; the modified kernel is referred to as Kernal "B".  I currently have Xen:
  
        ryzwork /etc/xen # eix xen -I
        [I] app-emulation/xen
             Available versions:  4.16.6_pre1^st ~4.17.3_pre1^st {+boot-symlinks debug efi flask secureboot}
             Installed versions:  4.16.6_pre1^st(09:50:46 AM 12/25/2023)(boot-symlinks efi -debug -flask)
             Homepage:            https://xenproject.org
             Description:         The Xen virtual machine monitor

        [I] app-emulation/xen-tools
             Available versions:  4.16.6_pre1(0/4.16)^t 4.16.6_pre1-r1(0/4.16)^t ~4.17.3_pre1(0/4.17)^t ~4.17.3_pre1-r1(0/4.17)^t {api debug doc +hvm +ipxe lzma ocaml ovmf pygrub python +qemu +qemu-traditional +rombios screen sdl selinux static-libs system-ipxe system-qemu system-seabios systemd zstd PYTHON_SINGLE_TARGET="python3_10 python3_11"}
             Installed versions:  4.16.6_pre1-r1(0/4.16)^t(09:50:11 AM 12/25/2023)(hvm ipxe ovmf python qemu qemu-traditional rombios -api -debug -doc -lzma -ocaml -pygrub -screen -sdl -selinux -static-libs -system-ipxe -system-qemu -system-seabios -systemd -zstd PYTHON_SINGLE_TARGET="python3_11 -python3_10")
             Homepage:            https://xenproject.org
             Description:         Xen tools including QEMU and xl

        Found 2 matches
        ryzwork /etc/xen #

Current system:

    ryzwork /etc/xen # uname -a
    Linux ryzwork 6.1.67-gentoo-x86_64 #1 SMP PREEMPT_DYNAMIC Sun Dec 24 19:07:13 PST 2023 x86_64 AMD Ryzen 9 7950X 16-Core Processor AuthenticAMD GNU/Linux
    ryzwork /etc/xen #

I can successfully launch into Dom0 and have not seen any evidence of a problem, though that is not saying much because once in Dom0, I proceeded to build out my first guest VM: "ryzdesk".

Currently, all of my LVM is on the Crucial NVMe (2TB) memory.

I then creates two new partitions in my LVM volume group "vg0":

    ryzwork /etc/xen # ls /dev/vg0
    jlpgenswap  jlpgentoo
    ryzwork /etc/xen #


and staged Gentoo's stage3-amd64-desktop-openrc-20231224T164659Z within the partition "jlpgentoo."  I entered the jlpgentoo image and made appropriate adjustments and additions for the new VM.

I created a VM configuration file: /etc/xen/ryzdesk.conf  https://salemdata.us/xen/20231226_bug/ryzdesk.conf.  This configuration file is based on one that works on another older Xen server.  When I first encountered a problem, rem'd out the swap file just in case the swap file and/or its specification was causing problems -- noted inside ryzdesk.conf.  The error continued despite the ommission of the swap file specification.

When I attempt to launch the new VM, I the console show failure:
 
      1500	[?2004h]0;root@ryzwork:/home/jlpooleryzwork /home/jlpoole # cd /etc/xen
      1501	[?2004l
    [?2004h]0;root@ryzwork:/etc/xenryzwork /etc/xen # xl create ryzdesk.conf -c
      1502	[?2004l
    Parsing config from ryzdesk.conf
      1503	libxl: error: libxl_xshelp.c:114:libxl__xs_vprintf: xenstore write failed: `/libxl/1/dm-version' = `qemu_xen': Bad file descriptor
      1504	libxl: error: libxl_xshelp.c:248:libxl__xs_transaction_start: could not create xenstore transaction: Bad file descriptor
      1505	libxl: error: libxl_create.c:1643:domcreate_launch_dm: Domain 1:unable to add disk devices
      1506	libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor
      1507	libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Failed to get dompath: Bad file descriptor
      1508	libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor
      1509	libxl: error: libxl_device.c:809:libxl__devices_destroy: unable to get xenstore device listing /libxl/1/device: Bad file descriptor
      1510	libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor
      1511	libxl: error: libxl_domain.c:1137:domain_destroy_callback: Domain 1:Unable to destroy guest
      1512	libxl: error: libxl_create.c:2022:domcreate_destruction_cb: Domain 1:unable to destroy domain following failed creation
      1513	libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor
      1514	libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Failed to get dompath: Bad file descriptor
      1515	libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor
      1516	libxl: error: libxl_device.c:809:libxl__devices_destroy: unable to get xenstore device listing /libxl/1/device: Bad file descriptor
      1517	libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor
      1518	libxl: error: libxl_domain.c:1137:domain_destroy_callback: Domain 1:Unable to destroy guest
      1519	libxl: error: libxl_domain.c:1064:domain_destroy_cb: Domain 1:Destruction of domain failed
      1520	[?2004h]0;root@ryzwork:/etc/xenryzwork /etc/xen # dmetsstg -T
      1521	[?2004l
      
See lines 1501-1521 at https://salemdata.us/xen/20231226_bug/20231226_1111_root_xl_create.script.html
 
Moreover, dmesg reveals a nicely isolated "kernel BUG" trace:

    3013	[Tue Dec 26 11:12:04 2023] ------------[ cut here ]------------
      3014	[Tue Dec 26 11:12:04 2023] kernel BUG at arch/x86/xen/p2m.c:542!
      3015	[Tue Dec 26 11:12:04 2023] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      3016	[Tue Dec 26 11:12:04 2023] CPU: 19 PID: 3555 Comm: xenstored Not tainted 6.1.67-gentoo-x86_64 #1
      3017	[Tue Dec 26 11:12:04 2023] Hardware name: ASRock X670E Steel Legend/X670E Steel Legend, BIOS 1.28 07/27/2023
      3018	[Tue Dec 26 11:12:04 2023] RIP: e030:xen_alloc_p2m_entry+0x485/0x8c0
      3019	[Tue Dec 26 11:12:04 2023] Code: 3d a8 6c 8a 01 73 5d 48 8b 05 a7 6c 8a 01 48 8b 04 f8 48 83 f8 ff 74 59 48 bf ff ff ff ff ff ff ff 3f 48 21 c7 e9 60 fc ff ff <0f> 0b 49 8d 7e 08 4c 89 f1 48 c7 c0 ff ff ff ff 49 c7 06 ff ff ff
      3020	[Tue Dec 26 11:12:04 2023] RSP: e02b:ffffc9004284fd80 EFLAGS: 00010246
      3021	[Tue Dec 26 11:12:04 2023] RAX: 0000000000000000 RBX: 0000000010007fff RCX: ffffffff82610000
      3022	[Tue Dec 26 11:12:04 2023] RDX: 0000000000000000 RSI: ffffc9008003fff8 RDI: 000000000e54d067
      3023	[Tue Dec 26 11:12:04 2023] RBP: ffffc9004284fe28 R08: ffffea0000000000 R09: 0000000000000000
      3024	[Tue Dec 26 11:12:04 2023] R10: 0000000000000000 R11: ffff898088000000 R12: ffffc9008003fff8
      3025	[Tue Dec 26 11:12:04 2023] R13: 0000000000000000 R14: 0000000010008000 R15: 0000000010008000
      3026	[Tue Dec 26 11:12:04 2023] FS:  00007f5ea714ac40(0000) GS:ffff88901f0c0000(0000) knlGS:0000000000000000
      3027	[Tue Dec 26 11:12:04 2023] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
      3028	[Tue Dec 26 11:12:04 2023] CR2: 0000563fa8a34000 CR3: 0000000106ba6000 CR4: 0000000000050660
      3029	[Tue Dec 26 11:12:04 2023] Call Trace:
      3030	[Tue Dec 26 11:12:04 2023]  <TASK>
      3031	[Tue Dec 26 11:12:04 2023]  ? __die_body.cold+0x1a/0x1f
      3032	[Tue Dec 26 11:12:04 2023]  ? die+0x2a/0x50
      3033	[Tue Dec 26 11:12:04 2023]  ? do_trap+0xc5/0x110
      3034	[Tue Dec 26 11:12:04 2023]  ? xen_alloc_p2m_entry+0x485/0x8c0
      3035	[Tue Dec 26 11:12:04 2023]  ? do_error_trap+0x6a/0x90
      3036	[Tue Dec 26 11:12:04 2023]  ? xen_alloc_p2m_entry+0x485/0x8c0
      3037	[Tue Dec 26 11:12:04 2023]  ? exc_invalid_op+0x4c/0x60
      3038	[Tue Dec 26 11:12:04 2023]  ? xen_alloc_p2m_entry+0x485/0x8c0
      3039	[Tue Dec 26 11:12:04 2023]  ? asm_exc_invalid_op+0x16/0x20
      3040	[Tue Dec 26 11:12:04 2023]  ? xen_alloc_p2m_entry+0x485/0x8c0
      3041	[Tue Dec 26 11:12:04 2023]  xen_alloc_unpopulated_pages+0x9a/0x430
      3042	[Tue Dec 26 11:12:04 2023]  gnttab_alloc_pages+0x14/0x40
      3043	[Tue Dec 26 11:12:04 2023]  gntdev_alloc_map+0x1cf/0x2e0
      3044	[Tue Dec 26 11:12:04 2023]  gntdev_ioctl+0x307/0x550
      3045	[Tue Dec 26 11:12:04 2023]  __x64_sys_ioctl+0x90/0xd0
      3046	[Tue Dec 26 11:12:04 2023]  do_syscall_64+0x3b/0x90
      3047	[Tue Dec 26 11:12:04 2023]  entry_SYSCALL_64_after_hwframe+0x64/0xce
      3048	[Tue Dec 26 11:12:04 2023] RIP: 0033:0x7f5ea7261d0b
      3049	[Tue Dec 26 11:12:04 2023] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
      3050	[Tue Dec 26 11:12:04 2023] RSP: 002b:00007fffc998b0e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      3051	[Tue Dec 26 11:12:04 2023] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f5ea7261d0b
      3052	[Tue Dec 26 11:12:04 2023] RDX: 00007fffc998b140 RSI: 0000000000184700 RDI: 000000000000000c
      3053	[Tue Dec 26 11:12:04 2023] RBP: 00007fffc998b1f0 R08: 00007fffc998b21c R09: 00007fffc998b140
      3054	[Tue Dec 26 11:12:04 2023] R10: 00007fffc998b21c R11: 0000000000000246 R12: 0000000000000001
      3055	[Tue Dec 26 11:12:04 2023] R13: 0000000000000003 R14: 000000000000000c R15: 00007fffc998b140
      3056	[Tue Dec 26 11:12:04 2023]  </TASK>
      3057	[Tue Dec 26 11:12:04 2023] Modules linked in: xen_pciback cfg80211 8021q garp mrp vfat fat amdgpu mfd_core iommu_v2 gpu_sched drm_buddy snd_hda_codec_realtek i2c_algo_bit drm_ttm_helper snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi ttm drm_display_helper snd_hda_intel cec btusb intel_rapl_msr snd_intel_dspcfg btrtl intel_rapl_common rc_core btbcm snd_intel_sdw_acpi snd_hda_codec btintel sd_mod crct10dif_pclmul snd_hda_core drm_kms_helper bluetooth snd_hwdep ghash_clmulni_intel snd_pcm sha512_ssse3 drm sp5100_tco ecdh_generic sha256_ssse3 uas snd_timer rfkill i2c_piix4 wmi_bmof joydev sha1_ssse3 usb_storage ecc pcspkr efi_pstore k10temp snd i2c_core soundcore ccp video wmi backlight gpio_amdpt gpio_generic mac_hid efivarfs xfs nvme nvme_core xhci_pci crc32_pclmul xhci_pci_renesas crc32c_intel t10_pi aesni_intel crypto_simd cryptd crc64_rocksoft r8169 ahci realtek xhci_hcd mdio_devres crc64 libahci libphy
      3058	[Tue Dec 26 11:12:04 2023] ---[ end trace 0000000000000000 ]---

See lines 3013-3057 at https://salemdata.us/xen/20231226_bug/20231226_1111_root_xl_create.script.html

Kernel A is available at: https://salemdata.us/xen/20231226_bug/kernel-config-6.1-2.67-gentoo-x86_64_virgin_r2
Kernel B is available at: https://salemdata.us/xen/20231226_bug/LVM_Xen_kernel-config-6.1.67-gentoo-x86_64

Note: The Gentoo Handbook at https://wiki.gentoo.org/wiki/Handbook:AMD64/Installation/Tools states:
     It's recommended that sys-block/io-scheduler-udev-rules is installed for the correct scheduler behavior with e.g. nvme devices: 
     
I did *not* install sys-block/io-scheduler-udev-rules.

The directory on my server containing files related to thie bug is accessible and may be viewed at:

     https://salemdata.us/xen/20231226_bug/
     
Hardware:

    AMD Ryzen™ 9 7950X 16-Core, 32-Thread Unlocked Desktop Processor 

    ASRock X670E Steel Legend AM5 ATX Motherboard. 4xDDR5 slots, PCIE 5.0X16, AMD Cross Fire, QuadM.2 slots, Dual LAN ports 1GB and 2.5GB , WIFI-6E,7.1 HD audio, HDMI 2.1, DP 1.4 ports
    
    Kingston FURY Beast 64GB 2 x 32GB DDR5 SDRAM Memory Kit KF548C38BBK264
    
    Crucial P3 Plus 2TB PCIe 4.0 3D NAND NVMe M.2 SSD, up to 5000MB/s - CT2000P3PSSD8 
  
Given that this looks to be a run time problem and a "kernel" bug that the Xen team will be concerned with, I'm not sure what else I can provide for this Gentoo bug filing.  Please specify any further particulars desired and I'll upload and/or stage same.  I plan to submit an email to xen-users@lists.xenproject.org referencing this bug.
Comment 1 John L. Poole 2023-12-26 22:52:50 UTC
I successfully posted to xen-users list:
https://lists.xenproject.org/archives/html/xen-users/2023-12/msg00005.html
Comment 2 John L. Poole 2023-12-27 00:20:32 UTC
I began to wonder if, perhaps, my Dom0 was allocated all of the memory and that the attempt to create a guest was failing because there was no memory left.  It appears, however, that is not the case since there is 64 GBs in the machine (currently, I plan to bump that up to 128 later this week) and Dom0 is currently allocated 48GB and the ryzdesk was allocated 16 GB.

    ryzwork /home/jlpoole # date; xl info
    Tue Dec 26 04:16:56 PM PST 2023
    host                   : ryzwork
    release                : 6.1.67-gentoo-x86_64
    version                : #1 SMP PREEMPT_DYNAMIC Sun Dec 24 19:07:13 PST 2023
    machine                : x86_64
    nr_cpus                : 32
    max_cpu_id             : 31
    nr_nodes               : 1
    cores_per_socket       : 16
    threads_per_core       : 2
    cpu_mhz                : 4491.631
    hw_caps                : 178bf3ff:76f8320b:2e500800:244037ff:0000000f:f1bf97a9:00405fce:00000500
    virt_caps              : pv hvm hvm_directio pv_directio hap shadow gnttab-v1 gnttab-v2
    total_memory           : 64632
    free_memory            : 137
    sharing_freed_memory   : 0
    sharing_used_memory    : 0
    outstanding_claims     : 0
    free_cpus              : 0
    xen_major              : 4
    xen_minor              : 16
    xen_extra              : .5
    xen_version            : 4.16.5
    xen_caps               : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
    xen_scheduler          : credit2
    xen_pagesize           : 4096
    platform_params        : virt_start=0xffff800000000000
    xen_changeset          :
    xen_commandline        : placeholder no-real-mode edd=off
    cc_compiler            : x86_64-pc-linux-gnu-gcc (Gentoo 13.2.1_p20230826 p7) 13.2.1 202
    cc_compile_by          :
    cc_compile_domain      :
    cc_compile_date        : Mon Dec 25 09:50:29 PST 2023
    build_id               : 8ea32f853a84f4cb9929ebb8984c03715f8498ed
    xend_config_format     : 4
    ryzwork /home/jlpoole # date; free
    Tue Dec 26 04:17:00 PM PST 2023
                   total        used        free      shared  buff/cache   available
    Mem:        46958300      641512    42030820        1348     4285968    45706888
    Swap:              0           0           0
    ryzwork /home/jlpoole #
Comment 3 John L. Poole 2023-12-27 01:31:56 UTC
After a reboot without running any xen command, the free memory bumped up to 63741672 whereas running "free" after the failed attempt showed 46958300, so I'm guessing that the failed attempt cause the memory allocated for the guest VM to remain allocated and not freed.  I'm still going to determine if I am specifying the Dom0 instance to having all the 64 GB -- I doubt I would have done that, but I probably would have been generous in my allocation to Dom0 since I'm building out a new server and would want it to have plenty of memory to speed things along.

    ryzwork /home/jlpoole # free
                   total        used        free      shared  buff/cache   available
    Mem:        63741672      282624    63346896        1348      112152    62985016
    Swap:              0           0           0
    ryzwork /home/jlpoole # xl info
    host                   : ryzwork
    release                : 6.1.67-gentoo-x86_64
    version                : #1 SMP PREEMPT_DYNAMIC Sun Dec 24 19:07:13 PST 2023
    machine                : x86_64
    nr_cpus                : 32
    max_cpu_id             : 31
    nr_nodes               : 1
    cores_per_socket       : 16
    threads_per_core       : 2
    cpu_mhz                : 4491.612
    hw_caps                : 178bf3ff:76f8320b:2e500800:244037ff:0000000f:f1bf97a9:00405fce:00000500
    virt_caps              : pv hvm hvm_directio pv_directio hap shadow gnttab-v1 gnttab-v2
    total_memory           : 64632
    free_memory            : 131
    sharing_freed_memory   : 0
    sharing_used_memory    : 0
    outstanding_claims     : 0
    free_cpus              : 0
    xen_major              : 4
    xen_minor              : 16
    xen_extra              : .5
    xen_version            : 4.16.5
    xen_caps               : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
    xen_scheduler          : credit2
    xen_pagesize           : 4096
    platform_params        : virt_start=0xffff800000000000
    xen_changeset          :
    xen_commandline        : placeholder no-real-mode edd=off
    cc_compiler            : x86_64-pc-linux-gnu-gcc (Gentoo 13.2.1_p20230826 p7) 13.2.1 202
    cc_compile_by          :
    cc_compile_domain      :
    cc_compile_date        : Mon Dec 25 09:50:29 PST 2023
    build_id               : 8ea32f853a84f4cb9929ebb8984c03715f8498ed
    xend_config_format     : 4
    ryzwork /home/jlpoole #
Comment 4 John L. Poole 2023-12-27 01:51:12 UTC
Following the advice of https://wiki.xenproject.org/wiki/Xen_Project_Best_Practices I did the following three:

1) edited /etc/default/grub adding at the end:

   cat -n /etc/default/grub
   ...
   104  #
   105  # 12/16/2023 jlpoole: trying overcome the xl create bug
   106  # per: https://wiki.xenproject.org/wiki/Xen_Project_Best_Practices
   107  #
   108  GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=1024M,max:1024M"

2) edited /etc/xen/xl.conf and set autoballoon to "0":

     cat -n /etc/xen/xl.conf
     ...
     7  # Control whether dom0 is ballooned down when xen doesn't have enough
     8  # free memory to create a domain.  "auto" means only balloon if dom0
     9  # starts with all the host's memory.
    10  #autoballoon="auto"
    11  #
    12  # 12/26/2023 jlpoole
    13  # ref: https://wiki.xenproject.org/wiki/Xen_Project_Best_Practices
    14  # to overcome xl create: Gentoo Bug 920747
    15  #
    16  autoballoon=0

3) executed:

      grub-mkconfig -o /boot/grub/grub.cfg

4) rebooted

I then ran "free" and "xl info" and updated this bug noting the difference in memory.  I then tried:

     xl create ryzdesk.conf -c

and it WORKED: I successfully launch the VM.

So, it's looking like the kernel bug arises from autoballooning and/or not limiting memory in GRUB2.  I would be tempted to close this bug, but I'll continue try out my new VM and it might be helpful for other reading this bug to determine if the condition I created might be better handled and/or documented to prevent others from making the same mistake.  Someone else can close this bug with my blessing.  If something else arises that hints of being related to this, I can re-open this bug.