On new system, AMD 7950+ w/64 GB Ram (see below), I installed Gentoo's stage3-amd64-20210203T214504Z and compiled the installed kernel with genkernel --lvm ( referred to as "Kernel A"). Then I modified the kernel with LVM (https://wiki.gentoo.org/wiki/LVM) and Xen (https://wiki.gentoo.org/wiki/Xen) and recompiled; the modified kernel is referred to as Kernal "B". I currently have Xen: ryzwork /etc/xen # eix xen -I [I] app-emulation/xen Available versions: 4.16.6_pre1^st ~4.17.3_pre1^st {+boot-symlinks debug efi flask secureboot} Installed versions: 4.16.6_pre1^st(09:50:46 AM 12/25/2023)(boot-symlinks efi -debug -flask) Homepage: https://xenproject.org Description: The Xen virtual machine monitor [I] app-emulation/xen-tools Available versions: 4.16.6_pre1(0/4.16)^t 4.16.6_pre1-r1(0/4.16)^t ~4.17.3_pre1(0/4.17)^t ~4.17.3_pre1-r1(0/4.17)^t {api debug doc +hvm +ipxe lzma ocaml ovmf pygrub python +qemu +qemu-traditional +rombios screen sdl selinux static-libs system-ipxe system-qemu system-seabios systemd zstd PYTHON_SINGLE_TARGET="python3_10 python3_11"} Installed versions: 4.16.6_pre1-r1(0/4.16)^t(09:50:11 AM 12/25/2023)(hvm ipxe ovmf python qemu qemu-traditional rombios -api -debug -doc -lzma -ocaml -pygrub -screen -sdl -selinux -static-libs -system-ipxe -system-qemu -system-seabios -systemd -zstd PYTHON_SINGLE_TARGET="python3_11 -python3_10") Homepage: https://xenproject.org Description: Xen tools including QEMU and xl Found 2 matches ryzwork /etc/xen # Current system: ryzwork /etc/xen # uname -a Linux ryzwork 6.1.67-gentoo-x86_64 #1 SMP PREEMPT_DYNAMIC Sun Dec 24 19:07:13 PST 2023 x86_64 AMD Ryzen 9 7950X 16-Core Processor AuthenticAMD GNU/Linux ryzwork /etc/xen # I can successfully launch into Dom0 and have not seen any evidence of a problem, though that is not saying much because once in Dom0, I proceeded to build out my first guest VM: "ryzdesk". Currently, all of my LVM is on the Crucial NVMe (2TB) memory. I then creates two new partitions in my LVM volume group "vg0": ryzwork /etc/xen # ls /dev/vg0 jlpgenswap jlpgentoo ryzwork /etc/xen # and staged Gentoo's stage3-amd64-desktop-openrc-20231224T164659Z within the partition "jlpgentoo." I entered the jlpgentoo image and made appropriate adjustments and additions for the new VM. I created a VM configuration file: /etc/xen/ryzdesk.conf https://salemdata.us/xen/20231226_bug/ryzdesk.conf. This configuration file is based on one that works on another older Xen server. When I first encountered a problem, rem'd out the swap file just in case the swap file and/or its specification was causing problems -- noted inside ryzdesk.conf. The error continued despite the ommission of the swap file specification. When I attempt to launch the new VM, I the console show failure: 1500 [?2004h]0;root@ryzwork:/home/jlpooleryzwork /home/jlpoole # cd /etc/xen 1501 [?2004l [?2004h]0;root@ryzwork:/etc/xenryzwork /etc/xen # xl create ryzdesk.conf -c 1502 [?2004l Parsing config from ryzdesk.conf 1503 libxl: error: libxl_xshelp.c:114:libxl__xs_vprintf: xenstore write failed: `/libxl/1/dm-version' = `qemu_xen': Bad file descriptor 1504 libxl: error: libxl_xshelp.c:248:libxl__xs_transaction_start: could not create xenstore transaction: Bad file descriptor 1505 libxl: error: libxl_create.c:1643:domcreate_launch_dm: Domain 1:unable to add disk devices 1506 libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor 1507 libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Failed to get dompath: Bad file descriptor 1508 libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor 1509 libxl: error: libxl_device.c:809:libxl__devices_destroy: unable to get xenstore device listing /libxl/1/device: Bad file descriptor 1510 libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor 1511 libxl: error: libxl_domain.c:1137:domain_destroy_callback: Domain 1:Unable to destroy guest 1512 libxl: error: libxl_create.c:2022:domcreate_destruction_cb: Domain 1:unable to destroy domain following failed creation 1513 libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor 1514 libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Failed to get dompath: Bad file descriptor 1515 libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor 1516 libxl: error: libxl_device.c:809:libxl__devices_destroy: unable to get xenstore device listing /libxl/1/device: Bad file descriptor 1517 libxl: error: libxl_xshelp.c:149:libxl__xs_get_dompath: Domain 1:Failed to get dompath: Bad file descriptor 1518 libxl: error: libxl_domain.c:1137:domain_destroy_callback: Domain 1:Unable to destroy guest 1519 libxl: error: libxl_domain.c:1064:domain_destroy_cb: Domain 1:Destruction of domain failed 1520 [?2004h]0;root@ryzwork:/etc/xenryzwork /etc/xen # dmetsstg -T 1521 [?2004l See lines 1501-1521 at https://salemdata.us/xen/20231226_bug/20231226_1111_root_xl_create.script.html Moreover, dmesg reveals a nicely isolated "kernel BUG" trace: 3013 [Tue Dec 26 11:12:04 2023] ------------[ cut here ]------------ 3014 [Tue Dec 26 11:12:04 2023] kernel BUG at arch/x86/xen/p2m.c:542! 3015 [Tue Dec 26 11:12:04 2023] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI 3016 [Tue Dec 26 11:12:04 2023] CPU: 19 PID: 3555 Comm: xenstored Not tainted 6.1.67-gentoo-x86_64 #1 3017 [Tue Dec 26 11:12:04 2023] Hardware name: ASRock X670E Steel Legend/X670E Steel Legend, BIOS 1.28 07/27/2023 3018 [Tue Dec 26 11:12:04 2023] RIP: e030:xen_alloc_p2m_entry+0x485/0x8c0 3019 [Tue Dec 26 11:12:04 2023] Code: 3d a8 6c 8a 01 73 5d 48 8b 05 a7 6c 8a 01 48 8b 04 f8 48 83 f8 ff 74 59 48 bf ff ff ff ff ff ff ff 3f 48 21 c7 e9 60 fc ff ff <0f> 0b 49 8d 7e 08 4c 89 f1 48 c7 c0 ff ff ff ff 49 c7 06 ff ff ff 3020 [Tue Dec 26 11:12:04 2023] RSP: e02b:ffffc9004284fd80 EFLAGS: 00010246 3021 [Tue Dec 26 11:12:04 2023] RAX: 0000000000000000 RBX: 0000000010007fff RCX: ffffffff82610000 3022 [Tue Dec 26 11:12:04 2023] RDX: 0000000000000000 RSI: ffffc9008003fff8 RDI: 000000000e54d067 3023 [Tue Dec 26 11:12:04 2023] RBP: ffffc9004284fe28 R08: ffffea0000000000 R09: 0000000000000000 3024 [Tue Dec 26 11:12:04 2023] R10: 0000000000000000 R11: ffff898088000000 R12: ffffc9008003fff8 3025 [Tue Dec 26 11:12:04 2023] R13: 0000000000000000 R14: 0000000010008000 R15: 0000000010008000 3026 [Tue Dec 26 11:12:04 2023] FS: 00007f5ea714ac40(0000) GS:ffff88901f0c0000(0000) knlGS:0000000000000000 3027 [Tue Dec 26 11:12:04 2023] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 3028 [Tue Dec 26 11:12:04 2023] CR2: 0000563fa8a34000 CR3: 0000000106ba6000 CR4: 0000000000050660 3029 [Tue Dec 26 11:12:04 2023] Call Trace: 3030 [Tue Dec 26 11:12:04 2023] <TASK> 3031 [Tue Dec 26 11:12:04 2023] ? __die_body.cold+0x1a/0x1f 3032 [Tue Dec 26 11:12:04 2023] ? die+0x2a/0x50 3033 [Tue Dec 26 11:12:04 2023] ? do_trap+0xc5/0x110 3034 [Tue Dec 26 11:12:04 2023] ? xen_alloc_p2m_entry+0x485/0x8c0 3035 [Tue Dec 26 11:12:04 2023] ? do_error_trap+0x6a/0x90 3036 [Tue Dec 26 11:12:04 2023] ? xen_alloc_p2m_entry+0x485/0x8c0 3037 [Tue Dec 26 11:12:04 2023] ? exc_invalid_op+0x4c/0x60 3038 [Tue Dec 26 11:12:04 2023] ? xen_alloc_p2m_entry+0x485/0x8c0 3039 [Tue Dec 26 11:12:04 2023] ? asm_exc_invalid_op+0x16/0x20 3040 [Tue Dec 26 11:12:04 2023] ? xen_alloc_p2m_entry+0x485/0x8c0 3041 [Tue Dec 26 11:12:04 2023] xen_alloc_unpopulated_pages+0x9a/0x430 3042 [Tue Dec 26 11:12:04 2023] gnttab_alloc_pages+0x14/0x40 3043 [Tue Dec 26 11:12:04 2023] gntdev_alloc_map+0x1cf/0x2e0 3044 [Tue Dec 26 11:12:04 2023] gntdev_ioctl+0x307/0x550 3045 [Tue Dec 26 11:12:04 2023] __x64_sys_ioctl+0x90/0xd0 3046 [Tue Dec 26 11:12:04 2023] do_syscall_64+0x3b/0x90 3047 [Tue Dec 26 11:12:04 2023] entry_SYSCALL_64_after_hwframe+0x64/0xce 3048 [Tue Dec 26 11:12:04 2023] RIP: 0033:0x7f5ea7261d0b 3049 [Tue Dec 26 11:12:04 2023] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 3050 [Tue Dec 26 11:12:04 2023] RSP: 002b:00007fffc998b0e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 3051 [Tue Dec 26 11:12:04 2023] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f5ea7261d0b 3052 [Tue Dec 26 11:12:04 2023] RDX: 00007fffc998b140 RSI: 0000000000184700 RDI: 000000000000000c 3053 [Tue Dec 26 11:12:04 2023] RBP: 00007fffc998b1f0 R08: 00007fffc998b21c R09: 00007fffc998b140 3054 [Tue Dec 26 11:12:04 2023] R10: 00007fffc998b21c R11: 0000000000000246 R12: 0000000000000001 3055 [Tue Dec 26 11:12:04 2023] R13: 0000000000000003 R14: 000000000000000c R15: 00007fffc998b140 3056 [Tue Dec 26 11:12:04 2023] </TASK> 3057 [Tue Dec 26 11:12:04 2023] Modules linked in: xen_pciback cfg80211 8021q garp mrp vfat fat amdgpu mfd_core iommu_v2 gpu_sched drm_buddy snd_hda_codec_realtek i2c_algo_bit drm_ttm_helper snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi ttm drm_display_helper snd_hda_intel cec btusb intel_rapl_msr snd_intel_dspcfg btrtl intel_rapl_common rc_core btbcm snd_intel_sdw_acpi snd_hda_codec btintel sd_mod crct10dif_pclmul snd_hda_core drm_kms_helper bluetooth snd_hwdep ghash_clmulni_intel snd_pcm sha512_ssse3 drm sp5100_tco ecdh_generic sha256_ssse3 uas snd_timer rfkill i2c_piix4 wmi_bmof joydev sha1_ssse3 usb_storage ecc pcspkr efi_pstore k10temp snd i2c_core soundcore ccp video wmi backlight gpio_amdpt gpio_generic mac_hid efivarfs xfs nvme nvme_core xhci_pci crc32_pclmul xhci_pci_renesas crc32c_intel t10_pi aesni_intel crypto_simd cryptd crc64_rocksoft r8169 ahci realtek xhci_hcd mdio_devres crc64 libahci libphy 3058 [Tue Dec 26 11:12:04 2023] ---[ end trace 0000000000000000 ]--- See lines 3013-3057 at https://salemdata.us/xen/20231226_bug/20231226_1111_root_xl_create.script.html Kernel A is available at: https://salemdata.us/xen/20231226_bug/kernel-config-6.1-2.67-gentoo-x86_64_virgin_r2 Kernel B is available at: https://salemdata.us/xen/20231226_bug/LVM_Xen_kernel-config-6.1.67-gentoo-x86_64 Note: The Gentoo Handbook at https://wiki.gentoo.org/wiki/Handbook:AMD64/Installation/Tools states: It's recommended that sys-block/io-scheduler-udev-rules is installed for the correct scheduler behavior with e.g. nvme devices: I did *not* install sys-block/io-scheduler-udev-rules. The directory on my server containing files related to thie bug is accessible and may be viewed at: https://salemdata.us/xen/20231226_bug/ Hardware: AMD Ryzen™ 9 7950X 16-Core, 32-Thread Unlocked Desktop Processor ASRock X670E Steel Legend AM5 ATX Motherboard. 4xDDR5 slots, PCIE 5.0X16, AMD Cross Fire, QuadM.2 slots, Dual LAN ports 1GB and 2.5GB , WIFI-6E,7.1 HD audio, HDMI 2.1, DP 1.4 ports Kingston FURY Beast 64GB 2 x 32GB DDR5 SDRAM Memory Kit KF548C38BBK264 Crucial P3 Plus 2TB PCIe 4.0 3D NAND NVMe M.2 SSD, up to 5000MB/s - CT2000P3PSSD8 Given that this looks to be a run time problem and a "kernel" bug that the Xen team will be concerned with, I'm not sure what else I can provide for this Gentoo bug filing. Please specify any further particulars desired and I'll upload and/or stage same. I plan to submit an email to xen-users@lists.xenproject.org referencing this bug.
I successfully posted to xen-users list: https://lists.xenproject.org/archives/html/xen-users/2023-12/msg00005.html
I began to wonder if, perhaps, my Dom0 was allocated all of the memory and that the attempt to create a guest was failing because there was no memory left. It appears, however, that is not the case since there is 64 GBs in the machine (currently, I plan to bump that up to 128 later this week) and Dom0 is currently allocated 48GB and the ryzdesk was allocated 16 GB. ryzwork /home/jlpoole # date; xl info Tue Dec 26 04:16:56 PM PST 2023 host : ryzwork release : 6.1.67-gentoo-x86_64 version : #1 SMP PREEMPT_DYNAMIC Sun Dec 24 19:07:13 PST 2023 machine : x86_64 nr_cpus : 32 max_cpu_id : 31 nr_nodes : 1 cores_per_socket : 16 threads_per_core : 2 cpu_mhz : 4491.631 hw_caps : 178bf3ff:76f8320b:2e500800:244037ff:0000000f:f1bf97a9:00405fce:00000500 virt_caps : pv hvm hvm_directio pv_directio hap shadow gnttab-v1 gnttab-v2 total_memory : 64632 free_memory : 137 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 16 xen_extra : .5 xen_version : 4.16.5 xen_caps : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit2 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : xen_commandline : placeholder no-real-mode edd=off cc_compiler : x86_64-pc-linux-gnu-gcc (Gentoo 13.2.1_p20230826 p7) 13.2.1 202 cc_compile_by : cc_compile_domain : cc_compile_date : Mon Dec 25 09:50:29 PST 2023 build_id : 8ea32f853a84f4cb9929ebb8984c03715f8498ed xend_config_format : 4 ryzwork /home/jlpoole # date; free Tue Dec 26 04:17:00 PM PST 2023 total used free shared buff/cache available Mem: 46958300 641512 42030820 1348 4285968 45706888 Swap: 0 0 0 ryzwork /home/jlpoole #
After a reboot without running any xen command, the free memory bumped up to 63741672 whereas running "free" after the failed attempt showed 46958300, so I'm guessing that the failed attempt cause the memory allocated for the guest VM to remain allocated and not freed. I'm still going to determine if I am specifying the Dom0 instance to having all the 64 GB -- I doubt I would have done that, but I probably would have been generous in my allocation to Dom0 since I'm building out a new server and would want it to have plenty of memory to speed things along. ryzwork /home/jlpoole # free total used free shared buff/cache available Mem: 63741672 282624 63346896 1348 112152 62985016 Swap: 0 0 0 ryzwork /home/jlpoole # xl info host : ryzwork release : 6.1.67-gentoo-x86_64 version : #1 SMP PREEMPT_DYNAMIC Sun Dec 24 19:07:13 PST 2023 machine : x86_64 nr_cpus : 32 max_cpu_id : 31 nr_nodes : 1 cores_per_socket : 16 threads_per_core : 2 cpu_mhz : 4491.612 hw_caps : 178bf3ff:76f8320b:2e500800:244037ff:0000000f:f1bf97a9:00405fce:00000500 virt_caps : pv hvm hvm_directio pv_directio hap shadow gnttab-v1 gnttab-v2 total_memory : 64632 free_memory : 131 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 16 xen_extra : .5 xen_version : 4.16.5 xen_caps : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit2 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : xen_commandline : placeholder no-real-mode edd=off cc_compiler : x86_64-pc-linux-gnu-gcc (Gentoo 13.2.1_p20230826 p7) 13.2.1 202 cc_compile_by : cc_compile_domain : cc_compile_date : Mon Dec 25 09:50:29 PST 2023 build_id : 8ea32f853a84f4cb9929ebb8984c03715f8498ed xend_config_format : 4 ryzwork /home/jlpoole #
Following the advice of https://wiki.xenproject.org/wiki/Xen_Project_Best_Practices I did the following three: 1) edited /etc/default/grub adding at the end: cat -n /etc/default/grub ... 104 # 105 # 12/16/2023 jlpoole: trying overcome the xl create bug 106 # per: https://wiki.xenproject.org/wiki/Xen_Project_Best_Practices 107 # 108 GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=1024M,max:1024M" 2) edited /etc/xen/xl.conf and set autoballoon to "0": cat -n /etc/xen/xl.conf ... 7 # Control whether dom0 is ballooned down when xen doesn't have enough 8 # free memory to create a domain. "auto" means only balloon if dom0 9 # starts with all the host's memory. 10 #autoballoon="auto" 11 # 12 # 12/26/2023 jlpoole 13 # ref: https://wiki.xenproject.org/wiki/Xen_Project_Best_Practices 14 # to overcome xl create: Gentoo Bug 920747 15 # 16 autoballoon=0 3) executed: grub-mkconfig -o /boot/grub/grub.cfg 4) rebooted I then ran "free" and "xl info" and updated this bug noting the difference in memory. I then tried: xl create ryzdesk.conf -c and it WORKED: I successfully launch the VM. So, it's looking like the kernel bug arises from autoballooning and/or not limiting memory in GRUB2. I would be tempted to close this bug, but I'll continue try out my new VM and it might be helpful for other reading this bug to determine if the condition I created might be better handled and/or documented to prevent others from making the same mistake. Someone else can close this bug with my blessing. If something else arises that hints of being related to this, I can re-open this bug.