I have the oops shown on ths screenshot : http://xena.ww7.be/oops012014.jpg that happens more or less once a day the banned user 81 shown on the screenshot is apache, which is the process that is most using the disk IO the server is a DELL R210 II with bios 2.7.0 the "banned user " message on the screenshot appears only on the console, which I can see through IDRAC console, but NOT written in any logs, so it seems the disks/raid are no more accessible at this time. Reproducible: Always Steps to Reproduce: 1.Dell R210 II server 2.kernel linux-3.17.7-hardened-r1 3.seems to crash on high disk usage Actual Results: kernel oops Expected Results: no oops ;) or a mce if its a hardware fault ( not sure yet , but it could be )
is there anything special that apache is doing?
also, a disk testing tool like fio should reproduce it if you are willing to try.
(In reply to Matthew Thode ( prometheanfire ) from comment #1) > is there anything special that apache is doing? nothing special that I know, but I suspect some kind of 0day . . . I had another oops on the sane server / hardware, for this one I have it in the logs Jan 15 22:16:17 gemelos kernel: divide error: 0000 [#1] SMP Jan 15 22:16:17 gemelos kernel: CPU: 2 PID: 18340 Comm: mysqld Not tainted 3.17.7-hardened-r1ww7_r10b #1 Jan 15 22:16:17 gemelos kernel: Hardware name: Dell Inc. PowerEdge R210 II/03X6X0, BIOS 2.7.0 11/15/2013 Jan 15 22:16:17 gemelos kernel: task: ee0c0930 ti: ee0c0c94 task.ti: ee0c0c94 Jan 15 22:16:17 gemelos kernel: EIP: 0060:[<00249241>] EFLAGS: 00210246 CPU: 2 Jan 15 22:16:17 gemelos kernel: EAX: 0000003a EBX: ffff66bd ECX: 00000000 EDX: 00000000 Jan 15 22:16:17 gemelos kernel: ESI: 0000003a EDI: 0000003a EBP: c230fc6c ESP: c230fc48 Jan 15 22:16:17 gemelos kernel: DS: 0068 ES: 0068 FS: 00d8 GS: 007b SS: 0068 Jan 15 22:16:17 gemelos kernel: CR0: 80050033 CR2: 204bc454 CR3: 01a04080 CR4: 001407f0 Jan 15 22:16:17 gemelos kernel: Stack: Jan 15 22:16:17 gemelos kernel: 00000542 00000000 c230fc74 00000000 00000000 00000000 00000000 00000000 Jan 15 22:16:17 gemelos kernel: 00000000 c230fca8 000c1e69 00000000 00000000 02a70000 00000000 00000000 Jan 15 22:16:17 gemelos kernel: 00000000 000002a7 0000003b 00000000 00000001 00000065 00000000 ee3d49c4 Jan 15 22:16:17 gemelos kernel: Call Trace: Jan 15 22:16:17 gemelos kernel: [<000c1e69>] bdi_position_ratio+0x181/0x1dd Jan 15 22:16:17 gemelos kernel: [<000c2fc5>] balance_dirty_pages_ratelimited+0x43f/0x739 Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b Jan 15 22:16:17 gemelos kernel: [<001a784d>] ? __ext4_journal_stop+0x53/0x6c Jan 15 22:16:17 gemelos kernel: [<00017ffe>] ? intel_pmu_hw_config+0xa7/0xca Jan 15 22:16:17 gemelos kernel: [<000bb9ff>] generic_perform_write+0x172/0x1af Jan 15 22:16:17 gemelos kernel: [<003c0000>] ? bnx2x_queue_comp_cmd+0xcf/0x12d Jan 15 22:16:17 gemelos kernel: [<000bcbc0>] __generic_file_write_iter+0x444/0x4c5 Jan 15 22:16:17 gemelos kernel: [<003c1000>] ? bnx2x_func_send_cmd+0xc7/0x459 Jan 15 22:16:17 gemelos kernel: [<00200246>] ? sha256_transform+0x19e0/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<003c0000>] ? bnx2x_queue_comp_cmd+0xcf/0x12d Jan 15 22:16:17 gemelos kernel: [<00180245>] ext4_file_write_iter+0x3b2/0x473 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<000ee35d>] new_sync_write+0x5c/0x83 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<0000f000>] ? init_intel_cacheinfo+0x291/0x3bd Jan 15 22:16:17 gemelos kernel: [<003c0000>] ? bnx2x_queue_comp_cmd+0xcf/0x12d Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<000ee301>] ? do_sync_readv_writev+0x70/0x70 Jan 15 22:16:17 gemelos kernel: [<000eee57>] vfs_write+0xe8/0x1c8 Jan 15 22:16:17 gemelos kernel: [<000ef286>] SyS_write+0x3f/0x7f Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<003c0000>] ? bnx2x_queue_comp_cmd+0xcf/0x12d Jan 15 22:16:17 gemelos kernel: [<00510b09>] syscall_call+0x7/0x7 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010033>] ? print_cpu_info+0x19/0xb0 Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00020033>] ? smp_trace_threshold_interrupt+0x13/0x85 Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00200033>] ? sha256_transform+0x17cd/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00200033>] ? sha256_transform+0x17cd/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 Jan 15 22:16:17 gemelos kernel: Code: 89 f9 83 ec 18 89 d7 8b 51 04 8b 01 85 d2 89 45 e8 89 d0 89 55 ec 75 2e 8b 4d e8 89 f3 89 fe 39 ce 73 04 31 f6 eb 10 89 f0 31 d2 <f7> f1 31 d2 89 c6 89 f8 f7 f1 89 d7 89 d8 89 fa 89 f3 f7 f1 89 Jan 15 22:16:17 gemelos kernel: EIP: [<00249241>] div64_u64+0x36/0x106 SS:ESP 0068:c230fc48 Jan 15 22:16:17 gemelos kernel: divide error: 0000 [#2] Jan 15 22:16:17 gemelos kernel: ---[ end trace 16e28ee794763227 ]--- Jan 15 22:16:17 gemelos kernel: grsec: banning user with uid 60 until system restart for suspicious kernel crash Jan 15 22:16:17 gemelos kernel: SMP Jan 15 22:16:17 gemelos kernel: CPU: 4 PID: 18516 Comm: apache2 Tainted: G D 3.17.7-hardened-r1ww7_r10b #1 Jan 15 22:16:17 gemelos kernel: Hardware name: Dell Inc. PowerEdge R210 II/03X6X0, BIOS 2.7.0 11/15/2013 Jan 15 22:16:17 gemelos kernel: task: ee0eced0 ti: ee0ed234 task.ti: ee0ed234 Jan 15 22:16:17 gemelos kernel: EIP: 0060:[<00249241>] EFLAGS: 00210246 CPU: 4 Jan 15 22:16:17 gemelos kernel: EAX: 0000003a EBX: ffff6647 ECX: 00000000 EDX: 00000000 Jan 15 22:16:17 gemelos kernel: ESI: 0000003a EDI: 0000003a EBP: c23ebc58 ESP: c23ebc34 Jan 15 22:16:17 gemelos kernel: DS: 0068 ES: 0068 FS: 00d8 GS: 007b SS: 0068 Jan 15 22:16:17 gemelos kernel: CR0: 80050033 CR2: a18c2000 CR3: 01a04100 CR4: 001407f0 Jan 15 22:16:17 gemelos kernel: Stack: Jan 15 22:16:17 gemelos kernel: 00000542 00000000 c23ebc60 00000000 00000000 00000000 00000000 00000000 Jan 15 22:16:17 gemelos kernel: 00000000 c23ebc94 000c1e69 00000000 00000000 02a70000 00000000 00000000 Jan 15 22:16:17 gemelos kernel: 00000000 000002a7 0000003b 00000000 00000001 00000065 00000000 ee3d49c4 Jan 15 22:16:17 gemelos kernel: Call Trace: Jan 15 22:16:17 gemelos kernel: [<000c1e69>] bdi_position_ratio+0x181/0x1dd Jan 15 22:16:17 gemelos kernel: [<000c2fc5>] balance_dirty_pages_ratelimited+0x43f/0x739 Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b Jan 15 22:16:17 gemelos kernel: [<001a784d>] ? __ext4_journal_stop+0x53/0x6c Jan 15 22:16:17 gemelos kernel: [<00017ffe>] ? intel_pmu_hw_config+0xa7/0xca Jan 15 22:16:17 gemelos kernel: [<000bb9ff>] generic_perform_write+0x172/0x1af Jan 15 22:16:17 gemelos kernel: [<000bcbc0>] __generic_file_write_iter+0x444/0x4c5 Jan 15 22:16:17 gemelos kernel: [<00200246>] ? sha256_transform+0x19e0/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00180245>] ext4_file_write_iter+0x3b2/0x473 Jan 15 22:16:17 gemelos kernel: [<000ee35d>] new_sync_write+0x5c/0x83 Jan 15 22:16:17 gemelos kernel: [<000ee301>] ? do_sync_readv_writev+0x70/0x70 Jan 15 22:16:17 gemelos kernel: [<000eee57>] vfs_write+0xe8/0x1c8 Jan 15 22:16:17 gemelos kernel: [<000ef391>] SyS_pwrite64+0x52/0x79 Jan 15 22:16:17 gemelos kernel: [<00510b09>] syscall_call+0x7/0x7 Jan 15 22:16:17 gemelos kernel: [<00200246>] ? sha256_transform+0x19e0/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00510b29>] ? restore_all_pax+0xc/0xc Jan 15 22:16:17 gemelos kernel: [<0051007b>] ? ldsem_down_read+0x3b/0x163 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00200202>] ? sha256_transform+0x199c/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00200033>] ? sha256_transform+0x17cd/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00200286>] ? sha256_transform+0x1a20/0x24a2 Jan 15 22:16:17 gemelos kernel: Code: 89 f9 83 ec 18 89 d7 8b 51 04 8b 01 85 d2 89 45 e8 89 d0 89 55 ec 75 2e 8b 4d e8 89 f3 89 fe 39 ce 73 04 31 f6 eb 10 89 f0 31 d2 <f7> f1 31 d2 89 c6 89 f8 f7 f1 89 d7 89 d8 89 fa 89 f3 f7 f1 89 Jan 15 22:16:17 gemelos kernel: EIP: [<00249241>] div64_u64+0x36/0x106 SS:ESP 0068:c23ebc34 Jan 15 22:16:17 gemelos kernel: divide error: 0000 [#3] Jan 15 22:16:17 gemelos kernel: ---[ end trace 16e28ee794763228 ]--- Jan 15 22:16:17 gemelos kernel: grsec: banning user with uid 81 until system restart for suspicious kernel crash Jan 15 22:16:17 gemelos kernel: SMP Jan 15 22:16:17 gemelos kernel: CPU: 6 PID: 18483 Comm: mysqld Tainted: G D 3.17.7-hardened-r1ww7_r10b #1 Jan 15 22:16:17 gemelos kernel: Hardware name: Dell Inc. PowerEdge R210 II/03X6X0, BIOS 2.7.0 11/15/2013 Jan 15 22:16:17 gemelos kernel: task: ee0a4e10 ti: ee0a5174 task.ti: ee0a5174 Jan 15 22:16:17 gemelos kernel: EIP: 0060:[<00249241>] EFLAGS: 00210246 CPU: 6 Jan 15 22:16:17 gemelos kernel: EAX: 0000003a EBX: ffff64e5 ECX: 00000000 EDX: 00000000 Jan 15 22:16:17 gemelos kernel: ESI: 0000003a EDI: 0000003a EBP: ebe7bbfc ESP: ebe7bbd8 Jan 15 22:16:17 gemelos kernel: DS: 0068 ES: 0068 FS: 00d8 GS: 007b SS: 0068 Jan 15 22:16:17 gemelos kernel: CR0: 80050033 CR2: a3400000 CR3: 01a04180 CR4: 001407f0 Jan 15 22:16:17 gemelos kernel: Stack: Jan 15 22:16:17 gemelos kernel: 00000542 00000000 ebe7bc04 00000000 00000000 00000000 00000000 00000000 Jan 15 22:16:17 gemelos kernel: 00000000 ebe7bc38 000c1e69 00000000 00000000 02a70000 00000000 00000000 Jan 15 22:16:17 gemelos kernel: 00000000 000002a7 0000003b 00000000 00000001 00000065 00000000 ee3d49c4 Jan 15 22:16:17 gemelos kernel: Call Trace: Jan 15 22:16:17 gemelos kernel: [<000c1e69>] bdi_position_ratio+0x181/0x1dd Jan 15 22:16:17 gemelos kernel: [<000c2fc5>] balance_dirty_pages_ratelimited+0x43f/0x739 Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b Jan 15 22:16:17 gemelos kernel: [<001a784d>] ? __ext4_journal_stop+0x53/0x6c Jan 15 22:16:17 gemelos kernel: [<00017ffe>] ? intel_pmu_hw_config+0xa7/0xca Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<000bb9ff>] generic_perform_write+0x172/0x1af Jan 15 22:16:17 gemelos kernel: [<000bcbc0>] __generic_file_write_iter+0x444/0x4c5 Jan 15 22:16:17 gemelos kernel: [<00200246>] ? sha256_transform+0x19e0/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00180245>] ext4_file_write_iter+0x3b2/0x473 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<000ee35d>] new_sync_write+0x5c/0x83 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<000ee301>] ? do_sync_readv_writev+0x70/0x70 Jan 15 22:16:17 gemelos kernel: [<000eee57>] vfs_write+0xe8/0x1c8 Jan 15 22:16:17 gemelos kernel: [<000ef286>] SyS_write+0x3f/0x7f Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00510b09>] syscall_call+0x7/0x7 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00510b29>] ? restore_all_pax+0xc/0xc Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 Jan 15 22:16:17 gemelos kernel: [<00200033>] ? sha256_transform+0x17cd/0x24a2 Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 Jan 15 22:16:17 gemelos kernel: Code: 89 f9 83 ec 18 89 d7 8b 51 04 8b 01 85 d2 89 45 e8 89 d0 89 55 ec 75 2e 8b 4d e8 89 f3 89 fe 39 ce 73 04 31 f6 eb 10 89 f0 31 d2 <f7> f1 31 d2 89 c6 89 f8 f7 f1 89 d7 89 d8 89 fa 89 f3 f7 f1 89 Jan 15 22:16:17 gemelos kernel: EIP: [<00249241>] div64_u64+0x36/0x106 SS:ESP 0068:ebe7bbd8 Jan 15 22:16:17 gemelos kernel: ---[ end trace 16e28ee794763229 ]--- Jan 15 22:16:17 gemelos kernel: grsec: banning user with uid 60 until system restart for suspicious kernel crash
(In reply to Matthew Thode ( prometheanfire ) from comment #2) > also, a disk testing tool like fio should reproduce it if you are willing to > try. I installed this fio, and I can try that, any recommended options to run it ?
(In reply to William Waisse from comment #3) > (In reply to Matthew Thode ( prometheanfire ) from comment #1) > > is there anything special that apache is doing? > > nothing special that I know, but I suspect some kind of 0day . . . > > I had another oops on the sane server / hardware, for this one I have it in > the logs > > Jan 15 22:16:17 gemelos kernel: divide error: 0000 [#1] SMP > Jan 15 22:16:17 gemelos kernel: CPU: 2 PID: 18340 Comm: mysqld Not tainted > 3.17.7-hardened-r1ww7_r10b #1 > Jan 15 22:16:17 gemelos kernel: Hardware name: Dell Inc. PowerEdge R210 > II/03X6X0, BIOS 2.7.0 11/15/2013 > Jan 15 22:16:17 gemelos kernel: task: ee0c0930 ti: ee0c0c94 task.ti: ee0c0c94 > Jan 15 22:16:17 gemelos kernel: EIP: 0060:[<00249241>] EFLAGS: 00210246 CPU: > 2 > Jan 15 22:16:17 gemelos kernel: EAX: 0000003a EBX: ffff66bd ECX: 00000000 > EDX: 00000000 > Jan 15 22:16:17 gemelos kernel: ESI: 0000003a EDI: 0000003a EBP: c230fc6c > ESP: c230fc48 > Jan 15 22:16:17 gemelos kernel: DS: 0068 ES: 0068 FS: 00d8 GS: 007b SS: 0068 > Jan 15 22:16:17 gemelos kernel: CR0: 80050033 CR2: 204bc454 CR3: 01a04080 > CR4: 001407f0 > Jan 15 22:16:17 gemelos kernel: Stack: > Jan 15 22:16:17 gemelos kernel: 00000542 00000000 c230fc74 00000000 00000000 > 00000000 00000000 00000000 > Jan 15 22:16:17 gemelos kernel: 00000000 c230fca8 000c1e69 00000000 00000000 > 02a70000 00000000 00000000 > Jan 15 22:16:17 gemelos kernel: 00000000 000002a7 0000003b 00000000 00000001 > 00000065 00000000 ee3d49c4 > Jan 15 22:16:17 gemelos kernel: Call Trace: > Jan 15 22:16:17 gemelos kernel: [<000c1e69>] bdi_position_ratio+0x181/0x1dd > Jan 15 22:16:17 gemelos kernel: [<000c2fc5>] > balance_dirty_pages_ratelimited+0x43f/0x739 > Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b > Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b > Jan 15 22:16:17 gemelos kernel: [<001a784d>] ? __ext4_journal_stop+0x53/0x6c > Jan 15 22:16:17 gemelos kernel: [<00017ffe>] ? intel_pmu_hw_config+0xa7/0xca > Jan 15 22:16:17 gemelos kernel: [<000bb9ff>] > generic_perform_write+0x172/0x1af > Jan 15 22:16:17 gemelos kernel: [<003c0000>] ? > bnx2x_queue_comp_cmd+0xcf/0x12d > Jan 15 22:16:17 gemelos kernel: [<000bcbc0>] > __generic_file_write_iter+0x444/0x4c5 > Jan 15 22:16:17 gemelos kernel: [<003c1000>] ? bnx2x_func_send_cmd+0xc7/0x459 > Jan 15 22:16:17 gemelos kernel: [<00200246>] ? sha256_transform+0x19e0/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<003c0000>] ? > bnx2x_queue_comp_cmd+0xcf/0x12d > Jan 15 22:16:17 gemelos kernel: [<00180245>] ext4_file_write_iter+0x3b2/0x473 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<000ee35d>] new_sync_write+0x5c/0x83 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<0000f000>] ? > init_intel_cacheinfo+0x291/0x3bd > Jan 15 22:16:17 gemelos kernel: [<003c0000>] ? > bnx2x_queue_comp_cmd+0xcf/0x12d > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<000ee301>] ? do_sync_readv_writev+0x70/0x70 > Jan 15 22:16:17 gemelos kernel: [<000eee57>] vfs_write+0xe8/0x1c8 > Jan 15 22:16:17 gemelos kernel: [<000ef286>] SyS_write+0x3f/0x7f > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<003c0000>] ? > bnx2x_queue_comp_cmd+0xcf/0x12d > Jan 15 22:16:17 gemelos kernel: [<00510b09>] syscall_call+0x7/0x7 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010033>] ? print_cpu_info+0x19/0xb0 > Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00020033>] ? > smp_trace_threshold_interrupt+0x13/0x85 > Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00200033>] ? sha256_transform+0x17cd/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00200033>] ? sha256_transform+0x17cd/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 > Jan 15 22:16:17 gemelos kernel: Code: 89 f9 83 ec 18 89 d7 8b 51 04 8b 01 85 > d2 89 45 e8 89 d0 89 55 ec 75 2e 8b 4d e8 89 f3 89 fe 39 ce 73 04 31 f6 eb > 10 89 f0 31 d2 <f7> f1 31 d2 89 c6 89 f8 f7 f1 89 d7 89 d8 89 fa 89 f3 f7 f1 > 89 > Jan 15 22:16:17 gemelos kernel: EIP: [<00249241>] div64_u64+0x36/0x106 > SS:ESP 0068:c230fc48 > Jan 15 22:16:17 gemelos kernel: divide error: 0000 [#2] > Jan 15 22:16:17 gemelos kernel: ---[ end trace 16e28ee794763227 ]--- > Jan 15 22:16:17 gemelos kernel: grsec: banning user with uid 60 until system > restart for suspicious kernel crash > Jan 15 22:16:17 gemelos kernel: SMP > Jan 15 22:16:17 gemelos kernel: CPU: 4 PID: 18516 Comm: apache2 Tainted: G > D 3.17.7-hardened-r1ww7_r10b #1 > Jan 15 22:16:17 gemelos kernel: Hardware name: Dell Inc. PowerEdge R210 > II/03X6X0, BIOS 2.7.0 11/15/2013 > Jan 15 22:16:17 gemelos kernel: task: ee0eced0 ti: ee0ed234 task.ti: ee0ed234 > Jan 15 22:16:17 gemelos kernel: EIP: 0060:[<00249241>] EFLAGS: 00210246 CPU: > 4 > Jan 15 22:16:17 gemelos kernel: EAX: 0000003a EBX: ffff6647 ECX: 00000000 > EDX: 00000000 > Jan 15 22:16:17 gemelos kernel: ESI: 0000003a EDI: 0000003a EBP: c23ebc58 > ESP: c23ebc34 > Jan 15 22:16:17 gemelos kernel: DS: 0068 ES: 0068 FS: 00d8 GS: 007b SS: 0068 > Jan 15 22:16:17 gemelos kernel: CR0: 80050033 CR2: a18c2000 CR3: 01a04100 > CR4: 001407f0 > Jan 15 22:16:17 gemelos kernel: Stack: > Jan 15 22:16:17 gemelos kernel: 00000542 00000000 c23ebc60 00000000 00000000 > 00000000 00000000 00000000 > Jan 15 22:16:17 gemelos kernel: 00000000 c23ebc94 000c1e69 00000000 00000000 > 02a70000 00000000 00000000 > Jan 15 22:16:17 gemelos kernel: 00000000 000002a7 0000003b 00000000 00000001 > 00000065 00000000 ee3d49c4 > Jan 15 22:16:17 gemelos kernel: Call Trace: > Jan 15 22:16:17 gemelos kernel: [<000c1e69>] bdi_position_ratio+0x181/0x1dd > Jan 15 22:16:17 gemelos kernel: [<000c2fc5>] > balance_dirty_pages_ratelimited+0x43f/0x739 > Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b > Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b > Jan 15 22:16:17 gemelos kernel: [<001a784d>] ? __ext4_journal_stop+0x53/0x6c > Jan 15 22:16:17 gemelos kernel: [<00017ffe>] ? intel_pmu_hw_config+0xa7/0xca > Jan 15 22:16:17 gemelos kernel: [<000bb9ff>] > generic_perform_write+0x172/0x1af > Jan 15 22:16:17 gemelos kernel: [<000bcbc0>] > __generic_file_write_iter+0x444/0x4c5 > Jan 15 22:16:17 gemelos kernel: [<00200246>] ? sha256_transform+0x19e0/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00180245>] ext4_file_write_iter+0x3b2/0x473 > Jan 15 22:16:17 gemelos kernel: [<000ee35d>] new_sync_write+0x5c/0x83 > Jan 15 22:16:17 gemelos kernel: [<000ee301>] ? do_sync_readv_writev+0x70/0x70 > Jan 15 22:16:17 gemelos kernel: [<000eee57>] vfs_write+0xe8/0x1c8 > Jan 15 22:16:17 gemelos kernel: [<000ef391>] SyS_pwrite64+0x52/0x79 > Jan 15 22:16:17 gemelos kernel: [<00510b09>] syscall_call+0x7/0x7 > Jan 15 22:16:17 gemelos kernel: [<00200246>] ? sha256_transform+0x19e0/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00510b29>] ? restore_all_pax+0xc/0xc > Jan 15 22:16:17 gemelos kernel: [<0051007b>] ? ldsem_down_read+0x3b/0x163 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00200202>] ? sha256_transform+0x199c/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00200033>] ? sha256_transform+0x17cd/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00200286>] ? sha256_transform+0x1a20/0x24a2 > Jan 15 22:16:17 gemelos kernel: Code: 89 f9 83 ec 18 89 d7 8b 51 04 8b 01 85 > d2 89 45 e8 89 d0 89 55 ec 75 2e 8b 4d e8 89 f3 89 fe 39 ce 73 04 31 f6 eb > 10 89 f0 31 d2 <f7> f1 31 d2 89 c6 89 f8 f7 f1 89 d7 89 d8 89 fa 89 f3 f7 f1 > 89 > Jan 15 22:16:17 gemelos kernel: EIP: [<00249241>] div64_u64+0x36/0x106 > SS:ESP 0068:c23ebc34 > Jan 15 22:16:17 gemelos kernel: divide error: 0000 [#3] > Jan 15 22:16:17 gemelos kernel: ---[ end trace 16e28ee794763228 ]--- > Jan 15 22:16:17 gemelos kernel: grsec: banning user with uid 81 until system > restart for suspicious kernel crash > Jan 15 22:16:17 gemelos kernel: SMP > Jan 15 22:16:17 gemelos kernel: CPU: 6 PID: 18483 Comm: mysqld Tainted: G > D 3.17.7-hardened-r1ww7_r10b #1 > Jan 15 22:16:17 gemelos kernel: Hardware name: Dell Inc. PowerEdge R210 > II/03X6X0, BIOS 2.7.0 11/15/2013 > Jan 15 22:16:17 gemelos kernel: task: ee0a4e10 ti: ee0a5174 task.ti: ee0a5174 > Jan 15 22:16:17 gemelos kernel: EIP: 0060:[<00249241>] EFLAGS: 00210246 CPU: > 6 > Jan 15 22:16:17 gemelos kernel: EAX: 0000003a EBX: ffff64e5 ECX: 00000000 > EDX: 00000000 > Jan 15 22:16:17 gemelos kernel: ESI: 0000003a EDI: 0000003a EBP: ebe7bbfc > ESP: ebe7bbd8 > Jan 15 22:16:17 gemelos kernel: DS: 0068 ES: 0068 FS: 00d8 GS: 007b SS: 0068 > Jan 15 22:16:17 gemelos kernel: CR0: 80050033 CR2: a3400000 CR3: 01a04180 > CR4: 001407f0 > Jan 15 22:16:17 gemelos kernel: Stack: > Jan 15 22:16:17 gemelos kernel: 00000542 00000000 ebe7bc04 00000000 00000000 > 00000000 00000000 00000000 > Jan 15 22:16:17 gemelos kernel: 00000000 ebe7bc38 000c1e69 00000000 00000000 > 02a70000 00000000 00000000 > Jan 15 22:16:17 gemelos kernel: 00000000 000002a7 0000003b 00000000 00000001 > 00000065 00000000 ee3d49c4 > Jan 15 22:16:17 gemelos kernel: Call Trace: > Jan 15 22:16:17 gemelos kernel: [<000c1e69>] bdi_position_ratio+0x181/0x1dd > Jan 15 22:16:17 gemelos kernel: [<000c2fc5>] > balance_dirty_pages_ratelimited+0x43f/0x739 > Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b > Jan 15 22:16:17 gemelos kernel: [<00498fe8>] ? nft_target_init+0x6b/0x17b > Jan 15 22:16:17 gemelos kernel: [<001a784d>] ? __ext4_journal_stop+0x53/0x6c > Jan 15 22:16:17 gemelos kernel: [<00017ffe>] ? intel_pmu_hw_config+0xa7/0xca > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<000bb9ff>] > generic_perform_write+0x172/0x1af > Jan 15 22:16:17 gemelos kernel: [<000bcbc0>] > __generic_file_write_iter+0x444/0x4c5 > Jan 15 22:16:17 gemelos kernel: [<00200246>] ? sha256_transform+0x19e0/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00180245>] ext4_file_write_iter+0x3b2/0x473 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<000ee35d>] new_sync_write+0x5c/0x83 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<000ee301>] ? do_sync_readv_writev+0x70/0x70 > Jan 15 22:16:17 gemelos kernel: [<000eee57>] vfs_write+0xe8/0x1c8 > Jan 15 22:16:17 gemelos kernel: [<000ef286>] SyS_write+0x3f/0x7f > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00510b09>] syscall_call+0x7/0x7 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00510b29>] ? restore_all_pax+0xc/0xc > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00010000>] ? print_cpu_msr+0x3b/0x55 > Jan 15 22:16:17 gemelos kernel: [<00200033>] ? sha256_transform+0x17cd/0x24a2 > Jan 15 22:16:17 gemelos kernel: [<00200293>] ? sha256_transform+0x1a2d/0x24a2 > Jan 15 22:16:17 gemelos kernel: Code: 89 f9 83 ec 18 89 d7 8b 51 04 8b 01 85 > d2 89 45 e8 89 d0 89 55 ec 75 2e 8b 4d e8 89 f3 89 fe 39 ce 73 04 31 f6 eb > 10 89 f0 31 d2 <f7> f1 31 d2 89 c6 89 f8 f7 f1 89 d7 89 d8 89 fa 89 f3 f7 f1 > 89 > Jan 15 22:16:17 gemelos kernel: EIP: [<00249241>] div64_u64+0x36/0x106 > SS:ESP 0068:ebe7bbd8 > Jan 15 22:16:17 gemelos kernel: ---[ end trace 16e28ee794763229 ]--- > Jan 15 22:16:17 gemelos kernel: grsec: banning user with uid 60 until system > restart for suspicious kernel crash I cross posted this second oops on the lkml : https://lkml.org/lkml/2015/1/18/60
(In reply to William Waisse from comment #5) > > I cross posted this second oops on the lkml : > https://lkml.org/lkml/2015/1/18/60 Sorry this was assigned to the wrong alias and I'm just seeing it. Upstream vanilla kernel will not be interested in a grsec (ie heavily) patch kernel, although from the oops it doesn't look like grsec/pax is causing it. The problem has something to do with your broadcom card (bnx2x driver) and I have had a report in irc about a similar oops. Maybe pipacs upstream might see something else there. You can try two things to narrow it down: 1) try hardened-sources 3.18.2-r1 which is the very latest grsec/pax patch 2) try the vanilla equivalent and see if you hit the same oops.
adding a few details 3.17.7-hardened i686-pc-linux-gnu-4.8.3 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20)
it looks like a divide-by-0 bug somewhere in ext4/vfs, it's not clear what code path is triggering it. if you still have vmlinux around you could try to resolve the address reported for generic_perform_write via addr2line. also check 3.18 as we've moved off of 3.17 already.
is it : addr2line -e vmlinux -fip 001721af ext3_fill_super at /usr/src/linux/fs/ext3/super.c:1961 (discriminator 1) ?
(In reply to William Waisse from comment #9) > addr2line -e vmlinux -fip 001721af > ext3_fill_super at /usr/src/linux/fs/ext3/super.c:1961 (discriminator 1) the theory is fine but i don't know where that address comes from, it's certainly not for generic_perform_write as that would then show up in the addr2line output as well ;). did you make sure that the address you took from the oops is for the same kernel whose vmlinux image you passed to addr2line? also enabling CONFIG_DEBUG_INFO/CONFIG_DEBUG_INFO_REDUCED will produce better output.
(In reply to PaX Team from comment #10) > (In reply to William Waisse from comment #9) > > addr2line -e vmlinux -fip 001721af > > ext3_fill_super at /usr/src/linux/fs/ext3/super.c:1961 (discriminator 1) > > the theory is fine but i don't know where that address comes from, it's > certainly not for generic_perform_write as that would then show up in the > addr2line output as well ;). did you make sure that the address you took > from the oops is for the same kernel whose vmlinux image you passed to > addr2line? also enabling CONFIG_DEBUG_INFO/CONFIG_DEBUG_INFO_REDUCED will > produce better output. the addr2line is ran on the kernel I rebuilt with the same options and source code, but adding CONFIG_DEBUG_INFO/CONFIG_DEBUG_INFO_REDUCED the vmlinuz in usr/src/linux have been overwritten when I rebuilt the kernel with the debug options the addr2line wont work on the vmlinuz files I have in /boot after make install, seems to work only on the vmlinuz in the source tree. gemelos boot # addr2line -e vmlinuz-3.17.7-hardened-r1ww7_r10b -fip 001721af addr2line: vmlinuz-3.17.7-hardened-r1ww7_r10b: File format not recognized gemelos boot # addr2line -e vmlinuz-3.17.7-hardened-r1ww7_r10b_debug -fip 001721af addr2line: vmlinuz-3.17.7-hardened-r1ww7_r10b_debug: File format not recognized I m now waiting for another oops with the debug-enabled kernel
I finally had another crash, the oops is a little bit different : Feb 3 09:02:28 gemelos kernel: divide error: 0000 [#1] SMP DEBUG_PAGEALLOC Feb 3 09:02:28 gemelos kernel: CPU: 4 PID: 15864 Comm: mysqld Not tainted 3.17.7-hardened-r1ww7_r10b_debug #3 Feb 3 09:02:28 gemelos kernel: Hardware name: Dell Inc. PowerEdge R210 II/03X6X0, BIOS 2.7.0 11/15/2013 Feb 3 09:02:28 gemelos kernel: task: d3c8e190 ti: d3c8e4f4 task.ti: d3c8e4f4 Feb 3 09:02:28 gemelos kernel: EIP: 0060:[<002477e8>] EFLAGS: 00210246 CPU: 4 Feb 3 09:02:28 gemelos kernel: EAX: 00000356 EBX: 00000000 ECX: 00000356 EDX: 00000000 Feb 3 09:02:28 gemelos kernel: ESI: 00000000 EDI: 00000000 EBP: d2a7dd18 ESP: d2a7dcec Feb 3 09:02:28 gemelos kernel: DS: 0068 ES: 0068 FS: 00d8 GS: 007b SS: 0068 Feb 3 09:02:28 gemelos kernel: CR0: 80050033 CR2: 1e0db000 CR3: 01a04100 CR4: 001407f0 Feb 3 09:02:28 gemelos kernel: Stack: Feb 3 09:02:28 gemelos kernel: 00000000 ffffffff 00000000 00000000 ffffc08b 00000356 00000356 000c46fa Feb 3 09:02:28 gemelos kernel: 00000000 c000001d 00000001 d2a7dd4c 000c48cd 00000000 00000000 00000000 Feb 3 09:02:28 gemelos kernel: 00010000 ffffffff 0000001d 00000357 00000000 00000001 0000002f ecf429c4 Feb 3 09:02:28 gemelos kernel: Call Trace: Feb 3 09:02:28 gemelos kernel: [<000c46fa>] ? pos_ratio_polynom+0x25/0x77 Feb 3 09:02:28 gemelos kernel: [<000c48cd>] bdi_position_ratio+0x181/0x1dc Feb 3 09:02:28 gemelos kernel: [<00010000>] ? identify_cpu+0x73/0x32d Feb 3 09:02:28 gemelos kernel: [<000c5a90>] balance_dirty_pages_ratelimited+0x366/0x656 Feb 3 09:02:28 gemelos kernel: [<000be51c>] generic_perform_write+0x151/0x192 Feb 3 09:02:28 gemelos kernel: [<0011036c>] ? ftrace_raw_output_writeback_single_inode_template+0x79/0x98 Feb 3 09:02:28 gemelos kernel: [<000bfc1a>] __generic_file_write_iter+0x42f/0x478 Feb 3 09:02:28 gemelos kernel: [<0011047c>] ? perf_trace_bdi_dirty_ratelimit+0x2a/0xf2 Feb 3 09:02:28 gemelos kernel: [<0011036c>] ? ftrace_raw_output_writeback_single_inode_template+0x79/0x98 Feb 3 09:02:28 gemelos kernel: [<00180e42>] ext4_file_write_iter+0x2dd/0x46a Feb 3 09:02:28 gemelos kernel: [<000f0c89>] new_sync_write+0x5c/0x83 Feb 3 09:02:28 gemelos kernel: [<0011036c>] ? ftrace_raw_output_writeback_single_inode_template+0x79/0x98 Feb 3 09:02:28 gemelos kernel: [<000f0c2d>] ? new_sync_read+0x80/0x80 Feb 3 09:02:28 gemelos kernel: [<000f1509>] vfs_write+0xe8/0x1c5 Feb 3 09:02:28 gemelos kernel: [<000f1bb0>] SyS_pwrite64+0x4e/0x75 Feb 3 09:02:28 gemelos kernel: [<0011036c>] ? ftrace_raw_output_writeback_single_inode_template+0x79/0x98 Feb 3 09:02:28 gemelos kernel: [<00507d89>] syscall_call+0x7/0x7 Feb 3 09:02:28 gemelos kernel: [<0011036c>] ? ftrace_raw_output_writeback_single_inode_template+0x79/0x98 Feb 3 09:02:28 gemelos kernel: [<00110033>] ? ftrace_raw_output_writeback_pages_written+0x37/0x3e Feb 3 09:02:28 gemelos kernel: [<00200293>] ? sha512_transform+0x528/0x1476 Feb 3 09:02:28 gemelos kernel: [<00200033>] ? sha512_transform+0x2c8/0x1476 Feb 3 09:02:28 gemelos kernel: [<00200296>] ? sha512_transform+0x52b/0x1476 Feb 3 09:02:28 gemelos kernel: [<00200296>] ? sha512_transform+0x52b/0x1476 Feb 3 09:02:28 gemelos kernel: Code: e5 57 56 53 8d 7d 08 83 ec 20 8b 37 8b 7f 04 89 45 e4 89 55 e8 85 ff 75 35 8b 4d e8 39 f1 89 4d ec 73 04 31 c9 eb 12 89 c8 31 d2 <f7> f6 31 d2 89 c1 8b 45 ec f7 f6 89 55 ec 8b 5d e4 8b 55 ec 89 Feb 3 09:02:28 gemelos kernel: EIP: [<002477e8>] div64_u64+0x2d/0x126 SS:ESP 0068:d2a7dcec Feb 3 09:02:28 gemelos kernel: ---[ end trace 449da52219d682ad ]--- I will now try with hardened-sources 3.18.5 . . .
so with the latest crash the adress reported for generic_perform_write is Feb 3 09:02:28 gemelos kernel: [<000be51c>] generic_perform_write+0x151/0x192 and i have trhe vmlinux for addr2line : addr2line -e vmlinux -fip 00151192 reiserfs_write_dquot at /usr/src/linux/fs/reiserfs/super.c:2213 addr2line -e vmlinux -fip 000be51c constant_test_bit at /usr/src/linux/./arch/x86/include/asm/bitops.h:311 (inlined by) test_ti_thread_flag at /usr/src/linux/include/linux/thread_info.h:91 (inlined by) test_tsk_thread_flag at /usr/src/linux/include/linux/sched.h:2828 (inlined by) signal_pending at /usr/src/linux/include/linux/sched.h:2854 (inlined by) fatal_signal_pending at /usr/src/linux/include/linux/sched.h:2864 (inlined by) generic_perform_write at /usr/src/linux/mm/filemap.c:2526 is that what you need ? ( theres no reiserfs at all on this server )
can you resolve 2477e8?
addr2line -e vmlinux -fip 002477e8 div_u64_rem at /usr/src/linux/./arch/x86/include/asm/div64.h:53 (inlined by) div_u64 at /usr/src/linux/include/linux/math64.h:100 (inlined by) div64_u64 at /usr/src/linux/lib/div64.c:139
and this one: c48cd? what happens here is a division by 0 error, i have no idea how we would cause this though, we don't change this writeback code...
addr2line -e vmlinux -fip 00c48cd bdi_position_ratio at /usr/src/linux/mm/page-writeback.c:824
so, the problem occured in this code in mm/page-writeback.c: 820 »·······span = (thresh - bdi_thresh + 8 * write_bw) * (u64)x >> 16; 821 »·······x_intercept = bdi_setpoint + span; 822 823 »·······if (bdi_dirty < x_intercept - span / 4) { 824 »·······»·······pos_ratio = div64_u64(pos_ratio * (x_intercept - bdi_dirty), 825 »·······»·······»·······»······· x_intercept - bdi_setpoint + 1); the divisor x_intercept-bdi_setpoint+1 was 0, which means that span+1 was 0 which is kinda impossible given the above code, so i wonder what could go so wrong. can you post your kernel config (in particular, i'm wondering if you have enabled the SIZE_OVERFLOW plugin)?
Created attachment 395562 [details] kernel config
so we think it's an upstream bug https://lkml.org/lkml/2014/4/29/497 that was fixed only on 64 bit archs. on 32 bit archs the function in question uses a 32 bit type (unsigned long) instead of u64 and therefore the trunction issue mentioned in the thread can very well happen. i'm wondering, could you run a vanilla kernel just for testing and reproduce this issue there as well?
well I try to never use a vanilla kernel but I will do that if asked by the pax team ! those last 10 years i ve only built hardened sources with grsec/pax, should I use gentoo-sources or vanilla-sources to build this unsecure kernel ?
vanilla sources would be better as kernel developers don't like to deal with bugreports for patched trees.
ok, now running vmlinuz-3.18.5ww7_vanilla1_debug , hoping i wont be rooted by one more vanilla 0day for not running a secure grsec/pax kernel ;(
ok , it crashed again today with 3.18.5 vanilla , full oops on : http://pastebin.com/raw.php?i=sfvXTAEZ
thanks, so it's as i expected it, a vanilla bug not fixed on 32 bit archs. as the (un)lucky finder, you'll have the honours of reporting it to lkml and as soon as they have a fix, we'll take it into grsec (i could go ahead and blindly change all affected variables to u64 but i'd rather have the people familiar with this code produce a proper fix). also please CC me on the lkml submission.
(In reply to William Waisse from comment #24) > ok , it crashed again today with 3.18.5 vanilla , full oops on : > > http://pastebin.com/raw.php?i=sfvXTAEZ posted on https://lkml.org/lkml/2015/2/14/64 (In reply to PaX Team from comment #25) > thanks, so it's as i expected it, a vanilla bug not fixed on 32 bit archs. > as the (un)lucky finder, you'll have the honours of reporting it to lkml and > as soon as they have a fix, we'll take it into grsec (i could go ahead and > blindly change all affected variables to u64 but i'd rather have the people > familiar with this code produce a proper fix). also please CC me on the lkml > submission. posted on https://lkml.org/lkml/2015/2/14/64
ok, waiting for lthe lkml to answer, I just tried to patch mm/page-writeback.c myself to avoid this division by zero, kernel booting . . . lets pray
now getting the same problem on another server , also 32 bits : http://pastebin.com/Rvid0BF8 that happened after i did full pdates on this server, so the kernel bug seems to be triggered by new versions of userspace programs, probably related to writing on ext4 filesystem while using a 32 bits kernel. no more crashes on the server i manually "dirty patched" to avoid divide by zero, so I ll just do the same thing on this other crashing server just in case someone needs to avoid those crashes, here is my diff of mm/ page-writeback.c ( yes I agree this is probably a shameful dirty workaround, but i m not exactly a kernel developper and at least I have no more crashes since I patched that ) diff page-writeback.c page-writeback.c.save 581,584d580 < unsigned int divisor; < < divisor = limit - setpoint; < if (divisor < 1 ) divisor=1; 587,588c583 < divisor); < // limit - setpoint + 1); --- > limit - setpoint + 1); 686d680 < unsigned long divisor; 827a822 > 829,830d823 < divisor=x_intercept - bdi_setpoint +1; < if ( divisor < 1 ) divisor=1; 832c825 < divisor); --- > x_intercept - bdi_setpoint + 1);
@pax_team since the linux kernel developpers dont seem to be very interested in this divide by zero oops . . . would you mind patching that ? for now its ok for me with my basic dirty custom patch workaround checking for divisor=0 , no more crashes. but manually patching the kernel sources everytime i ll have to upgrade is . . .meh ;( any ideas why I just had no answers on https://lkml.org/lkml/2015/2/14/64 even after reproducing on a vamilla kernel and providing full debug stack ? any ideas why this was fixed on 64 bit kernel code but not for 32 bit ? I have to say i m a little bit stunned by the complete silence of the lkmk about this kernel oops . . .
(In reply to William Waisse from comment #29) > @pax_team since the linux kernel developpers dont seem to be very interested > in this divide by zero oops . . . would you mind patching that ? the problem is that i don't know this code and what the correct patch would look like... > any ideas why I just had no answers on https://lkml.org/lkml/2015/2/14/64 > even after reproducing on a vamilla kernel and providing full debug stack ? good question, maybe try to resend it every now and then and perhaps CC more people. > any ideas why this was fixed on 64 bit kernel code but not for 32 bit ? i guess nobody working on the fix at the time thought of 32 bit archs.
we've dropped support for the 3.x series of hardened-sources. is this divide by zero problem in the 4.x series? otherwise, i'll close this bug obsolete.
(In reply to Anthony Basile from comment #31) > we've dropped support for the 3.x series of hardened-sources. is this > divide by zero problem in the 4.x series? otherwise, i'll close this bug > obsolete. yes, afaik this code survives until today and noone seems to have fixed it properly...
(In reply to Anthony Basile from comment #31) > we've dropped support for the 3.x series of hardened-sources. is this > divide by zero problem in the 4.x series? otherwise, i'll close this bug > obsolete. most probably yes since the code didnt change and no one answered to me on the LKML . . . seems 32 bit kernel is no more supported. since I reported this bug i just used a patched kernel with my small dirty ack i posted in the previous comment ( if (divisor <1) divisor=1 ) the server was no more crashing, just pretty slow on disk accesses. I ll try a non patched 4.x version soon to tell you if the crash/oops comes back.
ok I ve been running the 4.1.7-hardened kernel for a few weeks now, without my dirty patch , and i had no division by zero oops. most probably the divide by zero was triggered somewhere els in a hardware or filesystem or hardware raid driver, and the bug is ( or seem until now ) no more here, so you can probably close the bug.