267961 – XFS filesystem randomly crash with " xfs_iflush: Bad inode %d magic number"

Bug 267961 - XFS filesystem randomly crash with " xfs_iflush: Bad inode %d magic number"

Summary: XFS filesystem randomly crash with " xfs_iflush: Bad inode %d magic number"

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	AMD64 Linux

Importance:	High normal
Assignee:	Gentoo Linux bug wranglers

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2009-04-30 09:10 UTC by Il'ya
Modified:	2009-05-29 06:25 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Current kernel config (config.gz,16.71 KB, application/octet-stream) 2009-04-30 09:19 UTC, Il'ya	Details
emerge --info output (emerge.info.gz,5.85 KB, application/octet-stream) 2009-04-30 09:19 UTC, Il'ya	Details
copy of /var/log/messages (messages.gz,12.14 KB, application/octet-stream) 2009-04-30 09:20 UTC, Il'ya	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Il'ya 2009-04-30 09:10:29 UTC

Here is extraction from syslog:
'
Apr 30 10:57:24 wrath Filesystem "sdc1": xfs_iflush: Bad inode 1229818808 magic number 0x0, ptr 0xffff8800d601e800
Apr 30 10:57:24 wrath xfs_force_shutdown(sdc1,0x8) called from line 3001 of file fs/xfs/xfs_inode.c.  Return address = 0xffffffff803c61dd
Apr 30 10:57:24 wrath Filesystem "sdc1": Corruption of in-memory data detected.  Shutting down filesystem: sdc1
Apr 30 10:57:24 wrath Please umount the filesystem, and rectify the problem(s)
Apr 30 10:57:54 wrath Filesystem "sdc1": xfs_log_force: error 5 returned.
... (about 20 lines of prevous error message)
Apr 30 11:09:54 wrath Filesystem "sdc1": xfs_log_force: error 5 returned.
'
After this point the mount point (mounted from sdc1) became unavailable.
umount/mount fixes this situation.
Complete overnight memtest+ completes without any errors.

Comment 1 Il'ya 2009-04-30 09:19:30 UTC

Created attachment 189934 [details]
Current kernel config

Comment 2 Il'ya 2009-04-30 09:19:57 UTC

Created attachment 189936 [details]
emerge --info output

Comment 3 Il'ya 2009-04-30 09:20:21 UTC

Created attachment 189938 [details]
copy of /var/log/messages

Comment 4 Il'ya 2009-05-04 14:15:05 UTC

bump

Comment 5 Csaba Tóth 2009-05-05 19:56:38 UTC

I use xfs for a lot of servers, without any error.

Have you tried to check or fix it? with xfs_check and xfs_repair?

Comment 6 Robin Johnson archtester

2009-05-08 21:07:44 UTC

Your on-disk XFS is corrupted. If it's not a system-critical partition, umount and use xfs_check and xfs_repair. If it is system-critical, boot off a livecd and do the same

Comment 7 Il'ya 2009-05-08 21:46:32 UTC

(In reply to comment #6)
> Your on-disk XFS is corrupted. If it's not a system-critical partition, umount
> and use xfs_check and xfs_repair. If it is system-critical, boot off a livecd
> and do the same
> 

In this partition i have non-critical multimedia data, $CCACHE_DIR, $PORTAGE_TMPDIR and public spool. I've already remount this partition several times and reboot machine. There is only one more time identical problem have been occured.

wrath ~ # mount | fgrep sdb
/dev/sdb1 on /mnt/puzo type xfs (rw)
wrath ~ # umount /mnt/puzo/
wrath ~ # time xfs_check /dev/sdb1 

real	0m26.725s
user	0m2.710s
sys	0m0.997s
wrath ~ # xfs_repair -v /dev/sdb1 
Phase 1 - find and verify superblock...
        - block cache size set to 359360 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 6 tail block 6
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Sat May  9 01:42:27 2009

Phase		Start		End		Duration
Phase 1:	05/09 01:42:07	05/09 01:42:07	
Phase 2:	05/09 01:42:07	05/09 01:42:09	2 seconds
Phase 3:	05/09 01:42:09	05/09 01:42:25	16 seconds
Phase 4:	05/09 01:42:25	05/09 01:42:26	1 second
Phase 5:	05/09 01:42:26	05/09 01:42:26	
Phase 6:	05/09 01:42:26	05/09 01:42:27	1 second
Phase 7:	05/09 01:42:27	05/09 01:42:27	

Total run time: 20 seconds
done
wrath ~ #

I think this output tell us, that everything is "Ok" for now. Isn't it?

I'm sorry for my terrible english.

Comment 8 Robin Johnson archtester

2009-05-14 20:52:50 UTC

ok, now lastly mount it again and see if you still get any errors when you try to access whatever file was on that inode.

Comment 9 Il'ya 2009-05-19 10:30:23 UTC

(In reply to comment #8)
> ok, now lastly mount it again and see if you still get any errors when you try
> to access whatever file was on that inode.
> 

I'd got no errors after remount.
This bug is very rare for me and situation described in "Description" field have random conditions.

It is triggered only once during last week.

Comment 10 Robin Johnson archtester

2009-05-29 06:25:00 UTC

If you still get similar, I'd suspect something is just fractionally bad in your hardware.