422491 – emerging updates to nfs-mounted file systems can cause running programs to segfault

Bug 422491 - emerging updates to nfs-mounted file systems can cause running programs to segfault

Summary: emerging updates to nfs-mounted file systems can cause running programs to se...

Status:	UNCONFIRMED

Alias:	None

Product:	Portage Development
Classification:	Unclassified
Component:	Core (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Portage team

URL:	http://nfs.sourceforge.net/#faq_d9
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-06-20 02:45 UTC by Myk Taylor
Modified:	2012-06-21 02:59 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Myk Taylor 2012-06-20 02:45:15 UTC

I am not an expert on the portage emerge process, but it seems that it updates files by overwriting them. My system is diskless, running entirely over NFS, and I've noticed that after an update, running processes (such as mythbackend) will sometimes randomly segfault. According to the nfs faq (link in URL field, faq item copied here below), files must be installed with 'install -b' to avoid messing up the cache coherency. Could this method of installation be used automatically when portage detects that the target filesystem is network mounted?

From the linked page:

D9. When I update shared executable files on my NFS exports, programs running on my clients all segfault. How come?

A. If you simply copy the new executable or library over an old version, you are violating the NFS cache consistency rules (described here) by changing a file that is being held open on your clients.

Copying over executables creates a window during which an NFS client's cache may hold parts of the old version and parts of the new version, all combined in the same file. The correct way to update executables and shared libraries on your NFS shares is to use the install program with the '-b' option. That renames the version of the executable that is in use, then creates a brand new file to contain the new version of the executable.

Reproducible: Always

Steps to Reproduce:
1. network mount an entire system; start some complex, long-running services (such as mythbackend)
2. emerge updates to libraries used by the service
3. wait for a segfault of the service

NFS volumes are exported from a Netgear ReadyNAS running RAIDiator firmware 4.2.19.

Comment 1 Zac Medico gentoo-dev

2012-06-20 03:29:06 UTC

When installing a package, portage already copies each file to a temporary file and then renames it after the copy is complete. For the relevant code, see /usr/lib/portage/pym/portage/util/movefile.py, where it calls copyfile and then rename.

Comment 2 Myk Taylor 2012-06-21 02:09:21 UTC

The behavior the nfs faq is promoting is subtly different from what movefile does.  From what I can see, movefile basically performs the following actions:

1) copy file to temporary file in dest dir 
2) move temp file to

Comment 3 Myk Taylor 2012-06-21 02:13:28 UTC

The behavior the nfs faq is promoting is subtly different from what movefile does.  From what I can see, movefile basically performs the following actions:

1) copy file to temporary file in dest dir
2) rename temp file to dest filename

The faq is promoting:

1) move existing file to temp name (in same directory, presumably)
2) copy in updated file
3) remove temp file

Combining the two behaviors (both of which are desirable), might look like this:

1) cp src to dest#new
2) mv dest to dest#old
3) mv dest#new dest
4) rm dest#old

Does that make sense?

Comment 4 Zac Medico gentoo-dev

2012-06-21 02:27:25 UTC

(In reply to comment #3)

> Combining the two behaviors (both of which are desirable), might look like
> this:
> 
> 1) cp src to dest#new
> 2) mv dest to dest#old
> 3) mv dest#new dest
> 4) rm dest#old
> 
> Does that make sense?

Well, it's less desirable than the existing behavior, because there's a small period of time between step 2 and step 2 where the destination file is temporarily inaccessible. Imagine if it's something important, like /bin/bash.

Comment 5 Myk Taylor 2012-06-21 02:38:24 UTC

that is true -- I'd like the replacement to be atomic as well, but, at least for nfs, I'd rather have a well-functioning system.  Could the suggested behavior only be turned on for remote filesystems, perhaps?

Comment 6 Myk Taylor 2012-06-21 02:43:43 UTC

or maybe just for nfs -- I'm unsure how other remote file systems work with regards to deleted files.  nfs relies on that '.nfs03243049343434' "silly rename" business (which our current file-overwriting implementation defeats).

Comment 7 Zac Medico gentoo-dev

2012-06-21 02:50:24 UTC

We can add a FEATURES=nfs-nonatomic-merge setting.

I'd be surprised if it helps though, since I would expect the existing movefile behavior would work fine.

Comment 8 Myk Taylor 2012-06-21 02:59:59 UTC

adding a feature would probably be best, since making the update non-atomic does have user-visible effects, as you rightly point out.  I'm pretty sure the current implementation is not sufficient, though, since it does not allow nfs to create the "silly renamed" version of opened files, as detailed at http://nfs.sourceforge.net/#faq_d2