Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 49971 - fam-2.7.0-r1 makes Courier-imap-3.0.2/Qmail server w/Enhanced IDLE deliver local messages multiple times.
Summary: fam-2.7.0-r1 makes Courier-imap-3.0.2/Qmail server w/Enhanced IDLE deliver lo...
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Server (show other bugs)
Hardware: x86 Linux
: High normal
Assignee: Net-Mail Packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-05-04 07:34 UTC by Ryan Hadley
Modified: 2005-03-10 01:20 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Hadley 2004-05-04 07:34:20 UTC
With fam-2.7.0-r1 my Courier-IMAP/Qmail server started doing weird stuff.

I have my IMAP server set to do Enhanced IDLE.  Courier was compiled with fam support.  If I connect to the server with an IDLE enabled client (the new Thunderbird 0.6) then messages to me get stuck in the queue with this message:

deferral: Temporary_error_on_maildir_delivery._(#4.3.0)/

The funny thing is, it succesfully delivered the message!  So everytime the queue is ran I get the message again.

I upgraded to fam-2.7.0-r1 to try out the dnotify patch.  fam-2.7.0 gives me no problems.  Could it be the dnotify patch that is causing this?

Kernel version: 2.6.5-gentoo-r1

Reproducible: Always
Steps to Reproduce:
Comment 1 foser (RETIRED) gentoo-dev 2004-05-04 11:53:12 UTC
seems unlikely, it just notifies changes and nothing else, with the patch this is only faster. Anyway, you could test easily by up & downgrading fam & checking the results.
Comment 2 Ryan Hadley 2004-05-04 11:59:51 UTC
Unlikely but true.  I have both famd without and with the dnotify patch installed.  When I start famd with the dnotify patch, things break.  When I start famd without the dnotify patch, things work.

I traced the problem down to this code in qmail-local.c (in the maildir_child function): 

 if ((fd = open(fnnewtph, O_RDONLY)) < 0 ||
     fsync(fd) < 0 || close(fd) < 0) goto fail;

For some reason, with famd w/dnotify patch and a client logged in with enhanced idle on courier imap, qmail's call to open here returns less than 0.
Comment 3 Ryan Hadley 2004-05-04 12:08:23 UTC
Oh yeah, maildir is on local mount, not on an NFS mount.
Comment 4 Ryan Hadley 2004-05-04 12:41:53 UTC
One more update.

The open command is setting errno 9:
#define EBADF            9      /* Bad file number */
Comment 5 Ryan Hadley 2004-05-04 13:14:01 UTC
Ha, I need more sleep.  My newborn is wearing me out.

Of course the close(fd) threw errno 9.

The open, however, throws 2:
#define ENOENT           2      /* No such file or directory */
Comment 6 Ryan Hadley 2004-05-04 13:25:42 UTC
rofl.

I figured out what's breaking.

Courier w/famd + dnotify is too good at what it's designed to do.  My client gets the notice of the file and moves it from:

new/1083701882.12737.....

to:

cur/1083701882.12737.....

All before Qmail gets a chance to make sure it can open, in read only mode, the file it just wrote.

Why does Qmail insist on checking to see that the file exists after it writes it?  Can't it just trust the return status of the write?  Seems like even without enhanced idle and famd with dnotify that there'd be a very very small chance that this could happen.
Comment 7 foser (RETIRED) gentoo-dev 2004-05-04 13:49:41 UTC
nice bit of investigation there. I'd say it's a bit of a needless check.

reassigning to the courier-imap maintainers
Comment 8 Ryan Hadley 2004-05-04 13:58:54 UTC
Shouldn't it be assigned to qmail?

The lines of code in question are added via qmail-link-sync.patch.  In fact, that's just about all the qmail-link-sync.patch added to qmail-local.c... so someone must thing the check is needed.
Comment 9 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2004-05-04 15:05:14 UTC
the read after write is to check that the file made it to the destination properly, and guard against a successful write result when it really didn't succeed. this can and does happen. i'm not certain of a solution that can protect both this integrity of qmail as well as allow dnotify+famd to work properly, short of having qmail lock the file as it's writing, then unlock once it's read the contents.

btw i do qmail AND courier-imap, so either way, i'm the right guy for this.
Comment 10 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2004-05-04 15:06:02 UTC
mark as assigned to net-mail so i don't keep getting the notifications twice.
Comment 11 Ryan Hadley 2004-05-04 19:25:07 UTC
I'm not too good with c, but I read a bit on what the fsync does in that qmail-link-sync.patch file:

"fsync copies all in-core parts of a file to disk, and waits until the device reports that all parts are on stable storage."

So that check doesn't seem to be to makes sure that the link succeeded, but to make qmail-local wait until all data is out of core and on disk for that link before continuing.  So, as the name of the patch suggests, it makes qmail-local wait until all data is synced (kind of link calling the "sync" command from the command prompt) for that specific file.

If this is the case, then would it be safe to assume that if the open fails due to "No such file or directory" that the link must have finished and had been fully synced?  Otherwise other programs wouldn't have had a chance to modify it?  Or is that not a true statement?

i.e.:
 if (fd = open(fnnewtph, O_RDONLY))
 {
     if (fsync(fd) < 0 || close(fd) < 0) goto fail;
 }  // we don't care if open failed, the link before hand succeeded, that's good for us.  Let's just make sure that if the file is still there that all data is synced to disk before continuing.

Would fam w/dnotify even send out a notice to Courier before all data is out of core and synced on disk?

Again, I am not a c programmer.  Sorry if my suggestions are way off base.
Comment 12 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2004-05-04 19:44:13 UTC
i'm not certain on that one either.
i'd suggest going and talking to linux-kernel or the dnotify author about the problem. point them to this bug and the logic of qmail.
Comment 13 Ryan Hadley 2004-05-07 06:41:01 UTC
New line of thinking.

The purpose of the patch, which adds this code:

+ if ((fd = open(fnnewtph, O_RDONLY)) < 0 ||
+     fsync(fd) < 0 || close(fd) < 0) goto fail;
+

Is to make sure that the data is out of core and synced to disk, for reliability reasons.  For in case of a power outage or something.

But, Courier is moving the file out of the new/ directory and in to the cur/ directory before it has a chance to sync the data.  So, it's courier's problem now isn't it?  Courier must have some sort of method for moving files from new/ to cur/ and probably deals with syncing the file itself.

The point is, even in a non-famd+dnotify patch situation, we get to this same point.  Courier takes control of the file.  Courier doesn't care if the data in new/ is synced to the disk or still in core.  It moves it to the cur/ directory and does it's own thing with the data.

So the file has been succesfully linked in to the new/ dirctory by Qmail:

  if (link(fntmptph,fnnewtph) == -1) goto fail;

So we know the data is there.  Maybe not out of core and on to disk yet, but it's there enough for other programs to use it.  So if something has already taken over responsibility of the data, why should Qmail return an error?

So I say, make the patch add these 3 lines instead:

+ if ((fd = open(fnnewtph, O_RDONLY) != -1)) {
+     if (fsync(fd) < 0 || close(fd) < 0) goto fail;
+ }

I am using this on my server and it is working wonderfully.
Comment 14 Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2005-01-07 14:15:59 UTC
I've added the patch to qmail-1.03-r16. Could you test it, please?
Comment 15 Steven Brudenell 2005-03-10 01:20:43 UTC
I have a similar setup (courier-imap and qmail), and encountered a similar problem (my bounce message was 'User is over quota', but otherwise identical), and verified that fam support in courier-imap was a trigger for the problem.

I have tested qmail-1.03-r16 in my setup, and it solves this issue. I can now use courier-imap with fam support.