From linux-cluster-bounces@redhat.com Tue Nov 13 18:04:15 2007 Return-Path: X-Original-To: jacob@jjoseph.org Delivered-To: jacob@jjoseph.org Received: from mx4.andrew.cmu.edu (MX4.andrew.cmu.edu [128.2.10.114]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.jjoseph.org (Postfix) with ESMTP id 1DB002D7207 for ; Tue, 13 Nov 2007 17:53:37 -0500 (EST) Received: from hormel.redhat.com (hormel.redhat.com [209.132.177.30]) by mx4.andrew.cmu.edu (8.13.8/8.13.8) with ESMTP id lADMrVGX016870 for ; Tue, 13 Nov 2007 17:53:32 -0500 Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com [10.8.4.110]) by hormel.redhat.com (Postfix) with ESMTP id ADA4B738F7; Tue, 13 Nov 2007 17:53:30 -0500 (EST) Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by listman.util.phx.redhat.com (8.13.1/8.13.1) with ESMTP id lADMrSrr024958 for ; Tue, 13 Nov 2007 17:53:28 -0500 Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id lADMrREZ008073 for ; Tue, 13 Nov 2007 17:53:27 -0500 Received: from EPEXCH1.qlogic.org (eppat.qlogic.com [198.186.5.11]) by mx3.redhat.com (8.13.1/8.13.1) with ESMTP id lADMrQvI011932 for ; Tue, 13 Nov 2007 17:53:26 -0500 Received: from 10.32.4.34 ([10.32.4.34]) by EPEXCH1.qlogic.org ([10.20.33.121]) via Exchange Front-End Server qlm.qlogic.com ([192.168.215.10]) with Microsoft Exchange Server HTTP-DAV ; Tue, 13 Nov 2007 22:52:00 +0000 Received: from cbarry-d810 by qlm.qlogic.com; 13 Nov 2007 17:53:21 -0500 From: Christopher Barry To: "Linux-Cluster@Redhat.Com" In-Reply-To: <1194984842.5187.28.camel@localhost> References: <1194984339.5187.24.camel@localhost> <1194984842.5187.28.camel@localhost> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Tue, 13 Nov 2007 17:53:20 -0500 Message-Id: <1194994401.5187.47.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.6.1 X-RedHat-Spam-Score: -0.411 X-Scanned-By: MIMEDefang 2.60 on 128.2.10.114 X-Scanned-By: MIMEDefang 2.60 on 128.2.10.114 X-Scanned-By: MIMEDefang 2.58 on 172.16.48.32 X-loop: linux-cluster@redhat.com Subject: [Linux-cluster] Re: nanny segfault problem X-BeenThere: linux-cluster@redhat.com X-Mailman-Version: 2.1.5 Precedence: junk Reply-To: linux clustering List-Id: linux clustering List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-cluster-bounces@redhat.com Errors-To: linux-cluster-bounces@redhat.com X-SpamAssassin-Clean: (SPF_PASS,UNPARSEABLE_RELAY) X-PMX-Version: 5.3.3.310218, Antispam-Engine: 2.5.2.311128, Antispam-Data: 2007.11.13.143604 X-Spam-Clean: 7% (__CP_URI_IN_BODY 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __HAS_MSGID 0, __HAS_X_MAILER 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __SANE_MSGID 0, __pbl.spamhaus.org_TIMEOUT , __sbl.spamhaus.org_TIMEOUT ) X-Spam-Score: 7% On Tue, 2007-11-13 at 15:14 -0500, Christopher Barry wrote: > script got scraped by my gateway - attached here as a textfile > > > On Tue, 2007-11-13 at 15:05 -0500, Christopher Barry wrote: > > Greetings All, > > > > running RHEL4U5 > > > > I have a bunch of services on my cluster w/ access via redundant > > directors. > > > > I've created a generic service checking script, which I'm specifying in > > lvs.cf's 'send_program' config parameter. > > > > script is attached to this post. see that for how it works with the > > symlinks described below. > > > > I create symlinks to the script for every service I want to check, with > > their name containing the port to hit, as in: > > /sbin/lvs-.sh > > > > so the symlink name to check ssh availability, for instance, is: > > /sbin/lvs-22.sh > > > > The script works fine, and returns the first contiguous block of > > [[:alnum:]] text data from the connection attempt for use with the > > expect line of lvs.cf. > > > > > > The problem is, when nanny is spawned by pulse, all of the nanny > > processes segfault. > > > > > Nov 13 14:40:44 kop-sds-dir-01 lvs[17740]: create_monitor for ssh_access/kop-sds-01 running as pid 17749 > > > Nov 13 14:40:44 kop-sds-dir-01 nanny[17749]: making 10.32.12.11:22 available > > > Nov 13 14:40:44 kop-sds-dir-01 kernel: nanny[17749]: segfault at 000000000000006c rip 000000335e570810 rsp 0000007fbfffe978 error 4 > > > > this occurs almost instantly for every nanny process. > > > > Can anyone venture a guess as to what is happening? > > > > see my lvs.cf here: > > http://nanny-error.pastebin.com/m592f7911 > > > > All, More interesting developments: If I start pulse with: # pulse -v --nodaemon everything (kinda) works. # pulse -v does not work work at all, however. Something is different between daemon mode and not, beyond apparently backgrounding it. I was thinking this may be a permissions issue, but I'd already changed the mode of my script to 4755. -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster