After starting a job I see directory permissions are wrong. Jobs are started on cluster nodes by pbs_mom, which runs as root, the job processes run under the id of the user who submitted the job. I don't know why user privileges are used to access the log directories ... $ tracejob 430 /var/spool/PBS//server_priv/accounting/20070425: Permission denied /var/spool/PBS//mom_logs/20070425: No such file or directory Job: 430.foo.bar 04/25/2007 21:56:42 S Job Queued at request of me@foo.bar, owner = me@foor.bar, job name = STDIN, queue = small 04/25/2007 21:56:42 S Job Modified at request of Scheduler@foo.bar 04/25/2007 21:56:42 L Job Run 04/25/2007 21:56:42 S Job Run at request of Scheduler@foo.bar Making the dirs world readable/writable makes the messages disappear but still no /var/spool/PBS//server_priv/accounting/20070425 is created. So, I need some advice.
Heh, I am puzzled. The directories exist on the server running pbs_mom and do contain the log. But I do not understand why running the tracejob on a cluster node/client looks for the directory on a local system. :(
This a non-bug then?
Yeah this is a non-bug. As shown in the help info for tracejob, it needs to know the path to PBS_SERVER_HOME, which implies that it must be run on the server for it to work properly.
Closing as invalid. If this is wrong, please re-open.
I think the issue for me was/is that because I run: /usr/sbin/pbs_server -d /var/spool/torque -L /var/log/pbs_server.log and therefore the /var/spool/torque/server_logs/ contains split logs day by day. And that '-L /var/log/pbs_server.log' comes from /etc/conf.d/torque.