Torque/Maui jobs failing suddenly and without output? Check disk space
07 Jun 2012Quick note: I just tracked down a problem with our Rocks cluster
(which uses Torque and Maui) where suddenly submitted jobs were
nearly always failing instantly and without any output -- even a
simple echo hello world
failed with zero output. Turned out one of
the nodes had filled up / (which, in a default Rocks install, includes
/opt/torque and /tmp) completely with the output from a month-old job
that ran amok. This node happened to be allocated to most (but not
all...) jobs, and so caused a lot of disruption.
I don't know how best to monitor this...
Add a comment:
Name and email required; email is not displayed.
Related Posts
QRP weekend 08 Oct 2018
Open Source Cubesat Workshop 2018 03 Oct 2018
mpd crash? try removing files in /var/lib/mpd/ 11 Aug 2018