5th January 2020

slurmctld showing nodes as 'drain'

Recently one node was changed from octacore to hexacore. Since that time sinfo showed this node in STATE drain. Stopping and restarting slurmctld did not resolve the issue. Log file /var/log/slurm-llnl/slurmctld.log showed

[2019-12-31T23:57:52.588] error: Node X appears to have a different slurm.conf than the slurmctld.  This could cause issues with communication and functionality.  Please review both files and make sure they are the same.  If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf.

This hinted that some caching is the culprit. Entry

StateSaveLocation=/var/lib/slurm-llnl/slurmctld

shows where slurmctld stores state. Removing the files in that directory and restarting slurmctld solves the problem.




Categories: Linux
Tags:
Author: Elmar Klausmeier