Process Watcher

My VizReader server has crashed 3 times in roughly as many weeks now and I have no idea why, nothing unusual shows up in the logs.


It could be a hardware problem, or a software problem.

If it’s a software problem there should be a way to manage the situation though, pretend we could start a control process that would kill any other process if it got too greedy or out of control and at the same time notify us of what it’s doing?

After not much googling you invariably arrive at ps-watcher if you’re a Linux/Ubuntu user. Another tutorial here.

This is how I start it in order for it to check every minute:

ps-watcher --config /opt/picolisp/ps-watcher.cfg --sleep 60

And this is my ps-watcher.cfg:

[picolisp]
	trigger = elapsed2secs('$etime') > 10*MINS && $pcpu > 90
	occurs  = every
	action  = <<EOT
	 echo "$command used $pcpu% CPU for the last $etime seconds" | /bin/mail root
	 kill -TERM $pid
EOT

[.?]
	trigger = elapsed2secs('$etime') > 10*MINS && $pmem > 25
	occurs  = every
	action  = <<EOT
	 echo "$command used $pmem% CPU for the last $etime seconds" | /bin/mail root
	 kill -TERM $pid
EOT

[picolisp]
	trigger = $count > 80
	occurs  = every
	action  = <<EOT
	 echo "Too many picolisp instances: $count" | /bin/mail root
	 /opt/picolisp/restart.sh
EOT

The first section will kill any PicoLisp process that has been running for 10 minutes and during that same time has been using over 90% of CPU power in a consistent fashion. The reason for that check is that my PL processes are mostly concerned with writing and reading to and from disc and that shouldn’t eat much processing power. If it happens something is probably wrong.

The second section will kill any process that is using more than 25% of RAM (250MB in my case) for more than 10 minutes.

And finally we restart the PL server if we have more than 80 PL processes running at the same time. This is the most probable culprit as we could get a hell of a lot of them waiting for some other PL process to finish writing to disc.


Related Posts

Tags: , ,