My VizReader server has crashed 3 times in roughly as many weeks now and I have no idea why, nothing unusual shows up in the logs.
It could be a hardware problem, or a software problem.
If it’s a software problem there should be a way to manage the situation though, pretend we could start a control process that would kill any other process if it got too greedy or out of control and at the same time notify us of what it’s doing?
This is how I start it in order for it to check every minute:
ps-watcher --config /opt/picolisp/ps-watcher.cfg --sleep 60
And this is my ps-watcher.cfg:
[picolisp] trigger = elapsed2secs('$etime') > 10*MINS && $pcpu > 90 occurs = every action = <<EOT echo "$command used $pcpu% CPU for the last $etime seconds" | /bin/mail root kill -TERM $pid EOT [.?] trigger = elapsed2secs('$etime') > 10*MINS && $pmem > 25 occurs = every action = <<EOT echo "$command used $pmem% CPU for the last $etime seconds" | /bin/mail root kill -TERM $pid EOT [picolisp] trigger = $count > 80 occurs = every action = <<EOT echo "Too many picolisp instances: $count" | /bin/mail root /opt/picolisp/restart.sh EOT
The first section will kill any PicoLisp process that has been running for 10 minutes and during that same time has been using over 90% of CPU power in a consistent fashion. The reason for that check is that my PL processes are mostly concerned with writing and reading to and from disc and that shouldn’t eat much processing power. If it happens something is probably wrong.
The second section will kill any process that is using more than 25% of RAM (250MB in my case) for more than 10 minutes.
And finally we restart the PL server if we have more than 80 PL processes running at the same time. This is the most probable culprit as we could get a hell of a lot of them waiting for some other PL process to finish writing to disc.