Creating Reliable Unattended Scripts
Most software developers are good at creating applications that behave well when failures occur. They know how to handle unexpected exceptions, provide meaningful error messages and how to make the program fail gracefully so that there’s minimal data loss.
When it comes to creating scripts that run unattended, it’s suddenly more difficult to come up with a solid process. All too often I’ve seen cases where an overnight backup would fail for days or weeks in a row with no one noticing. Other times a script would lock up for no apparent reason. Troubleshooting unattended processes are inherently difficult because there are never any witnesses.
No one wants to be the person who must explain why you don’t have current backups because your process failed, and you didn’t know it (Trust me on that!). So here are some tips and tricks that will help you sleep comfortably while all your unattended scripts are doing their thing.
Standard Rules Still Apply
In unattended processes, you still need to handle all exceptions, expected and unexpected, and provide appropriate error messages. This usually means using try-catch constructs. The difference is that error messages can’t just be displayed on the screen. Execution information and warnings need to be written to a log file. Fatal errors should be sent to the log and to email.
Your Email-Only App
You can think of email as the GUI for your unattended app. If there’s something the operator needs to know it needs to be sent in an email. Otherwise, you risk not getting their attention.
Often with an interactive app, the only way you know it worked is because it finished with no error messages. This isn’t good enough for an unattended app. If no error email arrives it may mean the process still failed but the email server is down. Or maybe the network is disconnected. For these reasons, it’s important to send a success email when the process runs to completion. When the email doesn’t arrive, you know you’ve got a problem.
Even then you may not be covered. It’s possible the process finished without error, but the results were not quite what you expected. A good success email will describe exactly what was accomplished:
The purchase order and receipt export completed without error.9 purchase orders were sent for date 2018-10-22 in file PO_Export.txt.
This is a real email notification I received recently, and I noticed the date was incorrect. This alerted me to the fact that the server was using UTC time instead of Eastern Daylight time. Without the email I’m not sure when I would have noticed the problem.
With logging, it’s also important to record successes as well as the failures. If your process ftp’s a file, then it’s worthwhile to log an entry saying “file such-n-such successfully ftp’d”. That way, you may or may not capture an ftp error in the log, but if the success message isn’t there you at least know that the process failed prior to completing the ftp.
Good logging should also anticipate problems. For example, if your process writes a file to a network drive, you’ll want to record “writing file <drive>\folder\file” in the log prior to executing the command. That way, if the file never appears, you’ll at least know your script reached the point where it tried to create the file. You’ll also know exactly what the file should have been called and where it should have appeared – all useful information when troubleshooting.
Every log entry should start with a timestamp. It’s simple, but it can help illuminate difficult problems. Timestamps will tell you when your processes run longer than you expect and can highlight when one process is interfering with another. If you see your process slowing to a crawl every night at 7 pm you may want to check when the nightly virus scan runs. Or the backup. Or maybe that’s when everyone in IT stops to play Halo.
When I setup logging, I usually create a one-line function called printlog that wraps a simple print command and allows me to send it a message. The function adds a timestamp:
print(date('Y-m-d H:i:s: ') . $msg . "\r\n");
By using the print command I’m basically telling it to send all my messages to the STDOUT stream, which goes to the monitor by default. Then, when I set up the batch file that launches the script I tell it to redirect STDOUT to a file – that becomes the log:
php POExport.php >> log.txt
The advantage of this is that my log will capture everything that goes to STDOUT. Overnight processes usually launch other processes, (like encryption or ftp) and those processes often write messages to the screen. Capturing STDOUT will also capture these messages, but that’s still not the whole story.
Most programs don’t send their error messages to STDOUT, they send them to STDERR. By default, STDERR also goes to your screen, so it looks as if everything you want to capture is going to the monitor and that redirecting the output will send it all to your log file. Not the case. You need to redirect STDERR (stream 2) to STDOUT (stream 1) to get everything. Like this:
php POExport.php >> log.txt 2>&1
The method of error tracking I’ve presented here is rudimentary. There are logging libraries available that can provide built-in message classifications (like INFO and WARNING), manage alerts, or that can integrate with log management systems. But it’s the principles that are important. Anticipate what you’ll need for troubleshooting, then capture it all in a log. Use email to call in the cavalry when things go wrong and also to let them know when everything is fine.
Then go home and get a good night’s sleep.