Friday, February 29, 2008

A bug-fix.

I was writing a small php script to recursively delete all files and directories in a particular location that are more than a few hours old. I thought it wouldn't take more than 30min to do the whole thing. Additionally, I thought I'd use the Standard PHP Library(SPL) and the RecursiveDirectoryIterator class provided with it to implement directory scanning instead of the normal opendir(), readdir(), closedir() approach.

After writing the script, I tested it by making it spit out stuff saying Deleting file.... instead of actually making it delete any files/directories. The logic te script was used was that it checked the file/directory ctime which is the Inode Change time(not the file creation time as I earlier thought). Everything seemed to be working just fine when I uncommented the unlink() and rmdir() functions to test the real thing. Now, the script started saying different things, and wasn't even considering some directories for deletion. I was very surprised at this behaviour. I initially thought it was something to do with open directory handles or something, so went back to using the opendir(), readdir(), closedir() approach. However, even that approach yielded the same results. It's desperate times like these when you lose reason, and start to tinker with things which you know are not at fault in the hope of fixing the error. I inserted lots of debug output to be able to pin-point the error, but php kept saying that rmdir( was trying to delete a non-empty directory.

I'll give the directory structure below:

`-- aaa
`-- zzz
`-- hello.txt

2 directories, 1 file

The output indicated that the file hello.txt was being deleted, but the directory zzz wasn't even being considered for deletion. Further, the script was trying to delete the non-empty directory aaa. I couldn't understand why this was happening. It was after some more brainstorming that I realized that the operation unlink() on the file hello.txt was changing the ctime for the directory zzz which meant that the directory zzz was no longer being considered for deletion. However, the ctime for aaa was staying unchanged. This was happening because I was calling filectime() while scanning the directory instead of pre-computing all the ctimes for all the files.

So, I now decided to use the scandir() function to get the directory contents at one go, and get their ctimes before doing any further processing on that directory such as recursive scanning or deletion of files/directories within that directory. This fixed the error, and the script now started exhibiting the expected behaviour. Even though this may look silly, there were many things I learnt from it; the most important being to make sure I know the side-effects of any action that I may perform.