As the forum went offline again tonight 0201, NOV 25 I have gone investgating deeper and found the cause for the emptying out of the Settings.php
file thanks to people at Catskill Technologies
Background: The Settings.php
file for some odd reason is used to log the last database error time stamp (why this is so only the developers of SMF will know, but its really bad programming).
The design of SMF's mechanism for logging database errors is really bad. What happens is that the entire Settings.php
file is completely rewritten just to update the timestamp of the last database error, which is assigned to PHP variable $db_last_error
- The original file's settings are read in, then the file is emptied/truncated (this is supposedly done to get around a glitch in some ancient version of a Unix-based server). Finally, the Settings.php
file is written out anew, with all the original settings, except that $db_last_error
has been updated with the current timestamp.The problem with this approach is
that there is a window in time where the original Settings.php
file has been emptied, but the file not yet rewritten.
This sets up a race condition, where if another user has also encountered a database error, their error code will read in the empty Settings.php
file, and try to process that! What are the odds of this happening? Apparently, quite good. If one user encounters a database error, how likely is it that the problem will resolve itself before the next user attempts a database operation, and also gets an error? Not so good.
The symptoms of an emptied out Settings.php
file vary, including error messages that "$sourcedir" is missing, as well as errors opening files such as /QueryString.php
(most of the path, supplied by $sourcedir
, is missing). The different ways that this error can manifest itself is confusing even to experienced support team members, is undocumented, and thoroughly crashes a forum as now happened to us twice in one day. It is a catastrophe for its victims, but the developer response is to tell them to find a better host, one who never has database errors (an arrogance of the developers that makes me cringe: If something is buggy fix the bloody bug and dont tell your bug victims to look for a workaround).What's the solution?
- One simple solution is to make the Settings.php
file "read only" (444). On a Linux server, this involves setting its permissions to 444 or lower - the idea is that the application can't write to the file and hence not empty it out accidentally. The drawback to this is that the database error timestamp can't be logged, and the user may receive an error log entry reporting that the file was unwritable. We will not adopt this solution for the mentioned reasons.
- A second solution would be to fix the code in the Settings.php
file update function that appears to be a check for an empty Settings.php
file, and to either forget about updating the file, or to wait until it can read in a good version. Evidently that code is broken, but re-writing the whole database timestamp code for the smf developers is out of my technical bilities scope.
- Our solution
provided by the programmers of Catskill Technologies
is to separate out the database timestamp line into its own file. Only that file, with its single line of code $db_last_error = 1234567890;
gets rewritten. If this separate file gets corrupted (emptied out) by the race condition, it's only a minor problem: We will implement a check in Settings.php
to see if $db_last_error
was defined, after it was supposedly included into the file. If not, have it assume that it was empty and rewrite it with a 0 value. That may not be the optimal solution, as another user's SMF process may be busy writing out the revised file (with a non-zero value), but at least it will prevent the forum crash problem.
I wont bother you with the detailled code changes required, I am writing this here so you trusting WaT users can see we Tech people are working, even if our work is usually invivisble and we keep in the background
For techwise users or administrators in other smf forums with the same problem this text here might be a good starting point to let the smf developers know that their approach to customer servcie is suboptimal as a solution for the bug exists and they just would have to move their ass a bit.