Welcome To The War & Tactics Forum => About The WaT Forum: Rules, Updates, News => Topic started by: TechAdmin on 24 November 2011, 22:15:35

Title: WaT Mishap NOV 24, 2011
Post by: TechAdmin on 24 November 2011, 22:15:35
The WaT forum was offline and not reachable from the outside for a large portion of today.

I got wind of the problem at 2025 forum time, double checked and found an email from Koen from 1853 that said WaT was down.

I informed our host (http://hostsuar.com (http://hostsuar.com)) by "urgent" ticket at 2038: At 2043 they had our forum fixed and the problem solved. Now, *thats* what I call service! Show me any other host who is equally responsive, personally have not met any, hats off and a big thank you to Carlos from the hostsuar team.  :champ: iconclap smallclap hatsoff

What had happened was that the main starting file (settings.php) of our forum software for some reason had gone blank, empty. Hostsuar overwrote the empty file with one of the last backup, this solved the problem..

What is not clear is how it could occur that the file went blank at the first place, as we havent modified or worked on forum files for at least a week. We are investigating to hopĆ¼efully prevent this from happening again.

I am sorry for any inconvinience caused and that I could not react earlier, hope the forum was not offline too long.


Title: Re: WaT Mishap NOV 24, 2011
Post by: TechAdmin on 25 November 2011, 07:52:42
As the forum went offline again tonight 0201, NOV 25 I have gone investgating deeper and found the cause for the emptying out of the Settings.php file thanks to people at Catskill Technologies (http://www.catskilltech.com/freeSW/SMF/faqs/):

Background: The Settings.php file for some odd reason is used to log the last database error time stamp (why this is so only the developers of SMF will know, but its really bad programming).

The design of SMF's mechanism for logging database errors is really bad. What happens is that the entire Settings.php file is completely rewritten just to update the timestamp of the last database error, which is assigned to PHP variable $db_last_error:

- The original file's settings are read in, then the file is emptied/truncated (this is supposedly done to get around a glitch in some ancient version of a Unix-based server). Finally, the Settings.php file is written out anew, with all the original settings, except that $db_last_error has been updated with the current timestamp.

The problem with this approach is that there is a window in time where the original Settings.php file has been emptied, but the file not yet rewritten.

This sets up a race condition, where if another user has also encountered a database error, their error code will read in the empty Settings.php file, and try to process that! What are the odds of this happening? Apparently, quite good. If one user encounters a database error, how likely is it that the problem will resolve itself before the next user attempts a database operation, and also gets an error? Not so good.

The symptoms of an emptied out Settings.php file vary, including error messages that "$sourcedir" is missing, as well as errors opening files such as /QueryString.php (most of the path, supplied by $sourcedir, is missing). The different ways that this error can manifest itself is confusing even to experienced support team members, is undocumented, and thoroughly crashes a forum as now happened to us twice in one day. It is a catastrophe for its victims, but the developer response is to tell them to find a better host, one who never has database errors (an arrogance of the developers that makes me cringe: If something is buggy fix the bloody bug and dont tell your bug victims to look for a workaround).

What's the solution?

- One simple solution is to make the Settings.php file "read only" (444). On a Linux server, this involves setting its permissions to 444 or lower - the idea is that the application can't write to the file and hence not empty it out accidentally. The drawback to this is that the database error timestamp can't be logged, and the user may receive an error log entry reporting that the file was unwritable. We will not adopt this solution for the mentioned reasons.

- A second solution would be to fix the code in the Settings.php file update function that appears to be a check for an empty Settings.php file, and to either forget about updating the file, or to wait until it can read in a good version. Evidently that code is broken, but re-writing the whole database timestamp code for the smf developers is out of my technical bilities scope.

- Our solution provided by the programmers of Catskill Technologies (http://www.catskilltech.com/freeSW/SMF/fixes1.1/index.html) is to separate out the database timestamp line into its own file. Only that file, with its single line of code $db_last_error = 1234567890; gets rewritten. If this separate file gets corrupted (emptied out) by the race condition, it's only a minor problem: We will implement a check in Settings.php to see if $db_last_error was defined, after it was supposedly included into the file. If not, have it assume that it was empty and rewrite it with a 0 value. That may not be the optimal solution, as another user's SMF process may be busy writing out the revised file (with a non-zero value), but at least it will prevent the forum crash problem.

I wont bother you with the detailled code changes required, I am writing this here so you trusting WaT users can see we Tech people are working, even if our work is usually invivisble and we keep in the background :)  For techwise users or administrators in other smf forums with the same problem this text here might be a good starting point to let the smf developers know that their approach to customer servcie is suboptimal as a solution for the bug exists and they just would have to move their ass a bit.