Title: JUL 19 - Forum Down TA AAR (Login Denied To All) and Lessons Learned
Post by: TechAdmin on 20 July 2009, 16:49:44
(copied this from the tech room to make accesible for external viewing when I call in help)

This is the TA AAR on the events that disrupted the forum today and that up to now do not yet have an explanation or an established cause, nor are they fully understood.

Part 1: Chain of Events

On JUL 19, 2009 we for the first time had serious problems that were out of my tech capabilities:

As today and upcoming week I won´t have time I had worked rather frantically the last days to get us to the desired (functional forum endstate and reached that goal on Friday night GMT, wherupon I proceed to publish us to Google on Saturday, JUL 18.

When I got up sunday morning at 0030 GMT (0230 Local) due to age related bed flight syndrom I

- logged on as TechAdmin
- created an XML sitemap using the sitemap link at the bottom of the pages
- submitted it too Google and checked Googles sites entries about us: I found out that the pages they displayed had the right titles, but spam gibberish in their content summary (we still had a lot of infected files from the KB virus attack which I suspected to be the source).

Next step, as it was Sunday and I had time on my hand:

- I started (in cPanel) manually cleaning files of the first line malicious code, managed til 0430 GMT to have all 600+ files we functionally need (except chat, I leave those 300 to Koen :) ) cleaned manually.

Job done, wanted to log out. Did not work: Came up with a blank page and a seesionID URL (like http://www.warandtactics.com/smf/index.php?action=logout;sesc=7cd3f45956a7a080609f5a6247406a70 (http://www.warandtactics.com/smf/index.php?action=logout;sesc=7cd3f45956a7a080609f5a6247406a70)), reloading the page showed me still logged in (fully functional).

Cleared cache, same story.

Tried other forum features, all worked smoothely (but could not get back to admin section to write a notification because of the same problem: After one hour you need to specifically log in to Admin section again and my password entry was ignored), it was just that I could not log out.

Tried to log in as Rattler from other browser, same problem, no login (but, of cause, other URL: http://www.warandtactics.com/smf/index.php?PHPSESSID=3b4829553bc1ed249ab94fae1084a7a9&action=login2 (http://www.warandtactics.com/smf/index.php?PHPSESSID=3b4829553bc1ed249ab94fae1084a7a9&action=login2)).

Next decided to force my logout as admin in two steps:

- First killed the session cookie (sessionID), the resulting logout try gave me an error message: "Session ID could not be verified" (this means that no new cookie was transmitted to the browser, interesting find and physically the primary source of the prob)

- 2nd killed the normal SMF80 cookie, logout worked.

Part 2: Measures taken

- went to sleep (important!) as I anticipated that I would need a clear head to try and remember what *exactly* I had done in those 4 hours, and as I was exhausted and close to panic

- When up, sat down and wrote a memory protocol of what steps I had taken in the 4 hours to extablish whether I had touched any files for other reasons than virus cleaning. Made a list of file groups that I remembered Ihad cleaned. I could definitely establish that I had *not* touched ./Sources/login php (which is the routine that handles the login together with login-template and the theme template), as I had previously cleaned all Source files up to  the lettter "M".

- ran a quick scan on (the few: 20) files of this list that could have to do with the problem, all looked ok at first glance

- Informed Forum Owner and Co-Admin via email about the problem and explained that I was out of my depth and probably needed external help (forgot to ask him to send an email to all members explaining the situation). I layed out the worst case scenario which would have meant a clean re-install and about a month´s work to reach same functionality as now (but w/o any data lost we already have generated up to now)

- Koen confirmed the problem with IE on Vista and XP, he could not log on either (but we both could log on into the chatbox on TinyPortal).

- Established various possible scenarios and started researching in the free SMF support forae, found similar but not the same problem

Then my time ran out for today, on the list were in this order:

- file a support request explaining the problem in detail (as I am doing now)
- go over all 600 edited files to see whether I accidentally had deleted more than the virus code (one weeks work)

Part 3: Miraculous Solution of the Problem

Just before going to bed (and on the hunch that such things are possible) decided to look at 1200 GMT at the forum again, found I could log in and access Admin area without problem (I need to reload every page to actually see it, but the URLs are ok, so maybe it is just a display prob that I can solve).

I frankly have no clue why we are suddenly able to log on again (and error logging was not enabled during the events), but I guess - and it is a wild guess - (as I had not done anything yet to resolve the issue) that the problem is maybe on the host side (server) and not related to my software handling. Bad thing is that the host never informed about this (on my Spanish host I would have accessed hosts blog and found why my stuff didnt work if he was the cause).

Put up the notification, now writing the AAR.

On the to-do list:

- backup db again, also all files (now, during session)

- run maintenance cycle

- try to log out, and go to sleep. More tomorrow.

Part 4: Lessons Learned

- Admins, when anticipating that they work longer than one hour, disable Admin Security temporarily til end of job, in this case it would have allowed me at least to put up a notification to guests and members of the problem

- When doing *any* admin work, enable error logging (was disabled, so now that we can log on again I have no error log entries to see what was happening)

- If we ever go and make a serious job out of this (or even money) we need to subscribe (50$ the year) to the SMF express support where the guys actually handle your files if needed: "Charter Membership". THe forae are great (and great and helpful people), but in the paid version they actually do installs, talk to your host, etc. when you as TA are out of your depth.

- A host that does not inform about problems on his servers or that has no support via ticket is not suitable for us. If I had had the possiblity to communicate via email with the host we couldhave cleared much earlier where the problem lay

- *Always* back up the database (incl table structure) when entering admin section, whatever other pressing stuff you might have to do: If everything had gone haywire permanently and a new clean install would have been necessary, we would not have lost a lot: The last back up of db was just 6 hours old, meaning 2 posts lost and all members and profiles as well as the forum setup would have been restorable.


Title: Re: JUL 19 - Forum Down TA AAR (Login Denied For All) and Lessons Learned
Post by: Koen on 21 July 2009, 11:45:14
copy from my other post:

latest report from me but forgive me that maybe I'm not so easy to read as our TechAdmin...

*when you try to log in you'll be directed to a blank page with an URL like this one:
http://www.warandtactics.com/smf/index.php?PHPSESSID=aa424a97b48559a2daa39c9ac1f17c20& (http://www.warandtactics.com/smf/index.php?PHPSESSID=aa424a97b48559a2daa39c9ac1f17c20&)action=login2

delete the action=login2 at the end and hit the refresh button of your browser, this should log you in!

The same procedure works for Internet Explorer and Firefox !

Please report back

Title: Re: JUL 19 - Forum Down TA AAR (Login Denied To All) and Lessons Learned
Post by: TechAdmin on 22 July 2009, 02:31:47
All FIXED (who wants to see what had happened: http://www.simplemachines.org/community/index.php?topic=325760.msg2170286#msg2170286 (http://www.simplemachines.org/community/index.php?topic=325760.msg2170286#msg2170286)), Forum back to normal...


Title: Re: JUL 19 - Forum Down TA AAR (Login Denied To All) and Lessons Learned
Post by: Tanker on 22 July 2009, 04:59:21
Time for some well earned sleep eh TA? iconclap

Title: Re: JUL 19 - Forum Down TA AAR (Login Denied To All) and Lessons Learned
Post by: TechAdmin on 22 July 2009, 12:22:47
...indeed, and it was short (I start work at 0600J and went to bed at 0245J)...


Title: Re: JUL 19 - Forum Down TA AAR (Login Denied To All) and Lessons Learned
Post by: Tanker on 23 July 2009, 06:15:12