Almost all night
I am very lucky that my job normally does not have a lot of after hours work such as I was used to when I worked in IT up north. Our environment is normally very, very stable and any fires that I have to put out are normally small in nature. Well yesterday I had an all out inferno break out.
As I have mentioned here, I am working through the process of migrating our office from an on site Exchange server to Office 365. I have had various road bumps along the way but have been able to work through them. Well yesterday I was at point where I was trying to convert our user accounts to “mail enabled” accounts, which allows them to connect to both our own site Exchange server and the Office 365 server at the same time during the transition. I followed the steps on the Office 365 site, exporting some info from Office 365 into a local csv file and then feeding that csv file to a powershell script on my Exchange server.
So I log onto my Exchange server and open an Exchange command prompt to run the script. I fire it off and see a LOT of red text scrolling up the screen indicating certain commands were getting an error. I didn’t think a lot about it, sometimes scripts will do that if it is trying to modify something that isn’t needed. After the script ran I got sidetracked into something else when my phone rang fro the first time. It was a user saying they couldn’t get into their email, weird.
So I hopped back onto my Exchange server since I already had it open to take a look. When I refreshed the view, ALL of our user mailboxes were gone except four or five of them. I swallowed hard, knowing somehow that script just trashed my Exchange environment. My head raced as I quickly tried to come up with a plan of action. It seemed to me that the script screwed with the Exchange active directory environment so I thought that if I restore my main domain controller with the prior night’s back up I would be able to erase the problem.
This set off a totally separate disaster. In order to do an authoritative restore of AD you need to stop DS on other domain controllers. Once I did that it crippled the network as there was no longer any authentication going on, breaking a number of apps. I decided to bale on that idea as the restore was going slow. I cancelled the DC restore and rebooted it. My heart sank when the server did not come up.
Instead of booting the DC would just come to a screen saying the OS failed to start with a 0xc000007b error. This set off hours of frustrating and time consuming trouble shooting which involved multiple restore attempts, each of which took at least an hour, only to have it fail. I was on the phone working with our backup appliance vendor trying to determine why it was not working. I knew I was in trouble when he basically said he had exhausted all of his ideas. By the end of the normal work day I was still heavily in the weeds. The network was limping along since I brought the other DC’s online but there was absolutely no email access for anybody for the majority of the day.
So as I was working on the DC problem, in between failed restore attempts I was looking more at the Exchange server and it’s missing mailboxes. I realized that the mailboxes were in the “disconnected state” which at first was sort of a relief since I could reconnect them manually. However I did not realize at the time that not all of the missing mailboxes were there, I later realized almost 50 of them were still missing.
So I finally gave up on getting the domain controller back online. I seized the FMSO roles and forced them to one of the other DC’s I have on the network. I then blew away the original DC, reinstalled Windows 2012 and readded it to the domain as a new DC. I then turned my attention fully on the email situation after grabbing a dinner that consisted of a large coffee and bagel from the Dunkin Donuts across the street.
So I figured that if I did a restore of the mailboxes from the prior night that should get me up and running. I never had to restore Exchange on my Unitrends appliance. My first couple attempts failed so I got on an online chat with a tech around 9PM. He remoted into my machine and helped me get the restore going which of course involved a lot of waiting as it completes. When the restore completed I remounted the mailbox database and refreshed my view, hoping to see all of my mailboxes intact.
You can imagine my feeling when I saw less mailboxes, dramatically less. Instead of 56 mailboxes in a disconnected state, I had 7 now, meaning almost 100 mailboxes were gone. The Unitrends tech again had no more answers, the back up restored correctly, the issue had to be something inside Exchange. My brain was pretty much fried at this point. The rollercoaster of fix and fail attempts was frustrating to say the least. However I was in it for the long haul. I broke it so I had to fix it.
So I did some more Google searching and found another Exchange powershell command that refreshes the database view “Clean-MailboxDatabase -Identity <database name>” after running the command and refreshing my view in 2010 Exchange console I now had 105 mailboxes in the disconnected state. I breathed my first sigh of relief in the last 12 hours. I still had a lot of manual clicking and typing to do to reconnect each mailbox to the correct user account, one by one. By the time I finished up it had rolled into Thursday morning. I turned off the lights in my office around 12:15AM.
By the time I got home, showered and into bed it was approaching 1:30. I had to get out the door as early as possible today in case more unexpected problems arose. Luckily my phone has been pretty quiet all morning, meaning my crazy collection of fixes more or less worked. It was one of those true tests of determination. Being able to continue pushing forward through a forest of failure until you discover the path to success is a trait some people simply don’t possess. It is one of the few good traits I have, I don’t give up easily.