Novell NSS RAID Data Recovery

Novell NSS: Parsing the LEAF

About three years ago I received a RAID that had a Novell OS on it.  The RAID was close to 2 TB and consequently was sporting a very hefty NSS file system.  There were several problems with this recovery, but after several weeks of trial and error, and a little luck the client received their data, and we, in return, received our recovery fee.  The following is a summary of the steps taken for this recovery.  These steps not only cover the RAID recovery, but the file system recovery, and ultimately the clients recovered data.  The steps also offer something else, a method for parsing the on-disk structure of a file system.  In other words, how I systematically found the few data elements I needed within an NSS file record in order to recover the data. 

One last note before I get into the actual nuts and bolts of the recovery.  I am a 30 year ‘C’, Intel 80×86, Motorola 68000, and MOS 6502 assembler coder.  I have been a ‘bit fiddler’ for a long time, and I am used to getting down on the wires. The methods I describe are not for everybody, but I have used them over the years and have found them successful.

As a RAID recovery specialist you will find the following conversation the norm:

ME: “What exactly happened to the array?”

ADMIN: “Well, I am not exactly sure, when I came in this morning it was down.”

ME: “I see, what did you do to get the array back online?”

ADMIN: “Uh, I replaced the amber lit drive, I did a rebuild, and the rebuild froze.”

ME: “Did you make images of the disks before you tried the rebuild?”

ADMIN: “No. Why? Was I supposed to?”

In all the years of data recovery, only one (1) admin had the wherewithal to make images of his disks before they did anything on the RAID.  If you are an admin reading this and your RAID degrades or goes down, follow these steps:

1. If it is not off, shut it off.  If management whines, (they always whine) ask them if they have the $26,000 it is going to cost to recover the array.

2. Don’t turn it back on. See step 1.

3. Find a nice piece of data imaging software (e.g. WinHex) and make images of all the drives that you can.  Some may be damaged, don’t worry about them; just make sure you image the good ones.

4. Don’t leave the bad ones in the array, they still may be recoverable and using them can, and will exacerbate the physical problem.

Now you that you have images you can do what you want.  If you can’t get the array back, at least the people you send it to will have the original starting point, and a better chance at your recovery.

I am sorry about the RAID recovery 101 lesson, but it ties in with this recovery.  This exact thing happened. A drive went down, they rebooted, saw the amber light on one drive, replaced the drive, rebooted, started a rebuild, and the rebuild froze. However, the rebuild also used a drive that came back online ‘magically’ and the entire front of the array data was corrupted by a drive with stale data. I guess they should have made images.

At this juncture I now have a RAID with a stale drive embedded in the stripe in the beginning part of the array.  I have written some software that will tell me where the break point is in the corruption, and where the good data starts.  Once that was determined, I used another piece of generic RAID software for finding stale data to get the stale drive out.  Once I knew where the stale drive was I then built a virtual drive from the remaining drives and put that drive into the array. Phew!

The next step was to destripe the array onto a server, and then try and build the file system. I used another in-house tool to calculate the stripe, as well as verify the drive order.  Once that was done I destriped the array onto an in house server.

The final step was to try and mount the file system using NSS file reading and repairing software. I had some commercial software, as well as a piece of in-house software that I had been working on for reading NSS.  None of the commercial software would read the file system. My piece of half coded in house software also failed.  Not good.

So here I am, I have what appears to be a good destripe, but no way to build a corrupted file system.  The array is worth a good deal of money so the boss wants me to figure something out. I did have a few strikes against me but I had one big advantage.  The client was a sweetheart.  Very understanding, and willing to wait four to six weeks for the data. So, I cleared my schedule, told my boss if he wanted this array to leave me alone for six weeks, and decided to buckle down and decipher the on-disk format. 

In my next installment, I will show you the steps I took to decipher the NSS record format.  Until then, back up your data!!!!