Thursday, May 28, 2020

[SOLVED] MongoDB will not start - invariant failure - database has corrupted collection files

One of my MongoDB server suddenly failed to start after abnormal / abrupt shutdown (lost power abruptly too many times).

When it tried to start I noticed errors like these in the log.

"invariant failure"

Then I tried to repair the database using the following command (it does not really help - but provide more clues what is wrong):

(your dbpath may be different)

mongod --repair --dbpath=/data_local/mongodb

The above command attempts to repair many bad indexes, collections, etc... but eventually it stopped at repairing one collection with an error message like this:

MongoDB wiredTiger Error: collection-<blah-blah>.wt does not appear to be a WiredTiger file

So, one of my collection file has been corrupted. My MongoDb version is 3.2 and it does not have a 'wt' tool (wiredtiger command line utility to repair, salvage, etc... wiredtiger files).  I have to install 'wt' tool from wiredtiger.com.  Please follow my guide How To Install WiredTiger Command Line Tool To Fix MongoDB Collections.

After I have 'wt' command line tool installed, I then proceed to try to fix (salvage) my collection file. Before I do that I prepare a working/salvage directory to provide a separate directory (I used /data_local/mongo_salvage).  I put my collection file (*.wt) and a few other needed files there:

(note my collection file name is collection-1217--8628481310102098565.wt, change to whatever your collection file name is)

-rw-r--r-- 1 root      root      4738772992 Feb  9 14:06 collection-1217--8628481310102098565.wt
-rw-r--r-- 1 root      root         1155072 Feb  9 14:05 _mdb_catalog.wt
-rw-r--r-- 1 root      root        26935296 Feb  9 14:05 sizeStorer.wt
-rw-r--r-- 1 root      root              95 Feb  9 14:05 storage.bson
-rw-r--r-- 1 root      root              46 Feb  9 14:04 WiredTiger
-rw-r--r-- 1 root      root              21 Feb  9 14:04 WiredTiger.lock
-rw-r--r-- 1 root      root             916 Feb  9 14:04 WiredTiger.turtle
-rw-r--r-- 1 root      root        10436608 Feb  9 14:04 WiredTiger.wt


Then I go to that working salvage directory and tried to salvage the collection file:

(note my collection file name is collection-1217--8628481310102098565.wt, change to whatever your collection file name is)


cd /data_local/mongo_salvage
./wt -v -h /data_local/mongo_salvage -C "extensions=[./ext/compressors/snappy/.libs/libwiredtiger_snappy.so]" -R salvage -F collection-1217--8628481310102098565.wt

The 'wt' tool did not return any error, which mean it worked.

Now, all I have to do is copy it (overwrite) my previously (original) corrupted collection file. BUT before I do that, lets be safe and make a copy of that corrupted file first.

cp /data_local/mongodb/collection-1217--8628481310102098565.wt /data_local/mongo_salvage/collection-1217--8628481310102098565-ORIGINAL-CORRUPTED.wt

Finally we are ready to overwrite it with the good salvaged version:

cp /data_local/mongo_salvage/collection-1217--8628481310102098565.wt /data_local/mongodb/

Then, we re-execute the same repair command:

mongod --repair --dbpath=/data_local/mongodb


In my case, I had 1 additional collection that was also corrupted. But I was relieved to see the above procedure worked (made progress - woohoo!).  So, I just repeat the process again for another corrupted collection file.

SUCCESS!  finally after salvaging 2 collection files and copying them over to overwrite the original corrupted .wt file, I was able to fully run the mongod --repair without it borking out midway. It executed all the way through and I was able to start my MongoDB system!

I hope this instruction is helpful for somebody. If so, please kindly let me know via comment.


No comments:

Post a Comment