Monday, August 11, 2014

Converting Outlook PST files into Thunderbird format

This turns out the be a process many people are looking for abut few people actually have answers for. Many of the solutions that I found work well if you have a very simple mailbox. If however, you have multiple PST files each with many sub-folders and sub-sub-folders that you wish to keep the structure of, the choices get very limited very quickly. I was successful in accomplishing this so I thought I would document the process I used for both myself and for others looking to do the same.

Here is the scenario:

I have 16 Microsoft Outlook .pst files. I am an email pack-rat and I archive off each year's worth of email into it's own .pst file. So I have:

~/pst$ ls
Archive 1998.pst  Archive 2002.pst  Archive 2006.pst  Archive 2010.pst
Archive 1999.pst  Archive 2003.pst  Archive 2007.pst  Archive 2011.pst
Archive 2000.pst  Archive 2004.pst  Archive 2008.pst  Archive 2012.pst
Archive 2001.pst  Archive 2005.pst  Archive 2009.pst  Archive 2013.pst


I want Thunderbird to display these as individual folders with the name of the .pst file and I want them in the in the local mail box. As you can see I started by placing all of the .pst files into a folder named ~/pst.

Next I use a utility named readpst to convert the pst's to mbox format.

~/pst$ mkdir archives
~/pst$ find . -name "*.pst" -print0 \
| xargs -0I{} readpst -u -r -o archives '{}'

I broke the line up above so that it is readable. This will create a directory for each of the pst files in the "archives" directory.

~/pst$ ls archives/
Archive 1998  Archive 2001  Archive 2004  Archive 2007  Archive 2010  Archive 2013
Archive 1999  Archive 2002  Archive 2005  Archive 2008  Archive 2011 
Archive 2000  Archive 2003  Archive 2006  Archive 2009  Archive 2012

If you look at the structure of the directories you will find that they are almost right for simply copying into the Thunderbird directory structure. But they are not absolutely right. So we will make it right.

Thunderibird uses a structure that has either two or three files all at the same directory level to represent a folder with its contents:

Thunderbird directory structure

folder-name.sbd
      This directory is at the same level as the two following files and only exists if there are sub directories.

folder-name   
      Will be an mbox formatted file with all of the messages and attachments

folder-name.mfs
      Will be a file that indexes the mbox formatted file above. If this files does not exists when Thunderbird starts it will create it.


The directory structure of the readpst output is:

folder-name
      A directory containing sub-directories and the mbox file for this folder

mbox
      The mbox formatted file for this folder. Note that it is in the file-folder directory. Not at the same level as it needs to be for Thunderbird.

Converting readpst output to Thunderbird format


Converting readpst output to Thunderbird directory structure only requires a little shuffling and renaming.

First lets make a working directory and step into it.

~/pst$ mkdir archives/tbird
~/pst$ cd archives/tbird
~/pst/archives/tbird$

I entered the following commands into a text editor and then simply pasted them at the command line. If you prefer you can save it as a script and mark it as executable and run it. For me copy-paste worked just fine.
  for i in 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
do

   echo "Processing Archive $i"
   

   # I copy the structure here just in case I mess up and have
   # to repeat the process for some reason. This is much faster
   # than rerunning readpst

   cp -rp ../Archive\ $i .



   # Thunderbird's directories all have a .sbd extension

   # we have to sort and deal with quotes because it is 
   # probable that you will rename a lower level folder before
   # renaming deeper level folders and that will cause it to fail

   find . -type d \
   | sort -r \
   | grep -v ".sbd$" \
   | grep -v "^.$" \
   | sed -e "s/'/\\\'/g" \
   | xargs -I{} mv '{}' '{}.sbd'

   # now we need to make sure that there is an mbox in

   # every folder. Even if it is empty. It is required
   # for Thunderbird to present the folder in the application

   find . -name "*.sbd" -type d -print0 \
   | xargs -0I{} touch '{}/mbox'


   # Now the mbox is both named wrong and is inside the directory

   # instead of at the same level as it. So we move the file up 
   # a level and rename it to the name of the directory minus the
   # .sbd

   find . -name mbox \
   | sort -r \
   | sed -e "s'\.sbd/mbox''" \
   | sed -e 's|\(.*\)/\(.*$\)|mv "\1/\2.sbd/mbox" "\1/\2"|' \
   | sh

   # make sure Thunderbird is not running. The directory below

   # is determined by: grep Path ~/.thunderbird/profiles.ini
   # this will move the updated directory structure into 
   # Thunderbird's directory structure.

   mv * ~/.thunderbird/63lnvun6.default/Mail/Local\ Folders/
 

done

Now start Thunderbird and you should have 16 new folders. Rejoice!