Adventures in migrating from static website + Blogger SFTP to WordPress
For many years now my main website has consisted of a set of statically generated webpages providing the overall structure. A couple of areas, notably my main blog were then dynamically generated using Blogger. The reason I started using Blogger was that it has the ability to publish posts directly to my webserver using SSH/SFTP, thus allowing the dynamic parts of the site to seemlessly integrate with the static parts. Then a couple of weeks ago, Blogger announced that they were discontinuing support for SFTP publishing on March 26th. Needless to say, this rather ruined my website publishing architecture. After thinking about things for a couple of weeks though, I decided this decision of Blogger’s is a blessing in disguise, because the way I managed website was completely outdated & needed to be brought into modern world.
What I in fact needed for a very simple content management system, that allowed publishing a small number of ‘static’ pages on site, but with the majority of the content being blog postings. Categorization, tagging & external links would be desirable too. Of course it has to be open source software too, capable of running on both my Debian Lenny webserver & Fedora laptop. As many people are no doubt aware, this is exactly what WordPress provides. As a proof of concept I downloaded the latest WordPress, tried out the install process on my laptop & generally got a feel for its admin capabilities. It all looked perfect, so that was a good decision made.
Exporting content from Blogger
Over the years that I’ve been using Blogger, I’ve written a few hundred postings, many worthless trash, but a fair number of them have really useful & frequently visited content. My recent series of articles on libvirt features have been particularly popular. It is absolutely non-negotiable that all existing links to these postings continue to work & don’t all end up broken. So the first step of the migration was to figure out how to export the content from Blogger into WordPress. The first thing I tried was WordPress’ own built-in import tool that can allegedly talk directly to Blogger and pull down all the postings & comments. The first problem I found with this, is that it only works if your content was hosted on Blogger. ie if you were using SFTP publishing it always reports ‘0 posts’. I temporarily update my blog settings to turn off SFTP and it at least detected all the posts at that point. I started the import process & it imported 3 posts and 70 comments and then gave up with no indication of what’s wrong. Tried again, and the same thing happened. Searching the WordPress forums it seems many people have hit this problem over the past 2 years with no reliable solution yet available.
Then I investigated whether Blogger had its own export capabilities. It does. It can export all your blog posts and comments in a single XML file. Unfortunately there is no apparent standard XML schema for blog import/export so there didn’t seem to be much use for this export capability & I didn’t fancy writing my own XSL transform to convert it to WordPress’ native XML import schema. The nice thing about Blogger and WordPress being so widely used on the web, is that if you have a problem, then the chances are that someone else has had the same problem already. In fact so many people have had this problem, that someone’s already written an tool to solve this, Google Blog Convertors
I tried downloaded it, fed it the Blogger exports and it generated some nice looking WordPress XML files. A closer look revealed one tiny flaw – it had unescaped a whole bunch of HTML tags in blog posts where I had been including snippets of example XML or HTML inside <pre> tags. Fortunately the code is all python and it was easy to find the bogus line of code “content = unescape(text)
” and replace it with just “content = text
“. After that the files imported into WordPress perfectly, preserving all formatting and comments.
Setting up URL redirects
Even though WordPress has a nice friendly URL scheme for articles based on their title, it is very slightly different from the scheme Blogger used for URLs. I was also merging several separate Blogger feeds into one, since WordPress has a nice categorization capability. It was thus inevitable that the URLs for all existing posts would have to change. The solution to this problem was pretty straightforward. Apache’s mod_rewrite engine can be told to load external files containing arbitrary key, value mappings, and then reference these maps in rewrite rules. It was a simple, albeit slightly tedious, process to write a map file that contained the old Blogger URL as the key and the new WordPress URL as the value. As an example, a tiny part of the map I created looks like this
/diary/2008/04/presentation-is-everything /posts/2008/04/21/presentation-is-everything /diary/2008/06/red-hat-summit-2008 /posts/2008/06/18/red-hat-summit-2008 /diary/2009/12/using-qcow2-disk-encryption-with /posts/2009/12/02/using-qcow2-disk-encryption-with-libvirt-in-fedora-12
To make use of this map, just requires two rules in the httpd.conf file, one to load the map and the other to add a match for it. Those rules look like this
RewriteMap blogger txt:/etc/apache2/blogger-rewrite.txt RewriteRule ^/personal(/diary/.*) ${blogger:$1} [L,R=permanent]
In summary, while the migration process from Blogger to WordPress was not entirely smooth, it went alot better than I expected it to. Any web user following an old link to a post on my site now gets a permanent redirect to the new location, so no important links were broken during the migration. The new site I have is soo much more flexible than the old one & the WordPress UI is very much nicer to use. Blogger’s UI is rather dated & not really on a par with the standard of Google’s other popular apps like GMail.