Converting Blogger to Wordpress

Saturday 24 April 2010This is close to 15 years old. Be careful.

Until last weekend, Susan’s blog had been done with Blogger. We made use of the FTP feature to push all the content to static HTML files on her server. But Blogger is discontinuing FTP support, so we had to do something.

I’m a huge believer in keeping old URLs working, so I didn’t want to switch to a blogspot.com blog, or even move to blog.susansenator.com. Besides, Blogger had been seeming pretty creaky for a while, so I took the opportunity to try something better, namely Wordpress.

Creating the Wordpress blog was pretty simple. Our hosting provider offers one-click installation which worked great. Making a Wordpress theme can be a big undertaking, but not if you’re just trying to mimic an existing simple blog layout. I downloaded a simple theme and started hacking away on it. The Wordpress docs are pretty good, definitely better than Blogger’s, that’s a recurring theme here.

Migrating all the content over was a bigger deal. Blogger offers a backup facility that gives you your entire blog as a giant XML file. Converting that to a Wordpress format was simple with blog converters. Included is blogger2wordpress, which turned my 16Mb Blogger XML file into a 12Mb Wordpress XML file.

Then Wordpress can import the XML file, but maximum size 2Mb, why? So I manually split the big XML file into 8 smaller XML files, which was tedious but not difficult. Importing each of them brought in all the old blog posts and comments. Nice. (For some reason, embedded YouTube videos are now just a URL in text, not sure why. If I had noticed that earlier I may have been able to do something about it.)

Now we have a Wordpress blog that works just like the Blogger blog did, except that everything has a different permalink than it did before. The first step to fix that is to change the permalink style Wordpress uses. It defaults to something horrendous like:

http://susansenator.com/blog/?p=123

Select “Month and name” under Permalink settings in the Wordpress installation. This makes Wordpress use nice URLs like:

http://susansenator.com/blog/2010/04/here-be-dragons/

Changing this setting will either add or require you to add a chunk of mod_rewrite rules to your Apache .htaccess file:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /blog/index.php [L]
</IfModule>

But lots of other things are subtly different. Archive pages are named differently, Blogger had an index.html page for the blog, and so on. I manually added these rewrites to fix these issues:

# Blogger slugs have .html, wordpress does not.
RewriteRule ^blog/([0-9]{4})/([0-9]{2})/(.*)\.html?$ /blog/$1/$2/$3/ [R=301,L]

# Blogger archives are different.
RewriteRule ^blog/([0-9]{4})_([0-9]{2})_01_archive\.html /blog/$1/$2/ [R=301,L]

# Blogger feeds are now found at the wordpress feed
RewriteRule ^blog/atom.xml /blog/feed/ [R=301,L]
RewriteRule ^blog/rss.xml /blog/feed/ [R=301,L]

# Blogger had the old-style index.html.
RewriteRule ^blog/index.html /blog/ [R=301,L]

The thorniest problem, though, is that Blogger and Wordpress don’t agree on how to turn a post title into a slug. Both lowercase the text and change spaces to dashes, but Wordpress includes every word, while Blogger leaves out “a” and “the”, and maybe others.

The simplest way to solve the differing slug problem was to examine the wordpress.xml file. It had the title of the posts, and the Blogger slug, in the form of the post’s permalink. I could determine which posts would have a new slug under Wordpress, and create a redirect for them.

A quick Python program did the work:

from lxml import etree
import re, sys

def items(f):
    doc = etree.parse(open(f))    
    items = doc.xpath('.//item')
    for item in items:
        title = item.xpath('title/text()')
        link = item.xpath('link/text()')
        if title and link:
            yield (title[0], link[0])

# Regexes for turning a title into a Wordpress slug
slugify = [
    # Drop everything but nice word characters
    (r"[^-a-z0-9 ]", ""),
    # All spaces become dashes
    (r" ", "-"),
    # Multiple dashes become one
    (r"-+", "-"),
    ]

def do_file(f):
    for title, link in items(f):
        if "susansenator.com" not in link:
            continue
        slug = link.split('/')[-1].split('.')[0]
        wpslug = title.lower()
        for pat, rep in slugify:
            wpslug = re.sub(pat, rep, wpslug)
        if wpslug != slug:
            old_path = link.replace("http://susansenator.com/", "")
            new_path = old_path.rsplit('/', 1)[0] + "/" + wpslug
            
            print "RewriteRule ^%s /%s [R=301,L]" % (
                old_path.replace(".", r"\."),
                new_path
            )
        
do_file(sys.argv[1])

This just looks at every post, extracts the Blogger slug from the post’s link, and computes the Wordpress slug. Where the two slugs differ, a rewrite rule is written. On Susan’s blog, this produced 446 rewrite rules, which went into .htaccess:

### These are posts that slugify differently under blogger and wordpress, to keep old permalinks working:
RewriteRule ^blog/2010/04/cheerful-feelings-upon-awakening-in\.html /blog/2010/04/cheerful-feelings-upon-awakening-in-the-country [R=301,L]
RewriteRule ^blog/2010/03/here-is-my-passover-album-on-facebook-i\.html /blog/2010/03/passover-pics [R=301,L]
RewriteRule ^blog/2010/03/reality-of-autism-rifts-and-what-obama\.html /blog/2010/03/the-reality-of-the-autism-rifts-and-what-obama-should-do [R=301,L]
# ... 440 skipped ...
RewriteRule ^blog/2005/10/autism-and-school-board\.html /blog/2005/10/autism-and-the-school-board [R=301,L]
RewriteRule ^blog/2005/10/speed-of-dark\.html /blog/2005/10/the-speed-of-dark [R=301,L]
RewriteRule ^blog/2005/10/adolescence-without-roadmap\.html /blog/2005/10/adolescence-without-a-roadmap [R=301,L]

With the new super-sized .htaccess in place, the new blog is ready to go. All existing links work well, and no one misses a beat.

Comments

[gravatar]
Hi Ned, just wanted to warn you -- be sure to upgrade WordPress regularly; it has had a lot of security problems in the past.
[gravatar]
Thanks Dirkjan, my son Max has alerted me to the Wordpress Automatic Upgrade plugin (http://wordpress.org/extend/plugins/wordpress-automatic-upgrade/), and the hosting provider was already installing the latest version.
[gravatar]
Hugo Wetterberg 3:32 PM on 24 Apr 2010
About the YouTube url:s. Blogger was probably using OEmbed and as far as I know there should be ready to go solutions available for wordpress that gives you OEmbed support.
[gravatar]
Ned, I've been putting this off as I was not looking forward to trying to get my blogger blog onto wordpress. Thank you so much for all these useful details, I really appreciate them.
[gravatar]
Great post, Ned! I ran into troubles though using blogger2wordpress which your readers might also like to comment on.

I'm not sure what's happening, but it seems that every time I upload my blogger export file to http://blogger2wordpress.appspot.com/, it comes back with another blog's posts!?!?! I'm not sure what's happening, but it's either SPAM or it is mixing posts between website visitors.

For example, I uploaded content from our Green Education blog, and in the WRX output file, I also have the posts from http://blog.timoth.net/.

Odd.
[gravatar]
@Zac, I didn't use the online version of the converter. I downloaded the Python code and ran it on my machine. Sounds like they are having some difficulty keeping their users straight...
[gravatar]
I am having the same issue with the blogger2wordpress site, it is combinging my posts with some other blogs, and loses half the posts... this is really hard to work around. any help would be appreciated.
[gravatar]
I had been wanting to make the move from Blogger for quite some time now, so the FTP sunsetting just kinda forced us to get on with it. So, my wife & I just migrated our FTP-published Classic Blogger blog over to a self-hosted WordPress blog. She got all of the Blogger posts imported and they seem to be working just fine.

I've been really scratching my head over WordPress showing a path to new posts, yet these are just virtual pages within the WP database as opposed to the literal .HTML files we're used to seeing when we created new Blogger posts. Does this mean we'd be safe to delete the old Blogger < year \ month > structure now?

The biggest outstanding issue I've yet to sort out is that none of our old Blogger posts show up in the Archives dropdown widget. I'm a little antsy about rinking around with my .htaccess file any more than absolutely necessary but do I understand correctly that if I add the following lines:

# Blogger archives are different.
RewriteRule ^blog/([0-9]{4})_([0-9]{2})_01_archive\.html /blog/$1/$2/ [R=301,L]

...that this will address the Archives issue we're having?

Also, I'm still figuring out how to tweak the themes code portions to make a few changes here & there, but I can't get the pages styling (like the Hx font formatting, for example) to quite match the posts.
[gravatar]
@Rob, I don't know why your Archives are not appearing. That rewrite rule won't help: your archive picker is already using URLs like "/blog/2010/05". Wordpress tries to write rewrite rules like:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

that I think should be handling your archives. Maybe those aren't properly installed?
[gravatar]
You said "So I manually split the big XML file into 8 smaller XML files, which was tedious but not difficult. Importing each of them brought in all the old blog posts and comments. "
I'm struggling with splitting a large Blogger Export file --- can you share the logic you used? Do counts have to be adjusted as you make the splits. I'd appreciate any help!
[gravatar]
@SBA: I don't have the exported files any more, so the details here might be a little off, but there was nothing tricky about splitting the export file. Copy the export file so that you have a clean copy of the original. Then find the elements in the file that correspond to your blog posts, and remove enough of them to get the file down to a usable size. Do it again with another copy, removing different elements, so that you have as many pieces as you need, each below the size limit, and each with distinct sets of posts.
[gravatar]
Yes, I understand that part, but can't determine where a post really starts and ends! there are tags like
[gravatar]
oops, the comment editor left out sample tags "kind#post" signals a post but there are tags before that line...
[gravatar]
Each blog post is an ... element in the XML. Those are the ones you want to remove to get the file size down.
[gravatar]
Thanks, I'll try that tomorrow!
[gravatar]
Ned, you are the man. This will save me tons of time.
[gravatar]
Instead of making the huge htaccess file, you can modify your post slugs in your WP database(to match the blogger ones). You can also opt to end your post urls with a .html . Then there wont be any need for redirections
[gravatar]
I'm not sure to understand fully: I'm trying to convert my Blogger template to a Wordpress theme. Does this trick work for it?

Thank you
[gravatar]
Hello!

I know I'm a little late in on the party, but i was wondering if it'd be possible to create a website using WordPress, then use Blogger to post blogposts within the WordPress website?

I'm a non-techie person you see, and when my friend helped set up our current website, he also hacked the blog posting function. This means that our 'Blog' is now just a WordPress page, and it seems like we can't do simple posts anymore. That, plus we're more comfortable with using Blogger to blog :S

Would it be at all possible to dynamically extract Blogger's content into Wordpress, and do a blog post on a Wordpress site using Blogger?
[gravatar]
Hi,
thanks for sharing this. Since this is already an older entry, I am curious whether this method still works now with the new Wordpress 3.2.1?
[gravatar]
nice information..but when i convert blogspot to wordpress shall i loose my page rank and the template style.
[gravatar]
i have problem with www.mobpk.com
[gravatar]
Hi everyone!
i recommend to check up this tut http://www.cms2cms.com/blog/import-blogger-into-wordpress-unbeaten-solution/
This way of conversion doesn't require any tech skills or module/plugin installation. It's enough to install wordpress and then the content will be automatedly converted from blogger to wp.
[gravatar]
Thanks, i want convert my blog

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.