Converting Blogger to Wordpress

Saturday 24 April 2010

Until last weekend, Susan's blog had been done with Blogger. We made use of the FTP feature to push all the content to static HTML files on her server. But Blogger is discontinuing FTP support, so we had to do something.

I'm a huge believer in keeping old URLs working, so I didn't want to switch to a blogspot.com blog, or even move to blog.susansenator.com. Besides, Blogger had been seeming pretty creaky for a while, so I took the opportunity to try something better, namely Wordpress.

Creating the Wordpress blog was pretty simple. Our hosting provider offers one-click installation which worked great. Making a Wordpress theme can be a big undertaking, but not if you're just trying to mimic an existing simple blog layout. I downloaded a simple theme and started hacking away on it. The Wordpress docs are pretty good, definitely better than Blogger's, that's a recurring theme here.

Migrating all the content over was a bigger deal. Blogger offers a backup facility that gives you your entire blog as a giant XML file. Converting that to a Wordpress format was simple with blog converters. Included is blogger2wordpress, which turned my 16Mb Blogger XML file into a 12Mb Wordpress XML file.

Then Wordpress can import the XML file, but maximum size 2Mb, why? So I manually split the big XML file into 8 smaller XML files, which was tedious but not difficult. Importing each of them brought in all the old blog posts and comments. Nice. (For some reason, embedded YouTube videos are now just a URL in text, not sure why. If I had noticed that earlier I may have been able to do something about it.)

Now we have a Wordpress blog that works just like the Blogger blog did, except that everything has a different permalink than it did before. The first step to fix that is to change the permalink style Wordpress uses. It defaults to something horrendous like:

http://susansenator.com/blog/?p=123

Select "Month and name" under Permalink settings in the Wordpress installation. This makes Wordpress use nice URLs like:

http://susansenator.com/blog/2010/04/here-be-dragons/

Changing this setting will either add or require you to add a chunk of mod_rewrite rules to your Apache .htaccess file:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /blog/index.php [L]
</IfModule>

But lots of other things are subtly different. Archive pages are named differently, Blogger had an index.html page for the blog, and so on. I manually added these rewrites to fix these issues:

# Blogger slugs have .html, wordpress does not.
RewriteRule ^blog/([0-9]{4})/([0-9]{2})/(.*)\.html?$ /blog/$1/$2/$3/ [R=301,L]

# Blogger archives are different.
RewriteRule ^blog/([0-9]{4})_([0-9]{2})_01_archive\.html /blog/$1/$2/ [R=301,L]

# Blogger feeds are now found at the wordpress feed
RewriteRule ^blog/atom.xml /blog/feed/ [R=301,L]
RewriteRule ^blog/rss.xml /blog/feed/ [R=301,L]

# Blogger had the old-style index.html.
RewriteRule ^blog/index.html /blog/ [R=301,L]

The thorniest problem, though, is that Blogger and Wordpress don't agree on how to turn a post title into a slug. Both lowercase the text and change spaces to dashes, but Wordpress includes every word, while Blogger leaves out "a" and "the", and maybe others.

The simplest way to solve the differing slug problem was to examine the wordpress.xml file. It had the title of the posts, and the Blogger slug, in the form of the post's permalink. I could determine which posts would have a new slug under Wordpress, and create a redirect for them.

A quick Python program did the work:

from lxml import etree
import re, sys

def items(f):
    doc = etree.parse(open(f))    
    items = doc.xpath('.//item')
    for item in items:
        title = item.xpath('title/text()')
        link = item.xpath('link/text()')
        if title and link:
            yield (title[0], link[0])

# Regexes for turning a title into a Wordpress slug
slugify = [
    # Drop everything but nice word characters
    (r"[^-a-z0-9 ]", ""),
    # All spaces become dashes
    (r" ", "-"),
    # Multiple dashes become one
    (r"-+", "-"),
    ]

def do_file(f):
    for title, link in items(f):
        if "susansenator.com" not in link:
            continue
        slug = link.split('/')[-1].split('.')[0]
        wpslug = title.lower()
        for pat, rep in slugify:
            wpslug = re.sub(pat, rep, wpslug)
        if wpslug != slug:
            old_path = link.replace("http://susansenator.com/", "")
            new_path = old_path.rsplit('/', 1)[0] + "/" + wpslug
            
            print "RewriteRule ^%s /%s [R=301,L]" % (
                old_path.replace(".", r"\."),
                new_path
            )
        
do_file(sys.argv[1])

This just looks at every post, extracts the Blogger slug from the post's link, and computes the Wordpress slug. Where the two slugs differ, a rewrite rule is written. On Susan's blog, this produced 446 rewrite rules, which went into .htaccess:

### These are posts that slugify differently under blogger and wordpress, to keep old permalinks working:
RewriteRule ^blog/2010/04/cheerful-feelings-upon-awakening-in\.html /blog/2010/04/cheerful-feelings-upon-awakening-in-the-country [R=301,L]
RewriteRule ^blog/2010/03/here-is-my-passover-album-on-facebook-i\.html /blog/2010/03/passover-pics [R=301,L]
RewriteRule ^blog/2010/03/reality-of-autism-rifts-and-what-obama\.html /blog/2010/03/the-reality-of-the-autism-rifts-and-what-obama-should-do [R=301,L]
# ... 440 skipped ...
RewriteRule ^blog/2005/10/autism-and-school-board\.html /blog/2005/10/autism-and-the-school-board [R=301,L]
RewriteRule ^blog/2005/10/speed-of-dark\.html /blog/2005/10/the-speed-of-dark [R=301,L]
RewriteRule ^blog/2005/10/adolescence-without-roadmap\.html /blog/2005/10/adolescence-without-a-roadmap [R=301,L]

With the new super-sized .htaccess in place, the new blog is ready to go. All existing links work well, and no one misses a beat.

Comments

[gravatar]
Dirkjan Ochtman 10:34 AM on 24 Apr 2010

Hi Ned, just wanted to warn you -- be sure to upgrade WordPress regularly; it has had a lot of security problems in the past.

[gravatar]
Ned Batchelder 10:49 AM on 24 Apr 2010

Thanks Dirkjan, my son Max has alerted me to the Wordpress Automatic Upgrade plugin (http://wordpress.org/extend/plugins/wordpress-automatic-upgrade/), and the hosting provider was already installing the latest version.

[gravatar]
Hugo Wetterberg 3:32 PM on 24 Apr 2010

About the YouTube url:s. Blogger was probably using OEmbed and as far as I know there should be ready to go solutions available for wordpress that gives you OEmbed support.

[gravatar]
Bernard Farrell 7:32 AM on 26 Apr 2010

Ned, I've been putting this off as I was not looking forward to trying to get my blogger blog onto wordpress. Thank you so much for all these useful details, I really appreciate them.

[gravatar]
Zac 11:54 AM on 26 Apr 2010

Great post, Ned! I ran into troubles though using blogger2wordpress which your readers might also like to comment on.

I'm not sure what's happening, but it seems that every time I upload my blogger export file to http://blogger2wordpress.appspot.com/, it comes back with another blog's posts!?!?! I'm not sure what's happening, but it's either SPAM or it is mixing posts between website visitors.

For example, I uploaded content from our Green Education blog, and in the WRX output file, I also have the posts from http://blog.timoth.net/.

Odd.

[gravatar]
Ned Batchelder 12:15 PM on 26 Apr 2010

@Zac, I didn't use the online version of the converter. I downloaded the Python code and ran it on my machine. Sounds like they are having some difficulty keeping their users straight...

[gravatar]
mellows 3:44 PM on 27 Apr 2010

I am having the same issue with the blogger2wordpress site, it is combinging my posts with some other blogs, and loses half the posts... this is really hard to work around. any help would be appreciated.

[gravatar]
Rob O. 6:55 AM on 9 May 2010

I had been wanting to make the move from Blogger for quite some time now, so the FTP sunsetting just kinda forced us to get on with it. So, my wife & I just migrated our FTP-published Classic Blogger blog over to a self-hosted WordPress blog. She got all of the Blogger posts imported and they seem to be working just fine.

I've been really scratching my head over WordPress showing a path to new posts, yet these are just virtual pages within the WP database as opposed to the literal .HTML files we're used to seeing when we created new Blogger posts. Does this mean we'd be safe to delete the old Blogger < year \ month > structure now?

The biggest outstanding issue I've yet to sort out is that none of our old Blogger posts show up in the Archives dropdown widget. I'm a little antsy about rinking around with my .htaccess file any more than absolutely necessary but do I understand correctly that if I add the following lines:

# Blogger archives are different.
RewriteRule ^blog/([0-9]{4})_([0-9]{2})_01_archive\.html /blog/$1/$2/ [R=301,L]

...that this will address the Archives issue we're having?

Also, I'm still figuring out how to tweak the themes code portions to make a few changes here & there, but I can't get the pages styling (like the Hx font formatting, for example) to quite match the posts.

[gravatar]
Ned Batchelder 9:52 AM on 9 May 2010

@Rob, I don't know why your Archives are not appearing. That rewrite rule won't help: your archive picker is already using URLs like "/blog/2010/05". Wordpress tries to write rewrite rules like:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

that I think should be handling your archives. Maybe those aren't properly installed?

[gravatar]
SBA 6:46 PM on 12 Jun 2010

You said "So I manually split the big XML file into 8 smaller XML files, which was tedious but not difficult. Importing each of them brought in all the old blog posts and comments. "
I'm struggling with splitting a large Blogger Export file --- can you share the logic you used? Do counts have to be adjusted as you make the splits. I'd appreciate any help!

[gravatar]
Ned Batchelder 7:03 PM on 12 Jun 2010

@SBA: I don't have the exported files any more, so the details here might be a little off, but there was nothing tricky about splitting the export file. Copy the export file so that you have a clean copy of the original. Then find the elements in the file that correspond to your blog posts, and remove enough of them to get the file down to a usable size. Do it again with another copy, removing different elements, so that you have as many pieces as you need, each below the size limit, and each with distinct sets of posts.

[gravatar]
SBA 7:14 PM on 12 Jun 2010

Yes, I understand that part, but can't determine where a post really starts and ends! there are tags like

[gravatar]
SBA 7:17 PM on 12 Jun 2010

oops, the comment editor left out sample tags "kind#post" signals a post but there are tags before that line...

[gravatar]
Ned Batchelder 7:49 PM on 12 Jun 2010

Each blog post is an <entry>...</entry> element in the XML. Those are the ones you want to remove to get the file size down.

[gravatar]
SBA 7:56 PM on 12 Jun 2010

Thanks, I'll try that tomorrow!

[gravatar]

Ned, you are the man. This will save me tons of time.

[gravatar]
Aneesh 12:30 PM on 12 Jul 2010

Instead of making the huge htaccess file, you can modify your post slugs in your WP database(to match the blogger ones). You can also opt to end your post urls with a .html . Then there wont be any need for redirections

[gravatar]
Claudia 2:27 PM on 28 Jul 2010

I'm not sure to understand fully: I'm trying to convert my Blogger template to a Wordpress theme. Does this trick work for it?

Thank you

[gravatar]
Chris 10:21 PM on 25 Oct 2010

Hello!

I know I'm a little late in on the party, but i was wondering if it'd be possible to create a website using WordPress, then use Blogger to post blogposts within the WordPress website?

I'm a non-techie person you see, and when my friend helped set up our current website, he also hacked the blog posting function. This means that our 'Blog' is now just a WordPress page, and it seems like we can't do simple posts anymore. That, plus we're more comfortable with using Blogger to blog :S

Would it be at all possible to dynamically extract Blogger's content into Wordpress, and do a blog post on a Wordpress site using Blogger?

[gravatar]
John @ Koh Samui Travel 11:40 PM on 18 Oct 2011

Hi,
thanks for sharing this. Since this is already an older entry, I am curious whether this method still works now with the new Wordpress 3.2.1?

[gravatar]
phphunger 4:39 AM on 8 Aug 2012

nice information..but when i convert blogspot to wordpress shall i loose my page rank and the template style.

[gravatar]
malik 8:44 PM on 29 Dec 2012

i have problem with www.mobpk.com

[gravatar]
kariplr 9:26 AM on 31 May 2013

Hi everyone!
i recommend to check up this tut http://www.cms2cms.com/blog/import-blogger-into-wordpress-unbeaten-solution/
This way of conversion doesn't require any tech skills or module/plugin installation. It's enough to install wordpress and then the content will be automatedly converted from blogger to wp.

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.