Saturday 24 April 2010 — This is close to 15 years old. Be careful.
Until last weekend, Susan’s blog had been done with Blogger. We made use of the FTP feature to push all the content to static HTML files on her server. But Blogger is discontinuing FTP support, so we had to do something.
I’m a huge believer in keeping old URLs working, so I didn’t want to switch to a blogspot.com blog, or even move to blog.susansenator.com. Besides, Blogger had been seeming pretty creaky for a while, so I took the opportunity to try something better, namely Wordpress.
Creating the Wordpress blog was pretty simple. Our hosting provider offers one-click installation which worked great. Making a Wordpress theme can be a big undertaking, but not if you’re just trying to mimic an existing simple blog layout. I downloaded a simple theme and started hacking away on it. The Wordpress docs are pretty good, definitely better than Blogger’s, that’s a recurring theme here.
Migrating all the content over was a bigger deal. Blogger offers a backup facility that gives you your entire blog as a giant XML file. Converting that to a Wordpress format was simple with blog converters. Included is blogger2wordpress, which turned my 16Mb Blogger XML file into a 12Mb Wordpress XML file.
Then Wordpress can import the XML file, but maximum size 2Mb, why? So I manually split the big XML file into 8 smaller XML files, which was tedious but not difficult. Importing each of them brought in all the old blog posts and comments. Nice. (For some reason, embedded YouTube videos are now just a URL in text, not sure why. If I had noticed that earlier I may have been able to do something about it.)
Now we have a Wordpress blog that works just like the Blogger blog did, except that everything has a different permalink than it did before. The first step to fix that is to change the permalink style Wordpress uses. It defaults to something horrendous like:
http://susansenator.com/blog/?p=123
Select “Month and name” under Permalink settings in the Wordpress installation. This makes Wordpress use nice URLs like:
http://susansenator.com/blog/2010/04/here-be-dragons/
Changing this setting will either add or require you to add a chunk of mod_rewrite rules to your Apache .htaccess file:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /blog/index.php [L]
</IfModule>
But lots of other things are subtly different. Archive pages are named differently, Blogger had an index.html page for the blog, and so on. I manually added these rewrites to fix these issues:
# Blogger slugs have .html, wordpress does not.
RewriteRule ^blog/([0-9]{4})/([0-9]{2})/(.*)\.html?$ /blog/$1/$2/$3/ [R=301,L]
# Blogger archives are different.
RewriteRule ^blog/([0-9]{4})_([0-9]{2})_01_archive\.html /blog/$1/$2/ [R=301,L]
# Blogger feeds are now found at the wordpress feed
RewriteRule ^blog/atom.xml /blog/feed/ [R=301,L]
RewriteRule ^blog/rss.xml /blog/feed/ [R=301,L]
# Blogger had the old-style index.html.
RewriteRule ^blog/index.html /blog/ [R=301,L]
The thorniest problem, though, is that Blogger and Wordpress don’t agree on how to turn a post title into a slug. Both lowercase the text and change spaces to dashes, but Wordpress includes every word, while Blogger leaves out “a” and “the”, and maybe others.
The simplest way to solve the differing slug problem was to examine the wordpress.xml file. It had the title of the posts, and the Blogger slug, in the form of the post’s permalink. I could determine which posts would have a new slug under Wordpress, and create a redirect for them.
A quick Python program did the work:
from lxml import etree
import re, sys
def items(f):
doc = etree.parse(open(f))
items = doc.xpath('.//item')
for item in items:
title = item.xpath('title/text()')
link = item.xpath('link/text()')
if title and link:
yield (title[0], link[0])
# Regexes for turning a title into a Wordpress slug
slugify = [
# Drop everything but nice word characters
(r"[^-a-z0-9 ]", ""),
# All spaces become dashes
(r" ", "-"),
# Multiple dashes become one
(r"-+", "-"),
]
def do_file(f):
for title, link in items(f):
if "susansenator.com" not in link:
continue
slug = link.split('/')[-1].split('.')[0]
wpslug = title.lower()
for pat, rep in slugify:
wpslug = re.sub(pat, rep, wpslug)
if wpslug != slug:
old_path = link.replace("http://susansenator.com/", "")
new_path = old_path.rsplit('/', 1)[0] + "/" + wpslug
print "RewriteRule ^%s /%s [R=301,L]" % (
old_path.replace(".", r"\."),
new_path
)
do_file(sys.argv[1])
This just looks at every post, extracts the Blogger slug from the post’s link, and computes the Wordpress slug. Where the two slugs differ, a rewrite rule is written. On Susan’s blog, this produced 446 rewrite rules, which went into .htaccess:
### These are posts that slugify differently under blogger and wordpress, to keep old permalinks working:
RewriteRule ^blog/2010/04/cheerful-feelings-upon-awakening-in\.html /blog/2010/04/cheerful-feelings-upon-awakening-in-the-country [R=301,L]
RewriteRule ^blog/2010/03/here-is-my-passover-album-on-facebook-i\.html /blog/2010/03/passover-pics [R=301,L]
RewriteRule ^blog/2010/03/reality-of-autism-rifts-and-what-obama\.html /blog/2010/03/the-reality-of-the-autism-rifts-and-what-obama-should-do [R=301,L]
# ... 440 skipped ...
RewriteRule ^blog/2005/10/autism-and-school-board\.html /blog/2005/10/autism-and-the-school-board [R=301,L]
RewriteRule ^blog/2005/10/speed-of-dark\.html /blog/2005/10/the-speed-of-dark [R=301,L]
RewriteRule ^blog/2005/10/adolescence-without-roadmap\.html /blog/2005/10/adolescence-without-a-roadmap [R=301,L]
With the new super-sized .htaccess in place, the new blog is ready to go. All existing links work well, and no one misses a beat.
Comments
I'm not sure what's happening, but it seems that every time I upload my blogger export file to http://blogger2wordpress.appspot.com/, it comes back with another blog's posts!?!?! I'm not sure what's happening, but it's either SPAM or it is mixing posts between website visitors.
For example, I uploaded content from our Green Education blog, and in the WRX output file, I also have the posts from http://blog.timoth.net/.
Odd.
I've been really scratching my head over WordPress showing a path to new posts, yet these are just virtual pages within the WP database as opposed to the literal .HTML files we're used to seeing when we created new Blogger posts. Does this mean we'd be safe to delete the old Blogger < year \ month > structure now?
The biggest outstanding issue I've yet to sort out is that none of our old Blogger posts show up in the Archives dropdown widget. I'm a little antsy about rinking around with my .htaccess file any more than absolutely necessary but do I understand correctly that if I add the following lines:
# Blogger archives are different.
RewriteRule ^blog/([0-9]{4})_([0-9]{2})_01_archive\.html /blog/$1/$2/ [R=301,L]
...that this will address the Archives issue we're having?
Also, I'm still figuring out how to tweak the themes code portions to make a few changes here & there, but I can't get the pages styling (like the Hx font formatting, for example) to quite match the posts.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
that I think should be handling your archives. Maybe those aren't properly installed?
I'm struggling with splitting a large Blogger Export file --- can you share the logic you used? Do counts have to be adjusted as you make the splits. I'd appreciate any help!
Thank you
I know I'm a little late in on the party, but i was wondering if it'd be possible to create a website using WordPress, then use Blogger to post blogposts within the WordPress website?
I'm a non-techie person you see, and when my friend helped set up our current website, he also hacked the blog posting function. This means that our 'Blog' is now just a WordPress page, and it seems like we can't do simple posts anymore. That, plus we're more comfortable with using Blogger to blog :S
Would it be at all possible to dynamically extract Blogger's content into Wordpress, and do a blog post on a Wordpress site using Blogger?
thanks for sharing this. Since this is already an older entry, I am curious whether this method still works now with the new Wordpress 3.2.1?
i recommend to check up this tut http://www.cms2cms.com/blog/import-blogger-into-wordpress-unbeaten-solution/
This way of conversion doesn't require any tech skills or module/plugin installation. It's enough to install wordpress and then the content will be automatedly converted from blogger to wp.
Add a comment: