Migration From Plone to WordPress

December 30, 2010 by · 2 Comments 

Migrating from Plone to WordPress is not quite as straightforward as it may seem. In fact, it was quite a PITA. One of the factors for this was that none of us (except for @kagesenshi) have deep technical expertise on the innerworkings of Plone, or Zope. There was one solution, but reading through it, I know it would be a PITA.

So  I decided that the best solution would be to parse Plone’s RSS feed and import them to WordPress’ MySQL database. This seems to be the most sensible, effective and headache free solution – provided that your Plone entries (articles, news, etc) combined are not in thousands.

The first thing that you need to do is to aggregate the entire Plone site. The steps to do this is well documented here.

The RSS by default will publish only 15 items. You have to change the RSS’ setting to, well, a very large number if you want to parse all the contents. This can be done by navigating to synPropertiesForm of the Plone site.

So now you will have all the site’s content in RSS. Parsing the RSS is easy. My first attempt was to use ElementTree, a library that I always use whenever dealing with stuff XML-ish. Apparently it didn’t work properly, so I used feedparser instead.

Here’s a snippet of the code that does the job. This code reads a RSS file. feedparser can also retrieve the RSS remotely.

import feedparser
import os
import sys
import MySQLdb as mdb

# get this from the db
# post_author id
# mel => 1
# sniffit => 2
# klks => 3

# apparently unicode is not handled properly eventhough uft8 is aleady set, so edit the RSS file and remove Unicode chars first

if __name__ == '__main__':
        file = sys.argv[1]

        if not os.path.exists(file):
                print "Error: Unable to find file: " + file
                sys.exit(1)

        dbname = "yourdb"
        dbpass = "yourpassword"
        dbhost = "localhost"
        dbuser = "youruser"

        conn = mdb.connect(dbhost, dbuser, dbpass, dbname, charset="utf8")
        cursor = conn.cursor()

        f = feedparser.parse(file)
        l = len(f['entries'])   

        print "Processing %d entries" % l 

        for post in f['entries']:
                title = post.title
                link = post.link
                content = post.description
                content = content.encode('utf8')
                postdate = post.date
                slug = link.split("/")[-1]
                author = post.author
                post_author = ""

                if author == 'mel': post_author = "1"
                if author == 'sniffit': post_author = "2"
                if author == 'klks': post_author = "3"

                print "Importing post " + title
                q = "INSERT INTO wp_posts (post_date, post_date_gmt, post_content, post_title, post_name, post_author) VALUES (%s,%s,%s,%s,%s,%s)"
                cursor.execute(q, (postdate, postdate, content, title, slug, post_author))

        conn.commit()
        cursor.close()
        conn.close()

There are a few things that this script doesn’t do, namely tags and categories. This can be coded, or done manually. Also, the RSS’ HTML output as generated by Plone may contain some garbage or wayward HTML tags (which are pretty consistent in around 80% of contents) and this can be cleaned up manually or by code.

Share this article

Performance Optimization WordPress Plugins by W3 EDGE

Switch to our mobile site