Migration From Plone to WordPress
Migrating from Plone to WordPress is not quite as straightforward as it may seem. In fact, it was quite a PITA. One of the factors for this was that none of us (except for @kagesenshi) have deep technical expertise on the innerworkings of Plone, or Zope. There was one solution, but reading through it, I know it would be a PITA.
So I decided that the best solution would be to parse Plone’s RSS feed and import them to WordPress’ MySQL database. This seems to be the most sensible, effective and headache free solution – provided that your Plone entries (articles, news, etc) combined are not in thousands.
The first thing that you need to do is to aggregate the entire Plone site. The steps to do this is well documented here.
The RSS by default will publish only 15 items. You have to change the RSS’ setting to, well, a very large number if you want to parse all the contents. This can be done by navigating to
synPropertiesForm of the Plone site.
So now you will have all the site’s content in RSS. Parsing the RSS is easy. My first attempt was to use ElementTree, a library that I always use whenever dealing with stuff XML-ish. Apparently it didn’t work properly, so I used feedparser instead.
Here’s a snippet of the code that does the job. This code reads a RSS file. feedparser can also retrieve the RSS remotely.
import feedparser import os import sys import MySQLdb as mdb # get this from the db # post_author id # mel => 1 # sniffit => 2 # klks => 3 # apparently unicode is not handled properly eventhough uft8 is aleady set, so edit the RSS file and remove Unicode chars first if __name__ == '__main__': file = sys.argv if not os.path.exists(file): print "Error: Unable to find file: " + file sys.exit(1) dbname = "yourdb" dbpass = "yourpassword" dbhost = "localhost" dbuser = "youruser" conn = mdb.connect(dbhost, dbuser, dbpass, dbname, charset="utf8") cursor = conn.cursor() f = feedparser.parse(file) l = len(f['entries']) print "Processing %d entries" % l for post in f['entries']: title = post.title link = post.link content = post.description content = content.encode('utf8') postdate = post.date slug = link.split("/")[-1] author = post.author post_author = "" if author == 'mel': post_author = "1" if author == 'sniffit': post_author = "2" if author == 'klks': post_author = "3" print "Importing post " + title q = "INSERT INTO wp_posts (post_date, post_date_gmt, post_content, post_title, post_name, post_author) VALUES (%s,%s,%s,%s,%s,%s)" cursor.execute(q, (postdate, postdate, content, title, slug, post_author)) conn.commit() cursor.close() conn.close()
There are a few things that this script doesn’t do, namely tags and categories. This can be coded, or done manually. Also, the RSS’ HTML output as generated by Plone may contain some garbage or wayward HTML tags (which are pretty consistent in around 80% of contents) and this can be cleaned up manually or by code.