This is Part 2 of the the documentation of my migration from plone to wordpress. Read the previous post for my reasons to choose wordpress as CMS of choice.
As there is no direct path for a migration from Plone to some of the PHP-based content management systems I choose to follow a route which I had previously used quite sucessfully for other problems, namely to use an intermediate representation.
Script-based Export to JSON Data
Out of the box plone supports various methods for export either directly or via third-party products. From my quick research it seems as if many of these approaches do not really apply to the current plone 4 or require a substantive amount of plone developers knowledge which I don’t have. Other methods exported the data to XML with quite complex data structures.
What I found – though – are various approaches that export plone data to JSON files.There is
- a post Exporting Plone content as JSON which presents a nice and simple python script for JSON export,
- collective.jsonify which up to just recently did not support blob fields,
- collective.jsonmigrator which is a tool for migration from plone 2.x sites to plone 4.0.
- collective.blueprint.jsonmigrator which publishes a set of blueprints for the migration from plone 2.x to plone 4.0
For my purpose I used the blueprint export script plone2.0_export.py from collective.blueprint.jsonmigrator as starting point. In a series of iterations I modified this script such that I had an export run without an error. As a result for every data object (articles, pages, images, files) in plone I get a separate json file. For objects with binary data an additional base64-encoded data file is created, too.
Limitation?
Yes, there are . All items that are non-content objects in plone (users, css module, etc.) are skipped. Non-ascii characters anywhere expect the main body text are not supported. Workflow states and security permissions are disabled and hence not exported in my version.
You can get my version of the export script here: plone_export.zip
Copy the python file to your plone installation and add an external method pointing to this script’s export_plone20 function.
FTP Access
In addition to the JSON export I used the FTP access to the plone instance to get a direct copy of all data as present in the instance. This gives the files without any content information like title, publication date etc. yet with their file names as given in plone. For articles or blog entries text files are returned which contain title, teaser text, body text and author, publication date, etc.
For artictles written in restructured text the retrieved text files contained this restructured text and not the corresponding (rendered) representation. Not that helpful.
HTTP Mirroring
Finally, I used the excellent wget tool to also get a direct export of the complete website as visibile via http. This returned not only the unchanged image files as via ftp access but also the automatically generated variants for thumbnails, preview or fullscreen perspective.
Next Steps
With the data exported the next steps of the migration from plone to wordpress are:
- Importing data into wordpress
- and wrapping everything up.