Plone to WordPress: Importing Data into WordPress

This is Part 3 of the documentation of my migration from plone to wordpress. Read the previous posts for my reasons to choose wordpress as CMS of choice and about the intermediate data export from plone to the file system.

General Approach

As outlined in my previous post all data from the plone site is exported to an intermediate data represenation on the local file system in JSON format.

For the data import the excellent XML-RPC capabilities of wordpress are used. This interface allows the script-controlled creation of content in any given wordpress site. For python, there is the excellent python-wordpress-xmlrpc library  that comes with a pretty good documentation.

For the import I developed a python script which sequentially reads the JSON-data from my local harddrive, analyzes each entity and creates the corresponding wordpress content objects.

Python-Script for Import

The import script I developed and used as part of my migration process is written in python and uses the coresponding xml-rpc library. The basic functionality is as follows:

  • WordPress pages are created for all Plone Articles. For folderish items the default content item is used or – if no item present – an empty page with a user-defined text is created.
  • WordPress blog posts are created for all Plone NewsItems. Categories are defined according to the tags of the plone newsitem.
  • For all blobs i.e. for file items and images files are uploaded and corresponding landing page with a link to the uploaded files are generated. The creation of landing pages can be enabled or disabled via an option.

All content items are created with the correct creation, publication and modification data. If wanted, a mapping table between plone URL and wordpress URL (and id) is written during import in CSV format . This file can be used to define redirections using e.g the redirection plugin for wordpress.

The script is available for download:

python import script

Please configure the script according to your needs. The configuration options are at the top of the script.

Next Steps

With the data imported into wordpress the final step of the migration from plone to wordpress  means to fix all things that need fixing. It is time to wrap things up.

Plone to WordPress: Exporting Data from Plone

This is Part 2 of the the documentation of my migration from plone to wordpress. Read the previous post for my reasons to choose wordpress as CMS of choice.

As there is no direct path for a migration from Plone to some of the PHP-based content management systems I choose to follow a route which I had previously used quite sucessfully for other problems, namely to use an intermediate representation.

Script-based Export to JSON Data

Out of the box plone supports various methods for export either directly or via third-party products. From my quick research it seems as if many of these approaches do not really apply to the current plone 4 or require a substantive amount of plone developers knowledge which I don’t have. Other methods exported the data to XML with quite complex data structures.

What I found – though – are various approaches that export plone data to JSON files.There is

  1. a post Exporting Plone content as JSON which  presents a nice and simple python script for JSON export,
  2. collective.jsonify which up to just recently did not support blob fields,
  3. collective.jsonmigrator which is a tool for migration from plone 2.x sites to plone 4.0.
  4. collective.blueprint.jsonmigrator which publishes a set of blueprints for the migration from plone 2.x to plone 4.0

For my purpose I used the blueprint export script plone2.0_export.py from collective.blueprint.jsonmigrator as starting point.  In a series of iterations I modified this script such that I had an export run without an error. As a result for every data object (articles, pages, images, files) in plone I get a separate json file. For objects with binary data an additional base64-encoded data file is created, too.

Limitation?

Yes, there are . All items that are non-content objects in plone (users, css module, etc.) are skipped. Non-ascii characters anywhere expect the main body text are not supported. Workflow states and security permissions are disabled and hence not exported in my version.

You can get my version of the export script here: plone_export.zip

Copy the python file to your plone installation and add an external method pointing to this script’s export_plone20 function.

FTP Access

In addition to the JSON export I used the FTP access to the plone instance to get a direct copy of all data as present in the instance. This gives the files without any content information like title, publication date etc. yet with their file names as given in plone. For articles or blog entries text files are returned which contain title, teaser text, body text and author, publication date, etc.

For artictles written in restructured text the retrieved text files contained this restructured text and not the corresponding (rendered) representation. Not that helpful.

HTTP Mirroring

Finally, I used the excellent wget tool to also get a direct export of the complete website as visibile via http. This returned not only the unchanged image files as via ftp access but also the automatically generated variants for thumbnails, preview or fullscreen perspective.

Next Steps

With the data exported the next steps of the migration from plone to wordpress are:

  • Importing data into wordpress
  • and wrapping everything up.

Migrating from Plone to WordPress

Over the last 7 years incunabulum.de was using Plone as CMS of choice. But not anymore. Over the last couple of weeks I migrated the website to WordPress.

Why?

Well, first let me stress that I never encountered any serious problems with my Plone site. Yet, over the last couple of years Plone has become more and more complex with a rapid succession of new technologis, complex xml-based configurations and layers upon layers of abstraction. All this added up to an application stack which is very difficult to understand and maintain. Starting with Plone 3 upgrades have become frustating and error prone. The constant introduction of new technologies are only manageable with continuous learning and with continuous development work. Time which I do not want to invest.

Early this year I finally had a site which was running flawlessly yet which I could not update due to my limited knowledge. At the same time I could not reinstall the site in the current configuration as certain packages where not available any more. This gave me a system which I neither could restore in case of a server crash nor which I could maintain for the future in one way or the other. Therefore, it was time to move on. The results you can see here.

The migration from Plone to WordPress was a four step process, namely:

    1. Selecting WordPress as CMS
    2. Exporting the data from Plone
    3. Importing the data to Worpress
    4. Fixing things

The general migration process will be outlined here including all scripts developed for this purpose.

Duplicate Virtual Folders in Thunderbird

Personal note: If you encounter magically duplicating virtual folders in thunderbird check whether you there are any blanks or other Non-ASCII characters in your folder name. If this is the case switch the name to standard ASCIII characters and no blanks.

Explanation: In case of blanks or other Non-ASCII characters upon restart thunderbird escapes these characters in various ways leading to said duplicate entries. You can find the various entries in the virtualforlders.dat file and the corresponding msf-files in your IMAP mailbox. Deleting these entries solves the problem for a moment. Yet, unfortunately only up to the next restart of thunderbird.

Inbox Zero in Outlook

One might like Microsofts range of products or not, at work the mail system is Outlook. Perdiod And combined with Exchange this gives some quite capable system for collaborative work. But as always there is room for improvement. So, in the following my customizations to Microsoft Outlook 2013 for Inbox Zero are shown.

All mail in one single folder.

This is the normal situation which many people work with. As a result, such a single folder contains mails that are read or unread. Mails which are there just for archive purposes, mails that act as reminder for something and finally mails which need to be acted upon or which need to be answered.

Why Inbox Zero?

For me such a system only works well if the daily number of messages is low. Yet, with a busy work schedule and lots of mails as I encounter from time to time, this is seldom the case.

If the number of mails increases or if mails often need to be dealt with in a longer time frame – e. g. an answer needs to be sent next month – I quickly need to rescan hundreds of messages and hunt for the mails which are still important. This is time consuming and annoying.

Enter Inbox Zero. Inbox Zero is an action-based concept with the objective to deal with all mails once. And only once. Ever. All mails that can neither be deleted nor be dealt with are either defered and saved as action items or are delegated. So much about the basic idea

My Interpretation of Inbox Zero

As with any theory, there is more than one way of implementation. Over time and inspired by the very clever Defer feature available in the ancient FIDO software Crosspoint my personal take at Inbox Zero developed into the following:

  • I do not treat the Inbox as a physical folder but use Filtered Views or Smart Search Folders.
  • The actual Inbox View is periodically reviewed. Upon review:
    1. All newly arrived and hence unread mails are briefly read.
    2. All read mails that do not require any action or that are for information purposes only are moved elsewhere.
    3. All read mails that only require a quick answer or that can be dealt with in a couple of minutes are processed right away and are then moved elsewhere.
    4. All read mails that I will deal with today or that provide information which I will need to access today again are left in the Inbox.
    5. All read mails that I will deal with later on are marked as deferred with an appropriate due date.

As my primary Inbox View is configured such that only mails are shown that are

  • (unread)
  • or (read and not deferred)
  • or deferred and (due or overdue) and (not finished),

I always see only the information which is of importance to me right now. Deferred mails come back into view at due time and can then be processed or be deferred again. Completed mails are simply flagged as completed and are no longer shown.

Read mails left in the Inbox for easy reference need to be moved once this reference need is no longer given or might be postponed.Periodically, all completed mails which are still present in the Inbox yet are not shown in the Inbox View are moved to the correct locations.

For me, this approach works even under heavy mail load. It drastically reduces the clutter in the Inbox and gives a nice overview over all relevant and required actions. The maintenance tasks, i.e. archiving of completed mail actions and moving reference mails, are either easily done or are done periodically in burst mode.

Inbox Zero in Outlook

So, how does this work in Outlook? Basically, I do use a custom view to dynamically select only the mail items relevant for me now. For this rather complex query the Outlook SQL DSAL notation is the only option I am aware of.

After lots of experimenting I use a query as follows:

 (
  "urn:schemas:httpmail:read" = 0
  OR "urn:schemas:httpmail:messageflag" IS NULL 
  OR NOT "urn:schemas:httpmail:messageflag" > 0 
  OR ("urn:schemas:httpmail:reply-by" <= 'morgen' AND NOT "urn:schemas:httpmail:reply-by" IS NULL)
  OR ("urn:schemas:httpmail:messageflag" > 0 AND "urn:schemas:httpmail:reply-by" IS NULL)
 ) AND (
  "http://schemas.microsoft.com/mapi/proptag/0x10900003" IS NULL 
  OR NOT "http://schemas.microsoft.com/mapi/proptag/0x10900003" < 2
 )

Here, we collect or join all mails via the SQL OR statement that are 1) unread, 2) without a message flag, 3) where the flag was set and deleted, 4) have a flag or category yet no due date. Of all these mails, only (AND statement) mails are shown that A) do not have a due date, B) are not marked as done or completed. For the last checks the inoffcial proptag 0x10900003 is queried.

Voila. Save and assign to a custom folder represation and you are ready to doe Inbox Zero the incunabulum way.

Endnote

For the assignment of due dates, categories etc. another blog post will follow.

And – btw – one of the first steps on any new computer is to disable the new mail notification actions. For me, email is an asynchronous means of communication where I decide – based on my current tasks and mood – when and how often I want to check my mail. For urgent matters, there is always the phone or personal communication.

Owl2Java Status Update

Yes, Owl2Java, my java-based code generator for the comfortable access to OWL ontologies from java is no longer maintained by me. As I am now working in a totally different field of application and due to some very limited spare time, I am just and simply not able to do any maintenance.

Nevertheless, from time to time it is quite interesting to see, what other are doing wiht this public piece of code, see e.g. my previous post.

And today?

Well, things are calming down which is not surprising for a software that is unmaintained for so long. Yet, there are some activities up to 2012 in the repositories here and there and I even got some references in the odd paper like e.g.

And yes, my work even got referenced in a book, namely the proceddings from Semantic Web Rules.

But I guess thats about it. Dead is dead. Farewell.

Against the Cloud – Flickr Dowloads

Well, in the new days there came clouds. Lot’s of clouds. And they started to swallow lots and lots… images, documents, you name it. Everything… But I like sunshine and not clouds. So, out they must come. From the cloud on my own personal machine and under my control.

In this case, it was a picture set on Flickr for which I got access as public guest. And yes, as it is the usual case with clouds, this thing makes no difference. Upload is easy, downloads not. Neither as single impage nor – preferably – as complete set.

So what to do?

As described here Firefox, Greasemonkey and DownThemAll can be used togehter and form perfect match. The Flickr Image Script for greasemonkeys adds links to the largest available size for each image. DownThemAll can be set to really only download the images linked from the images. Additional Link here.

Done. Perfect. And yes, this is a personal note, too.

Etwas zur Genauigkeit von Temperaturmessungen

Vor längerer Zeit – ok, vor Ewigkeiten – habe ich im Rahmen des Pflanzenoelumbaus meines Passats im Rücklauf direkt hinter der ESP einen Temperaturfuehler installiert.

Und damit dass auch richtig tut natürlich richtig als Inline-Temperaturfühler mit passendem Fühlerstück, so dass letztlich der Fühler auch schön mittig im T-Stück im Rücklauf angeordnet ist. Da sollte doch nicht viel schief gehen und alles super genau sein …

img
img

img

Oder etwa doch nicht?

Der Steckerverbinder an diesem T-Stück direkt auf dem Motor war nach über 100.000 km endgültig durchgeschüttelt und musste ob des immer gravierender werdenden Luftwatzes doch ausgetauscht werden. Aus Faulheit wurde hierfür aus einem vorhandenen 3/8-Zoll T-Stück – verbaut war ein 1/4-Zoll Stück – samt 1/4-Zoll-Adapter eben eine Austauschlösung zu Hause im Warmen gebaut. Flugs getauscht. Fertig.

Dass jetzt der Temperaturgeber nicht mehr mittig im Querschnitt angeordnet ist sondern ca. 8 mm nach Aussen versetzt angeordnet ist kann ja nicht so ein Problem sein. Dachte ich. Ist es aber doch! Bei meinen üblichen Vergleicshrouten ( 10 Stunden Standzeit, 3 Kurven, danach 20 im Tempomat) bei identischer Aussentemperatur etc. stehen

  • bei Autobahnauffahrt auf einmal 8 – 10 Grad weniger auf der Anzeige,
  • bei Langstrecke (Tempomat 150 km/h) sind es 5 Grad weniger.

Und das nur, weil ich jetzt 10 cm hinter der ESP im Randbereich der Strömung und nicht mittig messe. Solche Auswirkungen hätte ich hier nicht erwartet. Ergo – Wer misst misst Mist. Genau. Mist.