Category Archives: it

Redirection in WordPress for the Plone URLs

The python-based import script which I used to automatically import my plone content into wordpress – see the coresponding article here – also generates an output file where for all content items, i.e. for pages, posts and blobs, the mapping between plone URL and wordpress URL and wordpress ID is stored.

Using this rather extensive file a quick python script is used to extract the two relevant sections for redirections, i.e. the mapping plone URL <> wordpress URL

  • for pages and posts
  • for blobs aka media

The script also updates and reformats the input such that the resulting two files pagePostMap.csv, blobMap.csv can be directly imported into the Redirection Plugin of wordpress.

Download the script here:

plone2wordpress redirections

 

WordPress – Fixing Things

Well, it took me a while to comlete the final post about the  migration from plone to wordpress. But here it is, where I outline the finishing touches to get a fully working site. For background information, please read the previous posts about the intermediate data export from plone to the file system. and the automated import into wordpress.

Content Listings

For folderish objects plone can automatically create content listings in various styles. These listings show the content of the folder or the result of some query definition. WordPress – in contrast – is not aware of a strict folder hierarchy and content and therefore is not able to generate these listings.

What to do?

For a website of moderate size like incunabulum.de I resorted to manually creating these content listings after import into Wordpres. For this purpose the export automatically detects objects with content listings and sets the content to some predefined statement (“Directory listing missing”. After import a quick search for this statement quickly gives all pages where a manually created listings needs to be defined.

The folder listings are directly copied from the plone instance which during migration was running in parallel. There is hence no specific need to manually recreate these listings.

Trouble with Plone Aquistion

Generally defined, Plone’s aqusition is a functionality where objects can aquire any property from any parent. What this means is that e.g. if a link references an image by name, but the image is located in one of the parent folders than plone automatically subsequently searches all parent folders for said image.

Under plone the majority of posts where written manually using the restructured text syntax. Here, I used aquisition for many links to reduce link lengths and to avoid excessive typing.

Again – What to do?

There probably are some plone commands to automatically check all links and to convert aquisitioned links to directly link to the real object. But this would mean at least an additional export step and some handling around with the plone code base.

For incunabulum.de I – once again – resorted to keeping the links as is upon export and import. After import broken links are detected using the excellent Broken Link Checker Plugin in WordPress and are fixed manually by deleting the additional sub folder items in the link.

All in all this required about 2 hours for my site (and therefore was probably more efficient than mucking around with plone python code)

Handling Files, Images and Image Sizes

For all images uploaded to  plone website, various sizes are available by adding e.g. <link>/image_thumb to a link. In wordpress things are handled differently via media elements.

As I did not want to manually change all links to specific images sizes and to other files etc. I created a static mirror of the plone website via wget and copied the downloaded version to a separate subdomain.

A set of rewrite rules takes care of the rest, see below. All links to images are transparently redirected to the archive version of plone.

 # BEGIN WordPress
 <IfModule mod_rewrite.c>
 RewriteEngine On
 RewriteBase /
 
 # Plone images are served from static mirror (mz, 2014.07)
 RedirectMatch (.*)image_(.*)         http://plone-archive.incunabulum.de$1image_$2
 RedirectMatch (.*)/view         http://plone-archive.incunabulum.de$1/image_preview

 # Plone - Map all jpg images in content directories to default plone variant image_preview
 RedirectMatch /projects/(.*).jpg     http://plone-archive.incunabulum.de/projects/$1.jpg/image_preview
 RedirectMatch /restricted/(.*).jpg     http://plone-archive.incunabulum.de/restricted/$1.jpg/image_preview
 RedirectMatch /travel/(.*).jpg         http://plone-archive.incunabulum.de/travel/$1.jpg/image_preview
 RedirectMatch /work/(.*).jpg         http://plone-archive.incunabulum.de/work/$1.jpg/image_preview

Plone to WordPress: Importing Data into WordPress

This is Part 3 of the documentation of my migration from plone to wordpress. Read the previous posts for my reasons to choose wordpress as CMS of choice and about the intermediate data export from plone to the file system.

General Approach

As outlined in my previous post all data from the plone site is exported to an intermediate data represenation on the local file system in JSON format.

For the data import the excellent XML-RPC capabilities of wordpress are used. This interface allows the script-controlled creation of content in any given wordpress site. For python, there is the excellent python-wordpress-xmlrpc library  that comes with a pretty good documentation.

For the import I developed a python script which sequentially reads the JSON-data from my local harddrive, analyzes each entity and creates the corresponding wordpress content objects.

Python-Script for Import

The import script I developed and used as part of my migration process is written in python and uses the coresponding xml-rpc library. The basic functionality is as follows:

  • WordPress pages are created for all Plone Articles. For folderish items the default content item is used or – if no item present – an empty page with a user-defined text is created.
  • WordPress blog posts are created for all Plone NewsItems. Categories are defined according to the tags of the plone newsitem.
  • For all blobs i.e. for file items and images files are uploaded and corresponding landing page with a link to the uploaded files are generated. The creation of landing pages can be enabled or disabled via an option.

All content items are created with the correct creation, publication and modification data. If wanted, a mapping table between plone URL and wordpress URL (and id) is written during import in CSV format . This file can be used to define redirections using e.g the redirection plugin for wordpress.

The script is available for download:

python import script

Please configure the script according to your needs. The configuration options are at the top of the script.

Next Steps

With the data imported into wordpress the final step of the migration from plone to wordpress  means to fix all things that need fixing. It is time to wrap things up.

Plone to WordPress: Exporting Data from Plone

This is Part 2 of the the documentation of my migration from plone to wordpress. Read the previous post for my reasons to choose wordpress as CMS of choice.

As there is no direct path for a migration from Plone to some of the PHP-based content management systems I choose to follow a route which I had previously used quite sucessfully for other problems, namely to use an intermediate representation.

Script-based Export to JSON Data

Out of the box plone supports various methods for export either directly or via third-party products. From my quick research it seems as if many of these approaches do not really apply to the current plone 4 or require a substantive amount of plone developers knowledge which I don’t have. Other methods exported the data to XML with quite complex data structures.

What I found – though – are various approaches that export plone data to JSON files.There is

  1. a post Exporting Plone content as JSON which  presents a nice and simple python script for JSON export,
  2. collective.jsonify which up to just recently did not support blob fields,
  3. collective.jsonmigrator which is a tool for migration from plone 2.x sites to plone 4.0.
  4. collective.blueprint.jsonmigrator which publishes a set of blueprints for the migration from plone 2.x to plone 4.0

For my purpose I used the blueprint export script plone2.0_export.py from collective.blueprint.jsonmigrator as starting point.  In a series of iterations I modified this script such that I had an export run without an error. As a result for every data object (articles, pages, images, files) in plone I get a separate json file. For objects with binary data an additional base64-encoded data file is created, too.

Limitation?

Yes, there are . All items that are non-content objects in plone (users, css module, etc.) are skipped. Non-ascii characters anywhere expect the main body text are not supported. Workflow states and security permissions are disabled and hence not exported in my version.

You can get my version of the export script here: plone_export.zip

Copy the python file to your plone installation and add an external method pointing to this script’s export_plone20 function.

FTP Access

In addition to the JSON export I used the FTP access to the plone instance to get a direct copy of all data as present in the instance. This gives the files without any content information like title, publication date etc. yet with their file names as given in plone. For articles or blog entries text files are returned which contain title, teaser text, body text and author, publication date, etc.

For artictles written in restructured text the retrieved text files contained this restructured text and not the corresponding (rendered) representation. Not that helpful.

HTTP Mirroring

Finally, I used the excellent wget tool to also get a direct export of the complete website as visibile via http. This returned not only the unchanged image files as via ftp access but also the automatically generated variants for thumbnails, preview or fullscreen perspective.

Next Steps

With the data exported the next steps of the migration from plone to wordpress are:

  • Importing data into wordpress
  • and wrapping everything up.

Migrating from Plone to WordPress

Over the last 7 years incunabulum.de was using Plone as CMS of choice. But not anymore. Over the last couple of weeks I migrated the website to WordPress.

Why?

Well, first let me stress that I never encountered any serious problems with my Plone site. Yet, over the last couple of years Plone has become more and more complex with a rapid succession of new technologis, complex xml-based configurations and layers upon layers of abstraction. All this added up to an application stack which is very difficult to understand and maintain. Starting with Plone 3 upgrades have become frustating and error prone. The constant introduction of new technologies are only manageable with continuous learning and with continuous development work. Time which I do not want to invest.

Early this year I finally had a site which was running flawlessly yet which I could not update due to my limited knowledge. At the same time I could not reinstall the site in the current configuration as certain packages where not available any more. This gave me a system which I neither could restore in case of a server crash nor which I could maintain for the future in one way or the other. Therefore, it was time to move on. The results you can see here.

The migration from Plone to WordPress was a four step process, namely:

    1. Selecting WordPress as CMS
    2. Exporting the data from Plone
    3. Importing the data to Worpress
    4. Fixing things
    5. Addon; Redirections in Worpress for Plone URLs

The general migration process will be outlined here including all scripts developed for this purpose.

Duplicate Virtual Folders in Thunderbird

Personal note: If you encounter magically duplicating virtual folders in thunderbird check whether you there are any blanks or other Non-ASCII characters in your folder name. If this is the case switch the name to standard ASCIII characters and no blanks.

Explanation: In case of blanks or other Non-ASCII characters upon restart thunderbird escapes these characters in various ways leading to said duplicate entries. You can find the various entries in the virtualforlders.dat file and the corresponding msf-files in your IMAP mailbox. Deleting these entries solves the problem for a moment. Yet, unfortunately only up to the next restart of thunderbird.

Inbox Zero in Outlook

One might like Microsofts range of products or not, at work the mail system is Outlook. Perdiod And combined with Exchange this gives some quite capable system for collaborative work. But as always there is room for improvement. So, in the following my customizations to Microsoft Outlook 2013 for Inbox Zero are shown.

All mail in one single folder.

This is the normal situation which many people work with. As a result, such a single folder contains mails that are read or unread. Mails which are there just for archive purposes, mails that act as reminder for something and finally mails which need to be acted upon or which need to be answered.

Why Inbox Zero?

For me such a system only works well if the daily number of messages is low. Yet, with a busy work schedule and lots of mails as I encounter from time to time, this is seldom the case.

If the number of mails increases or if mails often need to be dealt with in a longer time frame – e. g. an answer needs to be sent next month – I quickly need to rescan hundreds of messages and hunt for the mails which are still important. This is time consuming and annoying.

Enter Inbox Zero. Inbox Zero is an action-based concept with the objective to deal with all mails once. And only once. Ever. All mails that can neither be deleted nor be dealt with are either defered and saved as action items or are delegated. So much about the basic idea

My Interpretation of Inbox Zero

As with any theory, there is more than one way of implementation. Over time and inspired by the very clever Defer feature available in the ancient FIDO software Crosspoint my personal take at Inbox Zero developed into the following:

  • I do not treat the Inbox as a physical folder but use Filtered Views or Smart Search Folders.
  • The actual Inbox View is periodically reviewed. Upon review:
    1. All newly arrived and hence unread mails are briefly read.
    2. All read mails that do not require any action or that are for information purposes only are moved elsewhere.
    3. All read mails that only require a quick answer or that can be dealt with in a couple of minutes are processed right away and are then moved elsewhere.
    4. All read mails that I will deal with today or that provide information which I will need to access today again are left in the Inbox.
    5. All read mails that I will deal with later on are marked as deferred with an appropriate due date.

As my primary Inbox View is configured such that only mails are shown that are

  • (unread)
  • or (read and not deferred)
  • or deferred and (due or overdue) and (not finished),

I always see only the information which is of importance to me right now. Deferred mails come back into view at due time and can then be processed or be deferred again. Completed mails are simply flagged as completed and are no longer shown.

Read mails left in the Inbox for easy reference need to be moved once this reference need is no longer given or might be postponed.Periodically, all completed mails which are still present in the Inbox yet are not shown in the Inbox View are moved to the correct locations.

For me, this approach works even under heavy mail load. It drastically reduces the clutter in the Inbox and gives a nice overview over all relevant and required actions. The maintenance tasks, i.e. archiving of completed mail actions and moving reference mails, are either easily done or are done periodically in burst mode.

Inbox Zero in Outlook

So, how does this work in Outlook? Basically, I do use a custom view to dynamically select only the mail items relevant for me now. For this rather complex query the Outlook SQL DSAL notation is the only option I am aware of.

After lots of experimenting I use a query as follows:

 (
  "urn:schemas:httpmail:read" = 0
  OR "urn:schemas:httpmail:messageflag" IS NULL 
  OR NOT "urn:schemas:httpmail:messageflag" > 0 
  OR ("urn:schemas:httpmail:reply-by" <= 'morgen' AND NOT "urn:schemas:httpmail:reply-by" IS NULL)
  OR ("urn:schemas:httpmail:messageflag" > 0 AND "urn:schemas:httpmail:reply-by" IS NULL)
 ) AND (
  "http://schemas.microsoft.com/mapi/proptag/0x10900003" IS NULL 
  OR NOT "http://schemas.microsoft.com/mapi/proptag/0x10900003" < 2
 )

Here, we collect or join all mails via the SQL OR statement that are 1) unread, 2) without a message flag, 3) where the flag was set and deleted, 4) have a flag or category yet no due date. Of all these mails, only (AND statement) mails are shown that A) do not have a due date, B) are not marked as done or completed. For the last checks the inoffcial proptag 0x10900003 is queried.

Voila. Save and assign to a custom folder represation and you are ready to doe Inbox Zero the incunabulum way.

Endnote

For the assignment of due dates, categories etc. another blog post will follow.

And – btw – one of the first steps on any new computer is to disable the new mail notification actions. For me, email is an asynchronous means of communication where I decide – based on my current tasks and mood – when and how often I want to check my mail. For urgent matters, there is always the phone or personal communication.