data import: improve performance

following #193 (closed)

we speed up the import process by establishing a cache for the preprocessed documents. this will also add some convenience for the developers, as they can have a look at the intermediate format.

Edited by Mathias Goebel