Commit 996cdd5f authored by Dennis Neumann's avatar Dennis Neumann
Browse files

Add documentation

parent 81df187c
......@@ -2,17 +2,28 @@
<!--
This script produces Solr XML documents.
This script produces two types of Solr XML documents, representing letters and individual pages of those letters.
The letter documents contain many metadata fields and two text fields. The metadata fields are
values that are copied from the TEI files. The text fields are described next.
Field 'fulltext'
This is just plain text copied from the complete TEI file. Line breaks and page beginnings are
transformed to whitespace. This field can be useful for searching or for generating highlighting snippets.
Field 'fulltext_html'
This field contains the HTML representation of the text of a TEI document (e. g. a Goethe letter).
The Goethe letters are composed of different parts, for example 'opener', 'closer', 'salute'.
This field contains the HTML representation of the text of a TEI file (e. g. a Goethe letter).
A letter that consists of several pages is split on-the-fly into those pages.
However, the pages are all kept in this one field. The templates that are used here are the same
as the ones for the individual page documents (see below).
The letters are composed of different parts, for example 'opener', 'closer', 'salute'.
All those parts are represented here as <div>'s with the corresponding CSS classes.
The frontend viewer must decide how to format those parts and present them to the user.
Also, the original TEI files contain mark-up for many in-text parts, like dates, names, underlined words, etc.
The original TEI files contain mark-up for many in-text parts, like dates, names, underlined words, etc.
Most of these are also transformed to <div>'s with their own CSS classes.
Although the in-text parts are by nature inline elements, we use here <div>'s and not <span>'s.
The reason is that Solr seems to have problems when highlighting fields that contain <span>'s
......@@ -29,6 +40,12 @@ The text of such TEI elements is enclosed in HTML elements of class 'unknown-ele
Furthermore, a warning message is generated that contains data of the first occurrence of such a new element.
The second kind of documents that are produced are page documents.
The resulting pages are in the HTML format.
As the TEI file is processed, the TEI XML structure is split into pages using
the page beginning elements (<pb/>).
Refer to comment in the code to understand the used algorithm.
-->
......@@ -239,8 +256,15 @@ Furthermore, a warning message is generated that contains data of the first occu
<!-- %%%%%%%%%%%%% page splitting and HTML generating %%%%%%%%%%%%%%%%%%%%%%% -->
<!-- Here we start the page splitting algorithm.
In general, it groups all elements between two <pb/>'s and creates one page for each such group.
Note: We use a trick with a tunnel parameter. The templates that are run from here must check
if they really should execute or not using that tunnel parameter. -->
<xsl:template match="text" mode="page_splitting">
<!-- We keep the parent of all pages (-> super-parent). -->
<xsl:variable name="context" select="." />
<!-- We make groups of only 'small' nodes, like text and <pb/>.
We need this granularity, because a <pb/> can separate two text nodes to different pages. -->
<xsl:for-each-group select="descendant::node()[not(node())]" group-starting-with="pb">
<xsl:if test="self::pb">
<doc>
......@@ -263,7 +287,13 @@ Furthermore, a warning message is generated that contains data of the first occu
<xsl:call-template name="page-beginning-with-possible-link">
<xsl:with-param name="current-pb" select="." />
</xsl:call-template>
<!-- We run the templates of all the super-parent's children... -->
<xsl:apply-templates select="$context/*" mode="page_splitting">
<!-- ..., but we use a trick with a tunnel parameter to restrict the actually executed templates
to the ones relevant for the current page.
We choose the grouped nodes, as well as all their ancestors.
We need all the ancestors to construct correctly opened and closed elements
that otherwise would be cut in two by the <pb/>. -->
<xsl:with-param name="restricted-to" select="current-group()/ancestor-or-self::node()" tunnel="yes" />
</xsl:apply-templates>
</div>
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment