Skip to content

Feature/#117 tokenize tei

Michelle Weidling requested to merge feature/#117-tokenize-tei into develop



This MR provides an in-memory tokenization of all relevant text nodes. Each word is wrapped in a tei:seg with a unique ID which reflects its node identity.

This tokenization is applied both before the HTML and the annotation creation so that words in the text panel can reference entities of the AnnotationAPI and vice versa.

Compliance to “Definition of Done”

  • Unit tests passed
  • Code reviewed
  • Product Owner accepts the User Story


  • I provided my functions with appropriate documentation


Are we able to test this new feature?

  • Yes, everything can be done via unit tests.
  • Yes, you can test by following these steps:
    • build the repo locally
    • navigate to ${IP}/exist/restxq/textapi/ahikar/content/sample_teixml-82a.html
    • investigate the HTML with your developer tools. Each relevant word is wrapped in a separate xhtml:seg with an ID.


  • I added a statement to the CHANGELOG.

Version number

  • I bumped the version number in


Closes #117 (closed).

Logs and Screenshots

/cc Mathias Göbel, Frank Schneider, Michelle Weidling

Edited by Michelle Weidling

Merge request reports