Feature

Summary

This MR provides an in-memory tokenization of all relevant text nodes. Each word is wrapped in a tei:seg with a unique ID which reflects its node identity.

This tokenization is applied both before the HTML and the annotation creation so that words in the text panel can reference entities of the AnnotationAPI and vice versa.

Compliance to “Definition of Done”

Unit tests passed
Code reviewed
Product Owner accepts the User Story

Documentation

I provided my functions with appropriate documentation

Tests

Are we able to test this new feature?

Yes, everything can be done via unit tests.
Yes, you can test by following these steps:
- build the repo locally
- navigate to ${IP}/exist/restxq/textapi/ahikar/content/sample_teixml-82a.html
- investigate the HTML with your developer tools. Each relevant word is wrapped in a separate xhtml:seg with an ID.

Changelog

I added a statement to the CHANGELOG.

Version number

I bumped the version number in build.properties.

Closes

Closes #117 (closed).

Logs and Screenshots

/cc Mathias Göbel, Frank Schneider, Michelle Weidling

Edited Feb 19, 2021 by Michelle Weidling

Feature/#117 tokenize tei