Feature/#117 tokenize tei
Feature
Summary
This MR provides an in-memory tokenization of all relevant text nodes. Each word is wrapped in a tei:seg
with a unique ID which reflects its node identity.
This tokenization is applied both before the HTML and the annotation creation so that words in the text panel can reference entities of the AnnotationAPI and vice versa.
Compliance to “Definition of Done”
-
Unit tests passed -
Code reviewed -
Product Owner accepts the User Story
Documentation
-
I provided my functions with appropriate documentation
Tests
Are we able to test this new feature?
-
Yes, everything can be done via unit tests. -
Yes, you can test by following these steps: - build the repo locally
- navigate to
${IP}/exist/restxq/textapi/ahikar/content/sample_teixml-82a.html
- investigate the HTML with your developer tools. Each relevant word is wrapped in a separate
xhtml:seg
with an ID.
Changelog
-
I added a statement to the CHANGELOG.
Version number
-
I bumped the version number in build.properties
.
Closes
Closes #117 (closed).
Logs and Screenshots
Edited by Michelle Weidling