## Recreate input data from TEI/bibl and compare with AnyStyle input data
To see how much information is lost or which errors are introduced in the translation of Anystyle to TEI, we compare the input data generated from the (lossless) anystyle markup with that "reverse-engineered" from the TEI and save a character-level diff in the `html` directory.
The comparison is done with a copy of the files stored in `./tei-bibl-corrected` so that they are not overwritten when running the previous cell, and so that they can be manually corrected to fit the original data.
For better viewing, the result is published on gitlab pages (see links in the output).
%% Cell type:code id:4c19609699dc79c tags:
``` python
fromlxmlimportetree
importglob
importos
importjson
importregexasre
fromlib.stringimportremove_whitespace
fromdifflibimportHtmlDiff
fromIPython.displayimportdisplay,HTML,Markdown
deftei_to_ground_truth_input(tei_xml_doc):
"""
Extract the original footnote strings from the <note> elements in a given TEI document and return a list of strings
"""
root=etree.fromstring(tei_xml_doc)
ground_truth_list=[]
ns={"tei":"http://www.tei-c.org/ns/1.0"}
# iterate over the <note type="footnote"> elements
display(Markdown(f'Extracted and compared input data for {id} ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/{id}.diff.html))'))
```
%% Output
Extracted and compared input data for 10.1111_1467-6478.00057 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1111_1467-6478.00057.diff.html))
Extracted and compared input data for 10.1111_1467-6478.00080 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1111_1467-6478.00080.diff.html))
Extracted and compared input data for 10.1515_zfrs-1980-0103 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1515_zfrs-1980-0103.diff.html))
Extracted and compared input data for 10.1515_zfrs-1980-0104 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1515_zfrs-1980-0104.diff.html))