## Recreate input data from TEI/bibl and compare with AnyStyle input data
## Recreate input data from TEI/bibl and compare with AnyStyle input data
To see how much information is lost or which errors are introduced in the translation of Anystyle to TEI, we compare the input data generated from the (lossless) anystyle markup with that "reverse-engineered" from the TEI and save a character-level diff in the `html` directory.
To see how much information is lost or which errors are introduced in the translation of Anystyle to TEI, we compare the input data generated from the (lossless) anystyle markup with that "reverse-engineered" from the TEI and save a character-level diff in the `html` directory.
The comparison is done with a copy of the files stored in `./tei-bibl-corrected` so that they are not overwritten when running the previous cell, and so that they can be manually corrected to fit the original data.
The comparison is done with a copy of the files stored in `./tei-bibl-corrected` so that they are not overwritten when running the previous cell, and so that they can be manually corrected to fit the original data.
For better viewing, the result is published on gitlab pages (see links in the output).
For better viewing, the result is published on gitlab pages (see links in the output).
%% Cell type:code id:4c19609699dc79c tags:
%% Cell type:code id:4c19609699dc79c tags:
``` python
``` python
fromlxmlimportetree
fromlxmlimportetree
importglob
importglob
importos
importos
importjson
importjson
importregexasre
importregexasre
fromlib.stringimportremove_whitespace
fromlib.stringimportremove_whitespace
fromdifflibimportHtmlDiff
fromdifflibimportHtmlDiff
fromIPython.displayimportdisplay,HTML,Markdown
fromIPython.displayimportdisplay,Markdown
deftei_to_ground_truth_input(tei_xml_doc):
deftei_to_ground_truth_input(tei_xml_doc):
"""
"""
Extract the original footnote strings from the <note> elements in a given TEI document and return a list of strings
Extract the original footnote strings from the <note> elements in a given TEI document and return a list of strings
"""
"""
root=etree.fromstring(tei_xml_doc)
root=etree.fromstring(tei_xml_doc)
ground_truth_list=[]
ground_truth_list=[]
ns={"tei":"http://www.tei-c.org/ns/1.0"}
ns={"tei":"http://www.tei-c.org/ns/1.0"}
# iterate over the <note type="footnote"> elements
# iterate over the <note type="footnote"> elements
display(Markdown(f'Extracted and compared input data for {id} ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/{id}.diff.html))'))
display(Markdown(f'Extracted and compared input data for {id} ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/{id}.diff.html))'))
```
```
%% Output
%% Output
Extracted and compared input data for 10.1111_1467-6478.00057 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1111_1467-6478.00057.diff.html))
Extracted and compared input data for 10.1111_1467-6478.00057 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1111_1467-6478.00057.diff.html))
Extracted and compared input data for 10.1111_1467-6478.00080 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1111_1467-6478.00080.diff.html))
Extracted and compared input data for 10.1111_1467-6478.00080 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1111_1467-6478.00080.diff.html))
Extracted and compared input data for 10.1515_zfrs-1980-0103 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1515_zfrs-1980-0103.diff.html))
Extracted and compared input data for 10.1515_zfrs-1980-0103 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1515_zfrs-1980-0103.diff.html))
Extracted and compared input data for 10.1515_zfrs-1980-0104 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1515_zfrs-1980-0104.diff.html))
Extracted and compared input data for 10.1515_zfrs-1980-0104 ([See diff](https://experiments-boulanger-27b5c1c5c975b0350675064f0f85580e618945eef.pages.gwdg.de/convert-anystyle-data/diffs/10.1515_zfrs-1980-0104.diff.html))