Skip to content
Snippets Groups Projects
Commit c81b998f authored by Christian Boulanger's avatar Christian Boulanger
Browse files

added references

parent 4365bdaa
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:4c77ab592c98dfd tags:
# Conversion to a simplified TEI's <bibl> structure
# Conversion to a simplified TEI's <bibl> element
References:
- https://vault.tei-c.de/P5/3.0.0/doc/tei-p5-doc/en/html/CO.html#COBI
- https://vault.tei-c.de/P5/3.0.0/doc/tei-p5-doc/en/html/ref-bibl.html
- https://vault.tei-c.de/P5/3.0.0/doc/tei-p5-doc/en/html/ref-biblStruct.html
We use <bibl> here instead of <biblStruct> because it is more loosely-structured and allows for a more flat datastructure.
## Convert TEI XSD to LinkML (unfinished)
%% Cell type:code id:ff140f40df428a8f tags:
``` python
import xmlschema
import os
# cache for local use
if not os.path.isdir("schema/tei"):
schema = xmlschema.XMLSchema("https://www.tei-c.org/release/xml/tei/custom/schema/xsd/tei_all.xsd")
schema.export(target='schema/tei', save_remote=True)
schema = xmlschema.XMLSchema("schema/tei/tei_all.xsd")
```
%% Cell type:code id:572f566fc9784238 tags:
``` python
import xml.etree.ElementTree as ET
tree = ET.parse('schema/tei/tei_all.xsd')
root = tree.getroot()
```
%% Output
<Element '{http://www.w3.org/2001/XMLSchema}schema' at 0x00000201CFBCABB0>
%% Cell type:code id:8065a4946474e2fc tags:
``` python
import pandas as pd
namespaces = {'xs':'http://www.w3.org/2001/XMLSchema'}
bibl_schema = schema.find("tei:bibl")
data_list = []
for child_element in bibl_schema.iterchildren():
name = child_element.local_name
doc_node = root.find(f".//xs:element[@name='{name}']/xs:annotation/xs:documentation",namespaces=namespaces)
if doc_node is not None:
data_list.append({'name': name, 'documentation': doc_node.text})
df = pd.DataFrame(data_list)
df
```
%% Output
name documentation
0 g (character or glyph) represents a glyph, or a ...
1 hi (highlighted) marks a word or phrase as graphi...
2 q (quoted) contains material which is distinguis...
3 foreign (foreign) identifies a word or phrase as belon...
4 emph (emphasized) marks words or phrases which are ...
.. ... ...
152 writing (writing) contains a passage of written text r...
153 shift (shift) marks the point at which some paraling...
154 metamark contains or describes any kind of graphic or w...
155 notatedMusic encodes the presence of music notation in a te...
156 figure (figure) groups elements representing or conta...
[157 rows x 2 columns]
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment