Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
E
experiments
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Monitor
Service Desk
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Christian Boulanger
experiments
Commits
c81b998f
Commit
c81b998f
authored
8 months ago
by
Christian Boulanger
Browse files
Options
Downloads
Patches
Plain Diff
added references
parent
4365bdaa
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
convert-anystyle-data/anystyle-to-tei.ipynb
+8
-1
8 additions, 1 deletion
convert-anystyle-data/anystyle-to-tei.ipynb
with
8 additions
and
1 deletion
convert-anystyle-data/anystyle-to-tei.ipynb
+
8
−
1
View file @
c81b998f
...
...
@@ -3,7 +3,14 @@
{
"cell_type": "markdown",
"source": [
"# Conversion to a simplified TEI's <bibl> structure\n",
"# Conversion to a simplified TEI's <bibl> element\n",
"\n",
"References: \n",
"- https://vault.tei-c.de/P5/3.0.0/doc/tei-p5-doc/en/html/CO.html#COBI\n",
"- https://vault.tei-c.de/P5/3.0.0/doc/tei-p5-doc/en/html/ref-bibl.html\n",
"- https://vault.tei-c.de/P5/3.0.0/doc/tei-p5-doc/en/html/ref-biblStruct.html\n",
"\n",
"We use <bibl> here instead of <biblStruct> because it is more loosely-structured and allows for a more flat datastructure. \n",
"\n",
"## Convert TEI XSD to LinkML (unfinished)"
],
...
...
%% Cell type:markdown id:4c77ab592c98dfd tags:
# Conversion to a simplified TEI's <bibl> structure
# Conversion to a simplified TEI's <bibl> element
References:
-
https://vault.tei-c.de/P5/3.0.0/doc/tei-p5-doc/en/html/CO.html#COBI
-
https://vault.tei-c.de/P5/3.0.0/doc/tei-p5-doc/en/html/ref-bibl.html
-
https://vault.tei-c.de/P5/3.0.0/doc/tei-p5-doc/en/html/ref-biblStruct.html
We use
<bibl>
here instead of
<biblStruct>
because it is more loosely-structured and allows for a more flat datastructure.
## Convert TEI XSD to LinkML (unfinished)
%% Cell type:code id:ff140f40df428a8f tags:
```
python
import
xmlschema
import
os
# cache for local use
if
not
os
.
path
.
isdir
(
"
schema/tei
"
):
schema
=
xmlschema
.
XMLSchema
(
"
https://www.tei-c.org/release/xml/tei/custom/schema/xsd/tei_all.xsd
"
)
schema
.
export
(
target
=
'
schema/tei
'
,
save_remote
=
True
)
schema
=
xmlschema
.
XMLSchema
(
"
schema/tei/tei_all.xsd
"
)
```
%% Cell type:code id:572f566fc9784238 tags:
```
python
import
xml.etree.ElementTree
as
ET
tree
=
ET
.
parse
(
'
schema/tei/tei_all.xsd
'
)
root
=
tree
.
getroot
()
```
%% Output
<Element
'{
http:
//
www.w3.org
/2001/
XMLSchema
}
schema
'
at
0
x00000201CFBCABB0
>
%% Cell type:code id:8065a4946474e2fc tags:
```
python
import
pandas
as
pd
namespaces
=
{
'
xs
'
:
'
http://www.w3.org/2001/XMLSchema
'
}
bibl_schema
=
schema
.
find
(
"
tei:bibl
"
)
data_list
=
[]
for
child_element
in
bibl_schema
.
iterchildren
():
name
=
child_element
.
local_name
doc_node
=
root
.
find
(
f
"
.//xs:element[@name=
'
{
name
}
'
]/xs:annotation/xs:documentation
"
,
namespaces
=
namespaces
)
if
doc_node
is
not
None
:
data_list
.
append
({
'
name
'
:
name
,
'
documentation
'
:
doc_node
.
text
})
df
=
pd
.
DataFrame
(
data_list
)
df
```
%% Output
name documentation
0 g (character or glyph) represents a glyph, or a ...
1 hi (highlighted) marks a word or phrase as graphi...
2 q (quoted) contains material which is distinguis...
3 foreign (foreign) identifies a word or phrase as belon...
4 emph (emphasized) marks words or phrases which are ...
.. ... ...
152 writing (writing) contains a passage of written text r...
153 shift (shift) marks the point at which some paraling...
154 metamark contains or describes any kind of graphic or w...
155 notatedMusic encodes the presence of music notation in a te...
156 figure (figure) groups elements representing or conta...
[157 rows x 2 columns]
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment