From ab263aa0b54885b04813b6026bf5cd8680a26df2 Mon Sep 17 00:00:00 2001
From: Christian Boulanger <boulanger@lhlt.mpg.de>
Date: Mon, 30 Sep 2024 11:15:58 +0200
Subject: [PATCH] Update documentation

---
 convert-anystyle-data/anystyle-to-tei.ipynb | 25 +++++++++++++--------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/convert-anystyle-data/anystyle-to-tei.ipynb b/convert-anystyle-data/anystyle-to-tei.ipynb
index bc825fc..5297f32 100644
--- a/convert-anystyle-data/anystyle-to-tei.ipynb
+++ b/convert-anystyle-data/anystyle-to-tei.ipynb
@@ -3,7 +3,7 @@
   {
    "cell_type": "markdown",
    "source": [
-    "# Convert AnyStyle GS to TEI (`<bibl>`/`<biblStruct>`) GS \n",
+    "# Convert AnyStyle to TEI-bibl data \n",
     "\n",
     "References: \n",
     "- https://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#COBI (Overview)\n",
@@ -14,19 +14,28 @@
     "- https://grobid.readthedocs.io/en/latest/training/Bibliographical-references/ (Grobid examples using `<bibl>`)\n",
     "\n",
     "\n",
-    "We use `<bibl>` here instead of `<biblStruct>` because it is more loosely-structured and allows for a more flat datastructure. \n",
+    "We use `<bibl>` here for marking up the citation data. These annotations can then be further processed:\n",
+    "- [to Gold Standard based on `<biblStruct>`](tei-to-biblstruct-gs.ipynb)\n",
+    "- [to bibliographic data formats](tei-to-bibformats.ipynb)\n",
+    "- [to the prodigy annotation format](tei-to-prodigy.ipynb)\n",
     "\n",
-    "Todo:\n",
-    "- BiblStruct mit der Ã¼bergeordneten <listBibl n=\"fuÃŸnote\" src=\"Input\">\n",
-    "\n",
-    "\n",
-    "## Collect metadata on TEI `<bibl>` tags"
+    "Code was written with assistance by ChatGPT 4. "
    ],
    "metadata": {
     "collapsed": false
    },
    "id": "4c77ab592c98dfd"
   },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Collect metadata on TEI `<bibl>` tags"
+   ],
+   "metadata": {
+    "collapsed": false
+   },
+   "id": "dd3645db958007fe"
+  },
   {
    "cell_type": "markdown",
    "source": [
@@ -79,8 +88,6 @@
     "import re\n",
     "from tqdm.notebook import tqdm\n",
     "\n",
-    "\n",
-    "# written by GPT-4\n",
     "def extract_headings_and_links(tag, doc_heading, doc_base_url):\n",
     "    # Extract heading numbers from the document\n",
     "    heading_numbers = re.findall(r'\\d+(?:\\.\\d+)*', doc_heading)\n",
-- 
GitLab