Skip to content
Snippets Groups Projects

Conversion of AnyStyle training data to other formats

This subrepo contains code to convert the existing training data in the AnyStyle formats (XML, TTX) into other formats that can be used with other tools like prodigy or which are more standardized (such as LinkML)

Note: The automatic generation of a LinkML schema from the converted JSONL files using the schema-automator tool introduces a huge dependency tree - use a virtual environment to avoid cluttering your python installation.

Content of directories:

  • in: AnyStyle Ground Truth for document-level (ttx) and footnote-level (xml) reference information
  • jsonl: AnyStyle footnote GT converted to a JSONL objects with "in" (Complete footnote as a string) and "out" (Structured data) fields
  • json: json files containing a flat list of objects with the structured data of the references in the footnotes
  • schema: LinkML schema, autogenerated from the json files, not yet annotated.

Resources