Skip to content

esutils: json for "custom" field may contain unwanted entries

the textgrid-metadata field may contain lots of entries not relevant for tgsearch, e.g. https://textgridlab.org/1.0/tgcrud-public/rest/textgrid:3b6d5.0/metadata

these a are transformed to json and stored in elasticsearch, which creates json like

                          "tei:item": [
                            {
                              "": "die Verwendung für einen anderen als den unter Nr. 1 genannten Zweck, insbesondere die Nutzung zu kommerziellen Zwecken,",
                              "n": "3.1"
                            },
                            {
                              "": "das Anfertigen von Kopien oder Vervielfältigungen jeder Art und deren Weitergabe.",
                              "n": "3.2"
                            }
                          ]

this is a problem for reindexing in elasticsearch > 7, as empty field names are not allowed:

  "failures": [
    {
      "index": "textgrid-public-1687428527",
      "type": "_doc",
      "id": "3b6f6.0",
      "cause": {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse",
        "caused_by": {
          "type": "illegal_argument_exception",
          "reason": "field name cannot be an empty string"
        }
      },
      "status": 400
    },
    {
      "index": "textgrid-public-1687428527",
      "type": "_doc",
      "id": "3b6d5.0",
      "cause": {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse",
        "caused_by": {
          "type": "illegal_argument_exception",
          "reason": "field name cannot be an empty string"
        }
      },
      "status": 400
    }
  ]

we will just remove generation of json from custom field in esutils, as these entries are not searchable by tgsearch right now