Add secondary speech tagger implementation
The NLP pipeline LLpro (paper) contains an ensemble of STWR taggers aehrm/redewiedergabe-direct
, -indirect
, ... These are similar to the original models by the REDEWIEDERGABE project (https://github.com/redewiedergabe/tagger, Brunner et al.) but based on the larger lkonle/fiction-gbert-large
.
The associated merge request
- refactors the code of the current flair-based implementation to GenericFlairSpeechTagger such that it can take any ensemble of SequenceTaggers,
- defines flair_speech_tagger_large as single spacy component that essentially initializes a GenericFlairSpeechTagger with the large model variants as params.
In order to support loading both kinds of models (i.e. the original one, and the larger one), the transformers package version has to be pinned to <=4.30.2
due to incompatibilities between flair<=0.12.2
and newer transformer versions, cf. https://github.com/flairNLP/flair/issues/3284.