Add scene segmentation model
The Shared Task on Scene Segmentation defines "scene segmentation" as the segmentation of a (narrative) text into contiguous and non-overlapping scenes resp. non-scenes. In short, a scene is "a segment of a text where the story time and the discourse time are more or less equal, the narration focuses on one action and space and character constellations stay the same", whereas a non-scene refers to a non-scenic bridge between scenes like reflections of the narrator or accelerated speed of narration. Several NLP models have been submitted to the shared task.
The associated merge request adds the implementation/model that is being used in the NLP pipeline LLpro (paper): a BERT-large model, domain-adapted on literary text, followed by fine-tuning on the scene segmentation task (aehrm/stss-scene-segmenter). The architecture is based on a system by Kurfalı and Wirén, which participated in the above mentioned Shared Task on Scene Segmentation.