syntax.en.md 3.86 KB
Newer Older
1
2

# Searching in the TextGrid Repository
Ubbo Veentjer's avatar
Ubbo Veentjer committed
3

4
5
6
7
8
## The syntax of Apache Lucene

The TextGrid Repository uses the syntax of Apache Lucene 2.9.4. The mechanism of the syntax will be explained in the following paragraphs. The demonstration largely follows the summary on the [website of Apache Lucene](https://lucene.apache.org/core/5_1_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description).

Suchanfragen werden in dieser Syntax mit Suchbegriffen und sogenannten Operatoren gebildet. Suchbegriffe können über Felder näher bestimmt werden.
Ubbo Veentjer's avatar
Ubbo Veentjer committed
9
10
11


## Search terms and search fields
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

It is possible to search for single words as well as phrases that are shown to belong together via quotation marks, e.g. “TextGrid Repository”. Search terms can be limited to certain fields:

    field-name:search-term

or

    field-name:"multipart phrase"

These are the different fields of TextGrid:

* “title” for the title of the work
* „edition.agent.value“ for the author
* „language“ for the language of the work
* „notes” for notes of the text
* „genre“ for the genre
* „rightsHolder“ for the rights holder of the digital version of the text
* „work.dateOfCreation.date“ ,  „work.dateOfCreation.notBefore“ and „work.dateOfCreation.noAfter“ for dates of the work

The “Advanced Search” offers the possibility to choose the fields to search in the meta data directly and to connect them with operators for search queries.

Search queries can be altered in different ways. There are place holders, options for a vague search, specifying distances between words, searching in a defined range and appointing different relevance scales to search terms.

* **Place holders:** For single words ? replaces one character, and * stands for any number of characters. E.g. `Text?rid` or `*xtgrid`.
* **Vague search:** Adding a ~ to the word results in a vagueness of the search according to the Levenshtein distance. Following the ~ can be a value between 0 and 1. The closer the value is to 1, the higher the demanded resemblance. The standard value is 0.5.
* **Distances:** When searching for phrases, adding a ~ and a number after the phrase specifies the distance between the single words within the phrase. E.g. `"TextGrid Repository"~10`. The number stands for how many words can lie between the words. The “Advanced Search” gives the option to directly enter the number in the searching mask.
* **Ranges:** When connecting two search values with a “TO”, all values between them are found within the field. This applies to numerical values as well as words. For words the alphabetical order counts. Searches including the given search values are written within [], while searches excluding them are written within {}. E.g. `edition.agent.value:[Aristophanes TO Zuckmayer]` searches for all author names between “Aristophanes” and “Zuckmayer” including those names.
* **Relevance:** By adding a ^ and a number after a search term or phrase, they can be marked as more relevant, e.g. `TextGrid^5 Repository`. The standard value is 1.

Some characters must be masked with a \ : `+ - && || ! ( ) { } [ ] ^ " ~ * ? : \.`
Operators

Lucene uses Logical connectives to combine search terms and phrases. The standard value is OR, which is equal to ||. Logical connectives must be written in capital letters.

* **AND (equal to &&)**:Texts containing all of the search terms are found
* **+:** The following search term must be contained in the text
* **NOT (equal to ! or -):**The following search term must not be in the text. Using this at the beginning of the search query can slow down the searching process.

Lucene supports bracketing for the combination of logical connectives, e.g. `TextGrid AND (Laboratory OR Repository)` finds all texts that contain the word “TextGrid”, as well as the word “Laboratory” or “Repository”. This mechanism can be used with fields as well.