OpenAI's GPT-4 works perfectly with a minimal, German-language prompt, and infers the meaning of the columns
OpenAI's GPT-4 works perfectly with a minimal, German-language prompt, and infers the meaning of the columns
to returns the data we need:
to returns the data we need:
```
```
Finde im folgenden Text die Herausgeber, Redaktion/Schriftleitung und Beirat der Zeitschrift '{journal_name}' und gebe sie im CSV-Format zurück mit den Spalten 'lastname', 'firstname', 'title', 'position', 'affiliation','role'. Die Spalte 'role' enthält entweder 'Herausgeber', 'Redaktion', 'Beirat', 'Schriftleitung' oder ist leer wenn nicht bestimmbar. Wenn keine passenden Informationen verfügbar sind, gebe nur den CSV-Header zurück. Setze alle Werte in den CSV-Spalten in Anführungszeichen."
Finde im folgenden Text die Herausgeber, Redaktion/Schriftleitung und Beirat der Zeitschrift '{journal_name}' und gebe sie im CSV-Format zurück mit den Spalten 'lastname', 'firstname', 'title', 'position', 'affiliation','role'. Die Spalte 'role' enthält entweder 'Herausgeber', 'Redaktion', 'Beirat', 'Schriftleitung' oder ist leer wenn nicht bestimmbar. Wenn keine passenden Informationen verfügbar sind, gebe nur den CSV-Header zurück. Setze alle Werte in den CSV-Spalten in Anführungszeichen."
````
````
In contrast, the open models performed miserably with such a prompt. We therefore use English and provide very detailed instructions.
In contrast, the open models performed miserably with such a prompt. We therefore use English and provide very detailed instructions.
%% Cell type:code id:23aef80911796078 tags:
%% Cell type:code id:23aef80911796078 tags:
``` python
``` python
template = """
template = """
In the following German text, which was scraped from a website, find the members of the editorial board or the advisory board of the journal '{journal_name}' as per the following rules:
In the following German text, which was scraped from a website, find the members of the editorial board or the advisory board of the journal '{journal_name}' as per the following rules:
- In German, typical labels for these roles are "Herausgeber", "Redaktion/Redakteur/Schriftleitung" and "Beirat".
- In German, typical labels for these roles are "Herausgeber", "Redaktion/Redakteur/Schriftleitung" and "Beirat".
- Return the data as comma-separated values, which can be saved to a `.csv` file. Put all values in the CSV rows in quotes.
- Return the data as comma-separated values, which can be saved to a `.csv` file. Put all values in the CSV rows in quotes.
- The CSV data must have the columns 'lastname', 'firstname', 'title', 'position', 'affiliation','role'.
- The CSV data must have the columns 'lastname', 'firstname', 'title', 'position', 'affiliation','role'.
- The column 'role' must contain either 'Herausgeber', 'Redaktion', 'Beirat' or is empty. Leave the column empty if you cannot determine the role. Use 'Redaktion' for the "Schriftleitung" role.
- The column 'role' must contain either 'Herausgeber', 'Redaktion', 'Beirat' or is empty. Leave the column empty if you cannot determine the role. Use 'Redaktion' for the "Schriftleitung" role.
- The column 'title' should contain academic titles such as "Dr." or "Prof. Dr."
- The column 'title' should contain academic titles such as "Dr." or "Prof. Dr."
- The column 'position' should contain the job title
- The column 'position' should contain the job title
- The column 'affiliation' contains the institution or organization the person belongs to, or the city if one is mentioned
- The column 'affiliation' contains the institution or organization the person belongs to, or the city if one is mentioned
- If the journal is published ("herausgeben von") by an association, institute or other organization, but its name in the column 'lastname'.
- If the journal is published ("herausgeben von") by an association, institute or other organization, but its name in the column 'lastname'.
- If you cannot find any information, simply return the CSV header.
- If you cannot find any information, simply return the CSV header.
- You must not output any introduction, commentary or explanation such as 'Here is the CSV data for the members of the editorial board or the advisory board of the journal'. Only return the data.
- You must not output any introduction, commentary or explanation such as 'Here is the CSV data for the members of the editorial board or the advisory board of the journal'. Only return the data.
{website_text}
{website_text}
"""
"""
```
```
%% Cell type:markdown id:8f994c771cc9b4ef tags:
%% Cell type:markdown id:8f994c771cc9b4ef tags:
## ChatGPT-4
## ChatGPT-4
GPT-4 delivers an almost perfect [result](data/output/editors-openai-gpt-4.csv). There are some problems left which could be resolved by adding some more instructions to the prompt.
GPT-4 delivers an almost perfect [result](data/output/editors-openai-gpt-4.csv). There are some problems left which could be resolved by adding some more instructions to the prompt.
'Prof. Dr. Bernd von Garmissen, Göttingen Erb, Redaktion, Umwelt',
'Prof. Dr. Bernd von Garmissen, Göttingen Erb, Redaktion, Umwelt',
'Ltdr. Jose Martinez, Georg-August-Universität Göttingen, Göttingen',
'Ltdr. Jose Martinez, Georg-August-Universität Göttingen, Göttingen',
'',
'',
'',
'',
"Note: The column 'Role' contains the following values: 'Herausgeber', 'Redaktion', 'Beirat'"]
"Note: The column 'Role' contains the following values: 'Herausgeber', 'Redaktion', 'Beirat'"]
%% Cell type:markdown id:ca33fb28f6772cbc tags:
%% Cell type:markdown id:ca33fb28f6772cbc tags:
## TheBloke/Llama-2-70B-chat-GPTQ via Huggingface Inference Endpoint
## TheBloke/Llama-2-70B-chat-GPTQ via Huggingface Inference Endpoint
The 70 billion parameter variant [does a bit better](data/output/editors-llama-2-70b-chat-gptq.csv) but, among other things, doesn't the academic titles right. It also cannot be persuaded to [not comment on the CSV output].(data/output/editors-llama-2-70b-chat-gptq.txt). Given that the model costs $13/h to run, that's not really that impressive.
The 70 billion parameter variant [does a bit better](data/output/editors-llama-2-70b-chat-gptq.csv) but, among other things, doesn't the academic titles right. It also cannot be persuaded to [not comment on the CSV output].(data/output/editors-llama-2-70b-chat-gptq.txt). Given that the model costs $13/h to run, that's not really that impressive.
' Here is the CSV data for the members of the editorial board or the advisory board of the journal \'AUR - Agrar- und Umweltrecht\':\n\n"lastname","firstname","title","affiliation","role"\n"Busse","Christian", "Regierungsdirektor", "Bundesministerium für Ernährung und Landwirtschaft, Bonn", "Herausgeber"\n"Endres","Ewald", "Prof. Dr.", "Hochschule Weihenstephan-Triesdorf, Freising", "Redaktion"\n"Francois","Matthias", "Rechtsanwalt", "Bitburg", "Redaktion"\n"Garmissen","Bernd", "Rechtsanwalt", "Göttingen", "Redaktion"\n"Graß","Christiane", "Rechtsanwältin", "Bonn", "Redaktion"\n"Haarstrich","Jens", "Rechtsanwalt", "Peine", "Redaktion"\n"Köpl","Christian", "Ministerialrat", "Bayerisches Staatsministerium für Ernährung, Landwirtschaft und Forsten, München", ""\n"Martinez","Jose", "Prof. Dr.", "Institut für Landwirtschaftsrecht, Georg-August-Universität Göttingen, Göttingen", "Herausgeber"\n"Nies","Volkmar", "Ltd. Landwirtschaftsdirektor", "Landwirtschaftskammer NRW, Bonn", "Redaktion"\n"Stephany","Ralf", "Rechtsanwalt/Steuerberater", "Bonn", "Redaktion"\n"Wedemeyer","Harald", "Rechtsanwalt", "Landvolk Niedersachsen, Hannover", "Redaktion"\n"Schell","Irina Valeska", "", "Georg-August-Universität Göttingen, Göttingen", ""\n\nNote: The column \'role\' is empty for some members, as their role could not be determined.'
' Here is the CSV data for the members of the editorial board or the advisory board of the journal \'AUR - Agrar- und Umweltrecht\':\n\n"lastname","firstname","title","affiliation","role"\n"Busse","Christian", "Regierungsdirektor", "Bundesministerium für Ernährung und Landwirtschaft, Bonn", "Herausgeber"\n"Endres","Ewald", "Prof. Dr.", "Hochschule Weihenstephan-Triesdorf, Freising", "Redaktion"\n"Francois","Matthias", "Rechtsanwalt", "Bitburg", "Redaktion"\n"Garmissen","Bernd", "Rechtsanwalt", "Göttingen", "Redaktion"\n"Graß","Christiane", "Rechtsanwältin", "Bonn", "Redaktion"\n"Haarstrich","Jens", "Rechtsanwalt", "Peine", "Redaktion"\n"Köpl","Christian", "Ministerialrat", "Bayerisches Staatsministerium für Ernährung, Landwirtschaft und Forsten, München", ""\n"Martinez","Jose", "Prof. Dr.", "Institut für Landwirtschaftsrecht, Georg-August-Universität Göttingen, Göttingen", "Herausgeber"\n"Nies","Volkmar", "Ltd. Landwirtschaftsdirektor", "Landwirtschaftskammer NRW, Bonn", "Redaktion"\n"Stephany","Ralf", "Rechtsanwalt/Steuerberater", "Bonn", "Redaktion"\n"Wedemeyer","Harald", "Rechtsanwalt", "Landvolk Niedersachsen, Hannover", "Redaktion"\n"Schell","Irina Valeska", "", "Georg-August-Universität Göttingen, Göttingen", ""\n\nNote: The column \'role\' is empty for some members, as their role could not be determined.'