Commit 70aae931 authored by parciak's avatar parciak
Browse files

Added readme for documentation. Added overridable cdstar and couchdb...


Added readme for documentation. Added overridable cdstar and couchdb connection settings. Updated jsonld templates to latest documentation.
Signed-off-by: parciak's avatarMarcel Parciak <marcel.parciak@gmail.com>
parent 269d1fd9
......@@ -8,12 +8,8 @@ stages:
variables:
CDSTAR_URI: "https://c109-199.cloud.gwdg.de/v3"
CDSTAR_USER: "medic1"
CDSTAR_PASS: "medIC!umG!9"
CDSTAR_VAULT: "medic"
COUCH_URI: "https://c109-199.cloud.gwdg.de:5984"
COUCH_USER: "annotation_agent_test"
COUCH_PASS: "aXu9vcAugzQFvZij0TQOfnnN"
COUCH_DB: "annotation_test"
build_image:
......@@ -131,4 +127,4 @@ publish_to_harbor:
- docker build -t harbor.umg.eu/medic/$CI_PROJECT_NAME:latest .
- docker tag harbor.umg.eu/medic/$CI_PROJECT_NAME:latest harbor.umg.eu/medic/$CI_PROJECT_NAME:$AGENTVERSION
- docker push harbor.umg.eu/medic/$CI_PROJECT_NAME:latest
- docker push harbor.umg.eu/medic/$CI_PROJECT_NAME:$AGENTVERSION
\ No newline at end of file
- docker push harbor.umg.eu/medic/$CI_PROJECT_NAME:$AGENTVERSION
......@@ -4,70 +4,82 @@ SPDX-FileCopyrightText: 2020 UMG MeDIC <marcel.parciak@med.uni-goettingen.de>
SPDX-License-Identifier: GPL-3.0-or-later
-->
An agent barebone for Active Workflow agents.
Annotation of CDSTAR archives and files for CouchDB.
# Active Workflow Barebone Agent
# Annotation Agent
This Agent complies to the [external REST-API](https://github.com/automaticmode/active_workflow/blob/master/docs/remote_agent_api.md) of [ActiveWorkflow](https://github.com/automaticmode/active_workflow).
# How to use
In the first step, you should rename the `app` directory to something meaningful (although this needs many modification in each file). Do not forget to change it in setup.py, too.
After renaming, you can start to work on the main.py, which is the main entrypoint for this agent. This barebone provides you with [pydantic](https://pydantic-docs.helpmanual.io/) models for Active Workflow requests and responses. In the models_activeworkflow.py file, the "Application specifics" models allow to tailor the models to your needs. Namely, you should modify `PayloadInput`, `MessageOutput`, `OptionsCommon`, `MemoryCommon` and `CredentialsCommon`.
`config.py` holds the configuration according to [FastAPI](https://fastapi.tiangolo.com/) (i.e. it can be set via environment variables when starting uvicorn or the provided Dockerfile). Use it to make your agent configurable.
Use the `__version__.py` for versioning of your module, setup.py will try to access this value to determine the version. Also, this will add versions to the .gitlab-ci.yml for automatic creation of tagged images. For now, this is a manual process, i.e. you have to change the `VERSION` tuple in `__version__.py` to assign a version.
Keep in mind you may need to add lines to the Dockerfile if you expand you agent, e.g. by implementing a SQL backend (using [sqlalchemy](https://www.sqlalchemy.org/) and [alembic](https://alembic.sqlalchemy.org/en/latest/) of course). Also, this repo uses pipenv for package management, the Dockerfile uses pip3 instead. Do not forget to freeze manually using pipenv or use a package like [pipenv_to_requirements](https://github.com/gsemet/pipenv-to-requirements).
Last but not least, change this Readme.md and add all necessary information to it. This Readme is used in the agent description part of the Active Workflow register response, meaning that a AW-user will see it as the description of this agent.
## API Caller Configurations
### Register
This is what happens when you call register:
Register this agent by supplying a CDSTAR vault (`vault_id`) as well as a CouchDB database name (`couch_db`). This will tell the agent 1) where to check for posted archive_id as well as 2) which database to write all created JSON-LDs documents to.
In addition, you may supply additional metadata for archives (`annotations_archive`) and files (`annotations_file`) which will be appended to the template and any automatically created values. All options may be overridden by a received message.
#### Example
```
{
"some": "option"
"vault_id": "medic",
"couch_db": "medic",
"annotations_archive": {
"name": "SAP Einwilligungen Export",
"abstract": "An export of patient consent data from SAP.",
"sourceOrganization": {
"@id": "https://medic.umg.eu/metadata/Organization#umg",
"@type": "Organization"
}
},
"annotations_file": {
"name": "SAP Einwilligungserklärung",
"abstract": "An exported patient consent"
}
}
```
### Receive
This is what happens when you call receive:
Receive starts a new annotation process, creating a JSON-LD metadata document for an CDSTAR archive and each file contained in an archive. The JSON-LD files will be written to CouchDB as configured in the agent. Make sure both CDSTAR and CouchDB users have sufficient rights for this process!
A message MUST contain an `archive_id`, pointing to the archive that shall be annotated. Additionally, any configuration setting explained in the *Register* section can be supplied here, effectively overriding the parameter set in the agent options.
#### Example
Full request:
```
{
"method": "receive",
"params": {
"message": {
"payload": {
"context": {
"test_val": 12
}
}
},
"options": {
"job_name": "DummyJob",
"job_version": "1.0",
"context": {
"test_val": 8
}
},
"memory": {
"jobs": []
},
"credentials": {}
}
"method": "receive",
"params": {
"message": {
"payload": {
"archive_id": "a1b2c4d4e5f6",
"vault_id": "sources",
"annotations_archive": { },
"annotations_file": {
"name": "SAP Einwilligungen Export aus Onkostar"
}
}
},
"options": {
"vault_id": "medic",
"couch_db": "medic",
"annotations_archive": {
"name": "SAP Einwilligungen Export",
"abstract": "An export of patient consent data from SAP.",
"sourceOrganization": {
"@id": "https://medic.umg.eu/metadata/Organization#umg",
"@type": "Organization"
}
},
"annotations_file": {}
},
"memory": {
"archives": []
},
"credentials": {}
}
}
```
......@@ -76,9 +88,9 @@ Full response:
{
"result": {
"errors": [],
"logs": ["Job DummyJob has been started with id 21."],
"logs": ["Starting annotation of CDSTAR archive a1b2c4d4e5f6."],
"memory": {
"jobs": [21]
"archives": [ ("medic", "a1b2c4d4e5f6") ]
},
"messages": [],
}
......@@ -87,59 +99,54 @@ Full response:
### Check
This is what happens when you call check:
Checks the status of annotations in progress. Has to supply archives using the memory, which is the only variable that matters during a check request.
#### Example
Full request:
```
{
"method": "check",
"params": {
"message": null
"options": {
"job_name": "DummyJob",
"job_version": "1.0",
"context": {
"test_val": 8
}
},
"memory": {
"jobs": [21]
},
"credentials": [],
}
"method": "check",
"params": {
"message": null,
"options": {
"vault_id": "medic",
"couch_db": "medic",
"annotations_archive": {
"name": "SAP Einwilligungen Export",
"abstract": "An export of patient consent data from SAP.",
"sourceOrganization": {
"@id": "https://medic.umg.eu/metadata/Organization#umg",
"@type": "Organization"
}
},
"annotations_file": {}
},
"memory": {
"archives": [("medic", "a1b2c4d4e5f6")]
},
"credentials": {}
}
}
```
Full response:
```
{
"result": {
"errors": [],
"logs": [
"25|2pdwaA|QvVW2g6IdzHS|key.compareTo(25) != 0 failed\n"
],
"logs": [ ],
"memory": {
"jobs": []
},
"messages": [
{
"job_run_id": 21,
"job_id": 4,
"job_name": "DummyJob",
"job_version": "1.0",
"completed": true,
"returncode": 0,
"runtime": 1.036232,
"response": {
"data": [
{
"entity": 1,
"attribute": "wARNk5",
"value": "7riXpzHPFECz"
}
]
"archive_id": "32c44b2722f7",
"couch_uri": "",
"stats": {
"state": "completed",
"started_at": "2020-09-09T15:50:28.068312",
"completed_at": "2020-09-09T15:50:33.931664"
}
}
]
......@@ -154,4 +161,15 @@ Any non-active workflow based API endpoints can be explained here.
Configuration can be done using environment variables. It does not matter whether done locally or given to docker.
* `example_setting`: Some setting that configures your agent. Default: `"Unset"`
\ No newline at end of file
* `application_name`: Sets the name for this agent instance. Useful for multiple uses of this agent in one Active Workflow instance. Default: `"CDSTAR Annotation Agent"`
* `tmp_directory`: Temporary directory where all annotations progress files are stored. Needs to be writeable by the agent. Default: `"/tmp"`
* `cdstar_uri`: URI of the CDSTAR instance which is checked for the archive, should end on `/v3`. Default: `"http://localhost:8082/v3"`
* `cdstar_user`: Username of the CDSTAR instance for basic auth. Default: `"someuser"`
* `cdstar_pass`: Password of the CDSTAR instance for basic auth. Default: `"somepass"`
* `vault_id`: The default vault_id to use when checking the metadata of an CDSTAR archive. Default: `"medic"`
* `couch_uri`: The URI of the CouchDB instance to write the JSON LD metadata to. Default: `"http://localhost:5984"`
* `couch_user`: The user of the CouchDB instance for basic auth. Default: `"medic"`
* `couch_pass`: The password of the CouchDB instance for basic auth. Default: `"medic2020"`
* `couch_db`: Default database name in CouchDB that will be used to write JSON-LD metadata to. Default: `"medic"`
Keep in mind that you need to change both `archive.jsonld.jinja` and `file.jsonld.jinja` in `./annotator/static/` for this agent to work properly for you. Depending on your use case, they may differ from the CouchURI found in you settings. While you are at it, you can modify both template to fit your needs. Check `./annotator/models_cdstar.py` for possible value to use in the templates. They refer to the fields of the `ArchiveInfo` and `FileInfo` class respectively.
\ No newline at end of file
......@@ -2,6 +2,6 @@
#
# SPDX-License-Identifier: GPL-3.0-or-later
VERSION = (0, 2, 0)
VERSION = (0, 3, 0)
__version__ = ".".join(map(str, VERSION))
\ No newline at end of file
__version__ = ".".join(map(str, VERSION))
......@@ -9,7 +9,7 @@ from annotator import __version__
class BasicSettings(BaseSettings):
application_name: str = "CouchDB Annotation Agent"
application_name: str = "CDSTAR Annotation Agent"
application_version: str = __version__.__version__
tmp_directory: str = "/tmp"
cdstar_uri: str = "http://localhost:8082/v3"
......@@ -31,4 +31,4 @@ agent_description = ""
with open(
os.path.join(os.path.abspath(os.path.dirname(__file__)), os.pardir, "Readme.md")
) as readme:
agent_description = readme.read()
\ No newline at end of file
agent_description = readme.read()
......@@ -5,7 +5,7 @@
from annotator.errors import ActiveWorkflowError, ConfigurationError
import datetime
import os
from typing import Any, Dict, Optional, Union
from typing import Any, Dict, List, Optional, Union
from annotator import access_log, error_log
from annotator import config
......@@ -101,20 +101,32 @@ def receive(payload: awmodels.RequestReceive, background_tasks: BackgroundTasks)
response.result.memory.archives += payload.params.memory.archives
archive_id = payload.params.message.payload.archive_id
vault_id = get_setting_from_payload(payload, "vault_id")
if not vault_id:
# real error please
raise errors.ConfigurationError(
"There is an configuration error regarding the chosen CDSTAR vault. Please check the configuration of the agent."
)
couch_db = get_setting_from_payload(payload, "couch_db")
if not couch_db:
# real error please
raise errors.ConfigurationError(
"There is an configuration error regarding the chosen CouchDB database. Please check the configuraton of the agent."
)
settings = {}
for key in [
"vault_id",
"cdstar_uri",
"cdstar_user",
"cdstar_pass",
"couch_db",
"couch_uri",
"couch_user",
"couch_pass",
]:
setting_for_key = get_setting_from_payload(payload, key)
if not setting_for_key:
raise errors.ConfigurationError(
f"You are missing a configuration parameter. Setting {key} is required!"
)
settings[key] = setting_for_key
vault_id = settings["vault_id"]
if not stores.is_valid_archive(vault_id, archive_id):
if not stores.is_valid_archive(
vault_id=vault_id,
archive_id=archive_id,
cdstar_uri=settings["cdstar_uri"],
cdstar_auth=(settings["cdstar_user"], settings["cdstar_pass"]),
):
response.result.errors.append(
f"Archive {archive_id} is not available in CDSTAR vault {vault_id}."
)
......@@ -140,17 +152,13 @@ def receive(payload: awmodels.RequestReceive, background_tasks: BackgroundTasks)
"There is something wrong with the configured temporary directory to save the annotation state. Please check the configuration of the agent."
)
if payload.params.message.payload.couch_db:
couch_db = payload.params.message.payload.couch_db
background_tasks.add_task(
run_annotation,
archive_id=archive_id,
vault_id=vault_id,
annotations_archive=annotations_archive,
annotations_file=annotations_file,
couch_db=couch_db,
metafile=metafile,
settings=settings,
)
response.result.logs.append(f"Starting annotation of CDSTAR archive {archive_id}")
......@@ -161,13 +169,17 @@ def receive(payload: awmodels.RequestReceive, background_tasks: BackgroundTasks)
def run_annotation(
archive_id: str,
vault_id: str,
annotations_archive: Dict[str, Any],
annotations_file: Dict[str, Any],
couch_db: str,
metafile: str,
settings: Dict[str, str],
) -> None:
archive, filelist = stores.get_cdstar_metadata(vault_id, archive_id)
archive, filelist = stores.get_cdstar_metadata(
vault_id=settings["vault_id"],
archive_id=archive_id,
cdstar_uri=settings["cdstar_uri"],
cdstar_auth=(settings["cdstar_user"], settings["cdstar_pass"]),
)
if not archive:
errmsg = f"Not able to retrieve archive {archive_id} in BackgroundTask. How is that possible?"
error_log.error(errmsg)
......@@ -186,7 +198,12 @@ def run_annotation(
jsonld_archive["hasPart"].append(utils.get_jsonld_reference(meta_file))
meta_file["isPartOf"] = utils.get_jsonld_reference(jsonld_archive)
if not stores.upload_archive(jsonld_archive, couch_db):
if not stores.upload_archive(
jsonld_archive,
couch_db=settings["couch_db"],
couch_uri=settings["couch_uri"],
couch_auth=(settings["couch_user"], settings["couch_pass"]),
):
errmsg = (
f"Could not upload archive annotations for archive {archive_id} to CouchDB."
)
......@@ -194,7 +211,12 @@ def run_annotation(
os.unlink(metafile)
return
if not stores.upload_files(jsonld_files, couch_db):
if not stores.upload_files(
jsonld_files,
couch_db=settings["couch_db"],
couch_uri=settings["couch_uri"],
couch_auth=(settings["couch_user"], settings["couch_pass"]),
):
errmsg = f"Could not upload file annotations for files of archive {archive_id} to CouchDB."
error_log.error(errmsg)
utils.write_state_to_metafile(utils.AnnotationState.incomplete, metafile)
......@@ -288,4 +310,4 @@ async def aw_exception_handler(request: Request, exc: errors.ConfigurationError)
return fastapi.responses.JSONResponse(
status_code=fastapi.status.HTTP_400_BAD_REQUEST,
content=resp.dict(exclude_none=True),
)
\ No newline at end of file
)
......@@ -80,7 +80,13 @@ class PayloadInput(BaseModel):
archive_id: str = Field(..., example="a1b2c3d4e5f6")
vault_id: str = Field(None, example="medic")
cdstar_uri: str = Field(None, example="http://localhost:8082/v3")
cdstar_user: str = Field(None, example="someuser")
cdstar_pass: str = Field(None, example="somepass")
couch_db: str = Field(None, example="medic")
couch_uri: str = Field(None, example="http://lcoalhost:5984")
couch_user: str = Field(None, example="someuser")
couch_pass: str = Field(None, example="somepass")
annotations_archive: Dict[str, Any] = Field(
{}, example={"id": "something", "name": "Some Thing"}
)
......@@ -131,7 +137,13 @@ class OptionsCommon(BaseModel):
"""
vault_id: str = Field(None, example="medic")
cdstar_uri: str = Field(None, example="http://localhost:8082/v3")
cdstar_user: str = Field(None, example="someuser")
cdstar_pass: str = Field(None, example="somepass")
couch_db: str = Field(None, example="medic")
couch_uri: str = Field(None, example="http://lcoalhost:5984")
couch_user: str = Field(None, example="someuser")
couch_pass: str = Field(None, example="somepass")
annotations_archive: Dict[str, Any] = Field(
{}, example={"id": "something", "name": "Some Thing"}
)
......@@ -292,4 +304,4 @@ class ResultCheck(ResultCommon):
class ResponseCheck(ResponseCommon):
result: ResultCheck = Field(ResultCheck(), example=ResultCheck())
\ No newline at end of file
result: ResultCheck = Field(ResultCheck(), example=ResultCheck())
{
"@context": "http://schema.org/",
"@type": "Dataset",
"@id": "{{ model.base_uri }}/{{ model.id }}",
"@id": "https://c109-199.cloud.gwdg.de:5984/annotation_test/{{ model.id }}",
"identifier": "{{ model.id }}",
{% if model.file_count is defined %}
"size": {
......@@ -20,5 +20,9 @@
"dateModified": "{{ model.modified.isoformat() }}",
{% endif %}
"alternateName": "{{ model.id }}",
"hasPart": []
"hasPart": [],
"maintainer": {
"@id": "https://medic.umg.eu/metadata/organization#umg",
"@type": "Organization"
}
}
\ No newline at end of file
{
"@context": "http://schema.org/",
"@type": "DataDownload",
"@id": "{{ model.base_uri }}/{{ model.id }}",
"@id": "https://c109-199.cloud.gwdg.de:5984/annotation_test/{{ model.id }}",
"identifier": "{{ model.id }}",
{% if model.size is defined %}
"size": {
......@@ -25,5 +25,9 @@
{% if model.modified is defined %}
"dateModified": "{{ model.modified.isoformat() }}",
{% endif %}
"alternateName": "{{ model.id }}"
"alternateName": "{{ model.id }}",
"maintainer": {
"@id": "https://medic.umg.eu/metadata/organization#umg",
"@type": "Organization"
}
}
\ No newline at end of file
......@@ -11,21 +11,14 @@ import cloudant.database as CloudantDatabase
import cloudant.document as CloudantDocument
from pycdstar3 import CDStar, CDStarVault
from annotator import config
from annotator import error_log
from annotator import models_cdstar as cdmodels
cdstar = CDStar(
config.BasicSettings().cdstar_uri,
auth=(config.BasicSettings().cdstar_user, config.BasicSettings().cdstar_pass),
)
def is_valid_archive(vault_id: str, archive_id: str) -> bool:
cdstar = CDStar(
config.BasicSettings().cdstar_uri,
auth=(config.BasicSettings().cdstar_user, config.BasicSettings().cdstar_pass),
)
def is_valid_archive(
vault_id: str, archive_id: str, cdstar_uri: str, cdstar_auth: Tuple[str, str]
) -> bool:
cdstar = CDStar(cdstar_uri, auth=cdstar_auth,)
try:
vault = CDStarVault(cdstar, vault_id)
......@@ -38,12 +31,9 @@ def is_valid_archive(vault_id: str, archive_id: str) -> bool:
def get_cdstar_metadata(
vault_id: str, archive_id: str,
vault_id: str, archive_id: str, cdstar_uri: str, cdstar_auth: Tuple[str, str]
) -> Tuple[Optional[cdmodels.ArchiveInfo], List[cdmodels.FileInfo]]:
cdstar = CDStar(
config.BasicSettings().cdstar_uri,
auth=(config.BasicSettings().cdstar_user, config.BasicSettings().cdstar_pass),
)
cdstar = CDStar(cdstar_uri, auth=cdstar_auth,)
try:
vault = CDStarVault(cdstar, vault_id)
......@@ -62,27 +52,36 @@ def get_cdstar_metadata(
return None, []
def upload_archive(archive_meta: Dict[str, Any], couch_db: str) -> bool:
return upload_jsonld(archive_meta, couch_db)
def upload_archive(
archive_meta: Dict[str, Any],
couch_db: str,
couch_uri: str,
couch_auth: Tuple[str, str],
) -> bool:
return upload_jsonld(archive_meta, couch_db, couch_uri, couch_auth)
def upload_files(files_metalist: List[Dict[str, Any]], couch_db: str) -> bool:
def upload_files(
files_metalist: List[Dict[str, Any]],
couch_db: str,
couch_uri: str,
couch_auth: Tuple[str, str],
) -> bool:
for file_meta in files_metalist:
if not upload_jsonld(file_meta, couch_db):
if not upload_jsonld(file_meta, couch_db, couch_uri, couch_auth):
return False
return True
def upload_jsonld(jsonld: Dict[str, Any], couch_db: str) -> bool:
def upload_jsonld(
jsonld: Dict[str, Any], couch_db: str, couch_uri: str, couch_auth: Tuple[str, str]
) -> bool:
if "identifier" not in jsonld.keys():
error_log.error(f"Supplied jsonld has no id for `upload_jsonld`: {jsonld}")
return False
try:
with cloudant.couchdb(
config.BasicSettings().couch_user,
config.BasicSettings().couch_pass,
url=config.BasicSettings().couch_uri,
use_basic_auth=True,
couch_auth[0], couch_auth[1], url=couch_uri, use_basic_auth=True,
) as client:
database: CloudantDatabase.CouchDatabase = client[couch_db]
if not database.exists():
......@@ -94,4 +93,4 @@ def upload_jsonld(jsonld: Dict[str, Any], couch_db: str) -> bool:
t, v, tb = sys.exc_info()
error_log.error("Could not connect to CouchDB.")
error_log.error(f"{t}: {v}\n{traceback.print_tb(tb)}")
return False
\ No newline at end of file
return False
......@@ -2,12 +2,9 @@
#
# SPDX-License-Identifier: GPL-3.0-or-later
from annotator.stores import cdstar
import io
import json
import os
import time
from typing import Any, Dict
import cloudant
import cloudant.database as CloudantDatabase
......@@ -209,4 +206,4 @@ def test_complete_workflow(cdstar_archive):
assert test_utils.is_valid_response(response_json)
# on success, respond with any kind of message to indicate an annotation run is done
assert len(response_json["result"]["messages"]) > 0
\ No newline at end of file
assert len(response_json["result"]["messages"]) > 0
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment