Commit f2b55011 authored by mhellka's avatar mhellka
Browse files

Massive rewrite of the CLI and lots of refactoring work.

- Changed the concept of CLI context and workspaces (see README).
- Moved 'pycdstar3.client' module to 'pycdstar3.api'.
- Moved data classes from 'pycdstar3.api' to 'pycdstar3.model'.
- Introduced optional high-level API handle classes for remote resources.
parent 86d54c54
......@@ -11,4 +11,5 @@ MANIFEST
.pytest_cache/
.tox/
.coverage
.cdstar/
......@@ -7,7 +7,8 @@ venv/touch-me: setup.py requirements.txt
venv/bin/pip install -U pip -r requirements.txt
touch venv/touch-me
docs: venv $(DOCS_ALL) $(SRC_ALL)
docs: build/docs/index.html
build/docs/index.html: venv $(DOCS_ALL) $(SRC_ALL)
venv/bin/sphinx-build -a docs/ build/docs/
dist: venv $(SRC_ALL)
......
......@@ -5,38 +5,69 @@ This is a client library and command-line toolbelt for accessing a
There is already a library called [pycdstar](https://pypi.org/project/pycdstar/) for accessing older versions of CDSTAR. The two libraries may be merged at some point, but for now, use [pycdstar](https://pypi.org/project/pycdstar/) for CDSTAR 2 and [pycdstar3](https://gitlab.gwdg.de/cdstar/pycdstar3) for CDSTAR 3.
## Command-Line interface
## Library
`pycdstar3` is also a command-line toolbox to upload, download or manage data in a CDSTAR repository.
The `pycdstar3.CDStar` class closely reassembles the actual CDSTAR v3 REST API, one method per API endpoint. It provides basic connection pooling, transaction management, error handling and json parsing on top of the `requests` library, but tries to stay out of your way otherwise. Use this if you already know the CDSTAR API and need maximum control.
```Python
from pycdstar3 import CDStar
cdstar = CDStar("https://cdstar.gwdg.de/demo/v3/", auth=("USER", "PASS"))
with cdstar.begin(autocommit=True): # Wrap everything in a transaction (optional)
vault_name = "demo"
archive_id = cdstar.create_archive(vault_name).id
cdstar.put_file(vault_name, archive_id, "test.py", __file__)
with cdstar.get_file(vault_name, archive_id, "test.py") as download:
with open("/tmp/test.py", "wb") as target:
for chunk in download.iter_content(buffsize=1024*64):
target.write(chunk)
```
Please note that the command-line interface is made for humans, not scripts. The commands and output may change between releases without notice. If you want to automate CDSTAR, consider implementing your tools directly against the `pycdstar3` client library. Libraries for other languages may also be available. As a last resort, you can always develop directly against the stable [CDSTAR REST API](https://cdstar.gwdg.de/docs/dev/#endpoints) using the HTTP client of your choice.
For a more fluent and high-level approach, wrap your `CDStar` instance in a `CDStarVault`. This hides the HTTP layer behind an object-oriented API and offers convenient wrappers and methods for the most common operations.
### cdstar.conf
```Python
from pycdstar3 import CDStar, CDStarVault
cdstar = CDStar("https://cdstar.gwdg.de/demo/v3/", auth=("USER", "PASS"))
The `pycdstar3` client will look for a `cdstar.conf` in the current directory or its parent directories and try to load default values from it. The most important settings are the CDSTAR server URI and a default vault name. If these are defined, you can reference archives by ID only, instead of the full service URI. This saves a *lot* of typing. The `pycdstar init` command will help you create this file.
with cdstar.begin(autocommit=True): # Wrap everything in a transaction (optional)
vault = CDStarVault(cdstar, "demo")
archive = vault.create_archive()
archive.put("test.py", __file__)
### Referencing Vaults, Archives and Files
with archive.file("test.py").download() as download:
download.save_to("/tmp/")
```
Most `pycdstar3` client commands operate on a specific vault, archive or file. These can always be referenced by their full URI. If a `cdstar.conf` is present and default values for server and vault are defined, then also a couple of short forms can be used. The following syntax forms are supported:
## Command-Line interface
`pycdstar3` is also a command-line toolbox to upload, download or manage data in a CDSTAR repository.
* **Full URI**: If the reference starts with `http(s)://` then everything up to the first `/v3/` is used as the server URI, followed by a vault and optionally an archive ID and file path. The default server and vault settings are not used.
Please note that the command-line interface is made for humans, not scripts. The commands and output may change between releases. If you want to automate CDSTAR, consider implementing your tools directly against the `pycdstar3` client library.
Example: `pycdstar3 get https://example.com/v3/myvault/e497a76f/some/file.txt`
* **Vault Name**: If the reference starts with a forward slash and a vault name, then the default server from `cdstar.conf` is used, but the default vault is ignored. To reference the default vault, leave the vault name empty.
### Workspace mode
Example: `pycdstar3 get /myvault/e497a76f/some/file.txt`
Example: `pycdstar3 get //e497a76f/some/file.txt` (default vault)
* **Archive ID**: In the shortest form, the reference must start with an archive ID and both default server and vault settings must be present in your `cdstar.conf`.
Most `pycdstar3` commands require `--server` and `--vault` arguments to be set, or `CDSTAR_SERVER` and `CDSTAR_VAULT` environment variables to be defined. As an alternative, you can run `pycdstar3 init` to create a *workspace directory*. When within this directory (or any sub-directory), the client will switch to *workspace mode* and read default values for `--server`, `--vault` and other settings from workspace configuration. This is particularly useful if you are working with multiple servers or vaults and want to easily switch between them.
Example: `pycdstar3 get e497a76f/some/file.txt`
```sh
# Explicit
pycdstar3 --server https://user:pass@example.com/v3/ --vault myVault search 'dcAuthor=Einstein'
Some commands will accept a special string `new` as an archive ID and create a new archive on the fly.
# Environment Variables
export CDSTAR_SERVER=https://user:pass@example.com/v3/
export CDSTAR_VAULT=myVault
pycdstar3 search 'dcAuthor=Einstein'
# Workspace mode
pycdstar3 init # creates .cdstar/pycdstar.conf
pycdstar3 search 'dcAuthor=Einstein'
```
### Command Usage
This is an (incomplete) list of (planned) commands. For a complete list, run `pycdstar3 -h` and for details, see `pycdstar3 COMMAND -h`.
* **`init`**: Ask for server address, vault, credentials and other config options and create a `cdstar.conf` file in the current directory.
* **`init`**: Create a new workspace.
* **`new`** Create a new (empty) archive.
* **`info`** Query information about vaults, archives or files.
* **`meta get/set/list`** Manage metadata attributes for archives or files.
......@@ -50,7 +81,6 @@ This is an (incomplete) list of (planned) commands. For a complete list, run `py
* **`scroll`** List all IDs in a vault.
* **`search`** Search in a vault.
## License
Copyright 2019 Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
......
API reference
=====================================
.. automodule:: pycdstar3
API Client (low-level)
----------------------
.. autoclass:: pycdstar3.api.CDStar
:members:
Resource Handles (fluent API)
-----------------------------
.. autoclass:: pycdstar3.api.CDStarVault
:members:
.. autoclass:: pycdstar3.api.CDStarArchive
:members:
.. autoclass:: pycdstar3.api.CDStarFile
:members:
Data Classes and Wrappers
-------------------------
.. automodule:: pycdstar3.model
:members:
......@@ -36,6 +36,12 @@ extensions = [
'sphinx.ext.viewcode',
]
autodoc_default_options = {
'members': True,
'member-order': 'bysource',
'undoc-members': True
}
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
......
......@@ -5,7 +5,7 @@ Welcome to pycdstar3's documentation!
:maxdepth: 2
:caption: Contents:
pycdstar3/index
api
Indices and tables
......
`pycdstar3.client`
=================
.. automodule:: pycdstar3.client
:members:
API reference
=====================================
.. toctree::
:caption: Modules:
:glob:
*
__version__ = "3.0.dev0"
__url__ = "https://cdstar.gwdg.de/"
from pycdstar3.client import * # noqa: F401, F403
from pycdstar3.api import * # noqa: F403 F401
import pycdstar3.api
__all__ = pycdstar3.api.__all__
import os
PATH_TYPES = (str,)
if hasattr(os, "PathLike"):
PATH_TYPES = (str, os.PathLike) # pragma: no cover
class cached_property(object):
def __init__(self, func):
self.__doc__ = getattr(func, '__doc__')
self.func = func
def __get__(self, obj, cls):
if obj is None:
return self
value = obj.__dict__[self.func.__name__] = self.func(obj)
return value
"""
Client api implementation. Usually imported directly from :mod:`pycdstar` and not from here.
"""
import os
from json import JSONDecodeError
import typing
import requests
from requests_toolbelt import MultipartEncoder
__all__ = ("CDStar", "FormUpdate", "ApiError")
from pycdstar3.model import ApiError, JsonObject, FileDownload, FormUpdate
from pycdstar3._utils import PATH_TYPES
__all__ = "CDStar", "CDStarVault", "FormUpdate", "ApiError"
PATH_TYPES = (str,)
if hasattr(os, "PathLike"):
PATH_TYPES = (str, os.PathLike) # pragma: no cover
def _fix_filename(name):
# silently strip leading slashes
name = name.lstrip("/")
# Fail hard on relative filenames
if name != os.path.normpath(name):
raise ValueError("Archive file name not in a normalized form: {} != {}".format(name, os.path.normpath(name)))
return name
class CDStar:
""" Provide low-level methods for corresponding server-side REST endpoints.
If not documented otherwise, each method call triggers exactly one REST
request. There is no internal caching. The only state that is tracked by
this class is the running transaction, if any.
request and return a :class:`pycdstar3.model.JsonObject`, which offers dict-like and attribute access to json fields.
There is no internal caching. The only state that is tracked by this class is the running transaction, if any.
"""
def __init__(self, url, auth=None, _session=None):
......@@ -34,16 +45,16 @@ class CDStar:
"""
return CDStar(self.url, auth=self.auth, _session=self._session)
def raw(self, method, *path, expect_status=None, **options):
""" Send a raw HTTP request to the cdstar server.
def raw(self, method, *path, expect_status=None, **options) -> requests.Response:
""" Send a raw HTTP request to the cdstar server and return a raw response.
Authentication and transaction headers are added automatically.
Error responses are thrown as `ApiError`, unless the status code is
explicitly accepted as valid via `expect_status`.
Disclaimer: Avoid using this method if there is a more specific implementation available. If you find a feature missing from this
class, please submit a feature request instead of over-using this
method.
Disclaimer: Avoid using this method if there is a more specific implementation available.
If you find a feature missing from this class, please submit a feature request instead of
over-using this method.
"""
if self.auth:
......@@ -62,19 +73,18 @@ class CDStar:
raise ApiError(rs)
def rest(self, method, *path, expect_status=None, **options) -> dict:
def rest(self, method, *path, expect_status=None, **options) -> JsonObject:
""" Just like `raw()`, but expects the response to be JSON and returns
the parsed result instead of the raw response. Non-JSON responses
are errors.
Disclaimer: Avoid using this method if there is a more specific implementation available. If you find a feature missing from this
class, please submit a feature request instead of over-using this
method.
Disclaimer: Avoid using this method if there is a more specific implementation available.
If you find a feature missing from this class, please submit a feature request instead of
over-using this method.
"""
# TODO: Expect json errors or non-json responses and
# throw a better error message
return self.raw(method, *path, expect_status=expect_status, **options).json()
# TODO: Expect json errors or non-json responses and throw a better error message
return self.raw(method, *path, expect_status=expect_status, **options).json(object_hook=JsonObject)
def begin(self, autocommit=False, readonly=False):
""" Start a new transaction and return self.
......@@ -101,27 +111,27 @@ class CDStar:
return self
@property
def tx(self):
def tx(self) -> JsonObject:
""" Return the current transaction handle, or None if no transaction is running. """
return self._tx
def commit(self) -> None:
def commit(self):
""" Commit the current transaction. """
if not self._tx:
raise RuntimeError("No transaction running")
try:
self.raw("POST", "_tx", self._tx['id'])
self.raw("POST", "_tx", self._tx.id)
self._tx = None
except Exception:
self.rollback()
raise
def rollback(self) -> None:
def rollback(self):
""" Rollback the current transaction, if any. Do nothing otherwise. """
try:
if self._tx:
self.raw("DELETE", "_tx", self._tx['id'])
self.raw("DELETE", "_tx", self._tx.id)
finally:
self._tx = None
......@@ -129,7 +139,7 @@ class CDStar:
""" If a transaction is running, keep it alive. Otherwise, do nothing. """
if not self._tx:
raise RuntimeError("No transaction running")
self._tx = self.rest("GET", "_tx", self._tx['id'])
self._tx = self.rest("GET", "_tx", self._tx.id)
def __enter__(self):
""" Expect a transaction to be already running. """
......@@ -144,7 +154,7 @@ class CDStar:
else:
self.rollback()
def exists(self, vault, archive=None, file=None):
def exists(self, vault, archive=None, file=None) -> bool:
""" Checks if a vault, archive or file exists """
if file:
return self.raw("HEAD", vault, archive, file, expect_status=[200, 404]).ok
......@@ -152,15 +162,15 @@ class CDStar:
return self.raw("HEAD", vault, archive, expect_status=[200, 404]).ok
return self.raw("HEAD", vault, expect_status=[200, 404]).ok
def service_info(self):
def service_info(self) -> JsonObject:
""" Get information about the cdstar service instance """
return self.rest('GET')
def vault_info(self, vault: str):
def vault_info(self, vault: str) -> JsonObject:
""" Get information about a vault """
return self.rest('GET', vault)
def archive_info(self, vault, archive, meta=False, files=False):
def archive_info(self, vault, archive, meta=False, files=False) -> JsonObject:
""" Get information about an archive """
query = {"info": "true"}
if meta:
......@@ -169,14 +179,14 @@ class CDStar:
query.setdefault("with", []).append("files")
return self.rest("GET", vault, archive, params=query)
def file_info(self, vault, archive, name, meta=False):
def file_info(self, vault, archive, name, meta=False) -> JsonObject:
""" Get information about a file """
query = {"info": "true"}
if meta:
query['with'] = "meta"
return self.rest("GET", vault, archive, _fix_filename(name), params=query)
def create_archive(self, vault, form: "FormUpdate" = None):
def create_archive(self, vault, form: FormUpdate = None) -> JsonObject:
""" Create a new archive. """
if form:
return self.rest("POST", vault, data=form.body,
......@@ -184,13 +194,13 @@ class CDStar:
else:
return self.rest("POST", vault)
def update_archive(self, vault, archive, form: "FormUpdate"):
def update_archive(self, vault, archive, form: FormUpdate) -> JsonObject:
""" Update an existing archive """
return self.rest("POST", vault, archive, data=form.body,
headers={'Content-Type': form.content_type})
def list_files(self, vault, archive, offset=0, limit=100, meta=False, order=None, reverse=False,
include_glob=None, exclude_glob=None):
include_glob=None, exclude_glob=None) -> JsonObject:
""" Request a FileList for an archive.
The FileList may be incomplete of more than `limit` files are in an archive. See iter_files() for a
......@@ -211,7 +221,7 @@ class CDStar:
return self.rest("GET", vault, archive, params=query)
def iter_files(self, vault, archive, offset=0, **args):
def iter_files(self, vault, archive, offset=0, **args) -> typing.Iterator[JsonObject]:
""" Yield all FileInfo entries of an archive.
This method may (lazily) issue more than one request if an archive contains more than `limit` files.
......@@ -225,23 +235,31 @@ class CDStar:
else:
break
def put_file(self, vault, archive, name, source, type=None):
def put_file(self, vault, archive, name, source, type=None) -> JsonObject:
if isinstance(source, PATH_TYPES):
raise ValueError("Source must be a file-like object, byte string or iterator yielding byte strings.")
return self.rest("PUT", vault, archive, _fix_filename(name), data=source,
headers={'Content-Type': type or "application/x-autodetect"})
def get_file(self, vault, archive, name, offset=0) -> "FileDownload":
""" Return a stream-able response object representing the requested file. """
def get_file(self, vault, archive, name, offset=0) -> FileDownload:
""" Request a file and return a stream-able :class:`FileDownload`.
The request is issued with `stream=True`, which means it is still open and not fully read when this
method returns. The returned wrapper MUST be `close()`d after use, or wrapped in a `with` statement::
with cdstar.get_file(vault, id, "/file/name.txt") as dl:
dl.save_to("~/Downloads/")
"""
headers = {'Range': "bytes={}-".format(offset)} if offset > 0 else {}
name = _fix_filename(name)
rs = self.raw("GET", vault, archive, name, stream=True, headers=headers)
return FileDownload(vault, archive, name, rs)
def search(self, vault, q, order=None, limit=0, scroll=None, groups=None):
""" Perform a search and return the SearchResults document.
def search(self, vault, q, order=None, limit=0, scroll=None, groups=None) -> JsonObject:
""" Perform a search and return a single page of search results.
See iter_search() for a way to fetch more than `limit` results.
See iter_search() for a convenient way to fetch more than `limit` results.
"""
query = {"q": q}
if order:
......@@ -254,8 +272,8 @@ class CDStar:
query['groups'] = groups
return self.rest("GET", vault, query=query)
def iter_search(self, vault, q, scroll=None, **args):
""" Yield all SearchHit entries of a search.
def iter_search(self, vault, q, scroll=None, **args) -> typing.Iterator[JsonObject]:
""" Yield all search hits of a search.
This method may (lazily) issue more than one request if a search returns more than `limit` results.
"""
......@@ -268,197 +286,78 @@ class CDStar:
break
class FileDownload:
""" Wrapper for streamed file downloads.
The file content can only be read once.
"""
def __init__(self, vault, archive, name, rs: requests.Response):
self.vault = vault
self.archive = archive
self.name = name
self.rs = rs
self.read = rs.raw.read
@property
def basename(self):
return os.path.basename(self.name)
@property
def type(self):
return self.rs.headers["Content-Type"]
@property
def is_partial(self):
return self.rs.status_code == 206
@property
def size(self):
return int(self.rs.headers["Content-Length"])
def __iter__(self):
""" Iterate over chunks of data (NOT lines) """
return self.iter_content()
def iter_content(self, buffsize=1024 * 64):
""" Iterate over chunks of data (NOT lines) """
return self.rs.iter_content(chunk_size=buffsize)
def save_as(self, target):
""" Save this download to a file (str, Path or file-like)"""
if not hasattr(target, 'write'):
with open(target, 'wb') as fp:
return self.save_as(fp)
for chunk in self.iter_content():
target.write(chunk)
def close(self):
self.rs.close()
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
self.close()
def __repr__(self):
return "FileDownload({}/{} name={!r} size={})".format(self.vault, self.archive, self.name, self.size)
def readall(self):
""" Read the entire download into memory and return a single large byte object. """
return self.rs.content
# Design notes for the following resource handles:
# - The handle instances are really just a slim handle for a remote resource, NOT a wrapper, local copy or cache.
# They should not cache or store anything that might change remotely.
# - Only methods are allowed to trigger requests, preferably only one request per call.
# - Handles MUST implement exists()->bool and info()->JsonObject.
class ApiError(Exception):
def __init__(self, rs):
if rs.ok:
raise AssertionError("Not an error response: " + rs)
if rs.headers.get("Content-Type") != "application/json":
raise AssertionError("Not a JSON response: " + rs)
self.rs = rs
class CDStarVault:
""" Handle for a CDSTAR vault, providing a more fluent and object-oriented API on top of :class:`CDStar`.
try:
self.json = rs.json()
except JSONDecodeError:
raise AssertionError("Failed to decode server error response (invalid JSON) for: " + rs.request)
This handle, as well als other handles returned by it, are just lightweight pointers to remote resources.
No remote state is cached locally and most method calls will trigger REST requests.
"""
__slots__ = "api", "name"
@property
def error(self):
return self.json['error']
def __init__(self, api: CDStar, vault: str):
self.api = api
self.name = vault
@property
def message(self):
return self.json['message']
def exists(self):
""" Checks if a vault exists """
return self.api.exists(self.name)
@property
def status(self):
return self.json['status']
def info(self) -> JsonObject:
""" Get information about a vault """
return self.api.vault_info(self.name)
@property
def detail(self):
return self.json.get('detail') or {}
def new_archive(self, *a, **ka) -> "CDStarArchive":
""" Create a new archive and return a handle to it. """
return CDStarArchive(self, self.api.create_archive(self.name, *a, **ka).id)
def __repr__(self):
return '{0.error}({0.status}): {0.message}'.format(self)
def archive(self, id: str) -> "CDStarArchive":
""" Return a handle for a specific archive in this vault.
__str__ = __repr__
The archive may or may not exist remotely. Check with `exist()`.
"""
return CDStarArchive(self, id)
def pretty(self):
err = "API Error: {} ({})\n".format(self.error, self.status)
err += "Message: {}\n".format(self.message)
if self.detail:
err += "Details:\n"
for k, v in self.detail:
err += " {}: {!r}\n".format(k, v)
return err
def search(self, *a, **ka) -> JsonObject:
""" Search in this vault. Return a single result page. """
return self.api.search(self.name, *a, **ka)
def iter_search(self, *a, **ka) -> typing.Iterator[JsonObject]:
""" Search in this vault. Return a result iterator, which lazily issues more requests on demand. """
return self.api.iter_search(self.name, *a, **ka)
def _fix_filename(name):
# silently strip leading slashes
name = name.lstrip("/")
# Fail hard on relative filenames
if name != os.path.normpath(name):
raise ValueError("Archive file name not in a normalized form: {} != {}".format(name, os.path.normpath(name)))
return name
class CDStarArchive:
""" Handle for a CDSTAR archive.
class FormUpdate:
""" Builder for CDSTAR POST multipart/form-data requests to upload multiple files or change aspects of an archive.
See :class:`CDStarVault` for details on how handles work.