data ingest review and curation
We need a clearer process to ensure that errors or possible improvements after the first data ingestion are taken into account and solved either by a new ingest (if we are dealing with a mapping issue) or by the curation team.
Systematic review has been conducted for TAPoR ingest (see here).
Human/manual review have been conducted for each sources. See dedicated ingest issue per source:
- TAPoR (#7)
- Programming Historian (#5)
- SSK (workflows and Zotero libraby) (#3)
- Switchboard tools (#6)
- DBLP (#8)
- EOSC catalogue (#2)
- Humanities Data (#63)
Main related issues:
- duplicates (#44 (closed) );
- broken URLs (#1 )
- licenses (#37 (moved) )
- continuous ingest (is there a dedicated issue for this?)
Discussed during one of the recent T7.2 telcos. Establish manual curation procedure for new and updated sources, based on the reviews to ensure an action follows upon the findings:
- SYSTEM_IMPORTER => suggested
- Bulk operations on items (esp. approve)
- Need to identify ingest-batches (specific hidden dynamic property?)
notify @alex @dieter @nicolas.larrousse @edward.gray @matej.durco @klaus.illmayer @frank.fischer01