Identifiables
While writing the parser for tsv tables, I realized that it shares issues with the crawler:
- We designed the crawler such that it can be interrupted and called again without problems. Should be the same for the parser
- We want to prevent creation of doubles in the DB.
- It might be necessary to create Entities that are used later on. E.g. one and the same file is referred to in a table multiple times. You will want to insert the file; but only once.
This could be solved by using the concept of identifiables in both cases. A reminder: an identifiable is just an uncomplete record. However the set of properties it has should uniquely identify a record of the corresponding type.