@ymoranv wrote:
"To make use of the AAI, we obviously need a key and a secret, those are known to the developers and server admins."
Do we have these already? Or what would we have to do?
Hm, interesting. But if the actual content is not indexed, just the site url, then ok.
Speaking of which, the robots.txt on our production server at https://marketplace.sshopencloud.eu/robots.txt has the same Disallow rule β I expect this will stay the same while we're in Beta, right? (We just shouldn't forget to change that rule in the final.
Thanks, Laure, for the comprehensive overview!
I would vote against including full conference proceedings. In the exemplary case of DH2011, we probably ingest individual articles via the overview at https://dblp.org/db/conf/dihu/dh2011.html β so we get the metadata, but URLs for this particular conference are not reachable anymore (also not via Internet Archive).
We could do with a simple trick to get actual URLs, using the "page" function for PDFs. So, for this exemplary paper: https://dblp.org/rec/conf/dihu/DalmauC11 β which is on pages 114β115 of the proceedings β we would link to https://dh2011.stanford.edu/wp-content/uploads/2011/05/DH2011_BookOfAbs.pdf#page=134 (page numbers are shifted by 20 in this case due to the nature of the PDF).