Skip to main content

Crossref DOI Resolution Test Crawl (May 2018)

Internet Archive Web Group

This collection contains web crawl data for a random selection of 500k (0.5 million) Crossref DOI redirects, including the doi.org redirect requests. The intent of this crawl is to gather loose statistics on the number of failing redirects, number of host websites that block automated crawling, and a corpus of HTML landing pages for metadata extraction (eg, "signposting" HTTP headers, linked data HTML metadata, semantic markup).



rss RSS

5
RESULTS


Show sorted alphabetically

Show sorted alphabetically

SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Internet Archive crawldata of web PDF content captured by wbgrp-svc285.us.archive.org:DOI-LANDING-TESTCRAWL-2018-05 from Fri May 4 03:47:28 PDT 2018 to Fri May 4 11:47:17 PDT 2018.
Topic: crawldata
Internet Archive crawldata of web PDF content captured by wbgrp-svc285.us.archive.org:DOI-LANDING-TESTCRAWL-2018-05 from Fri May 4 02:32:06 PDT 2018 to Thu May 3 22:33:51 PDT 2018.
Topic: crawldata
Crossref DOI Resolution Test Crawl (May 2018)
data

eye 3

favorite 0

comment 0

Configuration, Reports, and Logs for DOI-LANDING-TESTCRAWL-2018-05 crawl.
Internet Archive crawldata of web PDF content captured by wbgrp-svc285.us.archive.org:DOI-LANDING-TESTCRAWL-2018-05 from Fri May 4 14:20:49 PDT 2018 to Sat May 5 09:31:21 PDT 2018.
Topic: crawldata
Internet Archive crawldata of web PDF content captured by wbgrp-svc285.us.archive.org:DOI-LANDING-TESTCRAWL-2018-05 from Sat May 5 16:31:23 PDT 2018 to Mon May 7 14:29:26 PDT 2018.
Topic: crawldata