Skip to main content
SHOW DETAILS
eye
Title
Date Archived
Creator
MAG-PDF-CRAWL-2020-07
MAG-PDF-CRAWL-2020-07
collection
196
ITEMS
2M
VIEWS
by Internet Archive Web Group
collection

eye 2M

TARGETED-ARTICLE-CRAWL-2022-07
TARGETED-ARTICLE-CRAWL-2022-07
collection
0
ITEMS
85,950
VIEWS
collection

eye 85,950

SCIELO-CRAWL-2020-07
SCIELO-CRAWL-2020-07
collection
41
ITEMS
219,673
VIEWS
by Internet Archive Web Group
collection

eye 219,673

OMICS-DOI-LANDING-CRAWL-2019-04
OMICS-DOI-LANDING-CRAWL-2019-04
collection
4
ITEMS
14,368
VIEWS
by Internet Archive Web Group
collection

eye 14,368

This crawl started in April 2019, as an informal collaboration with Crossref. Crawling a smallish number (100k) DOI redirects and landing pages (plus PDF outlinks, and maybe a couple other hops) for a single large publisher (OMICS, which has multiple subsidiaries). Intent is to get reasonably good capture that can be used as canonical preservation copies of the landing pages. Secondary goal is to get decent fulltext capture coverage.
Open Access Journal Test Crawl (2018)
Open Access Journal Test Crawl (2018)
collection
794
ITEMS
12.4M
VIEWS
by Internet Archive Web Group
collection

eye 12.4M

MAG-PDF-CRAWL-2020-03
MAG-PDF-CRAWL-2020-03
collection
489
ITEMS
4.7M
VIEWS
by Internet Archive Web Group
collection

eye 4.7M

JOURNAL-HOMEPAGE-CRAWL-2022-03
JOURNAL-HOMEPAGE-CRAWL-2022-03
collection
44
ITEMS
367,737
VIEWS
collection

eye 367,737

OA-DOI-CRAWL-2020-12
OA-DOI-CRAWL-2020-12
collection
191
ITEMS
1.7M
VIEWS
by Internet Archive Web Group
collection

eye 1.7M

Wide Web Targeted PDF Crawling (2017)
Wide Web Targeted PDF Crawling (2017)
collection
922
ITEMS
3.4M
VIEWS
by Internet Archive Web Group
collection

eye 3.4M

UNPAYWALL-PDF-CRAWL-2020-05
UNPAYWALL-PDF-CRAWL-2020-05
collection
282
ITEMS
2M
VIEWS
by Internet Archive Web Group
collection

eye 2M

MAG-PDF-CRAWL-2021-08
MAG-PDF-CRAWL-2021-08
collection
189
ITEMS
1M
VIEWS
collection

eye 1M

UNPAYWALL-PDF-CRAWL-2021-07
UNPAYWALL-PDF-CRAWL-2021-07
collection
174
ITEMS
1.2M
VIEWS
collection

eye 1.2M

OA-JOURNAL-CRAWL-2020-07
OA-JOURNAL-CRAWL-2020-07
collection
1,923
ITEMS
11.7M
VIEWS
by Internet Archive Web Group
collection

eye 11.7M

Custom Crawl Services
by Internet Archive Web Group
data

eye 0

favorite 0

comment 0

This item contains a copy of log files found on the Internet Archive (Web Group) machine `wbgrp-svc263.us.archive.org` on 2018-05-29, under the `/3` directory. These are logs of file transfer status between various crawler machines; they are not known to contain any sensitive metadata (eg, personal information, IPs, or other security-sensitive information), but are being keep `access-restricted` anyways. This data is almost certainly unimportant and could be deleted; it is being preserved out...
TARGETED-ARTICLE-CRAWL-2022-04
TARGETED-ARTICLE-CRAWL-2022-04
collection
219
ITEMS
468,730
VIEWS
collection

eye 468,730

PubMed Central Crawl (2019-10)
PubMed Central Crawl (2019-10)
collection
216
ITEMS
517,355
VIEWS
by Internet Archive Web Group
collection

eye 517,355

OA-JOURNAL-CRAWL-2019-08
OA-JOURNAL-CRAWL-2019-08
collection
201
ITEMS
3M
VIEWS
by Internet Archive Web Group
collection

eye 3M

DOAJ-CRAWL-2020-11
DOAJ-CRAWL-2020-11
collection
102
ITEMS
1M
VIEWS
by Internet Archive Web Group
collection

eye 1M

DOI-CRAWL-2022-02
DOI-CRAWL-2022-02
collection
25
ITEMS
309,045
VIEWS
collection

eye 309,045

arXiv Content Crawl (2019-10)
arXiv Content Crawl (2019-10)
collection
37
ITEMS
96,737
VIEWS
by Internet Archive Web Group
collection

eye 96,737

JOURNALS-PATCH-CRAWL-2022-01
JOURNALS-PATCH-CRAWL-2022-01
collection
104
ITEMS
1.1M
VIEWS
collection

eye 1.1M

SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
collection
1,011
ITEMS
1.8M
VIEWS
by Internet Archive Web Group
collection

eye 1.8M

collection

eye 2.1M

IA crawl of PDF urls provided by Semantic Scholar.
Topic: pdf
UNPAYWALL-PDF-CRAWL-2019-04
UNPAYWALL-PDF-CRAWL-2019-04
collection
641
ITEMS
6.3M
VIEWS
by Internet Archive Web Group
collection

eye 6.3M

UNPAYWALL-PDF-CRAWL-2020-03
UNPAYWALL-PDF-CRAWL-2020-03
collection
344
ITEMS
2.2M
VIEWS
by Internet Archive Web Group
collection

eye 2.2M

OA-DOI-CRAWL-2020-02
OA-DOI-CRAWL-2020-02
collection
278
ITEMS
3.8M
VIEWS
by Internet Archive Web Group
collection

eye 3.8M

OAI-PMH-CRAWL-2020-06
OAI-PMH-CRAWL-2020-06
collection
2,946
ITEMS
6.8M
VIEWS
by Internet Archive Web Group
collection

eye 6.8M

UNPAYWALL-PDF-CRAWL-2020-11
UNPAYWALL-PDF-CRAWL-2020-11
collection
199
ITEMS
2M
VIEWS
by Internet Archive Web Group
collection

eye 2M

TARGETED-ARTICLE-CRAWL-2022-03
TARGETED-ARTICLE-CRAWL-2022-03
collection
9
ITEMS
71,585
VIEWS
collection

eye 71,585

by Internet Archive Web Group
collection

eye 6,870

This collection contains web crawl data for a random selection of 500k (0.5 million) Crossref DOI redirects, including the doi.org redirect requests. The intent of this crawl is to gather loose statistics on the number of failing redirects, number of host websites that block automated crawling, and a corpus of HTML landing pages for metadata extraction (eg, "signposting" HTTP headers, linked data HTML metadata, semantic markup). Total size of (uncompressed) WARC data is 50 GB,...
CiteSeerX URL Crawl 2017
data

eye 10

favorite 0

comment 0

This item contains checksums and file-level metadata for most (if not all) files collected in this crawl. The tab-separated-value (.tsv) file is similar to a CDX file but contains additional hashes.
MSAG-PDF-CRAWL-2017
collection
1,855
ITEMS
14.1M
VIEWS
by Internet Archive Web Group
collection

eye 14.1M

Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
CiteSeerX URL Crawl 2017
web

eye 8,583

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 05:01:45 PDT 2017 to Tue Jul 4 22:50:03 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,805

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:35:29 PDT 2017 to Wed Jul 5 00:48:05 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,619

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 05:45:37 PDT 2017 to Tue Jul 4 23:00:36 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 10,002

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:17:48 PDT 2017 to Wed Jul 5 05:29:23 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,495

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 08:33:05 PDT 2017 to Wed Jul 5 01:46:32 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,887

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:07:49 PDT 2017 to Wed Jul 5 00:18:34 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,120

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 06:37:37 PDT 2017 to Tue Jul 4 23:54:39 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,398

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:44:33 PDT 2017 to Wed Jul 5 00:57:33 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,787

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:48:17 PDT 2017 to Wed Jul 5 05:01:28 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,598

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 06:27:07 PDT 2017 to Tue Jul 4 23:40:46 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,776

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 10:26:12 PDT 2017 to Wed Jul 5 03:41:41 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,617

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 08:03:45 PDT 2017 to Wed Jul 5 01:16:31 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,148

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 09:55:35 PDT 2017 to Wed Jul 5 03:08:11 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 10,927

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:08:27 PDT 2017 to Wed Jul 5 05:22:22 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,207

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:59:15 PDT 2017 to Wed Jul 5 05:11:17 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,146

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 18:09:15 PDT 2017 to Wed Jul 5 11:23:47 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 9,290

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:42:06 PDT 2017 to Thu Jul 6 00:54:28 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,828

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:32:04 PDT 2017 to Thu Jul 6 00:46:06 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,176

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 02:20:32 PDT 2017 to Wed Jul 5 19:33:31 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,344

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 00:01:18 PDT 2017 to Wed Jul 5 17:59:10 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,676

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 06:57:33 PDT 2017 to Thu Jul 6 02:08:23 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,458

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 01:36:31 PDT 2017 to Wed Jul 5 18:48:11 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,762

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:01:58 PDT 2017 to Wed Jul 5 20:14:53 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,424

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 05:56:39 PDT 2017 to Wed Jul 5 23:08:54 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,379

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:48:37 PDT 2017 to Thu Jul 6 03:05:10 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,117

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:58:36 PDT 2017 to Thu Jul 6 05:13:01 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 11,097

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:53:13 PDT 2017 to Thu Jul 6 02:05:47 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,833

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:08:03 PDT 2017 to Thu Jul 6 04:22:39 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,601

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 14:52:50 PDT 2017 to Thu Jul 6 08:15:06 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,808

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:38:29 PDT 2017 to Thu Jul 6 02:51:41 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,206

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 12:19:36 PDT 2017 to Thu Jul 6 05:34:14 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,071

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 22:56:35 PDT 2017 to Thu Jul 6 17:34:26 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,517

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Fri Jul 7 03:24:06 PDT 2017 to Thu Jul 6 21:44:17 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,300

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 09:23:13 PDT 2017 to Wed Jul 5 02:37:37 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,757

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 22:33:37 PDT 2017 to Wed Jul 5 16:22:38 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,700

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 23:14:09 PDT 2017 to Wed Jul 5 17:01:18 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,473

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 16:02:30 PDT 2017 to Wed Jul 5 09:16:52 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,213

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 19:05:46 PDT 2017 to Wed Jul 5 12:18:30 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,656

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 14:59:25 PDT 2017 to Wed Jul 5 08:13:10 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,116

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 15:29:50 PDT 2017 to Wed Jul 5 08:40:44 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,306

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 17:01:13 PDT 2017 to Wed Jul 5 10:15:10 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,559

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 16:13:58 PDT 2017 to Wed Jul 5 09:29:32 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,790

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 15:19:45 PDT 2017 to Wed Jul 5 08:32:51 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,410

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 06:15:53 PDT 2017 to Wed Jul 5 23:27:06 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,030

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 02:41:55 PDT 2017 to Wed Jul 5 19:54:59 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,658

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 05:37:05 PDT 2017 to Wed Jul 5 22:50:05 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,931

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:28:50 PDT 2017 to Wed Jul 5 20:41:42 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,745

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 06:04:47 PDT 2017 to Wed Jul 5 23:17:34 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,316

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 02:26:07 PDT 2017 to Wed Jul 5 20:39:19 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,586

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:14:32 PDT 2017 to Thu Jul 6 00:28:30 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,894

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 01:56:59 PDT 2017 to Wed Jul 5 19:14:04 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,138

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:29:12 PDT 2017 to Thu Jul 6 04:42:47 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,701

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 16:43:44 PDT 2017 to Thu Jul 6 10:08:52 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,907

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:09:06 PDT 2017 to Thu Jul 6 02:22:13 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,596

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 17:09:19 PDT 2017 to Thu Jul 6 10:32:42 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,324

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:49:51 PDT 2017 to Thu Jul 6 05:02:19 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,166

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Fri Jul 7 00:14:00 PDT 2017 to Thu Jul 6 18:31:17 PDT 2017.
Topic: crawldata
Internet Archive Research Publication Crawls
by Wanfang Data
data

eye 6

favorite 0

comment 0

Metadata and some fulltext PDFs from Wanfang Data, downloaded 2020-03-29 from http://subject.med.wanfangdata.com.cn/Channel/7
OAI-PMH-PATCH-CRAWL-2021-12
OAI-PMH-PATCH-CRAWL-2021-12
collection
75
ITEMS
463,039
VIEWS
collection

eye 463,039

PUBMEDCENTRAL-CRAWL-2020-02
PUBMEDCENTRAL-CRAWL-2020-02
collection
108
ITEMS
296,336
VIEWS
by Internet Archive Web Group
collection

eye 296,336

DATASET-CRAWL-2022-01
DATASET-CRAWL-2022-01
collection
2
ITEMS
5,412
VIEWS
collection

eye 5,412

DIRECT-OA-CRAWL-2019
DIRECT-OA-CRAWL-2019
collection
2,566
ITEMS
6.1M
VIEWS
by Internet Archive Web Group
collection

eye 6.1M

CiteSeerX URL Crawl 2017
web

eye 7,329

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:27:04 PDT 2017 to Wed Jul 5 04:42:02 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,490

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:17:51 PDT 2017 to Wed Jul 5 00:28:29 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,916

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 08:43:10 PDT 2017 to Wed Jul 5 01:56:51 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,424

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 08:23:44 PDT 2017 to Wed Jul 5 01:37:05 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,327

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 05:56:50 PDT 2017 to Tue Jul 4 23:09:37 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,051

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 19:28:55 PDT 2017 to Wed Jul 5 12:44:56 PDT 2017.
Topic: crawldata