Skip to main content

Internet Archive Research Publication Crawls

Internet Archive Web Group

A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications.



rss RSS

21,136
RESULTS


Show sorted alphabetically

Show sorted alphabetically

SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
UNPAYWALL-PDF-CRAWL-2018-07
UNPAYWALL-PDF-CRAWL-2018-07
collection
1,241
ITEMS
14.3M
VIEWS
by Internet Archive Web Group
collection

eye 14.3M

Web archive data from a crawl of open access PDF URLs provided by Unpaywall.
OAI-PMH-CRAWL-2020-06
OAI-PMH-CRAWL-2020-06
collection
2,946
ITEMS
4.6M
VIEWS
by Internet Archive Web Group
collection

eye 4.6M

OA-JOURNAL-CRAWL-2020-07
OA-JOURNAL-CRAWL-2020-07
collection
1,923
ITEMS
9.4M
VIEWS
by Internet Archive Web Group
collection

eye 9.4M

MSAG-PDF-CRAWL-2017
collection
1,855
ITEMS
11.5M
VIEWS
by Internet Archive Web Group
collection

eye 11.5M

Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
Open Access Journal Test Crawl (2018)
Open Access Journal Test Crawl (2018)
collection
794
ITEMS
10.6M
VIEWS
by Internet Archive Web Group
collection

eye 10.6M

UNPAYWALL-PDF-CRAWL-2019-04
UNPAYWALL-PDF-CRAWL-2019-04
collection
641
ITEMS
5.2M
VIEWS
by Internet Archive Web Group
collection

eye 5.2M

MAG-PDF-CRAWL-2020-03
MAG-PDF-CRAWL-2020-03
collection
489
ITEMS
3.6M
VIEWS
by Internet Archive Web Group
collection

eye 3.6M

DIRECT-OA-CRAWL-2019
DIRECT-OA-CRAWL-2019
collection
2,566
ITEMS
5.1M
VIEWS
by Internet Archive Web Group
collection

eye 5.1M

CORE-UPSTREAM-CRAWL-2018-11
CORE-UPSTREAM-CRAWL-2018-11
collection
741
ITEMS
1.5M
VIEWS
by Internet Archive Web Group
collection

eye 1.5M

Crawl of "upstream" URLs from CORE (core.ac.uk) metadata dump. Only a partial seedlist of files crawled.
OA-DOI-CRAWL-2020-02
OA-DOI-CRAWL-2020-02
collection
278
ITEMS
3.2M
VIEWS
by Internet Archive Web Group
collection

eye 3.2M

JOURNALS-PATCH-CRAWL-2022-01
JOURNALS-PATCH-CRAWL-2022-01
collection
104
ITEMS
579,093
VIEWS
collection

eye 579,093

UNPAYWALL-PDF-CRAWL-2020-03
UNPAYWALL-PDF-CRAWL-2020-03
collection
344
ITEMS
1.7M
VIEWS
by Internet Archive Web Group
collection

eye 1.7M

DATACITE-DOI-CRAWL-2020-01
DATACITE-DOI-CRAWL-2020-01
collection
1,417
ITEMS
3.6M
VIEWS
by Internet Archive Web Group
collection

eye 3.6M

OA-DOI-CRAWL-2020-12
OA-DOI-CRAWL-2020-12
collection
191
ITEMS
1.4M
VIEWS
by Internet Archive Web Group
collection

eye 1.4M

UNPAYWALL-PDF-CRAWL-2021-07
UNPAYWALL-PDF-CRAWL-2021-07
collection
174
ITEMS
900,124
VIEWS
collection

eye 900,124

MAG-PDF-CRAWL-2021-08
MAG-PDF-CRAWL-2021-08
collection
189
ITEMS
652,030
VIEWS
collection

eye 652,030

MAG-PDF-CRAWL-2020-07
MAG-PDF-CRAWL-2020-07
collection
196
ITEMS
1.5M
VIEWS
by Internet Archive Web Group
collection

eye 1.5M

UNPAYWALL-PDF-CRAWL-2020-11
UNPAYWALL-PDF-CRAWL-2020-11
collection
199
ITEMS
1.6M
VIEWS
by Internet Archive Web Group
collection

eye 1.6M

UNPAYWALL-PDF-CRAWL-2020-05
UNPAYWALL-PDF-CRAWL-2020-05
collection
282
ITEMS
1.6M
VIEWS
by Internet Archive Web Group
collection

eye 1.6M

Wide Web Targeted PDF Crawling (2017)
Wide Web Targeted PDF Crawling (2017)
collection
922
ITEMS
3M
VIEWS
by Internet Archive Web Group
collection

eye 3M

TARGETED-ARTICLE-CRAWL-2022-04
TARGETED-ARTICLE-CRAWL-2022-04
collection
219
ITEMS
195,788
VIEWS
collection

eye 195,788

DOI-LANDING-CRAWL-2018-06
DOI-LANDING-CRAWL-2018-06
collection
279
ITEMS
3.2M
VIEWS
by Internet Archive Web Group
collection

eye 3.2M

OA-JOURNAL-CRAWL-2019-08
OA-JOURNAL-CRAWL-2019-08
collection
201
ITEMS
2.7M
VIEWS
by Internet Archive Web Group
collection

eye 2.7M

PLATFORM-CRAWL-2020
PLATFORM-CRAWL-2020
collection
649
ITEMS
372,726
VIEWS
by Internet Archive Web Group
collection

eye 372,726

UNPAYWALL-PDF-CRAWL-2021-05
UNPAYWALL-PDF-CRAWL-2021-05
collection
123
ITEMS
837,487
VIEWS
by Internet Archive Web Group
collection

eye 837,487

SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
collection
1,011
ITEMS
1.4M
VIEWS
by Internet Archive Web Group
collection

eye 1.4M

collection

eye 1.9M

IA crawl of PDF urls provided by Semantic Scholar.
Topic: pdf
DOAJ-CRAWL-2020-11
DOAJ-CRAWL-2020-11
collection
102
ITEMS
851,272
VIEWS
by Internet Archive Web Group
collection

eye 851,272

OAI-PMH-PATCH-CRAWL-2021-12
OAI-PMH-PATCH-CRAWL-2021-12
collection
75
ITEMS
279,982
VIEWS
collection

eye 279,982

CiteSeerX URL Crawl 2017
CiteSeerX URL Crawl 2017
collection
207
ITEMS
1.1M
VIEWS
collection

eye 1.1M

A targeted crawl to fetch research publications from the public web which have been crawled by CiteSeerX but have not previously been crawled by the Internet Archive.
Topics: scholarly, papers, journal
JOURNAL-HOMEPAGE-CRAWL-2022-03
JOURNAL-HOMEPAGE-CRAWL-2022-03
collection
44
ITEMS
215,083
VIEWS
collection

eye 215,083

DOI-CRAWL-2022-02
DOI-CRAWL-2022-02
collection
25
ITEMS
162,934
VIEWS
collection

eye 162,934

PubMed Central Crawl (2019-10)
PubMed Central Crawl (2019-10)
collection
216
ITEMS
398,184
VIEWS
by Internet Archive Web Group
collection

eye 398,184

PUBMEDCENTRAL-CRAWL-2020-02
PUBMEDCENTRAL-CRAWL-2020-02
collection
108
ITEMS
231,100
VIEWS
by Internet Archive Web Group
collection

eye 231,100

ARXIV-PUBMEDCENTRAL-CRAWL-2020-04
ARXIV-PUBMEDCENTRAL-CRAWL-2020-04
collection
60
ITEMS
100,370
VIEWS
by Internet Archive Web Group
collection

eye 100,370

TARGETED-ARTICLE-CRAWL-2022-03
TARGETED-ARTICLE-CRAWL-2022-03
collection
9
ITEMS
43,574
VIEWS
collection

eye 43,574

SCIELO-CRAWL-2020-07
SCIELO-CRAWL-2020-07
collection
41
ITEMS
187,324
VIEWS
by Internet Archive Web Group
collection

eye 187,324

arXiv Content Crawl (2019-10)
arXiv Content Crawl (2019-10)
collection
37
ITEMS
63,434
VIEWS
by Internet Archive Web Group
collection

eye 63,434

OA-JOURNAL-CRAWL-2020-07
web

eye 194,942

favorite 0

comment 0

Internet Archive crawldata of scholarly web journal content captured by wbgrp-svc282.us.archive.org:OA-JOURNAL-CRAWL-2020-07 from Sun Aug 2 19:00:58 PDT 2020 to Sun Aug 2 13:24:24 PDT 2020.
Topic: crawldata
UNPAYWALL-PDF-CRAWL-2022-04
UNPAYWALL-PDF-CRAWL-2022-04
collection
38
ITEMS
11,488
VIEWS
collection

eye 11,488

JOURNALS-PATCH-CRAWL-2022-01
web

eye 13,789

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 9 06:43:52 PST 2022 to Wed Feb 9 06:06:53 PST 2022.
Topic: crawldata
DOAJ-CRAWL-2020-11
web

eye 89,901

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc279.us.archive.org:DOAJ-CRAWL-2020-11 from Tue Nov 24 17:59:21 PST 2020 to Tue Nov 24 11:43:19 PST 2020.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 17,368

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sun Jan 16 16:05:54 PST 2022 to Sun Jan 16 16:33:31 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 10,593

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 9 12:34:39 PST 2022 to Wed Feb 9 13:13:37 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 15,270

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 2 04:32:21 PST 2022 to Wed Feb 2 06:24:58 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 18,761

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Fri Mar 4 08:19:11 PST 2022 to Tue Mar 8 18:29:43 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 10,228

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Sat Feb 26 14:02:15 PST 2022 to Sun Feb 27 05:47:42 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 20,293

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Feb 23 02:01:38 PST 2022 to Wed Feb 23 15:48:40 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 8,894

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Tue Mar 8 20:50:17 PST 2022 to Wed Mar 9 18:29:43 PST 2022.
Topic: crawldata
OA-DOI-CRAWL-2020-12
web

eye 33,194

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc279.us.archive.org:OA-DOI-CRAWL-2020-12 from Wed Dec 9 22:59:12 PST 2020 to Wed Dec 9 15:45:33 PST 2020.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 12,884

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sun Jan 16 23:09:54 PST 2022 to Sun Jan 16 23:27:17 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 9,108

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Mar 2 07:41:16 PST 2022 to Thu Mar 3 05:41:51 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 8,736

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 9 19:49:10 PST 2022 to Wed Feb 9 17:48:49 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 8,709

favorite 1

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Fri Feb 25 14:02:24 PST 2022 to Sat Feb 26 06:00:57 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 13,003

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Feb 23 18:50:55 PST 2022 to Thu Feb 24 11:23:51 PST 2022.
Topic: crawldata
JOURNAL-HOMEPAGE-CRAWL-2022-03
web

eye 21,725

favorite 0

comment 0

Internet Archive crawldata of scholarly web journal content captured by wbgrp-svc279.us.archive.org:JOURNAL-HOMEPAGE-CRAWL-2022-03 from Thu Mar 10 03:08:12 PST 2022 to Fri Mar 11 04:09:37 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 8,340

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Fri Feb 4 02:18:39 PST 2022 to Fri Feb 4 01:48:51 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 8,676

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Thu Feb 24 14:01:46 PST 2022 to Fri Feb 25 11:54:58 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 9,914

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sun Jan 16 08:47:48 PST 2022 to Sun Jan 16 09:46:08 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 8,261

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sat Feb 5 14:19:42 PST 2022 to Sat Feb 5 15:31:51 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 9,047

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Sun Feb 27 13:18:39 PST 2022 to Mon Feb 28 05:15:19 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 7,965

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Tue Mar 1 07:52:41 PST 2022 to Wed Mar 2 05:33:50 PST 2022.
Topic: crawldata
JOURNAL-HOMEPAGE-CRAWL-2022-03
web

eye 9,643

favorite 0

comment 0

Internet Archive crawldata of scholarly web journal content captured by wbgrp-svc279.us.archive.org:JOURNAL-HOMEPAGE-CRAWL-2022-03 from Wed Mar 30 20:42:30 PDT 2022 to Thu Mar 31 18:02:39 PDT 2022.
Topic: crawldata
UNPAYWALL-PDF-CRAWL-2018-07
web

eye 45,441

favorite 0

comment 0

Internet Archive crawldata of open access journal content captured by wbgrp-svc281.us.archive.org:UNPAYWALL-PDF-CRAWL-2018-07 from Sun Jul 29 09:54:12 PDT 2018 to Sun Jul 29 04:01:42 PDT 2018.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 7,770

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Thu Mar 3 07:55:41 PST 2022 to Fri Mar 4 06:00:57 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 6,343

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Fri Jan 28 18:32:02 PST 2022 to Fri Jan 28 18:24:26 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 7,803

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Fri Feb 4 09:38:55 PST 2022 to Fri Feb 4 10:01:13 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 8,316

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Tue Feb 1 19:10:45 PST 2022 to Tue Feb 1 22:31:42 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 6,919

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Mar 9 20:55:55 PST 2022 to Fri Mar 11 02:34:43 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 7,542

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Thu Feb 3 11:41:05 PST 2022 to Thu Feb 3 11:55:15 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 7,268

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sat Feb 5 21:54:41 PST 2022 to Sat Feb 5 22:00:45 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 7,071

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sat Feb 5 06:11:43 PST 2022 to Sat Feb 5 08:08:47 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 7,774

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Mon Feb 28 07:53:39 PST 2022 to Tue Mar 1 05:36:50 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 7,694

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Thu Feb 3 04:12:12 PST 2022 to Thu Feb 3 03:46:54 PST 2022.
Topic: crawldata
JOURNAL-HOMEPAGE-CRAWL-2022-03
web

eye 12,464

favorite 0

comment 0

Internet Archive crawldata of scholarly web journal content captured by wbgrp-svc279.us.archive.org:JOURNAL-HOMEPAGE-CRAWL-2022-03 from Thu Mar 31 20:37:17 PDT 2022 to Fri Apr 1 13:06:40 PDT 2022.
Topic: crawldata