Skip to main content

Internet Archive Research Publication Crawls

Internet Archive Web Group

A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications.



rss RSS

21,136
RESULTS


Show sorted alphabetically

Show sorted alphabetically

SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
OAI-PMH-CRAWL-2020-06
OAI-PMH-CRAWL-2020-06
collection
2,946
ITEMS
4.7M
VIEWS
by Internet Archive Web Group
collection

eye 4.7M

UNPAYWALL-PDF-CRAWL-2018-07
UNPAYWALL-PDF-CRAWL-2018-07
collection
1,241
ITEMS
14.3M
VIEWS
by Internet Archive Web Group
collection

eye 14.3M

Web archive data from a crawl of open access PDF URLs provided by Unpaywall.
OA-JOURNAL-CRAWL-2020-07
OA-JOURNAL-CRAWL-2020-07
collection
1,923
ITEMS
9.5M
VIEWS
by Internet Archive Web Group
collection

eye 9.5M

MSAG-PDF-CRAWL-2017
collection
1,855
ITEMS
11.5M
VIEWS
by Internet Archive Web Group
collection

eye 11.5M

Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
Open Access Journal Test Crawl (2018)
Open Access Journal Test Crawl (2018)
collection
794
ITEMS
10.6M
VIEWS
by Internet Archive Web Group
collection

eye 10.6M

UNPAYWALL-PDF-CRAWL-2019-04
UNPAYWALL-PDF-CRAWL-2019-04
collection
641
ITEMS
5.2M
VIEWS
by Internet Archive Web Group
collection

eye 5.2M

MAG-PDF-CRAWL-2020-03
MAG-PDF-CRAWL-2020-03
collection
489
ITEMS
3.6M
VIEWS
by Internet Archive Web Group
collection

eye 3.6M

DIRECT-OA-CRAWL-2019
DIRECT-OA-CRAWL-2019
collection
2,566
ITEMS
5.1M
VIEWS
by Internet Archive Web Group
collection

eye 5.1M

CORE-UPSTREAM-CRAWL-2018-11
CORE-UPSTREAM-CRAWL-2018-11
collection
741
ITEMS
1.5M
VIEWS
by Internet Archive Web Group
collection

eye 1.5M

Crawl of "upstream" URLs from CORE (core.ac.uk) metadata dump. Only a partial seedlist of files crawled.
JOURNALS-PATCH-CRAWL-2022-01
JOURNALS-PATCH-CRAWL-2022-01
collection
104
ITEMS
590,763
VIEWS
collection

eye 590,763

OA-DOI-CRAWL-2020-02
OA-DOI-CRAWL-2020-02
collection
278
ITEMS
3.2M
VIEWS
by Internet Archive Web Group
collection

eye 3.2M

UNPAYWALL-PDF-CRAWL-2020-03
UNPAYWALL-PDF-CRAWL-2020-03
collection
344
ITEMS
1.7M
VIEWS
by Internet Archive Web Group
collection

eye 1.7M

DATACITE-DOI-CRAWL-2020-01
DATACITE-DOI-CRAWL-2020-01
collection
1,417
ITEMS
3.6M
VIEWS
by Internet Archive Web Group
collection

eye 3.6M

OA-DOI-CRAWL-2020-12
OA-DOI-CRAWL-2020-12
collection
191
ITEMS
1.4M
VIEWS
by Internet Archive Web Group
collection

eye 1.4M

MAG-PDF-CRAWL-2021-08
MAG-PDF-CRAWL-2021-08
collection
189
ITEMS
660,560
VIEWS
collection

eye 660,560

UNPAYWALL-PDF-CRAWL-2021-07
UNPAYWALL-PDF-CRAWL-2021-07
collection
174
ITEMS
908,257
VIEWS
collection

eye 908,257

MAG-PDF-CRAWL-2020-07
MAG-PDF-CRAWL-2020-07
collection
196
ITEMS
1.5M
VIEWS
by Internet Archive Web Group
collection

eye 1.5M

UNPAYWALL-PDF-CRAWL-2020-11
UNPAYWALL-PDF-CRAWL-2020-11
collection
199
ITEMS
1.6M
VIEWS
by Internet Archive Web Group
collection

eye 1.6M

DOI-LANDING-CRAWL-2018-06
DOI-LANDING-CRAWL-2018-06
collection
279
ITEMS
3.2M
VIEWS
by Internet Archive Web Group
collection

eye 3.2M

TARGETED-ARTICLE-CRAWL-2022-04
TARGETED-ARTICLE-CRAWL-2022-04
collection
219
ITEMS
201,633
VIEWS
collection

eye 201,633

UNPAYWALL-PDF-CRAWL-2020-05
UNPAYWALL-PDF-CRAWL-2020-05
collection
282
ITEMS
1.6M
VIEWS
by Internet Archive Web Group
collection

eye 1.6M

Wide Web Targeted PDF Crawling (2017)
Wide Web Targeted PDF Crawling (2017)
collection
922
ITEMS
3M
VIEWS
by Internet Archive Web Group
collection

eye 3M

OA-JOURNAL-CRAWL-2019-08
OA-JOURNAL-CRAWL-2019-08
collection
201
ITEMS
2.7M
VIEWS
by Internet Archive Web Group
collection

eye 2.7M

PLATFORM-CRAWL-2020
PLATFORM-CRAWL-2020
collection
649
ITEMS
379,588
VIEWS
by Internet Archive Web Group
collection

eye 379,588

SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
collection
1,011
ITEMS
1.4M
VIEWS
by Internet Archive Web Group
collection

eye 1.4M

UNPAYWALL-PDF-CRAWL-2021-05
UNPAYWALL-PDF-CRAWL-2021-05
collection
123
ITEMS
841,987
VIEWS
by Internet Archive Web Group
collection

eye 841,987

collection

eye 1.9M

IA crawl of PDF urls provided by Semantic Scholar.
Topic: pdf
OAI-PMH-PATCH-CRAWL-2021-12
OAI-PMH-PATCH-CRAWL-2021-12
collection
75
ITEMS
284,225
VIEWS
collection

eye 284,225

DOAJ-CRAWL-2020-11
DOAJ-CRAWL-2020-11
collection
102
ITEMS
854,580
VIEWS
by Internet Archive Web Group
collection

eye 854,580

CiteSeerX URL Crawl 2017
CiteSeerX URL Crawl 2017
collection
207
ITEMS
1.1M
VIEWS
collection

eye 1.1M

A targeted crawl to fetch research publications from the public web which have been crawled by CiteSeerX but have not previously been crawled by the Internet Archive.
Topics: scholarly, papers, journal
DOI-CRAWL-2022-02
DOI-CRAWL-2022-02
collection
25
ITEMS
166,413
VIEWS
collection

eye 166,413

JOURNAL-HOMEPAGE-CRAWL-2022-03
JOURNAL-HOMEPAGE-CRAWL-2022-03
collection
44
ITEMS
218,024
VIEWS
collection

eye 218,024

PubMed Central Crawl (2019-10)
PubMed Central Crawl (2019-10)
collection
216
ITEMS
400,669
VIEWS
by Internet Archive Web Group
collection

eye 400,669

PUBMEDCENTRAL-CRAWL-2020-02
PUBMEDCENTRAL-CRAWL-2020-02
collection
108
ITEMS
232,578
VIEWS
by Internet Archive Web Group
collection

eye 232,578

ARXIV-PUBMEDCENTRAL-CRAWL-2020-04
ARXIV-PUBMEDCENTRAL-CRAWL-2020-04
collection
60
ITEMS
101,172
VIEWS
by Internet Archive Web Group
collection

eye 101,172

arXiv Content Crawl (2019-10)
arXiv Content Crawl (2019-10)
collection
37
ITEMS
64,150
VIEWS
by Internet Archive Web Group
collection

eye 64,150

TARGETED-ARTICLE-CRAWL-2022-03
TARGETED-ARTICLE-CRAWL-2022-03
collection
9
ITEMS
44,165
VIEWS
collection

eye 44,165

OA-JOURNAL-CRAWL-2020-07
web

eye 195,493

favorite 0

comment 0

Internet Archive crawldata of scholarly web journal content captured by wbgrp-svc282.us.archive.org:OA-JOURNAL-CRAWL-2020-07 from Sun Aug 2 19:00:58 PDT 2020 to Sun Aug 2 13:24:24 PDT 2020.
Topic: crawldata
SCIELO-CRAWL-2020-07
SCIELO-CRAWL-2020-07
collection
41
ITEMS
187,797
VIEWS
by Internet Archive Web Group
collection

eye 187,797

UNPAYWALL-PDF-CRAWL-2022-04
UNPAYWALL-PDF-CRAWL-2022-04
collection
38
ITEMS
11,972
VIEWS
collection

eye 11,972

JOURNALS-PATCH-CRAWL-2022-01
web

eye 14,144

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 9 06:43:52 PST 2022 to Wed Feb 9 06:06:53 PST 2022.
Topic: crawldata
DOAJ-CRAWL-2020-11
web

eye 90,259

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc279.us.archive.org:DOAJ-CRAWL-2020-11 from Tue Nov 24 17:59:21 PST 2020 to Tue Nov 24 11:43:19 PST 2020.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 17,725

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sun Jan 16 16:05:54 PST 2022 to Sun Jan 16 16:33:31 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 15,552

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 2 04:32:21 PST 2022 to Wed Feb 2 06:24:58 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 10,862

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 9 12:34:39 PST 2022 to Wed Feb 9 13:13:37 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 19,060

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Fri Mar 4 08:19:11 PST 2022 to Tue Mar 8 18:29:43 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 10,490

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Sat Feb 26 14:02:15 PST 2022 to Sun Feb 27 05:47:42 PST 2022.
Topic: crawldata
OA-DOI-CRAWL-2020-12
web

eye 33,464

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc279.us.archive.org:OA-DOI-CRAWL-2020-12 from Wed Dec 9 22:59:12 PST 2020 to Wed Dec 9 15:45:33 PST 2020.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 20,531

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Feb 23 02:01:38 PST 2022 to Wed Feb 23 15:48:40 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 9,121

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Tue Mar 8 20:50:17 PST 2022 to Wed Mar 9 18:29:43 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 13,099

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sun Jan 16 23:09:54 PST 2022 to Sun Jan 16 23:27:17 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 9,323

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Mar 2 07:41:16 PST 2022 to Thu Mar 3 05:41:51 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 8,945

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 9 19:49:10 PST 2022 to Wed Feb 9 17:48:49 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 8,913

favorite 1

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Fri Feb 25 14:02:24 PST 2022 to Sat Feb 26 06:00:57 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 13,188

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Feb 23 18:50:55 PST 2022 to Thu Feb 24 11:23:51 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 10,122

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sun Jan 16 08:47:48 PST 2022 to Sun Jan 16 09:46:08 PST 2022.
Topic: crawldata
UNPAYWALL-PDF-CRAWL-2018-07
web

eye 45,657

favorite 0

comment 0

Internet Archive crawldata of open access journal content captured by wbgrp-svc281.us.archive.org:UNPAYWALL-PDF-CRAWL-2018-07 from Sun Jul 29 09:54:12 PDT 2018 to Sun Jul 29 04:01:42 PDT 2018.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 8,867

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Thu Feb 24 14:01:46 PST 2022 to Fri Feb 25 11:54:58 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 8,152

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Tue Mar 1 07:52:41 PST 2022 to Wed Mar 2 05:33:50 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 8,513

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Fri Feb 4 02:18:39 PST 2022 to Fri Feb 4 01:48:51 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 8,441

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sat Feb 5 14:19:42 PST 2022 to Sat Feb 5 15:31:51 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 7,925

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Thu Feb 3 04:12:12 PST 2022 to Thu Feb 3 03:46:54 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 7,963

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Thu Mar 3 07:55:41 PST 2022 to Fri Mar 4 06:00:57 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 9,225

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Sun Feb 27 13:18:39 PST 2022 to Mon Feb 28 05:15:19 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 7,985

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Fri Feb 4 09:38:55 PST 2022 to Fri Feb 4 10:01:13 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 7,225

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Fri Mar 11 02:31:16 PST 2022 to Sun Mar 13 07:29:43 PDT 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 8,510

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Tue Feb 1 19:10:45 PST 2022 to Tue Feb 1 22:31:42 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 6,509

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Fri Jan 28 18:32:02 PST 2022 to Fri Jan 28 18:24:26 PST 2022.
Topic: crawldata
JOURNAL-HOMEPAGE-CRAWL-2022-03
web

eye 9,775

favorite 0

comment 0

Internet Archive crawldata of scholarly web journal content captured by wbgrp-svc279.us.archive.org:JOURNAL-HOMEPAGE-CRAWL-2022-03 from Wed Mar 30 20:42:30 PDT 2022 to Thu Mar 31 18:02:39 PDT 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 7,713

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Thu Feb 3 11:41:05 PST 2022 to Thu Feb 3 11:55:15 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 7,439

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sat Feb 5 21:54:41 PST 2022 to Sat Feb 5 22:00:45 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 7,949

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Mon Feb 28 07:53:39 PST 2022 to Tue Mar 1 05:36:50 PST 2022.
Topic: crawldata
JOURNAL-HOMEPAGE-CRAWL-2022-03
web

eye 21,816

favorite 0

comment 0

Internet Archive crawldata of scholarly web journal content captured by wbgrp-svc279.us.archive.org:JOURNAL-HOMEPAGE-CRAWL-2022-03 from Thu Mar 10 03:08:12 PST 2022 to Fri Mar 11 04:09:37 PST 2022.
Topic: crawldata
JOURNALS-PATCH-CRAWL-2022-01
web

eye 7,223

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sat Feb 5 06:11:43 PST 2022 to Sat Feb 5 08:08:47 PST 2022.
Topic: crawldata
DOI-CRAWL-2022-02
web

eye 7,096

favorite 0

comment 0

Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Mar 9 20:55:55 PST 2022 to Fri Mar 11 02:34:43 PST 2022.
Topic: crawldata