4.7M
4.7M
May 28, 2020
05/20
by
Internet Archive Web Group
14.3M
14M
Jul 17, 2018
07/18
by
Internet Archive Web Group
Web archive data from a crawl of open access PDF URLs provided by Unpaywall.
9.5M
9.5M
Jul 6, 2020
07/20
by
Internet Archive Web Group
11.5M
12M
Aug 4, 2017
08/17
by
Internet Archive Web Group
Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
10.6M
11M
Apr 9, 2018
04/18
by
Internet Archive Web Group
5.2M
5.2M
Apr 26, 2019
04/19
by
Internet Archive Web Group
3.6M
3.6M
Mar 5, 2020
03/20
by
Internet Archive Web Group
5.1M
5.1M
Feb 15, 2019
02/19
by
Internet Archive Web Group
1.5M
1.5M
Oct 31, 2018
10/18
by
Internet Archive Web Group
Crawl of "upstream" URLs from CORE (core.ac.uk) metadata dump. Only a partial seedlist of files crawled.
3.2M
3.2M
Feb 5, 2020
02/20
by
Internet Archive Web Group
1.7M
1.7M
Mar 5, 2020
03/20
by
Internet Archive Web Group
3.6M
3.6M
Jan 24, 2020
01/20
by
Internet Archive Web Group
1.4M
1.4M
Dec 9, 2020
12/20
by
Internet Archive Web Group
1.5M
1.5M
Jul 9, 2020
07/20
by
Internet Archive Web Group
1.6M
1.6M
Nov 2, 2020
11/20
by
Internet Archive Web Group
3.2M
3.2M
Jun 1, 2018
06/18
by
Internet Archive Web Group
1.6M
1.6M
May 4, 2020
05/20
by
Internet Archive Web Group
3M
3.0M
Sep 21, 2017
09/17
by
Internet Archive Web Group
2.7M
2.7M
Aug 1, 2019
08/19
by
Internet Archive Web Group
379,588
380K
Aug 27, 2020
08/20
by
Internet Archive Web Group
1.4M
1.4M
Feb 6, 2020
02/20
by
Internet Archive Web Group
841,987
842K
Apr 27, 2021
04/21
by
Internet Archive Web Group
IA crawl of PDF urls provided by Semantic Scholar.
Topic: pdf
854,580
855K
Nov 24, 2020
11/20
by
Internet Archive Web Group
A targeted crawl to fetch research publications from the public web which have been crawled by CiteSeerX but have not previously been crawled by the Internet Archive.
Topics: scholarly, papers, journal
400,669
401K
Oct 12, 2019
10/19
by
Internet Archive Web Group
232,578
233K
Feb 5, 2020
02/20
by
Internet Archive Web Group
101,172
101K
Apr 24, 2020
04/20
by
Internet Archive Web Group
64,150
64K
Oct 12, 2019
10/19
by
Internet Archive Web Group
195,493
195K
Aug 2, 2020
08/20
by
Internet Archive
web
eye 195,493
favorite 0
comment 0
Internet Archive crawldata of scholarly web journal content captured by wbgrp-svc282.us.archive.org:OA-JOURNAL-CRAWL-2020-07 from Sun Aug 2 19:00:58 PDT 2020 to Sun Aug 2 13:24:24 PDT 2020.
Topic: crawldata
187,797
188K
Jul 6, 2020
07/20
by
Internet Archive Web Group
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 9 06:43:52 PST 2022 to Wed Feb 9 06:06:53 PST 2022.
Topic: crawldata
90,259
90K
Nov 24, 2020
11/20
by
Internet Archive
web
eye 90,259
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc279.us.archive.org:DOAJ-CRAWL-2020-11 from Tue Nov 24 17:59:21 PST 2020 to Tue Nov 24 11:43:19 PST 2020.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sun Jan 16 16:05:54 PST 2022 to Sun Jan 16 16:33:31 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 2 04:32:21 PST 2022 to Wed Feb 2 06:24:58 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 9 12:34:39 PST 2022 to Wed Feb 9 13:13:37 PST 2022.
Topic: crawldata
19,060
19K
Mar 9, 2022
03/22
by
Internet Archive
web
eye 19,060
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Fri Mar 4 08:19:11 PST 2022 to Tue Mar 8 18:29:43 PST 2022.
Topic: crawldata
10,490
10K
Feb 27, 2022
02/22
by
Internet Archive
web
eye 10,490
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Sat Feb 26 14:02:15 PST 2022 to Sun Feb 27 05:47:42 PST 2022.
Topic: crawldata
33,464
33K
Dec 11, 2020
12/20
by
Internet Archive
web
eye 33,464
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc279.us.archive.org:OA-DOI-CRAWL-2020-12 from Wed Dec 9 22:59:12 PST 2020 to Wed Dec 9 15:45:33 PST 2020.
Topic: crawldata
20,531
21K
Feb 24, 2022
02/22
by
Internet Archive
web
eye 20,531
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Feb 23 02:01:38 PST 2022 to Wed Feb 23 15:48:40 PST 2022.
Topic: crawldata
9,121
9.1K
Mar 10, 2022
03/22
by
Internet Archive
web
eye 9,121
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Tue Mar 8 20:50:17 PST 2022 to Wed Mar 9 18:29:43 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sun Jan 16 23:09:54 PST 2022 to Sun Jan 16 23:27:17 PST 2022.
Topic: crawldata
9,323
9.3K
Mar 3, 2022
03/22
by
Internet Archive
web
eye 9,323
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Mar 2 07:41:16 PST 2022 to Thu Mar 3 05:41:51 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Wed Feb 9 19:49:10 PST 2022 to Wed Feb 9 17:48:49 PST 2022.
Topic: crawldata
8,913
8.9K
Feb 26, 2022
02/22
by
Internet Archive
web
eye 8,913
favorite 1
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Fri Feb 25 14:02:24 PST 2022 to Sat Feb 26 06:00:57 PST 2022.
Topic: crawldata
13,188
13K
Feb 24, 2022
02/22
by
Internet Archive
web
eye 13,188
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Feb 23 18:50:55 PST 2022 to Thu Feb 24 11:23:51 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sun Jan 16 08:47:48 PST 2022 to Sun Jan 16 09:46:08 PST 2022.
Topic: crawldata
Internet Archive crawldata of open access journal content captured by wbgrp-svc281.us.archive.org:UNPAYWALL-PDF-CRAWL-2018-07 from Sun Jul 29 09:54:12 PDT 2018 to Sun Jul 29 04:01:42 PDT 2018.
Topic: crawldata
8,867
8.9K
Feb 25, 2022
02/22
by
Internet Archive
web
eye 8,867
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Thu Feb 24 14:01:46 PST 2022 to Fri Feb 25 11:54:58 PST 2022.
Topic: crawldata
8,152
8.2K
Mar 2, 2022
03/22
by
Internet Archive
web
eye 8,152
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Tue Mar 1 07:52:41 PST 2022 to Wed Mar 2 05:33:50 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Fri Feb 4 02:18:39 PST 2022 to Fri Feb 4 01:48:51 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sat Feb 5 14:19:42 PST 2022 to Sat Feb 5 15:31:51 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Thu Feb 3 04:12:12 PST 2022 to Thu Feb 3 03:46:54 PST 2022.
Topic: crawldata
7,963
8.0K
Mar 9, 2022
03/22
by
Internet Archive
web
eye 7,963
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Thu Mar 3 07:55:41 PST 2022 to Fri Mar 4 06:00:57 PST 2022.
Topic: crawldata
9,225
9.2K
Feb 28, 2022
02/22
by
Internet Archive
web
eye 9,225
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Sun Feb 27 13:18:39 PST 2022 to Mon Feb 28 05:15:19 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Fri Feb 4 09:38:55 PST 2022 to Fri Feb 4 10:01:13 PST 2022.
Topic: crawldata
7,225
7.2K
Mar 14, 2022
03/22
by
Internet Archive
web
eye 7,225
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Fri Mar 11 02:31:16 PST 2022 to Sun Mar 13 07:29:43 PDT 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Tue Feb 1 19:10:45 PST 2022 to Tue Feb 1 22:31:42 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Fri Jan 28 18:32:02 PST 2022 to Fri Jan 28 18:24:26 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web journal content captured by wbgrp-svc279.us.archive.org:JOURNAL-HOMEPAGE-CRAWL-2022-03 from Wed Mar 30 20:42:30 PDT 2022 to Thu Mar 31 18:02:39 PDT 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Thu Feb 3 11:41:05 PST 2022 to Thu Feb 3 11:55:15 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sat Feb 5 21:54:41 PST 2022 to Sat Feb 5 22:00:45 PST 2022.
Topic: crawldata
7,949
7.9K
Mar 1, 2022
03/22
by
Internet Archive
web
eye 7,949
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Mon Feb 28 07:53:39 PST 2022 to Tue Mar 1 05:36:50 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web journal content captured by wbgrp-svc279.us.archive.org:JOURNAL-HOMEPAGE-CRAWL-2022-03 from Thu Mar 10 03:08:12 PST 2022 to Fri Mar 11 04:09:37 PST 2022.
Topic: crawldata
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:JOURNALS-PATCH-CRAWL-2022-01 from Sat Feb 5 06:11:43 PST 2022 to Sat Feb 5 08:08:47 PST 2022.
Topic: crawldata
7,096
7.1K
Mar 11, 2022
03/22
by
Internet Archive
web
eye 7,096
favorite 0
comment 0
Internet Archive crawldata of scholarly web landing page content captured by wbgrp-svc206.us.archive.org:DOI-CRAWL-2022-02 from Wed Mar 9 20:55:55 PST 2022 to Fri Mar 11 02:34:43 PST 2022.
Topic: crawldata