Skip to main content
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
TARGETED-ARTICLE-CRAWL-2022-07
Aug 30, 2022
data

eye 0

favorite 0

comment 0

TARGETED-ARTICLE-CRAWL-2022-07
Aug 30, 2022
data

eye 0

favorite 0

comment 0

TARGETED-ARTICLE-CRAWL-2022-07
TARGETED-ARTICLE-CRAWL-2022-07
collection
0
ITEMS
72,691
VIEWS
Aug 1, 2022
collection

eye 72,691

UNPAYWALL-PDF-CRAWL-2022-04
Jul 6, 2022
data

eye 3

favorite 0

comment 0

DATASET-CRAWL-2022-01
May 17, 2022
data

eye 6

favorite 0

comment 0

DATASET-CRAWL-2022-01
May 17, 2022
data

eye 2

favorite 0

comment 0

TARGETED-ARTICLE-CRAWL-2022-04
May 14, 2022
data

eye 0

favorite 0

comment 0

JOURNAL-HOMEPAGE-CRAWL-2022-03
May 11, 2022
data

eye 3

favorite 0

comment 0

JOURNAL-HOMEPAGE-CRAWL-2022-03
May 11, 2022
data

eye 1

favorite 0

comment 0

TARGETED-ARTICLE-CRAWL-2022-04
May 11, 2022
data

eye 3

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2022-04
UNPAYWALL-PDF-CRAWL-2022-04
collection
38
ITEMS
102,638
VIEWS
Apr 20, 2022
collection

eye 102,638

TARGETED-ARTICLE-CRAWL-2022-04
TARGETED-ARTICLE-CRAWL-2022-04
collection
219
ITEMS
449,946
VIEWS
Apr 20, 2022
collection

eye 449,946

DOI-CRAWL-2022-02
Apr 8, 2022
data

eye 0

favorite 0

comment 0

DOI-CRAWL-2022-02
Apr 8, 2022
data

eye 0

favorite 0

comment 0

TARGETED-ARTICLE-CRAWL-2022-03
Mar 18, 2022
data

eye 0

favorite 0

comment 0

TARGETED-ARTICLE-CRAWL-2022-03
Mar 18, 2022
data

eye 1

favorite 0

comment 0

TARGETED-ARTICLE-CRAWL-2022-03
TARGETED-ARTICLE-CRAWL-2022-03
collection
9
ITEMS
69,963
VIEWS
Mar 8, 2022
collection

eye 69,963

JOURNAL-HOMEPAGE-CRAWL-2022-03
JOURNAL-HOMEPAGE-CRAWL-2022-03
collection
44
ITEMS
359,107
VIEWS
Mar 8, 2022
collection

eye 359,107

DOI-CRAWL-2022-02
DOI-CRAWL-2022-02
collection
25
ITEMS
299,978
VIEWS
Feb 23, 2022
collection

eye 299,978

JOURNALS-PATCH-CRAWL-2022-01
JOURNALS-PATCH-CRAWL-2022-01
collection
104
ITEMS
1.1M
VIEWS
Jan 13, 2022
collection

eye 1.1M

OAI-PMH-PATCH-CRAWL-2021-12
Jan 11, 2022
data

eye 0

favorite 0

comment 0

OAI-PMH-PATCH-CRAWL-2021-12
Jan 11, 2022
data

eye 1

favorite 0

comment 0

DATASET-CRAWL-2022-01
DATASET-CRAWL-2022-01
collection
2
ITEMS
5,348
VIEWS
Jan 5, 2022
collection

eye 5,348

OAI-PMH-PATCH-CRAWL-2021-12
OAI-PMH-PATCH-CRAWL-2021-12
collection
75
ITEMS
450,690
VIEWS
Dec 2, 2021
collection

eye 450,690

MAG-PDF-CRAWL-2021-08
MAG-PDF-CRAWL-2021-08
collection
189
ITEMS
1M
VIEWS
Aug 11, 2021
collection

eye 1M

UNPAYWALL-PDF-CRAWL-2021-07
UNPAYWALL-PDF-CRAWL-2021-07
collection
174
ITEMS
1.2M
VIEWS
Jul 14, 2021
collection

eye 1.2M

UNPAYWALL-PDF-CRAWL-2021-05
May 4, 2021
data

eye 9

favorite 0

comment 0

OA-DOI-CRAWL-2020-12
Dec 30, 2020
data

eye 0

favorite 0

comment 0

DOAJ-CRAWL-2020-11
Dec 2, 2020
data

eye 2

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2020-11
Nov 11, 2020 Internet Archive Web Group
data

eye 0

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2020-11
Nov 11, 2020 Internet Archive Web Group
data

eye 0

favorite 0

comment 0

OAI-PMH-CRAWL-2020-06
Aug 5, 2020 Internet Archive Web Group
data

eye 2

favorite 0

comment 0

OAI-PMH-CRAWL-2020-06
Aug 5, 2020 Internet Archive Web Group
data

eye 0

favorite 0

comment 0

SCIELO-CRAWL-2020-07
Jul 28, 2020
data

eye 2

favorite 0

comment 0

SCIELO-CRAWL-2020-07
Jul 27, 2020
data

eye 0

favorite 0

comment 0

OAI-PMH-CRAWL-2020-06
OAI-PMH-CRAWL-2020-06
collection
2,946
ITEMS
6.7M
VIEWS
May 28, 2020 Internet Archive Web Group
collection

eye 6.7M

UNPAYWALL-PDF-CRAWL-2020-05
May 15, 2020
data

eye 1

favorite 0

comment 0

ARXIV-PUBMEDCENTRAL-CRAWL-2020-04
data

eye 1

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2020-05
May 5, 2020
data

eye 1

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2020-05
UNPAYWALL-PDF-CRAWL-2020-05
collection
282
ITEMS
1.9M
VIEWS
May 4, 2020 Internet Archive Web Group
collection

eye 1.9M

ARXIV-PUBMEDCENTRAL-CRAWL-2020-04
Apr 27, 2020
data

eye 1

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2020-03
Mar 21, 2020
data

eye 0

favorite 0

comment 0

SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
Mar 5, 2020 Internet Archive Web Group
data

eye 1

favorite 0

comment 0

OA-DOI-CRAWL-2020-02
Feb 19, 2020
data

eye 1

favorite 0

comment 0

OA-DOI-CRAWL-2020-02
Feb 19, 2020
data

eye 1

favorite 0

comment 0

PUBMEDCENTRAL-CRAWL-2020-02
Feb 14, 2020
data

eye 0

favorite 0

comment 0

PUBMEDCENTRAL-CRAWL-2020-02
Feb 14, 2020
data

eye 1

favorite 0

comment 0

PubMed Central Crawl (2019-10)
Dec 24, 2019
data

eye 3

favorite 0

comment 0

PubMed Central Crawl (2019-10)
Dec 24, 2019
data

eye 3

favorite 0

comment 0

arXiv Content Crawl (2019-10)
Dec 24, 2019
data

eye 3

favorite 0

comment 0

arXiv Content Crawl (2019-10)
Dec 24, 2019
data

eye 3

favorite 0

comment 0

OA-JOURNAL-CRAWL-2019-08
OA-JOURNAL-CRAWL-2019-08
collection
201
ITEMS
3M
VIEWS
Aug 1, 2019 Internet Archive Web Group
collection

eye 3M

UNPAYWALL-PDF-CRAWL-2018-07
Jul 2, 2019 Internet Archive Web Group
data

eye 3

favorite 0

comment 0

OMICS-DOI-LANDING-CRAWL-2019-04
Apr 29, 2019 Internet Archive Web Group
data

eye 0

favorite 0

comment 0

OMICS-DOI-LANDING-CRAWL-2019-04
Apr 29, 2019 Internet Archive Web Group
data

eye 4

favorite 0

comment 0

DIRECT-OA-CRAWL-2019
Apr 11, 2019 Internet Archive Web Group
data

eye 5

favorite 0

comment 0

DIRECT-OA-CRAWL-2019
Apr 11, 2019 Internet Archive Web Group
data

eye 4

favorite 0

comment 0

CORE-UPSTREAM-CRAWL-2018-11
Dec 1, 2018 Internet Archive Web Group
data

eye 3

favorite 0

comment 0

"Full" crawl logs (for every hit) from CORE-UPSTREAM-CRAWL-2018-11 crawl. See also 'CORE-UPSTREAM-CRAWL-2018-11-CRL' item for reports etc.
CORE-UPSTREAM-CRAWL-2018-11
Dec 1, 2018 Internet Archive Web Group
data

eye 6

favorite 0

comment 0

Crawl reports and logs for CORE-UPSTREAM-CRAWL-2018-11 crawl. See also 'CORE-UPSTREAM-CRAWL-2018-11-full_crawl_logs' item.
UNPAYWALL-PDF-CRAWL-2018-07
Oct 30, 2018 Internet Archive Web Group
data

eye 20

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2018-07
Oct 30, 2018 Internet Archive Web Group
data

eye 21

favorite 0

comment 0

Custom Crawl Services
May 29, 2018 Internet Archive Web Group
data

eye 0

favorite 0

comment 0

This item contains a copy of log files found on the Internet Archive (Web Group) machine `wbgrp-svc263.us.archive.org` on 2018-05-29, under the `/3` directory. These are logs of file transfer status between various crawler machines; they are not known to contain any sensitive metadata (eg, personal information, IPs, or other security-sensitive information), but are being keep `access-restricted` anyways. This data is almost certainly unimportant and could be deleted; it is being preserved out...
Internet Archive Research Publication Crawls
Internet Archive Research Publication Crawls
collection
21,177
ITEMS
121.2M
VIEWS
Dec 19, 2017 Internet Archive Web Group
collection

eye 121.2M

A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications. This collection contains WARC and CDX files that end up in Wayback ( https://web.archive.org ). See also bibliographic metadata corpuses at  https://archive.org/details/ia_biblio_metadata
Wide Web Targeted PDF Crawling (2017)
data

eye 10

favorite 0

comment 0

This item contains checksums and file-level metadata for most (if not all) files collected in this crawl. The tab-separated-value (.tsv) file is similar to a CDX file but contains additional hashes.
Semantic Scholar PDF Seedlist Crawl (Summer 2017)
data

eye 7

favorite 0

comment 0

This item contains checksums and file-level metadata for most (if not all) files collected in this crawl. The tab-separated-value (.tsv) file is similar to a CDX file but contains additional hashes.
MSAG-PDF-CRAWL-2017
Sep 13, 2017
data

eye 10

favorite 0

comment 0

This item contains checksums and file-level metadata for most (if not all) files collected in this crawl. The tab-separated-value (.tsv) file is similar to a CDX file but contains additional hashes.
CiteSeerX URL Crawl 2017
Sep 8, 2017
data

eye 10

favorite 0

comment 0

This item contains checksums and file-level metadata for most (if not all) files collected in this crawl. The tab-separated-value (.tsv) file is similar to a CDX file but contains additional hashes.
CiteSeerX URL Crawl 2017
Jul 21, 2017
data

eye 4

favorite 0

comment 0

Configuration, Reports, and Logs for CITESEERX-CRAWL-2017 crawl.