Skip to main content
SHOW DETAILS
up-solid down-solid
eye
Title
Date Published
Creator
MAG-PDF-CRAWL-2020-07
Jul 1, 2020 Internet Archive Web Group
data

eye 0

favorite 0

comment 0

OA-JOURNAL-CRAWL-2020-07
Jul 1, 2020 Internet Archive Web Group
data

eye 2

favorite 0

comment 0

OA-JOURNAL-CRAWL-2020-07
Jul 1, 2020 Internet Archive Web Group
data

eye 1

favorite 0

comment 0

MAG-PDF-CRAWL-2020-07
Jul 1, 2020 Internet Archive Web Group
data

eye 0

favorite 0

comment 0

Internet Archive Research Publication Crawls
Apr 6, 2020 Wanfang Data
data

eye 4

favorite 0

comment 0

Metadata and some fulltext PDFs from Wanfang Data, downloaded 2020-03-29 from http://subject.med.wanfangdata.com.cn/Channel/7
Internet Archive Research Publication Crawls
Mar 29, 2020 CNKI
data

eye 0

favorite 0

comment 0

Metadata about COVID-19 papers downloaded from:  http://en.gzbd.cnki.net/GZBT/brief/Default.aspx
Internet Archive Research Publication Crawls
Mar 29, 2020 Wanfang Data
data

eye 6

favorite 0

comment 0

Metadata and some fulltext PDFs from Wanfang Data, downloaded 2020-03-29 from http://subject.med.wanfangdata.com.cn/Channel/7
UNPAYWALL-PDF-CRAWL-2019-04
Apr 1, 2019 Internet Archive Web Group
data

eye 2

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2019-04
Apr 1, 2019 Internet Archive Web Group
data

eye 0

favorite 0

comment 0

Open Access Journal Test Crawl (2018)
2019 Internet Archive Web Group
data

eye 8

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2018-07
Nov 11, 2018 Internet Archive Web Group
data

eye 1

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2018-07
Nov 10, 2018 Internet Archive Web Group
data

eye 1

favorite 0

comment 0

See also the crawl logs item for this crawl.
DOI-LANDING-CRAWL-2018-06
Aug 1, 2018 Internet Archive Web Group
data

eye 6

favorite 0

comment 0

DOI-LANDING-CRAWL-2018-06
Jun 1, 2018 Internet Archive Web Group
data

eye 9

favorite 0

comment 0

This item contains output files related to the DOI-LANDING-CRAWL-2018-06 crawl of Crossref DOI redirect landing pages: - list of Crossref DOI numbers attempted - an index of DOI, URL, and final HTTP status codes
DOI-LANDING-CRAWL-2018-06
Jun 1, 2018 Internet Archive Web Group
data

eye 6

favorite 0

comment 0

Custom Crawl Services
2012 Internet Archive Web Group
data

eye 0

favorite 0

comment 0

This item contains a copy of log files found on the Internet Archive (Web Group) machine `wbgrp-svc263.us.archive.org` on 2018-05-29, under the `/3` directory. These are logs of file transfer status between various crawler machines; they are not known to contain any sensitive metadata (eg, personal information, IPs, or other security-sensitive information), but are being keep `access-restricted` anyways. This data is almost certainly unimportant and could be deleted; it is being preserved out...
OA-DOI-CRAWL-2020-12
data

eye 0

favorite 0

comment 0

DIRECT-OA-CRAWL-2019
- Internet Archive Web Group
data

eye 4

favorite 0

comment 0

PUBMEDCENTRAL-CRAWL-2020-02
data

eye 0

favorite 0

comment 0

OAI-PMH-PATCH-CRAWL-2021-12
data

eye 1

favorite 0

comment 0

CORE-UPSTREAM-CRAWL-2018-11
- Internet Archive Web Group
data

eye 3

favorite 0

comment 0

"Full" crawl logs (for every hit) from CORE-UPSTREAM-CRAWL-2018-11 crawl. See also 'CORE-UPSTREAM-CRAWL-2018-11-CRL' item for reports etc.
DATASET-CRAWL-2022-01
data

eye 2

favorite 0

comment 0

DATASET-CRAWL-2022-01
data

eye 6

favorite 0

comment 0

OAI-PMH-CRAWL-2020-06
- Internet Archive Web Group
data

eye 2

favorite 0

comment 0

DOI-CRAWL-2022-02
data

eye 0

favorite 0

comment 0

OMICS-DOI-LANDING-CRAWL-2019-04
- Internet Archive Web Group
data

eye 4

favorite 0

comment 0

SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
- Internet Archive Web Group
data

eye 1

favorite 0

comment 0

This item contains checksums and file-level metadata for most (if not all) files collected in this crawl. The tab-separated-value (.tsv) file is similar to a CDX file but contains additional hashes.
CiteSeerX URL Crawl 2017
data

eye 4

favorite 0

comment 0

Configuration, Reports, and Logs for CITESEERX-CRAWL-2017 crawl.
SCIELO-CRAWL-2020-07
data

eye 0

favorite 0

comment 0

OAI-PMH-CRAWL-2020-06
- Internet Archive Web Group
data

eye 0

favorite 0

comment 0

DOI-CRAWL-2022-02
data

eye 0

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2018-07
- Internet Archive Web Group
data

eye 14

favorite 0

comment 0

OMICS-DOI-LANDING-CRAWL-2019-04
- Internet Archive Web Group
data

eye 0

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2020-11
- Internet Archive Web Group
data

eye 0

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2020-05
data

eye 1

favorite 0

comment 0

CORE-UPSTREAM-CRAWL-2018-11
- Internet Archive Web Group
data

eye 6

favorite 0

comment 0

Crawl reports and logs for CORE-UPSTREAM-CRAWL-2018-11 crawl. See also 'CORE-UPSTREAM-CRAWL-2018-11-full_crawl_logs' item.
UNPAYWALL-PDF-CRAWL-2018-07
- Internet Archive Web Group
data

eye 2

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2020-03
data

eye 0

favorite 0

comment 0

This item contains checksums and file-level metadata for most (if not all) files collected in this crawl. The tab-separated-value (.tsv) file is similar to a CDX file but contains additional hashes.
UNPAYWALL-PDF-CRAWL-2021-05
data

eye 6

favorite 0

comment 0

SCIELO-CRAWL-2020-07
data

eye 2

favorite 0

comment 0

OA-DOI-CRAWL-2020-02
data

eye 1

favorite 0

comment 0

This item contains checksums and file-level metadata for most (if not all) files collected in this crawl. The tab-separated-value (.tsv) file is similar to a CDX file but contains additional hashes.
DIRECT-OA-CRAWL-2019
- Internet Archive Web Group
data

eye 5

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2020-11
- Internet Archive Web Group
data

eye 0

favorite 0

comment 0

arXiv Content Crawl (2019-10)
data

eye 3

favorite 0

comment 0

DOAJ-CRAWL-2020-11
data

eye 2

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2018-07
- Internet Archive Web Group
data

eye 12

favorite 0

comment 0

OA-DOI-CRAWL-2020-02
data

eye 1

favorite 0

comment 0

MSAG-PDF-CRAWL-2017
data

eye 10

favorite 0

comment 0

This item contains checksums and file-level metadata for most (if not all) files collected in this crawl. The tab-separated-value (.tsv) file is similar to a CDX file but contains additional hashes.