Skip to main content

341
UPLOADS


More right-solid

Show sorted alphabetically

Show sorted alphabetically

More right-solid
SHOW DETAILS
up-solid down-solid
eye
Title
Date Reviewed
Creator
UNPAYWALL-PDF-CRAWL-2018-07
- Internet Archive Web Group
data

eye 1

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 49

favorite 0

comment 0

Bulk Bibliographic Metadata
- Internet Archive Web Group
data

eye 4

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 4

favorite 0

comment 0

Bulk Bibliographic Metadata
Bulk Bibliographic Metadata
collection
223
ITEMS
17,195
VIEWS
- Internet Archive Web Group
collection

eye 17,195

This collection contains both external ("upstream") metadata dumps and Internet Archive generated databases and reports on our holdings of papers, books, and other documents.
Bulk Bibliographic Metadata
data

eye 27

favorite 0

comment 0

Contains a TSV file with SHA1, file size, wayback URLs, and metadata extracted from PDF by GROBID. Not intended for external use, but might be interested. DOES NOT CONTAIN FULLTEXT CONTENT.
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 17

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 26

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 34

favorite 0

comment 0

Bulk Bibliographic Metadata
- Internet Archive Web Group
data

eye 56

favorite 0

comment 0

This item contains sqlite3 database snapshots, URL crawl status, and other metadata useful for doing analytics on journal OA coverage, homepage status, etc. Particularly in the context of https://fatcat.wiki. Source code: https://github.com/bnewbold/chocula
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 11

favorite 0

comment 0

SCIELO-CRAWL-2020-07
SCIELO-CRAWL-2020-07
collection
41
ITEMS
187,797
VIEWS
- Internet Archive Web Group
collection

eye 187,797

MSAG-PDF-CRAWL-2017
collection
1,855
ITEMS
11.5M
VIEWS
- Internet Archive Web Group
collection

eye 11.5M

Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
Open Access Journal Test Crawl (2018)
Open Access Journal Test Crawl (2018)
collection
794
ITEMS
10.6M
VIEWS
- Internet Archive Web Group
collection

eye 10.6M

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 05:01:45 PDT 2017 to Tue Jul 4 22:50:03 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:35:29 PDT 2017 to Wed Jul 5 00:48:05 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 08:33:05 PDT 2017 to Wed Jul 5 01:46:32 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:44:33 PDT 2017 to Wed Jul 5 00:57:33 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 05:45:37 PDT 2017 to Tue Jul 4 23:00:36 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 06:37:37 PDT 2017 to Tue Jul 4 23:54:39 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:07:49 PDT 2017 to Wed Jul 5 00:18:34 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 06:27:07 PDT 2017 to Tue Jul 4 23:40:46 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 08:03:45 PDT 2017 to Wed Jul 5 01:16:31 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:17:48 PDT 2017 to Wed Jul 5 05:29:23 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:48:17 PDT 2017 to Wed Jul 5 05:01:28 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:08:27 PDT 2017 to Wed Jul 5 05:22:22 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 09:55:35 PDT 2017 to Wed Jul 5 03:08:11 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:59:15 PDT 2017 to Wed Jul 5 05:11:17 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 10:26:12 PDT 2017 to Wed Jul 5 03:41:41 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 18:09:15 PDT 2017 to Wed Jul 5 11:23:47 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 00:01:18 PDT 2017 to Wed Jul 5 17:59:10 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 01:36:31 PDT 2017 to Wed Jul 5 18:48:11 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 02:20:32 PDT 2017 to Wed Jul 5 19:33:31 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:01:58 PDT 2017 to Wed Jul 5 20:14:53 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:48:37 PDT 2017 to Thu Jul 6 03:05:10 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:58:36 PDT 2017 to Thu Jul 6 05:13:01 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:32:04 PDT 2017 to Thu Jul 6 00:46:06 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:53:13 PDT 2017 to Thu Jul 6 02:05:47 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:08:03 PDT 2017 to Thu Jul 6 04:22:39 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:38:29 PDT 2017 to Thu Jul 6 02:51:41 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 06:57:33 PDT 2017 to Thu Jul 6 02:08:23 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 12:19:36 PDT 2017 to Thu Jul 6 05:34:14 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:42:06 PDT 2017 to Thu Jul 6 00:54:28 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 05:56:39 PDT 2017 to Wed Jul 5 23:08:54 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Fri Jul 7 03:24:06 PDT 2017 to Thu Jul 6 21:44:17 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 22:56:35 PDT 2017 to Thu Jul 6 17:34:26 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 14:52:50 PDT 2017 to Thu Jul 6 08:15:06 PDT 2017.
Topic: crawldata
MAG-PDF-CRAWL-2020-07
MAG-PDF-CRAWL-2020-07
collection
196
ITEMS
1.5M
VIEWS
- Internet Archive Web Group
collection

eye 1.5M

OMICS-DOI-LANDING-CRAWL-2019-04
OMICS-DOI-LANDING-CRAWL-2019-04
collection
4
ITEMS
13,812
VIEWS
- Internet Archive Web Group
collection

eye 13,812

This crawl started in April 2019, as an informal collaboration with Crossref. Crawling a smallish number (100k) DOI redirects and landing pages (plus PDF outlinks, and maybe a couple other hops) for a single large publisher (OMICS, which has multiple subsidiaries). Intent is to get reasonably good capture that can be used as canonical preservation copies of the landing pages. Secondary goal is to get decent fulltext capture coverage.
Web PDF GROBID Corpus (July 2019)
Web PDF GROBID Corpus (July 2019)
collection
10
ITEMS
17
VIEWS
- Internet Archive Web Group
collection

eye 17

Custom Crawl Services
- Internet Archive Web Group
data

eye 0

favorite 0

comment 0

This item contains a copy of log files found on the Internet Archive (Web Group) machine `wbgrp-svc263.us.archive.org` on 2018-05-29, under the `/3` directory. These are logs of file transfer status between various crawler machines; they are not known to contain any sensitive metadata (eg, personal information, IPs, or other security-sensitive information), but are being keep `access-restricted` anyways. This data is almost certainly unimportant and could be deleted; it is being preserved out...
UNPAYWALL-PDF-CRAWL-2019-04
- Internet Archive Web Group
data

eye 2

favorite 0

comment 0

DOAJ-CRAWL-2020-11
DOAJ-CRAWL-2020-11
collection
102
ITEMS
854,580
VIEWS
- Internet Archive Web Group
collection

eye 854,580

DOI-LANDING-CRAWL-2018-06
DOI-LANDING-CRAWL-2018-06
collection
279
ITEMS
3.2M
VIEWS
- Internet Archive Web Group
collection

eye 3.2M

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:54:24 PDT 2017 to Wed Jul 5 01:08:02 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 06:58:20 PDT 2017 to Wed Jul 5 00:11:16 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 09:03:33 PDT 2017 to Wed Jul 5 02:16:39 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 17:47:56 PDT 2017 to Wed Jul 5 11:02:06 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 13:27:22 PDT 2017 to Wed Jul 5 06:40:32 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 16:49:18 PDT 2017 to Wed Jul 5 10:04:13 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 10:48:22 PDT 2017 to Wed Jul 5 04:00:37 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 02:31:51 PDT 2017 to Wed Jul 5 19:45:00 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:19:38 PDT 2017 to Thu Jul 6 04:33:05 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Fri Jul 7 06:46:26 PDT 2017 to Fri Jul 14 15:21:22 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 06:56:02 PDT 2017 to Thu Jul 6 00:08:40 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 12:53:52 PDT 2017 to Thu Jul 6 06:07:14 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:35:43 PDT 2017 to Thu Jul 6 01:46:47 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:31:07 PDT 2017 to Thu Jul 6 08:54:41 PDT 2017.
Topic: crawldata
CORE-UPSTREAM-CRAWL-2018-11
CORE-UPSTREAM-CRAWL-2018-11
collection
741
ITEMS
1.5M
VIEWS
- Internet Archive Web Group
collection

eye 1.5M

Crawl of "upstream" URLs from CORE (core.ac.uk) metadata dump. Only a partial seedlist of files crawled.
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 18:19:45 PDT 2017 to Wed Jul 5 11:33:54 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 18:52:51 PDT 2017 to Wed Jul 5 12:06:48 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 22:05:23 PDT 2017 to Wed Jul 5 15:42:16 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 18:41:03 PDT 2017 to Wed Jul 5 11:56:32 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 02:09:00 PDT 2017 to Wed Jul 5 19:24:01 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 19:41:55 PDT 2017 to Wed Jul 5 12:59:15 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 15:07:36 PDT 2017 to Thu Jul 6 08:28:04 PDT 2017.
Topic: crawldata
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 13:45:57 PDT 2017 to Thu Jul 6 07:00:09 PDT 2017.
Topic: crawldata
Web PDF Training Sets
- Internet Archive Web Group
data

eye 41

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 46

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 18

favorite 0

comment 0

This item contains bulk metadata exported from https://fatcat.wiki. With the exception of the 'abstracts' file (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to upstream sources (including Crossref, ORCID, DOAJ, the ISSN...
Web PDF Training Sets
- Internet Archive Web Group
data

eye 32

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 16

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 55

favorite 1

comment 0

Bulk Bibliographic Metadata
- Internet Archive Web Group
data

eye 18

favorite 1

comment 0

URL lists to PDFs on the web (and preserved in the wayback machine) which are likely to contain research materials.
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 197

favorite 3

comment 0

See: https://guide.fatcat.wiki/reference_graph.html License: CC-0
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 32

favorite 0

comment 0

OAI-PMH-CRAWL-2020-06
- Internet Archive Web Group
data

eye 2

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2019-04
UNPAYWALL-PDF-CRAWL-2019-04
collection
641
ITEMS
5.2M
VIEWS
- Internet Archive Web Group
collection

eye 5.2M

UNPAYWALL-PDF-CRAWL-2020-03
UNPAYWALL-PDF-CRAWL-2020-03
collection
344
ITEMS
1.7M
VIEWS
- Internet Archive Web Group
collection

eye 1.7M

SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
collection
1,011
ITEMS
1.4M
VIEWS
- Internet Archive Web Group
collection

eye 1.4M

arXiv Content Crawl (2019-10)
arXiv Content Crawl (2019-10)
collection
37
ITEMS
64,150
VIEWS
- Internet Archive Web Group
collection

eye 64,150

UNPAYWALL-PDF-CRAWL-2018-07
- Internet Archive Web Group
data

eye 1

favorite 0

comment 0

See also the crawl logs item for this crawl.
MAG-PDF-CRAWL-2020-07
- Internet Archive Web Group
data

eye 0

favorite 0

comment 0

OA-JOURNAL-CRAWL-2020-07
- Internet Archive Web Group
data

eye 2

favorite 0

comment 0

OMICS-DOI-LANDING-CRAWL-2019-04
- Internet Archive Web Group
data

eye 4

favorite 0

comment 0

SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
- Internet Archive Web Group
data

eye 1

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 6

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 10

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 32

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 16:02:30 PDT 2017 to Wed Jul 5 09:16:52 PDT 2017.
Topic: crawldata