Skip to main content
SHOW DETAILS
eye
Title
Date Archived
Creator
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 34

favorite 0

comment 0

OMICS-DOI-LANDING-CRAWL-2019-04
OMICS-DOI-LANDING-CRAWL-2019-04
collection
4
ITEMS
13,968
VIEWS
by Internet Archive Web Group
collection

eye 13,968

This crawl started in April 2019, as an informal collaboration with Crossref. Crawling a smallish number (100k) DOI redirects and landing pages (plus PDF outlinks, and maybe a couple other hops) for a single large publisher (OMICS, which has multiple subsidiaries). Intent is to get reasonably good capture that can be used as canonical preservation copies of the landing pages. Secondary goal is to get decent fulltext capture coverage.
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 49

favorite 0

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 4

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 11

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 16

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 55

favorite 1

comment 0

UNPAYWALL-PDF-CRAWL-2019-04
by Internet Archive Web Group
data

eye 2

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 18

favorite 0

comment 0

This item contains bulk metadata exported from https://fatcat.wiki. With the exception of the 'abstracts' file (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to upstream sources (including Crossref, ORCID, DOAJ, the ISSN...
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 46

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 200

favorite 3

comment 0

See: https://guide.fatcat.wiki/reference_graph.html License: CC-0
UNPAYWALL-PDF-CRAWL-2019-04
UNPAYWALL-PDF-CRAWL-2019-04
collection
641
ITEMS
5.6M
VIEWS
by Internet Archive Web Group
collection

eye 5.6M

arXiv Content Crawl (2019-10)
arXiv Content Crawl (2019-10)
collection
37
ITEMS
72,947
VIEWS
by Internet Archive Web Group
collection

eye 72,947

UNPAYWALL-PDF-CRAWL-2019-04
by Internet Archive Web Group
data

eye 0

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 10

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 22

favorite 1

comment 0

Web PDF GROBID Corpus (June 2019)
Web PDF GROBID Corpus (June 2019)
collection
10
ITEMS
50
VIEWS
by Internet Archive Web Group
collection

eye 50

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 19

favorite 0

comment 0

See README.md
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 16

favorite 0

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 10

favorite 1

comment 0

URL lists to PDFs on the web (and preserved in the wayback machine) which are likely to contain research materials.
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 12

favorite 0

comment 0

PubMed Central Crawl (2019-10)
PubMed Central Crawl (2019-10)
collection
216
ITEMS
431,481
VIEWS
by Internet Archive Web Group
collection

eye 431,481

Community Texts
by Internet Archive Web Group
texts

eye 5

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2021-05
UNPAYWALL-PDF-CRAWL-2021-05
collection
123
ITEMS
906,382
VIEWS
by Internet Archive Web Group
collection

eye 906,382

Open Access Journal Test Crawl (2018)
by Internet Archive Web Group
data

eye 8

favorite 0

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 28

favorite 0

comment 0

This dump includes all tables (including oauth authentication tables which could be a privacy, but not security, concern). At this time only IA staff have accounts, so the snapshot, which is intended mostly for disaster recovery, is still public.
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 21

favorite 0

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 23

favorite 0

comment 0

This item contains some bulk research affiliation datasets from Internet Archive cataloging efforts. These are mostly strings included in research papers that indicate the institutional affiliations of specific authors (eg, with a home department, university, or company) at the time of publication. These might be useful datasets for efforts to build complete indices of research organizations, or to test normalization code that maps raw strings to organization identifiers. Attribution and links...
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 76

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 27

favorite 1

comment 0

This item contains a complete PostgreSQL SQL database snapshot from https://fatcat.wiki, in binary 'pg_dump tar mode' format. With the exception of the 'abstracts' table (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to...
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 28

favorite 0

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 39

favorite 0

comment 0

This item contains a complete PostgreSQL SQL database snapshot from https://fatcat.wiki, in binary 'pg_dump tar mode' format. With the exception of the 'abstracts' table (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to...
DIRECT-OA-CRAWL-2019
DIRECT-OA-CRAWL-2019
collection
2,566
ITEMS
5.3M
VIEWS
by Internet Archive Web Group
collection

eye 5.3M