Skip to main content
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
DIRECT-OA-CRAWL-2019
DIRECT-OA-CRAWL-2019
collection
2,566
ITEMS
5.6M
VIEWS
by Internet Archive Web Group
collection

eye 5.6M

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 18

favorite 0

comment 0

This item contains bulk metadata exported from https://fatcat.wiki. With the exception of the 'abstracts' file (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to upstream sources (including Crossref, ORCID, DOAJ, the ISSN...
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 16

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 21

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 34

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 16

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 22

favorite 1

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 27

favorite 1

comment 0

This item contains a complete PostgreSQL SQL database snapshot from https://fatcat.wiki, in binary 'pg_dump tar mode' format. With the exception of the 'abstracts' table (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to...
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 28

favorite 0

comment 0

This dump includes all tables (including oauth authentication tables which could be a privacy, but not security, concern). At this time only IA staff have accounts, so the snapshot, which is intended mostly for disaster recovery, is still public.
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 12

favorite 0

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 39

favorite 0

comment 0

This item contains a complete PostgreSQL SQL database snapshot from https://fatcat.wiki, in binary 'pg_dump tar mode' format. With the exception of the 'abstracts' table (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to...
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 19

favorite 0

comment 0

See README.md
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 4

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 28

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 11

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 10

favorite 0

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 23

favorite 0

comment 0

This item contains some bulk research affiliation datasets from Internet Archive cataloging efforts. These are mostly strings included in research papers that indicate the institutional affiliations of specific authors (eg, with a home department, university, or company) at the time of publication. These might be useful datasets for efforts to build complete indices of research organizations, or to test normalization code that maps raw strings to organization identifiers. Attribution and links...
Open Access Journal Test Crawl (2018)
by Internet Archive Web Group
data

eye 8

favorite 0

comment 0

OMICS-DOI-LANDING-CRAWL-2019-04
OMICS-DOI-LANDING-CRAWL-2019-04
collection
4
ITEMS
14,146
VIEWS
by Internet Archive Web Group
collection

eye 14,146

This crawl started in April 2019, as an informal collaboration with Crossref. Crawling a smallish number (100k) DOI redirects and landing pages (plus PDF outlinks, and maybe a couple other hops) for a single large publisher (OMICS, which has multiple subsidiaries). Intent is to get reasonably good capture that can be used as canonical preservation copies of the landing pages. Secondary goal is to get decent fulltext capture coverage.
PubMed Central Crawl (2019-10)
PubMed Central Crawl (2019-10)
collection
216
ITEMS
456,372
VIEWS
by Internet Archive Web Group
collection

eye 456,372

Community Texts
by Internet Archive Web Group
texts

eye 5

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2019-04
UNPAYWALL-PDF-CRAWL-2019-04
collection
641
ITEMS
5.9M
VIEWS
by Internet Archive Web Group
collection

eye 5.9M

UNPAYWALL-PDF-CRAWL-2019-04
by Internet Archive Web Group
data

eye 0

favorite 0

comment 0

UNPAYWALL-PDF-CRAWL-2019-04
by Internet Archive Web Group
data

eye 2

favorite 0

comment 0

Web PDF GROBID Corpus (June 2019)
Web PDF GROBID Corpus (June 2019)
collection
10
ITEMS
50
VIEWS
by Internet Archive Web Group
collection

eye 50

arXiv Content Crawl (2019-10)
arXiv Content Crawl (2019-10)
collection
37
ITEMS
79,989
VIEWS
by Internet Archive Web Group
collection

eye 79,989