Skip to main content
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Bulk Bibliographic Metadata
by Allen Institute for Artificial Intelligence
data

eye 51

favorite 0

comment 0

This is a snapshot of the AI@ (Semantic Scholar') "Open Research Corpus", as downloaded June 26th, 2017. These files originally downloaded from: http://labs.semanticscholar.org/corpus/ Note restrictions in the 'license.txt' file. 'index.html' is a backup of the landing page, that includes field content. 'papers-2017-02-21-sample.zip' is a subset of the data useful for exploration. Semantic Scholar is a project of the Allen Institute for Artificial Intelligence.
Bulk Bibliographic Metadata
by Allen Institute for Artificial Intelligence
data

eye 23

favorite 0

comment 0

This is a snapshot of the AI@ (Semantic Scholar') "Open Research Corpus". These files originally downloaded from: http://labs.semanticscholar.org/corpus/ Note restrictions in the 'license.txt' file. 'index.html' is a backup of the landing page, that includes field content. 'papers-*-sample.zip' is a subset of the data useful for exploration. Semantic Scholar is a project of the Allen Institute for Artificial Intelligence.
Bulk Bibliographic Metadata
by Allen Institute for Artificial Intelligence
data

eye 40

favorite 0

comment 0

This is a snapshot of the AI2 (Semantic Scholar') "Open Research Corpus", as release May 3rd, 2018. These files originally downloaded from AWS S3, via: http://labs.semanticscholar.org/corpus/ Note restrictions in the 'license.txt' file. 'index.html' is a backup of the landing page, that includes field content. 'sample-S2-records.gz' is a subset of the data useful for exploration. Semantic Scholar is a project of the Allen Institute for Artificial Intelligence.
Bulk Bibliographic Metadata
data

eye 20

favorite 0

comment 0

This item contains work-level metadata about papers on academia.edu, obtained through their OAI-PMH interface.
Bulk Bibliographic Metadata
data

eye 6

favorite 0

comment 0

Mirrored from:  https://isaw.nyu.edu/publications/awol-index/ Note creator request: The content of the  The AWOL Index  is derived from: Charles E. Jones,  AWOL - The Ancient World Online  (ISSN 2156-2253), 2009-. That content is re-used and re-mixed here under the terms of  AWOL's  Creative Commons Attribution Share-Alike 3.0 Unported license. The production and publication of  The AWOL Index  contributes significant additional value both to the content itself and to its presentation...
Bulk Bibliographic Metadata
by CORE.ac.uk
data

eye 19

favorite 0

comment 0

Mirrored from: https://core.ac.uk/documentation/dataset Dataset created for Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word Embeddings (LREC 2020) (62 MB compressed, 204 MB in total) License: Open Data Commons Attribution (ODC-By) license.
Bulk Bibliographic Metadata
by CORE
data

eye 17

favorite 0

comment 0

This item contains mappings between CORE (https://core.ac.uk/) internal identifiers (simple integer numbers) and DOIs. This listing (a simple two-column TSV file) is derived from their publicly available metadata corpus.
Bulk Bibliographic Metadata
data

eye 63

favorite 0

comment 0

Downloaded from https://core.ac.uk/services "The data aggregated from repositories by the CORE system can be accessed in two ways, through the CORE API or by downloading the data to your computer. The former option is practical if you want to build a service on top of CORE while the latter is something we recommend to those who would like to analyse the CORE dataset and/or apply some computationally intensive batch processes. If you use CORE in your work, we kindly request you to cite one...
Bulk Bibliographic Metadata
data

eye 93

favorite 0

comment 0

Downloaded from https://core.ac.uk/services "The data aggregated from repositories by the CORE system can be accessed in two ways, through the CORE API or by downloading the data to your computer. The former option is practical if you want to build a service on top of CORE while the latter is something we recommend to those who would like to analyse the CORE dataset and/or apply some computationally intensive batch processes. If you use CORE in your work, we kindly request you to cite one...
Bulk Bibliographic Metadata
data

eye 50

favorite 0

comment 0

Downloaded from https://core.ac.uk/services "The data aggregated from repositories by the CORE system can be accessed in two ways, through the CORE API or by downloading the data to your computer. The former option is practical if you want to build a service on top of CORE while the latter is something we recommend to those who would like to analyse the CORE dataset and/or apply some computationally intensive batch processes. If you use CORE in your work, we kindly request you to cite one...
Bulk Bibliographic Metadata
by CORE.ac.uk
data

eye 76

favorite 0

comment 0

Mirrored from: https://core.ac.uk/documentation/dataset CORE Dataset to Microsoft Academic Graph (MAG) mapping (80MB compressed, 173 MB in total) - 8.9M items License: Open Data Commons Attribution (ODC-By) license.
Bulk Bibliographic Metadata
by Cariniana
data

eye 21

favorite 0

comment 0

Downloaded from, eg:  https://cariniana.ibict.br/index.php/preservacao-de-publicacoes-digitais/periodicos-eletronicos
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 58

favorite 0

comment 0

This item contains sqlite3 database snapshots, URL crawl status, and other metadata useful for doing analytics on journal OA coverage, homepage status, etc. Particularly in the context of https://fatcat.wiki. Source code: https://github.com/bnewbold/chocula
Bulk Bibliographic Metadata
by CiteSeerX Group at PSU
data

eye 187

favorite 0

comment 0

This is a mirror of a CiteSeerX database dump, downloaded from S3. It's hosted here for easy Internet Archive analytics access, and so we don't need to re-pay S3 download fees. See also: http://csxstatic.ist.psu.edu/about/data
Bulk Bibliographic Metadata
by Crossref
data

eye 74

favorite 0

comment 0

'crossref-works.json.xz' is the original file. 'works_crossref.elasticsearch.json.gz' contains a subset of metadata for most (but not all) works, restructured to be loaded directly into an Elasticsearch index. DOI: 10.6084/m9.figshare.4816720.v1 Via: https://figshare.com/articles/Metadata_for_all_DOIs_in_Crossref_JSON_MongoDB_exports_of_all_works_from_the_Crossref_API/4816720
Bulk Bibliographic Metadata
by Crossref
data

eye 647

favorite 2

comment 0

This file is a snapshot dump of the Crossref DOI metadata API, containing entries for over 94 million DOIs. Compared to the previous 2017-03 version (see archive.org item "crossref_doi_dump_201703"), this snapshot has a few million more works, but the corpus size is much larger (29 GB compressed vs. 7 GB compressed) as it now contains significantly more citation data, due to the efforts of the Initiative for Open Citations (I4OC) project. This was generated by running the scripts...
Bulk Bibliographic Metadata
by Crossref
data

eye 484

favorite 1

comment 0

This file is a snapshot dump of the Crossref DOI metadata API, containing entries for over 99 million DOIs. This was generated by running the scripts at: https://github.com/greenelab/crossref (git commit: 768a49ba1d8ba1971f00471950514716a9f699c8) The script completed on 2018-09-20. Format is xz-compressed JSON (one JSON object per line).
Bulk Bibliographic Metadata
by Crossref
data

eye 342

favorite 1

comment 0

This file is a snapshot dump of the Crossref DOI metadata API, containing entries for over 107 million DOIs. This was generated by running the scripts at: https://github.com/greenelab/crossref (git commit: 768a49ba1d8ba1971f00471950514716a9f699c8) The script started on 2019-09-09 and completed on 2019-10-06. Format is xz-compressed JSON (one JSON object per line).
Bulk Bibliographic Metadata
by Crossref
data

eye 89

favorite 0

comment 0

Mirrored via torrent from academic torrents: https://academictorrents.com/details/0c6c3fbfdc13f0169b561d29354ea8b188eb9d63 https://www.crossref.org/blog/free-public-data-file-of-112-million-crossref-records/
Bulk Bibliographic Metadata
by Crossref
data

eye 2,893

favorite 1

comment 0

Mirrored via torrent from academic torrents: https://academictorrents.com/details/0c6c3fbfdc13f0169b561d29354ea8b188eb9d63
Bulk Bibliographic Metadata
by Crossref
data

eye 43

favorite 0

comment 1

Metadata from the Crossref DOI registrar about "titles" (aka, individual Journals), in CSV format. Originally fetched from: https://wwwold.crossref.org/titlelist/titleFile.csv
( 1 reviews )
Bulk Bibliographic Metadata
data

eye 10

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 39

favorite 1

comment 0

Bulk Bibliographic Metadata
data

eye 20

favorite 1

comment 0

Bulk Bibliographic Metadata
by DOAJ
data

eye 8

favorite 0

comment 0

Bulk Bibliographic Metadata
by DIrectory of Open Access Journals
data

eye 48

favorite 0

comment 0

From: https://doaj.org/public-data-dump
Bulk Bibliographic Metadata
by DIrectory of Open Access Journals
data

eye 47

favorite 0

comment 0

Downloaded from https://doaj.org/csv and the OAI-PMH interface.
Bulk Bibliographic Metadata
by DIrectory of Open Access Journals
data

eye 92

favorite 0

comment 0

Downloaded from https://doaj.org/csv and the OAI-PMH interface. File names encode the date when data was downloaded.
Bulk Bibliographic Metadata
by DIrectory of Open Access Journals
data

eye 48

favorite 0

comment 0

Downloaded from https://doaj.org/csv and the OAI-PMH interface. File names encode the date when data was downloaded.
Bulk Bibliographic Metadata
data

eye 34

favorite 0

comment 0

Downloaded from: https://zenodo.org/record/1438356
Bulk Bibliographic Metadata
by Datacite
data

eye 66

favorite 0

comment 0

This item contains snapshots of the Datacite OAI-PHM metadata feed, as captured with the tool 'metha'.
Bulk Bibliographic Metadata
by EZB
data

eye 16

favorite 0

comment 0

See README for details. Scraped from: http://ezb.uni-regensburg.de/ezeit/services/collections.phtml?bibid=AAAAA&colors=1〈=en http://ezb.uni-regensburg.de/ezeit/services/xmloutput.phtml?bibid=AAAAA&colors=1〈=de#6.2
Bulk Bibliographic Metadata
by EuropePMC
data

eye 48

favorite 0

comment 0

Data mirrored from https://europepmc.org/downloads Contains a mapping between PubMed IDs (PMID), PubMedCentral IDs (PMCID), and DOI numbers, for over 29 million works.
Bulk Bibliographic Metadata
by EuropePMC
data

eye 28

favorite 1

comment 0

Data mirrored from https://europepmc.org/downloads Contains a mapping between PubMed IDs (PMID), PubMedCentral IDs (PMCID), and DOI numbers, for over 29 million works.
Mirrored from:  https://www.arc.gov.au/excellence-research-australia/era-2018-journal-list
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 19

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 28

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 17

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 19

favorite 0

comment 0

This item contains bulk metadata exported from https://fatcat.wiki. With the exception of the 'abstracts' file (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to upstream sources (including Crossref, ORCID, DOAJ, the ISSN...
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 16

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 21

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 34

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 14

favorite 0

comment 0

See README.md
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 16

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 22

favorite 1

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 14

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 9

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 27

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 55

favorite 1

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 78

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 46

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 73

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 37

favorite 1

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 9

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 9

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 11

favorite 0

comment 0

by Internet Archive Web Group
collection

eye 1,464

This collection holds database snapshots (SQL) and bulk metadata exports (JSON and TSV) from https:///fatcat.wiki (an Internet Archive service)
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 27

favorite 1

comment 0

This item contains a complete PostgreSQL SQL database snapshot from https://fatcat.wiki, in binary 'pg_dump tar mode' format. With the exception of the 'abstracts' table (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to...
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 28

favorite 0

comment 0

This dump includes all tables (including oauth authentication tables which could be a privacy, but not security, concern). At this time only IA staff have accounts, so the snapshot, which is intended mostly for disaster recovery, is still public.
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 12

favorite 0

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 39

favorite 0

comment 0

This item contains a complete PostgreSQL SQL database snapshot from https://fatcat.wiki, in binary 'pg_dump tar mode' format. With the exception of the 'abstracts' table (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to...
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 19

favorite 0

comment 0

See README.md
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 4

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 28

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 12

favorite 0

comment 0

See README.md
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 11

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 10

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 22

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 83

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 5

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 4

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 33

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 17

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 50

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 27

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 4

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 12

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 6

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 17

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 58

favorite 0

comment 0

This item contains an example corpus of citations between scholarly documents, as extracted from the fatcat (https://fatcat.wiki) corpus as of the 2020-08-05 bulk release export. This corpus itself was generated from a fatcat-scholar "intermediate" fulltext dump which is not public, using software in the fatcat-scholar repository in mid-September 2020. See also the README for some more notes, and the "sample" file.
Bulk Bibliographic Metadata
by Allen Institute for Artificial Intelligence
data

eye 15

favorite 0

comment 0

This is a mirror of the Semantic Scholar Graph of References in Context (GORC) dataset. Use of this dataset is under terms of the Semantic Scholar Dataset License: http://web.archive.org/web/20200118202545/http://api.semanticscholar.org/corpus/legal/ See also: https://github.com/allenai/s2-gorc https://arxiv.org/abs/1911.02782
Bulk Bibliographic Metadata
data

eye 12

favorite 0

comment 0

Downloaded from: https://grid.ac/downloads
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 28

favorite 0

comment 0

Contains a TSV file with SHA1, file size, wayback URLs, and metadata extracted from PDF by GROBID. Not intended for external use, but might be interested. DOES NOT CONTAIN FULLTEXT CONTENT.
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 6

favorite 0

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 23

favorite 0

comment 0

This item contains some bulk research affiliation datasets from Internet Archive cataloging efforts. These are mostly strings included in research papers that indicate the institutional affiliations of specific authors (eg, with a home department, university, or company) at the time of publication. These might be useful datasets for efforts to build complete indices of research organizations, or to test normalization code that maps raw strings to organization identifiers. Attribution and links...
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 10

favorite 1

comment 0

URL lists to PDFs on the web (and preserved in the wayback machine) which are likely to contain research materials.
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 18

favorite 1

comment 0

URL lists to PDFs on the web (and preserved in the wayback machine) which are likely to contain research materials.
Bulk Bibliographic Metadata
by ISSN
data

eye 361

favorite 1

comment 0

Unlike most ISSN metadata, this mapping file is publicly available.
Bulk Bibliographic Metadata
by Bruns A, Lenke C, Schmidt C, Taubert NC
data

eye 20

favorite 0

comment 0

ISSN-GOLD-OA provides a matching list of ISSN for Gold Open Access (OA) journals. The intention was to compile a matching table that is as complete as possible by using different publicly available sources. The data set offers a basis for various journal-related issues in bibliometric studies on Gold OA. The list is an updated version of ISSN-GOLD-OA . For a detailed description of the method, data sources used and the definition of the table fields, please refer to the original...
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 7

favorite 0

comment 0

This item contains hash lists of PDF files crawled from the public web specifically to preserve the scholarly record. It does not contain hashes of *all* PDFs the archive has ever seen, only a subset. Not all of these hashes are necessarily journal articles or other research outputs, but we have reason to believe the large majority are.
Bulk Bibliographic Metadata
by Internet Archive
data

eye 7

favorite 0

comment 0

This item contains KBART files of Internet Archive "serials" (aka, journals, magazines, conference proceedings, other periodicals) preservation holdings. They include both digitized content in archive.org, and web archived content ("fatcat").
Bulk Bibliographic Metadata
data

eye 182

favorite 0

comment 0

Manifest of Internet Archive's identified scholarly works in digital form (eg, journal articles). See README.html for details.
Bulk Bibliographic Metadata
data

eye 114

favorite 0

comment 0

Manifest of Internet Archive's identified scholarly works in digital form (eg, journal articles). See README.html for details.
Bulk Bibliographic Metadata
data

eye 140

favorite 0

comment 0

Manifest of Internet Archive's identified scholarly works in digital form (eg, journal articles). See README.html for details.
Bulk Bibliographic Metadata
data

eye 249

favorite 0

comment 0

Manifest of Internet Archive's identified scholarly works in digital form (eg, journal articles). See README.html for details.
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 8

favorite 0

comment 0

Snapshot of Internet Archive (petabox) file-level metadata (eg, PDF hashes) for files under the 'journals' collection as of December 2018. Note: includes a small number of items not actually under the 'journals' collection hierarchy due to how the input item list was generated, and a small fraction (estimate 500?) of items didn't dump successfully. A bit sloppy!
Bulk Bibliographic Metadata
data

eye 204

favorite 1

comment 0

As downloaded from: https://www.jstor.org/dfr/about/sample-datasets "The Early Journal Content (EJC) on JSTOR includes public domain journal articles published in the United States before 1923 and articles published in other countries before 1870, and includes discourse and scholarship in the arts and humanities, economics and politics, and in mathematics and other sciences. The EJC dataset includes full-text OCR and article-level metadata."
Bulk Bibliographic Metadata
by JURN
data

eye 12

favorite 0

comment 0

JURN is a scholarly web search engine implemented as a custom Google search index. A subset of resources are included in a directory at:  http://www.jurn.org/directory/ This item contains snapshots of the directory in the form of TSV files. At least to start these are only title + URL, but we hope to reconcile or lookup to ISSN number.
Bulk Bibliographic Metadata
by Japan Link Center
data

eye 45

favorite 0

comment 0

Downloaded from http://japanlinkcenter.org/top/material/material_metadata.html
Bulk Bibliographic Metadata
by Japan Link Center
data

eye 29

favorite 0

comment 0

Downloaded from http://japanlinkcenter.org/top/material/material_metadata.html