Skip to main content
SHOW DETAILS
up-solid down-solid
eye
Title
Date Reviewed
Creator
Bulk Bibliographic Metadata
Mar 10, 2022 OurResearch
data

eye 147

favorite 0

comment 1

This is an archive of the "beta" pre-release of the OpenAlex bibliographic metadata corpus. It was downloaded from AWS S3 "requester pays" bucket, then the individual files were compressed with gzip (pigz command), which reduced on-disk size significantly. Downloads of some files needed to be restarted, which seems to have worked ok, but potentially could have introduced corruption. This initial snapshot is dated in file names as "2021-10-11", and that date is used...
( 1 reviews )
Bulk Bibliographic Metadata
Feb 19, 2019 Crossref
data

eye 44

favorite 0

comment 1

Metadata from the Crossref DOI registrar about "titles" (aka, individual Journals), in CSV format. Originally fetched from: https://wwwold.crossref.org/titlelist/titleFile.csv
( 1 reviews )
Bulk Bibliographic Metadata
data

eye 117

favorite 0

comment 0

A mirror of the Unpaywall (aka oaDOI.org) metadata corpus, primarily consisting of public open access flags for a large number of Crossref-registered DOIs (identifiers representing published journal articles and other works). For more information see: http://unpaywall.org/products/snapshot
Bulk Bibliographic Metadata
data

eye 10

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 11

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 58

favorite 0

comment 0

This item contains an example corpus of citations between scholarly documents, as extracted from the fatcat (https://fatcat.wiki) corpus as of the 2020-08-05 bulk release export. This corpus itself was generated from a fatcat-scholar "intermediate" fulltext dump which is not public, using software in the fatcat-scholar repository in mid-September 2020. See also the README for some more notes, and the "sample" file.
Bulk Bibliographic Metadata
- DIrectory of Open Access Journals
data

eye 50

favorite 0

comment 0

Downloaded from https://doaj.org/csv and the OAI-PMH interface. File names encode the date when data was downloaded.
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 80

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 28

favorite 1

comment 0

This item contains a complete PostgreSQL SQL database snapshot from https://fatcat.wiki, in binary 'pg_dump tar mode' format. With the exception of the 'abstracts' table (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to...
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 26

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 33

favorite 0

comment 0

Bulk Bibliographic Metadata
- Internet Archive Web Group
data

eye 39

favorite 0

comment 0

This item contains a complete PostgreSQL SQL database snapshot from https://fatcat.wiki, in binary 'pg_dump tar mode' format. With the exception of the 'abstracts' table (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to...
Bulk Bibliographic Metadata
data

eye 7

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 27

favorite 0

comment 0

Bulk Bibliographic Metadata
- Jan Szczepanski
data

eye 21

favorite 0

comment 0

Downloaded from: https://www.ebsco.com/sites/g/files/nabnos191/files/acquiadam-assets/Jan-Szczepanski-Open-Access-Journals-2018_0.docx
Bulk Bibliographic Metadata
data

eye 47

favorite 0

comment 0

A mirror of the Unpaywall (aka oaDOI.org) metadata corpus, primarily consisting of public open access flags for a large number of Crossref-registered DOIs (identifiers representing published journal articles and other works). For more information see: http://unpaywall.org/products/snapshot
Bulk Bibliographic Metadata
data

eye 20

favorite 0

comment 0

This item contains work-level metadata about papers on academia.edu, obtained through their OAI-PMH interface.
Bulk Bibliographic Metadata
data

eye 7

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 29

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 10

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 40

favorite 1

comment 0

Bulk Bibliographic Metadata
- Allen Institute for Artificial Intelligence
data

eye 185

favorite 0

comment 0

Semantic Scholar Open Research Corpus is licensed under  ODC-BY . When using the Semantic Scholar Open Research Corpus (“S2 ORC”) in a product or service, or including data in a redistribution, please cite the following paper: Waleed Ammar et al. 2018. Construction of the Literature Graph in Semantic Scholar. NAACL https://www.semanticscholar.org/paper/09e3cf5704bcb16e6657f6ceed70e93373a54618 This site is provided by The Allen Institute for Artificial Intelligence (“AI2”) as a service...
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 5

favorite 0

comment 0

Bulk Bibliographic Metadata
- Internet Archive Web Group
data

eye 89

favorite 0

comment 0

This is a mapping between: - DOIs (Crossref) - PubMed PMID and PMCID (NIH) - CORE record identifier (core.ac.uk) - Wikidata QIDs See README and scripts for details.
Bulk Bibliographic Metadata
- JURN
data

eye 13

favorite 0

comment 0

JURN is a scholarly web search engine implemented as a custom Google search index. A subset of resources are included in a directory at:  http://www.jurn.org/directory/ This item contains snapshots of the directory in the form of TSV files. At least to start these are only title + URL, but we hope to reconcile or lookup to ISSN number.
Bulk Bibliographic Metadata
data

eye 9

favorite 0

comment 0

This item contains a set of "Keeper's Reports" summarizing journal content preservation coverage from major archival services and networks (Portico, LOCKSS, CLOCKSS). See README for links to where these files were downloaded from.
Bulk Bibliographic Metadata
- CiteSeerX Group at PSU
data

eye 193

favorite 0

comment 0

This is a mirror of a CiteSeerX database dump, downloaded from S3. It's hosted here for easy Internet Archive analytics access, and so we don't need to re-pay S3 download fees. See also: http://csxstatic.ist.psu.edu/about/data
Bulk Bibliographic Metadata
data

eye 9

favorite 0

comment 0

Snapshot of Internet Archive (petabox) file-level metadata (eg, PDF hashes) for files under the 'journals' collection as of December 2018. Note: includes a small number of items not actually under the 'journals' collection hierarchy due to how the input item list was generated, and a small fraction (estimate 500?) of items didn't dump successfully. A bit sloppy!
Bulk Bibliographic Metadata
- Internet Archive Web Group
data

eye 15

favorite 0

comment 0

This is a derivative of https://archive.org/download/ia_papers_manifest_2018-01-25, which contains JSON objects that can be inserted into a fatcat catalog.
Bulk Bibliographic Metadata
- Internet Archive Web Group
data

eye 7

favorite 0

comment 0

Bulk Bibliographic Metadata
- Japan Link Center
data

eye 29

favorite 0

comment 0

Downloaded from http://japanlinkcenter.org/top/material/material_metadata.html
Bulk Bibliographic Metadata
data

eye 26

favorite 0

comment 0

This item contains an annual copy of the ORCID public data file, as originally downloaded from: https://orcid.org/content/download-file More details about this content and it's use available at: https://orcid.org/content/orcid-public-data-file This dataset is available under the public domain (CC-0). The DOI of this dataset is: https://doi.org/10.14454/07243.2013.001
Bulk Bibliographic Metadata
- Internet Archive Web Group
data

eye 9

favorite 0

comment 0

About 1 million unique PDFs from Global Wayback before year 2000.
Bulk Bibliographic Metadata
data

eye 15

favorite 0

comment 0

This is the 2020 "baseline" PubMed/MEDLINE bibliographic metadata corpus, originally published in December 2019. Downloaded from https://www.nlm.nih.gov/databases/download/pubmed_medline.html
Bulk Bibliographic Metadata
data

eye 3

favorite 0

comment 0

Bulk Bibliographic Metadata
- Microsoft Academic
data

eye 64

favorite 0

comment 0

This is an updated snapshot of the Microsoft Academic Graph corpus. Microsoft generously makes this corpus available at no cost under the ODC-BY "open data license" ( https://opendatacommons.org/licenses/by/1.0/ ). See the link for details; at a minimum this license requires downstream users to acknowledge the creator. You can read more about the corpus, including how to obtain updated copies on Microsoft Azure, a schema reference, etc, at the following URLs and in the following...
Manifest of Internet Archive's identified scholarly works in digital form (eg, journal articles). See README.html for details.
Bulk Bibliographic Metadata
data

eye 23

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 17

favorite 0

comment 0

See README for details. Scraped from: http://ezb.uni-regensburg.de/ezeit/services/collections.phtml?bibid=AAAAA&colors=1〈=en http://ezb.uni-regensburg.de/ezeit/services/xmloutput.phtml?bibid=AAAAA&colors=1〈=de#6.2
Mirrored from:  https://github.com/njahn82/vanished_journals/tree/master/data
Bulk Bibliographic Metadata
data

eye 655

favorite 2

comment 0

This file is a snapshot dump of the Crossref DOI metadata API, containing entries for over 94 million DOIs. Compared to the previous 2017-03 version (see archive.org item "crossref_doi_dump_201703"), this snapshot has a few million more works, but the corpus size is much larger (29 GB compressed vs. 7 GB compressed) as it now contains significantly more citation data, due to the efforts of the Initiative for Open Citations (I4OC) project. This was generated by running the scripts...
Bulk Bibliographic Metadata
- Internet Archive Web Group
data

eye 30

favorite 0

comment 0

This dump includes all tables (including oauth authentication tables which could be a privacy, but not security, concern). At this time only IA staff have accounts, so the snapshot, which is intended mostly for disaster recovery, is still public.
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 20

favorite 0

comment 0

See README.md
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 86

favorite 0

comment 0

Bulk Bibliographic Metadata
- Microsoft Academic
data

eye 1,014

favorite 2

comment 0

This is an updated snapshot of the Microsoft Academic Graph corpus. Microsoft generously makes this corpus available at no cost under the ODC-BY "open data license" ( https://opendatacommons.org/licenses/by/1.0/ ). See the link for details; at a minimum this license requires downstream users to acknowledge the creator. You can read more about the corpus, including how to obtain updated copies on Microsoft Azure, a schema reference, etc, at the following URLs and in the following...
Bulk Bibliographic Metadata
- ROAD: Directory of Open Access Scholarly Resources
data

eye 145

favorite 0

comment 0

This is a backup of ROAD/ISSN metadata from http://road.issn.org/en/contenu/download-road-records Dumps in both MARC XML and RDF format are included; see sub-directory for date of download. See also earlier July 2017 dump at: https://archive.org/download/road-issn-2017 These files are under the Creative Commons Attribution-NonCommercial 4.0 International Public License (aka, CC-BY-NC).
Topic: metadata
Bulk Bibliographic Metadata
- ISSN
data

eye 429

favorite 1

comment 0

Unlike most ISSN metadata, this mapping file is publicly available.
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 19

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 22

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 20

favorite 0

comment 0

Mirrored from: https://core.ac.uk/documentation/dataset Dataset created for Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word Embeddings (LREC 2020) (62 MB compressed, 204 MB in total) License: Open Data Commons Attribution (ODC-By) license.
Downloaded from https://core.ac.uk/services "The data aggregated from repositories by the CORE system can be accessed in two ways, through the CORE API or by downloading the data to your computer. The former option is practical if you want to build a service on top of CORE while the latter is something we recommend to those who would like to analyse the CORE dataset and/or apply some computationally intensive batch processes. If you use CORE in your work, we kindly request you to cite one...
Bulk Bibliographic Metadata
data

eye 344

favorite 1

comment 0

This file is a snapshot dump of the Crossref DOI metadata API, containing entries for over 107 million DOIs. This was generated by running the scripts at: https://github.com/greenelab/crossref (git commit: 768a49ba1d8ba1971f00471950514716a9f699c8) The script started on 2019-09-09 and completed on 2019-10-06. Format is xz-compressed JSON (one JSON object per line).
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 20

favorite 0

comment 0

See README.md
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 10

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 22

favorite 1

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 29

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 385

favorite 2

comment 0

Snapshot as of 2019-04-15, contains SQL dumps for multiple databases: Complete Library Genesis Comic book database Fiction database 'Compact' Library Genesis database Scientific magazines SQL dumps generated by MySQL/MariaDB database. *** THIS ITEM DOES NOT CONTAIN ANY BOOKS *** Upstream does not provide checksums and all checksums should be taken with some doubt. Databases were archived by the upstream with RAR archiver, file names has been changed to include creation date.
Bulk Bibliographic Metadata
data

eye 485

favorite 1

comment 0

This file is a snapshot dump of the Crossref DOI metadata API, containing entries for over 99 million DOIs. This was generated by running the scripts at: https://github.com/greenelab/crossref (git commit: 768a49ba1d8ba1971f00471950514716a9f699c8) The script completed on 2018-09-20. Format is xz-compressed JSON (one JSON object per line).
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 0

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 52

favorite 0

comment 0

A mirror of the Unpaywall (aka oaDOI.org) metadata corpus, primarily consisting of public open access flags for a large number of Crossref-registered DOIs (identifiers representing published journal articles and other works). For more information see: http://unpaywall.org/products/snapshot
Bulk Bibliographic Metadata
- ORCID, Inc.
data

eye 100

favorite 1

comment 0

This item contains an annual copy of the ORCID public data file, as originally downloaded from:  https://orcid.figshare.com/articles/ORCID_Public_Data_File_2019/9988322 This dump contains over 7.31M summary entities (ORCIDs). More details about this content and it's use available at: https://orcid.org/content/orcid-public-data-file This dataset is available under the public domain (CC-0).
Bulk Bibliographic Metadata
- ROAD: Directory of Open Access Scholarly Resources
data

eye 90

favorite 0

comment 0

This is a backup of ROAD/ISSN metadata, downloaded July 3rd, 2017 from http://road.issn.org/en/contenu/download-road-records Dumps in both MARC XML and RDF format are included. These files are under the Creative Commons Attribution-NonCommercial 4.0 International Public License (aka, CC-BY-NC).
Topic: metadata
Bulk Bibliographic Metadata
data

eye 55

favorite 0

comment 0

This item contains a set of "Keeper's Reports" summarizing journal content preservation coverage from major archival services and networks (Portico, LOCKSS, CLOCKSS).
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 46

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 19

favorite 0

comment 0

This item contains bulk metadata exported from https://fatcat.wiki. With the exception of the 'abstracts' file (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to upstream sources (including Crossref, ORCID, DOAJ, the ISSN...
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 33

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 17

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 56

favorite 1

comment 0

Downloaded from: https://grid.ac/downloads
Standard paper bibliographic metadata corpuses (eg, Crossref, Pubmed, Arxiv) transformed into simple tab-separated and JSON formats.
Bulk Bibliographic Metadata
data

eye 141

favorite 0

comment 0

This is a mirror of the RDF dump posted at:  http://ma-graph.org/rdf-dumps/ The license provided with this metadata is: Open Data Commons Attribution License (ODC-By) v1.0
Bulk Bibliographic Metadata
- Harshdeep Singh, Robert West, & Giovanni Colavizza
data

eye 12

favorite 0

comment 0

Mirrored from: https://zenodo.org/record/3940692 Harshdeep Singh, Robert West, & Giovanni Colavizza. (2020). Wikipedia Citations: A comprehensive dataset of citations with identifiers extracted from English Wikipedia (Version 0.2) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3940692
Bulk Bibliographic Metadata
- Bruns A, Lenke C, Schmidt C, Taubert NC
data

eye 20

favorite 0

comment 0

ISSN-GOLD-OA provides a matching list of ISSN for Gold Open Access (OA) journals. The intention was to compile a matching table that is as complete as possible by using different publicly available sources. The data set offers a basis for various journal-related issues in bibliometric studies on Gold OA. The list is an updated version of ISSN-GOLD-OA . For a detailed description of the method, data sources used and the definition of the table fields, please refer to the original...
Bulk Bibliographic Metadata
data

eye 24

favorite 0

comment 0

OAI-PMH metadata collected from the arxiv.org endpoint, using the arXivRaw schema. Collected in two batches: up through ~2017, then up through May 22nd, 2019.
Bulk Bibliographic Metadata
- DIrectory of Open Access Journals
data

eye 49

favorite 0

comment 0

Downloaded from https://doaj.org/csv and the OAI-PMH interface.
Bulk Bibliographic Metadata
data

eye 10

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 36

favorite 0

comment 0

A mirror of the Unpaywall (aka oaDOI.org) metadata corpus, primarily consisting of public open access flags for a large number of Crossref-registered DOIs (identifiers representing published journal articles and other works). For more information see: http://unpaywall.org/products/snapshot
Fatcat Database Snapshots and Bulk Metadata Exports
- Internet Archive Web Group
data

eye 201

favorite 3

comment 0

See: https://guide.fatcat.wiki/reference_graph.html License: CC-0
Bulk Bibliographic Metadata
data

eye 10

favorite 0

comment 0

This item contains snapshots of the PubMed Central OA subset file manifests, linked from https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist
Bulk Bibliographic Metadata
- Internet Archive Web Group
data

eye 19

favorite 1

comment 0

URL lists to PDFs on the web (and preserved in the wayback machine) which are likely to contain research materials.
Bulk Bibliographic Metadata
data

eye 14

favorite 0

comment 0

Downloaded from http://japanlinkcenter.org/top/material/material_metadata.html
Bulk Bibliographic Metadata
data

eye 19

favorite 0

comment 0

This item contains a snapshot of the "Norwegian Register for Scientific Journals, Series and Publishers", as downloaded from https://dbh.nsd.uib.no/publiseringskanaler/AlltidFerskListe. As the name indicates, this is a registry of international Journals (aka "titles", or "serials"); the scope is not limited to Norwegian or Nordic publications.
Bulk Bibliographic Metadata
- Microsoft Academic Search
data

eye 323

favorite 0

comment 0

This is a copy of the Microsoft Academic Graph corpus of scholarly publications and citations, based on crawls from the open web. Metadata (authors, DOI numbers, journals, citations, keywords, affiliations, etc) is included for more than 125 million publications. The corpus is a single 27GB zipfile that extracts into about 96GB of flat tab-separated text files, cross-referenced using identifier columns. Schema information can be found in the `readme.txt` file, and usage restrictions can be...
Bulk Bibliographic Metadata
- Microsoft Academic
data

eye 145

favorite 1

comment 0

This is an updated snapshot of the Microsoft Academic Graph corpus. Microsoft generously makes this corpus available at no cost under the ODC-BY "open data license" ( https://opendatacommons.org/licenses/by/1.0/ ). See the link for details; at a minimum this license requires downstream users to acknowledge the creator. You can read more about the corpus, including how to obtain updated copies on Microsoft Azure, a schema reference, etc, at the following URLs and in the following...