Skip to main content

The Dataset Collection

The Dataset Collection consists of large data archives from both sites and individuals.



rss RSS

9,027
RESULTS


Show sorted alphabetically

Show sorted alphabetically

SHOW DETAILS
up-solid down-solid
eye
Title
Date Reviewed
Creator
Academic Torrents
Mar 30, 2022
data

eye 21

favorite 1

comment 1

PDF files from various sources, often with added OCR. They were all published before 1923 in international journals. I'm not providing legal advice, but if you consider them simultaneously published to USA they should all be in the public domain in the USA. Yet, publishers apply indiscriminate copyright statements to the contrary, which may constitute copyfraud, and lock nearly all of them behind paywalls or other hurdles, hoping to milk some more profit for who knows how many centuries. You...
( 1 reviews )
Topics: openaccess, openscience, publicdomain, papers
Source: http://academictorrents.com/details/70ecab072b2792c9239ab8197d3f52cc1d075be1
The Dataset Collection
Jul 6, 2021 legacycollector.org
software

eye 9,102

favorite 9

comment 2

To Browse the Repository: Click Here This website is a repository for web content that has been deemed "legacy" and has been removed by their original publishers, and might otherwise be difficult or cumbersome to get. Since starting this, end 2018, in response to Mozilla removing all legacy extensions from its add-ons site, with plans to expand to include more, similar "legacy" content, a few things have changed needing me to re-evaluate both the need for this site and my...
favoritefavoritefavoritefavoritefavorite ( 2 reviews )
The Dataset Collection
Jan 7, 2021
software

eye 870

favorite 4

comment 1

Apple Developer Discs 1989 2009
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
The Dataset Collection
Dec 19, 2020 Stack Exchange, Inc.
data

eye 258,069

favorite 12

comment 39

This is an anonymized dump of all user-contributed content on the Stack Exchange network . Each site is formatted as a separate archive consisting of XML files zipped via 7-zip using bzip2 compression. Each site archive includes Posts, Users, Votes, Comments, PostHistory and PostLinks. For complete schema information, see the included readme.txt. All user content contributed to the Stack Exchange network is cc-by-sa 4.0 licensed, intended to be shared and remixed. We even provide all our data...
favoritefavoritefavoritefavorite ( 39 reviews )
Topic: Stack Exchange Data Dump
The Dataset Collection
Nov 16, 2020 Mike Hoye (Compiled by)
data

eye 132

favorite 0

comment 1

To browse the Barbiephonic/Dora Collection, click here. From the original blog post : I have a funny story about the recent Hello Barbie networked-device security failure. This is doubly a repost – it started its current incarnation as a twitter rant, and longtime readers may remember it from the dim recesses of history, but the time has come for me to tell it again. Back in 2007 Mattel had a site where they’d charge parents two bucks to have one of Mattel’s franchise characters give...
favoritefavoritefavoritefavorite ( 1 reviews )
The Dataset Collection
Sep 9, 2020
data

eye 218

favorite 1

comment 1

Teletext Compilation Collection 2020 07
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
The Dataset Collection
Mar 16, 2020 All the Music, LLC
audio

eye 22,460

favorite 37

comment 9

From: https://www.vice.com/en_uk/article/wxepzw/musicians-algorithmically-generate-every-possible-melody-release-them-to-public-domain : Musicians Algorithmically Generate Every Possible Melody, Release Them to Public Domain Damien Riehl and Noah Rubin generated and saved every possible melody to a hard drive, then turned it back around to the commons. From: https://www.dailymail.co.uk/sciencetech/article-8042979/Musician-uses-computer-algorithm-compose-melody-thats-possible-key-C.html :...
favoritefavoritefavorite ( 9 reviews )
The Dataset Collection
Dec 20, 2019 Gwern Branwen
data

eye 37,668

favorite 15

comment 1

Dark Net Markets (DNM) are online markets typically hosted as Tor hidden services whose users transact in Bitcoin or other cryptocoins, usually for drugs or other illegal/regulated goods; the most famous DNM was Silk Road 1, which pioneered the business model. From 2013-2015, I scraped/mirrored on a weekly or daily basis all existing English-language DNMs as part of my research into their usage, lifetimes/characteristics, & legal riskiness; in addition, I made or obtained copies of as many...
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topics: Tor, Bitcoin, drugs, Silk Road, Evolution, Agora, black-markets, dark net markets
Bulk Bibliographic Metadata
Feb 19, 2019 Crossref
data

eye 43

favorite 0

comment 1

Metadata from the Crossref DOI registrar about "titles" (aka, individual Journals), in CSV format. Originally fetched from: https://wwwold.crossref.org/titlelist/titleFile.csv
( 1 reviews )
Academic Torrents
movies

eye 2,171

favorite 2

comment 1

favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Source: http://academictorrents.com/details/47d9877fa4f33d109721c65d066a26c3c5e12e0d
The Dataset Collection
Aug 17, 2018 Various
software

eye 1,394

favorite 7

comment 1

66,000 .SWF files, banner ads put into websites in the 2003-2004 era of the Web. Requires a flash player to view. The files have been saved with simple numbers, so no obvious metadata exists. The ads themselves range across a wide variety of products, services, companies and public service, with the .SWF file being self-encapsulated (not requiring any servers or outside data, although some have active URL clickthroughs to sites likely all dead).  Files have been separated by month released...
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
MusicBrainz Data Dumps
Mar 30, 2018
data

eye 534

favorite 0

comment 1

MusicBrainz Database Dump 20141004-015144
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Dumps of DISCOGS.ORG Metadata (2008-Present)
Nov 5, 2017 DISCOGS.ORG
software

eye 690

favorite 0

comment 1

This is the monthly dump of DISCOGS.ORG data, provided to the public domain. This dump has been generated and archived automatically. Official name is discogs_20130103 and the data is from 2013-01-03.
( 1 reviews )
The Dataset Collection
Mar 16, 2017
data

eye 728

favorite 3

comment 1

Large collection of Minecraft modifications. files directory is in a files.zip ZIP file for ease of transfer, but should be unpacked when being used.
favoritefavoritefavoritefavorite ( 1 reviews )
The Dataset Collection
Sep 30, 2016 SilenceROM
software

eye 940,739

favorite 4

comment 1

SilenceROM LIII Changelog *CCM/Hybrid/Nox Adjustments *Tweaked Super Favourites *Updated source file *Updated applications *Updated SilenceROM Wizard *Tweaked Database +TorrentRelease Repo +Renegades TV Guide :Preconfigured +Dragon Streams +DubStop ####################### SilenceROM LII Changelog *SilenceROM now can be installed via Wizard ! I made the wizard from whufclee's original code. Thanks whufclee! ! Benefit of installing via wizard; preconfigured system settings. ! This is not possible...
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topics: SilenceROM, Community Build, Kodi, Helix, CCM, Hybrid, Speed, Stability, Live TV, Sports, Movies,...
The Dataset Collection
Jul 22, 2015
data

eye 56,517

favorite 12

comment 3

(Here is the original Reddit comment announcing this collection of data and what the processes were.) This is an archive of Reddit comments from October of 2007 until May of 2015 (complete month). This reflects 14 months of work and a lot of API calls. This dataset includes nearly every publicly available Reddit comment. Approximately 350,000 comments out of ~1.65 billion were unavailable due to Reddit API issues. Q: How are the files structured? Each file is compressed with bzip2 compression....
favoritefavoritefavoritefavoritefavorite ( 3 reviews )
The Dataset Collection
May 27, 2015 Internet Archive
data

eye 23,914

favorite 9

comment 1

Culled from various sources, this collection includes over one million JPG, PNG and GIF album covers. The resolution ranges from "thumbnail" through to very large sizes. Filenames are variant in usefulness, although a good number indicate at least the name of the original album. This dataset is for experimentation and image processing research only. At 148gb, the collection is large but not unmanageable (there is a torrent available) and allows a developer or artist to work with the...
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topics: dataset, big data, album covers, covers, cover art, cover photos
OpenStreetMap datasets
data

eye 4

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on December 03, 2017.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 2

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on December 20, 2017.
Topics: osm, dumps, notes
The Dataset Collection
data

eye 1

favorite 0

comment 0

Part of an August 2021 download of roughly 40 % of the Flickr images referenced in the YFCC100M dataset.
The Dataset Collection
data

eye 1

favorite 0

comment 0

Part of an August 2021 download of roughly 40 % of the Flickr images referenced in the YFCC100M dataset.
These are the Stata and R data and replication code for "The Media Matters: Muslim American Portrayals and the Effects on Mass Attitudes." CC0 Waiver
Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DAPNOX&version=1.0
This is an agreement (“Agreement”) between you the downloader (“Downloader”) and the owner of the materials (“User”) governing the use of the materials (“Materials”) to be downloaded. I. Acceptance of this Agreement By downloading or otherwise accessing the Materials, Downloader represents his/her acceptance of the terms of this Agreement.   II. Modification of this Agreement Users may modify the terms of this Agreement at any time. However, any modifications to this Agreement...
Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IAH6Z6&version=6.1
Compressed iSALE model output from the "Formation of the Orientale lunar multi-ring basin" to be published in Science. CC0 Waiver
Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BH9UXW&version=1.0
The Dataset Collection
data

eye 1

favorite 0

comment 0

Preprocessed edgelist data of ta1-fivedirections-e3-official. CC0 Waiver
Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/NVQFY0&version=1.0
Dumps of DISCOGS.ORG Metadata (2008-Present)
software

eye 46

favorite 0

comment 0

This is the monthly dump of DISCOGS.ORG data, provided to the public domain. This dump has been generated and archived automatically. Official name is discogs_20110107 and the data is from 2011-01-07.
Dumps of DISCOGS.ORG Metadata (2008-Present)
software

eye 22

favorite 0

comment 0

This is the monthly dump of DISCOGS.ORG data, provided to the public domain. This dump has been generated and archived automatically. Official name is discogs_20110701 and the data is from 2011-07-01.
Minecraft Archive Project: MINECRAFTFORUM.NET
MusicBrainz Database Dump 20141220-014845
MusicBrainz Database Dump 20171118-001439
Bulk Bibliographic Metadata
data

eye 30

favorite 0

comment 0

This item contains an annual copy of the ORCID public data file, as originally downloaded from: https://orcid.org/content/download-file More details about this content and it's use available at: https://orcid.org/content/orcid-public-data-file This dataset is available under the public domain (CC-0). The DOI of this dataset is: https://doi.org/10.6084/m9.figshare.1582705
OpenStreetMap datasets
data

eye 10

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on November 24, 2015.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 11

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on April 29, 2015.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 5

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on June 26, 2015.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 3

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on April 04, 2016.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 8

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on September 04, 2016.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 14

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on January 26, 2017.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 5

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on June 25, 2017.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 4

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on May 06, 2017.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 6

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on October 12, 2017.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 4

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on October 09, 2019.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 3

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on December 29, 2018.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on May 07, 2019.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 4

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on April 22, 2018.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 5

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on November 28, 2017.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on November 20, 2020.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on November 07, 2020.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on July 29, 2020.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on October 16, 2020.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on June 10, 2021.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on August 03, 2021.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 2

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on July 17, 2020.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 2

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on March 13, 2020.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on September 13, 2021.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on November 22, 2021.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on December 15, 2021.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on December 09, 2021.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on December 01, 2021.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on February 19, 2022.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on April 09, 2022.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 3

favorite 0

comment 0

This is an XML dump of user-contributed notes on OpenStreetMap that is generated on January 15, 2022.
Topics: osm, dumps, notes
OpenStreetMap datasets
data

eye 3

favorite 0

comment 0

This is full dump of OSM data on OpenStreetMap in the Protocolbuffer Binary Format (PBF) format that is generated on February 10, 2020.
Topics: osm, dumps, pbf
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is full dump of OSM data on OpenStreetMap in the Protocolbuffer Binary Format (PBF) format that is generated on November 01, 2021.
Topics: osm, dumps, pbf
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is full dump of OSM data on OpenStreetMap in the Protocolbuffer Binary Format (PBF) format that is generated on November 22, 2021.
Topics: osm, dumps, pbf
OpenStreetMap datasets
data

eye 8

favorite 0

comment 0

This is full dump of OSM data on OpenStreetMap in the Protocolbuffer Binary Format (PBF) format that is generated on September 19, 2016.
Topics: osm, dumps, pbf
OpenStreetMap datasets
data

eye 8

favorite 0

comment 0

This is full history dump of OSM data on OpenStreetMap in the Protocolbuffer Binary Format (PBF) format that is generated on July 25, 2016.
Topics: osm, dumps, pbf, history
OpenStreetMap datasets
data

eye 5

favorite 0

comment 0

This is full dump of OSM data on OpenStreetMap that is generated on August 08, 2016.
Topics: osm, dumps, planet
This is full history dump of OSM data on OpenStreetMap in the Protocolbuffer Binary Format (PBF) format that is generated on September 17, 2018.
Topics: osm, dumps, pbf, history
OpenStreetMap datasets
data

eye 20

favorite 0

comment 0

This is full dump of OSM data on OpenStreetMap that is generated on August 15, 2016.
Topics: osm, dumps, planet
OpenStreetMap datasets
data

eye 5

favorite 0

comment 0

This is full dump of OSM data on OpenStreetMap that is generated on January 16, 2017.
Topics: osm, dumps, planet
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is full history dump of OSM data on OpenStreetMap in the Protocolbuffer Binary Format (PBF) format that is generated on November 15, 2021.
Topics: osm, dumps, pbf, history
OpenStreetMap datasets
data

eye 10

favorite 0

comment 0

This is full dump of OSM data on OpenStreetMap that is generated on March 07, 2016.
Topics: osm, dumps, planet
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is full dump of OSM data on OpenStreetMap that is generated on August 23, 2021.
Topics: osm, dumps, planet
OpenStreetMap datasets
data

eye 12

favorite 0

comment 0

This is full dump of OSM data on OpenStreetMap that is generated on August 14, 2017.
Topics: osm, dumps, planet
OpenStreetMap datasets
data

eye 1

favorite 0

comment 0

This is full dump of OSM data on OpenStreetMap that is generated on May 24, 2021.
Topics: osm, dumps, planet