Skip to main content

451
UPLOADS


Show sorted alphabetically

More right-solid

Show sorted alphabetically

More right-solid
SHOW DETAILS
eye
Title
Date Archived
Creator
Rescue Crawls
Rescue Crawls
collection
2
ITEMS
679
VIEWS
collection

eye 679

Rescue crawls conducted by the public for sites that have announced that they are closing.
Ferguson Tweets
Ferguson Tweets
collection
212
ITEMS
2.2M
VIEWS
collection

eye 2.2M

IDs of tweets that mention Ferguson, Missouri between August 10th and August 27th, 2014 subsequent to the death of Michael Brown . Tweets collected by Ed Summers. He subsequently extracted the URLs from these tweets, and they were crawled by the Internet Archive. Please read Summers's article at inkdroid.org , with an update here , for more information. Photo: " Memorial to Michael Brown " by Jamelle Bouie
Mercator Crawl
Mercator Crawl
collection
1
ITEMS
87
VIEWS
collection

eye 87

Crawl done with the DEC/HP-labs 'Mercator' crawler and converted to ARC format. This data is currently not publicly accessible.
Accelovation Crawl
Accelovation Crawl
collection
1,324
ITEMS
91.2M
VIEWS
collection

eye 91.2M

Web crawl snapshots generously donated from Accelovation . This data is currently not publicly accessible. From the site : Accelovation is pioneering the delivery of Insight Discovery™ software solutions that help companies move from innovation idea to product reality faster and with more success. Our solutions are used by leading firms in the Fortune 500 and beyond – companies from a diverse set of industries ranging from consumer packaged goods to high tech, foods to chemicals, and...
web_is_m
web_is_m
collection
1
ITEMS
13,714
VIEWS
collection

eye 13,714

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_sm_or
web_sm_or
collection
16
ITEMS
3.6M
VIEWS
collection

eye 3.6M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_pop
web_pop
collection
13
ITEMS
3.7M
VIEWS
collection

eye 3.7M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_sm_prin
web_sm_prin
collection
1
ITEMS
144,488
VIEWS
collection

eye 144,488

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_mon
web_mon
collection
3,809
ITEMS
150.7M
VIEWS
collection

eye 150.7M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl BK
Alexa Crawl BK
collection
1
ITEMS
103,387
VIEWS
collection

eye 103,387

Crawl BK from Alexa Internet. This data is currently not publicly accessible.
Alexa MP3.com Crawl
Alexa MP3.com Crawl
collection
43
ITEMS
141,994
VIEWS
collection

eye 141,994

MP3.com Crawl from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl DJ
Alexa Crawl DJ
collection
341
ITEMS
86M
VIEWS
collection

eye 86M

Crawl DJ from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl CRC
Alexa Crawl CRC
collection
32
ITEMS
27.9M
VIEWS
collection

eye 27.9M

Crawl CRC from Alexa Internet. This data is currently not publicly accessible.
collection

eye 88,526

Demo crawl for National Oceanic and Atmospheric Administration (NOAA). This data is currently not publicly accessible. from Wikipedia : The National Oceanic and Atmospheric Administration (NOAA) is a scientific agency within the United States Department of Commerce focused on the conditions of the oceans and the atmosphere. NOAA warns of dangerous weather, charts seas and skies, guides the use and protection of ocean and coastal resources, and conducts research to improve understanding and...
September 11th
September 11th
collection
1
ITEMS
890,747
VIEWS
collection

eye 890,747

Data related to September 11th, 2001 collected by Internet Archive. This data is currently not publicly accessible. from Wikipedia : The September 11 attacks (also referred to as September 11, September 11th, or 9/11 were a series of four coordinated terrorist attacks launched by the Islamic terrorist group al-Qaeda upon the United States in New York City and the Washington, D.C. areas on September 11, 2001.
Yahoo! Video Crawl
Yahoo! Video Crawl
collection
4,484
ITEMS
54,311
VIEWS
collection

eye 54,311

Pages captured from Yahoo! Video prior to removal of user uploads. Crawl Started February 2011. This data is currently not publicly accessible. from Wikipedia : Yahoo! Video is a video sharing website on which users could upload and share videos. The service is owned and created by Yahoo! Yahoo! Video began as an internet-wide video search engine and added the ability to upload and share video clips in June 2006. A re-designed site was launched in February 2008 that changed the focus to...
National Science Digital Library
National Science Digital Library
collection
3
ITEMS
55,761
VIEWS
collection

eye 55,761

Demo crawl for the National Science Digital Library. This data is currently not publicly accessible. from Wikipedia : The United States' National Science Digital Library (NSDL) is an open-access online digital library and collaborative network of disciplinary and grade-level focused education providers. NSDL's mission is to provide quality digital learning collections to the science, technology, engineering, and mathematics (STEM) education community, both formal and informal, institutional and...
web_osi
web_osi
collection
677
ITEMS
32.2M
VIEWS
collection

eye 32.2M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa 1996 Election Crawl
Alexa 1996 Election Crawl
collection
1
ITEMS
46,094
VIEWS
collection

eye 46,094

1996 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl DL
Alexa Crawl DL
collection
413
ITEMS
101.4M
VIEWS
collection

eye 101.4M

Crawl DL from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl ARC
Alexa Crawl ARC
collection
79
ITEMS
25.5M
VIEWS
collection

eye 25.5M

Crawl ARC from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl Robot
Alexa Crawl Robot
collection
1
ITEMS
104,037
VIEWS
collection

eye 104,037

Crawl Robot from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl ST
Alexa Crawl ST
collection
1
ITEMS
925,563
VIEWS
collection

eye 925,563

Crawl ST from Alexa Internet. This data is currently not publicly accessible.
Mayoral Crawls
Mayoral Crawls
collection
1
ITEMS
284,165
VIEWS
collection

eye 284,165

Mayoral crawls performed by Internet Archive. This data is currently not publicly accessible.
web_eg
web_eg
collection
32
ITEMS
4.2M
VIEWS
collection

eye 4.2M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_ind
web_ind
collection
91
ITEMS
8.7M
VIEWS
collection

eye 8.7M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_ma
web_ma
collection
1,085
ITEMS
76.8M
VIEWS
collection

eye 76.8M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
2004 Election
2004 Election
collection
178
ITEMS
14.4M
VIEWS
collection

eye 14.4M

2004 Election crawl performed by Internet Archive. This data is currently not publicly accessible.
To Crawl
To Crawl
collection
1
ITEMS
142,660
VIEWS
collection

eye 142,660

Data collected by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl DX
Alexa Crawl DX
collection
1,442
ITEMS
179.5M
VIEWS
collection

eye 179.5M

Crawl DX from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl Short
Alexa Crawl Short
collection
5
ITEMS
8M
VIEWS
collection

eye 8M

Crawl Short from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl AUG
Alexa Crawl AUG
collection
80
ITEMS
50.6M
VIEWS
collection

eye 50.6M

Crawl AUG from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl Test
Alexa Crawl Test
collection
6
ITEMS
15M
VIEWS
collection

eye 15M

Crawl Test from Alexa Internet. This data is currently not publicly accessible.
collection

eye 21,900

National Library of Ireland Crawls
National Library of Ireland Crawls
collection
2,623
ITEMS
35M
VIEWS
collection

eye 35M

Crawls performed by Internet Archive on behalf of the National Library of Ireland. This data is currently not publicly accessible.
web_tran
web_tran
collection
4,192
ITEMS
136.7M
VIEWS
collection

eye 136.7M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
2004 Indian Ocean earthquake and tsunami
2004 Indian Ocean earthquake and tsunami
collection
42
ITEMS
7M
VIEWS
collection

eye 7M

Data related to the 2004 Indian Ocean earthquake and tsunami collected by Internet Archive. This data is currently not publicly accessible.
VOX.com Crawl September 2010
VOX.com Crawl September 2010
collection
28
ITEMS
1.3M
VIEWS
collection

eye 1.3M

Crawl of vox.com, September 2010. This was an attempt to preserve vox.com content as much as possible in the wake of service closure, September 30, 2010.
Topic: webwidecrawl
Brookings Institute Crawl
Brookings Institute Crawl
collection
1
ITEMS
166,974
VIEWS
collection

eye 166,974

Crawl data gather by Internet Archive on behalf of the Brookings Institute. This data is currently not publicly accessible.
Alexa Crawl DH
Alexa Crawl DH
collection
141
ITEMS
44.3M
VIEWS
collection

eye 44.3M

Crawl DH from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl GR
Alexa Crawl GR
collection
74
ITEMS
16.7M
VIEWS
collection

eye 16.7M

Crawl GR from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl TS
Alexa Crawl TS
collection
1
ITEMS
10,717
VIEWS
collection

eye 10,717

Crawl TS from Alexa Internet. This data is currently not publicly accessible.
collection

eye 1.1M

Nigerian Election
Nigerian Election
collection
1
ITEMS
39,924
VIEWS
collection

eye 39,924

Data related to Nigerian elections, 2001 collected by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl DZ
Alexa Crawl DZ
collection
1,207
ITEMS
152.7M
VIEWS
collection

eye 152.7M

Crawl DZ from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl EH
Alexa Crawl EH
collection
1,218
ITEMS
181.6M
VIEWS
collection

eye 181.6M

Crawl EH from Alexa Internet. This data is currently not publicly accessible.
UK Government Site Crawl
UK Government Site Crawl
collection
107
ITEMS
6.3M
VIEWS
collection

eye 6.3M

Collaborative closure crawl of British government sites performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : GOV.UK is a United Kingdom public sector information website, created by the Government Digital Service to provide a single point of access to HM Government services.
Target Product Crawl
Target Product Crawl
collection
4
ITEMS
446
VIEWS
collection

eye 446

Target product crawl data collected by Alexa Internet. This data is currently not publicly accessible.
web_wk
web_wk
collection
9,973
ITEMS
321.6M
VIEWS
collection

eye 321.6M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_el
web_el
collection
925
ITEMS
67.5M
VIEWS
collection

eye 67.5M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_leg
web_leg
collection
58
ITEMS
9.3M
VIEWS
collection

eye 9.3M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Bibliotheque Nationale de France Domain Crawls
Bibliotheque Nationale de France Domain Crawls
collection
1,653
ITEMS
192M
VIEWS
collection

eye 192M

Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
Edu & Gov Crawl, June 2010
Edu & Gov Crawl, June 2010
collection
704
ITEMS
22.4M
VIEWS
collection

eye 22.4M

TEST COLLECTION: Crawl of .edu and .gov sites started in June 2010.
Topic: crawldata
web_oso
web_oso
collection
150
ITEMS
13.1M
VIEWS
collection

eye 13.1M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
University of Michigan
University of Michigan
collection
5
ITEMS
1.8M
VIEWS
collection

eye 1.8M

Data collected by Internet Archive on behalf of University of Michigan. This data is currently not publicly accessible. from Wikipedia : The University of Michigan, frequently referred to as simply Michigan, is a public research university located in Ann Arbor, Michigan, United States. It is the state's oldest university and the flagship campus of the University of Michigan.
FS Fed US
FS Fed US
collection
3
ITEMS
18,820
VIEWS
collection

eye 18,820

Data collected in 2005 by Internet Archive. This data is currently not publicly accessible.
Hurricane Katrina
Hurricane Katrina
collection
112
ITEMS
11M
VIEWS
collection

eye 11M

Data related to Hurricane Katrina collected in 2005 by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Hurricane Katrina was the deadliest and most destructive Atlantic hurricane of the 2005 Atlantic hurricane season. It was the costliest natural disaster, as well as one of the five deadliest hurricanes, in the history of the United States. Among recorded Atlantic hurricanes, it was the sixth strongest overall. At least 1,833 people died in the hurricane and...
collection

eye 239,190

Alexa 2002 Election Crawl
Alexa 2002 Election Crawl
collection
24
ITEMS
20.2M
VIEWS
collection

eye 20.2M

2002 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl TO
Alexa Crawl TO
collection
1
ITEMS
2.2M
VIEWS
collection

eye 2.2M

Crawl TO from Alexa Internet. This data is currently not publicly accessible.
NDIIPP Reality
NDIIPP Reality
collection
1
ITEMS
6,010
VIEWS
collection

eye 6,010

Immersive gaming environments R&D project for National Digital Internet Infrastructure Preservation Program. This data is currently not publicly accessible. from Wikipedia : The National Digital Information Infrastructure and Preservation Program (NDIIPP) is an archival program led by the Library of Congress to archive and provide access to digital resources. The U.S. Congress established the program in 2000. The Library was chosen because of its role as one of the leading providers of...
Open Sky
Open Sky
collection
1
ITEMS
3,124
VIEWS
collection

eye 3,124

Demo crawl of scientific data. This data is currently not publicly accessible.
Alexa Traffic
Alexa Traffic
collection
89
ITEMS
1,041
VIEWS
collection

eye 1,041

Traffic files from Alexa Internet that are sanitized-- just base urls (no parameters) and time/date. This data is currently not publicly accessible. Covers the period from December 2001 to February 2009.
Product DB
Product DB
collection
1
ITEMS
8,997
VIEWS
collection

eye 8,997

Product DB data collected by Alexa Internet. This data is currently not publicly accessible.
Swiss National Library
Swiss National Library
collection
12
ITEMS
430,729
VIEWS
collection

eye 430,729

Data collected by Internet Archive on behalf of the Swiss National Library. This data is currently not publicly accessible.
Standards
Standards
collection
1
ITEMS
965
VIEWS
collection

eye 965

Standards crawl data collected by Internet Archive. This data is currently not publicly accessible.
web_con
web_con
collection
1,507
ITEMS
74.1M
VIEWS
collection

eye 74.1M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_dar
web_dar
collection
112
ITEMS
9.1M
VIEWS
collection

eye 9.1M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa 2000 Election Crawl
Alexa 2000 Election Crawl
collection
4
ITEMS
349,263
VIEWS
collection

eye 349,263

2000 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl F2
Alexa Crawl F2
collection
1
ITEMS
285
VIEWS
collection

eye 285

Crawl F2 from Alexa Internet. This data is currently not publicly accessible.
collection

eye 259,840

NDIIPP Youtube Crawl
NDIIPP Youtube Crawl
collection
90
ITEMS
3.2M
VIEWS
collection

eye 3.2M

Youtube crawl performed by Internet Archive on behalf of the National Digital Internet Infrastructure Preservation Program. This data is currently not publicly accessible.
Alexa Crawl RECY
Alexa Crawl RECY
collection
1
ITEMS
210,708
VIEWS
collection

eye 210,708

Crawl RECY from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl Image
Alexa Crawl Image
collection
92
ITEMS
57.8M
VIEWS
collection

eye 57.8M

Crawl Image from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl EI
Alexa Crawl EI
collection
1,408
ITEMS
218.1M
VIEWS
collection

eye 218.1M

Crawl EI from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl Title
Alexa Crawl Title
collection
1
ITEMS
471,024
VIEWS
collection

eye 471,024

Crawl Title from Alexa Internet. This data is currently not publicly accessible.
web_is
web_is
collection
5
ITEMS
2.4M
VIEWS
collection

eye 2.4M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_sm_sing
web_sm_sing
collection
3
ITEMS
1.5M
VIEWS
collection

eye 1.5M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_sup
web_sup
collection
88
ITEMS
9.3M
VIEWS
collection

eye 9.3M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_iq
web_iq
collection
2,637
ITEMS
268.2M
VIEWS
collection

eye 268.2M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Inktomi 2001
Inktomi 2001
collection
1
ITEMS
118,292
VIEWS
collection

eye 118,292

Data collected in 2001. This data is currently not publicly accessible. from Wikipedia : Inktomi Corporation was a California company that provided software for Internet service providers. It was founded in 1996 by UC Berkeley professor Eric Brewer and graduate student Paul Gauthier. The company was initially founded based on the real-world success of the web search engine they developed at the university. After the bursting of the dot-com bubble, Inktomi was acquired by Yahoo!
National Library of Sweden
National Library of Sweden
collection
310
ITEMS
33M
VIEWS
collection

eye 33M

Data collected by Internet Archive on behalf of the National Library of Sweden. This data is currently not publicly accessible.
NL TV
NL TV
collection
1
ITEMS
89,495
VIEWS
collection

eye 89,495

Data collected in 2005. This data is currently not publicly accessible.
Wikipedia Dumps
by Wikipedia
web

eye 99

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 104

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 100

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 100

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 102

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 99

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 108

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 96

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 83

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 102

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 97

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 92

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 78

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 83

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 109

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 96

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 109

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010