128.4M
128M
web
eye 128.4M
favorite 5
comment 1
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
WARCs from internal crawl testing.
Topics: web, cctld
793,689
794K
web
eye 793,689
favorite 0
comment 0
Source: top_domains-00300
371,417
371K
web
eye 371,417
favorite 0
comment 0
368,706
369K
web
eye 368,706
favorite 0
comment 0
396,068
396K
web
eye 396,068
favorite 0
comment 0
364,354
364K
web
eye 364,354
favorite 0
comment 0
403,531
404K
web
eye 403,531
favorite 0
comment 0
370,625
371K
web
eye 370,625
favorite 0
comment 0
415,152
415K
web
eye 415,152
favorite 0
comment 0
381,210
381K
web
eye 381,210
favorite 0
comment 0
394,194
394K
web
eye 394,194
favorite 0
comment 0
381,333
381K
web
eye 381,333
favorite 0
comment 0
This collection depicts the events surrounding the 2011 Earthquake and Tsunami in Japan and the post-disaster reconstruction. Content includes blogs, social commentary, television/online news sites and aid organizations, with content in both English and Japanese.
Topics: earthquake, tsunami, Japan
370,970
371K
web
eye 370,970
favorite 0
comment 0
348,575
349K
web
eye 348,575
favorite 0
comment 0
Internet Archive crawldata from feed-driven WordPress Crawl, captured by crawl108.us.archive.org:no404 from Tue Feb 5 00:15:10 PST 2019 to Tue Feb 5 02:11:07 PST 2019.
Topics: no404, wordpress, crawldata
325,106
325K
web
eye 325,106
favorite 0
comment 0
291,515
292K
Feb 6, 2019
02/19
by
Archive-It
web
eye 291,515
favorite 0
comment 0
30.6M
31M
Aug 4, 2020
08/20
by
Archive-It
web
eye 30.6M
favorite 0
comment 0
36.8M
37M
web
eye 36.8M
favorite 1
comment 0
299,472
299K
web
eye 299,472
favorite 0
comment 0
246,298
246K
web
eye 246,298
favorite 0
comment 0
12.2M
12M
Aug 4, 2017
08/17
by
Internet Archive Web Group
Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
15M
15M
Jul 17, 2018
07/18
by
Internet Archive Web Group
Web archive data from a crawl of open access PDF URLs provided by Unpaywall.
10.1M
10M
Jul 6, 2020
07/20
by
Internet Archive Web Group
22.4M
22M
web
eye 22.4M
favorite 2
comment 0
21.2M
21M
web
eye 21.2M
favorite 0
comment 0
22.3M
22M
web
eye 22.3M
favorite 1
comment 0
46.5M
47M
Jun 21, 2011
06/11
by
Archive-It
This collection documents the events in Northern Africa and the Middle East starting in January 2011. Content includes blogs, social media and news sites about Egypt, Yemen, Libya, Sudan and other countries. Countries separated by site groups (scroll down the page to see all of them). Archived content is in Arabic, English, and French.
Topics: North Africa, Middle East, blogs, social media
148,183
148K
Oct 1, 2020
10/20
by
Internet Archive
web
eye 148,183
favorite 0
comment 0
Internet Archive crawldata from Twitter Outlinks Crawl, captured by crawl502.us.archive.org:twitter_outlinks from Wed Sep 30 14:06:27 PDT 2020 to Wed Sep 30 22:47:56 PDT 2020.
Topics: twitter, crawldata
183,337
183K
web
eye 183,337
favorite 0
comment 0
144,557
145K
Oct 7, 2020
10/20
by
Internet Archive
web
eye 144,557
favorite 0
comment 0
Internet Archive crawldata from Twitter Outlinks Crawl, captured by crawl420.us.archive.org:twitter_outlinks from Tue Oct 6 23:30:26 PDT 2020 to Tue Oct 6 18:41:38 PDT 2020.
Topics: twitter, crawldata
147,897
148K
Oct 1, 2020
10/20
by
Internet Archive
web
eye 147,897
favorite 0
comment 0
Internet Archive crawldata from Twitter Outlinks Crawl, captured by crawl424.us.archive.org:twitter_outlinks from Thu Oct 1 06:35:23 PDT 2020 to Thu Oct 1 01:13:50 PDT 2020.
Topics: twitter, crawldata
139,562
140K
Oct 1, 2020
10/20
by
Internet Archive
web
eye 139,562
favorite 0
comment 0
Internet Archive crawldata from Twitter Outlinks Crawl, captured by crawl421.us.archive.org:twitter_outlinks from Thu Oct 1 06:42:17 PDT 2020 to Thu Oct 1 00:36:26 PDT 2020.
Topics: twitter, crawldata
148,333
148K
Oct 1, 2020
10/20
by
Internet Archive
web
eye 148,333
favorite 0
comment 0
Internet Archive crawldata from Twitter Outlinks Crawl, captured by crawl421.us.archive.org:twitter_outlinks from Thu Oct 1 09:18:03 PDT 2020 to Thu Oct 1 03:20:14 PDT 2020.
Topics: twitter, crawldata
150,216
150K
Oct 8, 2020
10/20
by
Internet Archive
web
eye 150,216
favorite 0
comment 0
Internet Archive crawldata from Twitter Outlinks Crawl, captured by crawl423.us.archive.org:twitter_outlinks from Thu Oct 8 07:42:33 PDT 2020 to Thu Oct 8 03:34:40 PDT 2020.
Topics: twitter, crawldata
154,010
154K
web
eye 154,010
favorite 0
comment 0
148,050
148K
Oct 5, 2020
10/20
by
Internet Archive
web
eye 148,050
favorite 0
comment 0
Internet Archive crawldata from Twitter Outlinks Crawl, captured by crawl428.us.archive.org:twitter_outlinks from Sun Oct 4 23:09:21 PDT 2020 to Sun Oct 4 16:56:14 PDT 2020.
Topics: twitter, crawldata
144,165
144K
Sep 28, 2020
09/20
by
Internet Archive
web
eye 144,165
favorite 0
comment 0
Internet Archive crawldata from Twitter Outlinks Crawl, captured by crawl428.us.archive.org:twitter_outlinks from Mon Sep 28 19:14:38 PDT 2020 to Mon Sep 28 13:22:35 PDT 2020.
Topics: twitter, crawldata
146,872
147K
Oct 1, 2020
10/20
by
Internet Archive
web
eye 146,872
favorite 0
comment 0
Internet Archive crawldata from Twitter Outlinks Crawl, captured by crawl420.us.archive.org:twitter_outlinks from Thu Oct 1 01:05:38 PDT 2020 to Wed Sep 30 19:57:57 PDT 2020.
Topics: twitter, crawldata
154,117
154K
Oct 6, 2020
10/20
by
Internet Archive
web
eye 154,117
favorite 0
comment 0
Internet Archive crawldata from Twitter Outlinks Crawl, captured by crawl421.us.archive.org:twitter_outlinks from Tue Oct 6 11:31:35 PDT 2020 to Tue Oct 6 06:17:21 PDT 2020.
Topics: twitter, crawldata
ArchiveBot is an Archive Team service to quickly grab smaller at-risk or critical sites to bring copies into the Internet Archive Wayback machine.
189,424
189K
web
eye 189,424
favorite 0
comment 0
179,653
180K
web
eye 179,653
favorite 0
comment 0
185,821
186K
web
eye 185,821
favorite 0
comment 0
122,791
123K
Oct 1, 2020
10/20
by
Archive-It
web
eye 122,791
favorite 0
comment 0
122,009
122K
Oct 1, 2020
10/20
by
Archive-It
web
eye 122,009
favorite 0
comment 0
202,506
203K
web
eye 202,506
favorite 0
comment 0