Skip to main content

Custom Crawl Services

Internet Archive

Large-scale web harvests and national domain crawls performed for National Libraries, National Archives, preservation partners, research initiatives, and as part of special projects and custom crawling and research services.



rss RSS

167,187
RESULTS


Show sorted alphabetically

Show sorted alphabetically

SHOW DETAILS
up-solid down-solid
Prior Page
eye
Title
Date Archived
Creator
NLA_2010
NLA_2010
collection
180
ITEMS
14M
VIEWS
collection

eye 14M

This crawl was a domain scale harvest of .au performed for the National Library of Australia in 2010.
Topics: nla, web, 2010
NARA 111th Congressional Crawl
NARA 111th Congressional Crawl
collection
216
ITEMS
15.1M
VIEWS
collection

eye 15.1M

This crawl of online resources of the 111th Congress of the United States was performed in Fall of 2010 and Winter of 2011 on behalf of NARA.
Topics: nara, 111th, congress, web
NLS_elec2011
NLS_elec2011
collection
280
ITEMS
15.1M
VIEWS
collection

eye 15.1M

This crawl was performed on behalf of the National Library of Spain (BNE) in Fall of 2011 to archive the National elections in Spain.
Topics: elections, web, 2011, spain, bne
Fed Site Closure Crawls
Fed Site Closure Crawls
collection
1,858
ITEMS
15.3M
VIEWS
collection

eye 15.3M

These are crawls performed on US Federal Government Web sites prior to their removal or merge with other resources.
Topics: federal, web, closures
Fed Site Closures 2011
Fed Site Closures 2011
collection
1,855
ITEMS
15.3M
VIEWS
collection

eye 15.3M

This crawl was performed in Fall of 2011 to archive Federal government web sites that were either slated for removal or for merger with other online resources.
Topics: federal, web, 2011
NLIL_2014
NLIL_2014
collection
971
ITEMS
14M
VIEWS
by dominic@archive.org
collection

eye 14M

This crawl of the .il domain was performed in 2014 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2014
NLNZ_2020
NLNZ_2020
collection
2,107
ITEMS
6M
VIEWS
collection

eye 6M

WEWA domain crawls
WEWA domain crawls
collection
6,436
ITEMS
9.2M
VIEWS
collection

eye 9.2M

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 17.0px; font: 12.8px Menlo; color: #161516; background-color: #ffffff} span.s1 {font-kerning: none} WARCS from Whole Earth Web Archive (WEWA) Domain Crawls
Topic: web
BNL 2022 Spring Domain Crawl
BNL 2022 Spring Domain Crawl
collection
442
ITEMS
1.7M
VIEWS
collection

eye 1.7M

016-2022-Spring domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive March - May 2022 on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg.
Topic: crawldata
Indonesia 2017 Domain Crawl
Indonesia 2017 Domain Crawl
collection
667
ITEMS
9.5M
VIEWS
collection

eye 9.5M

Crawls performed by the Internet Archive of the .id (Indonesia) web domain. This data is not currently publicly accessible.
Topics: web, 2017
BNL 2017 Winter Domain Crawl
BNL 2017 Winter Domain Crawl
collection
1,181
ITEMS
8.9M
VIEWS
collection

eye 8.9M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in December 2017 and January 2018.
Topics: web, 2017, 2018, luxembourg, BNL
KB Curated List Crawl 2019
KB Curated List Crawl 2019
collection
1,287
ITEMS
7M
VIEWS
collection

eye 7M

KB Curated List Crawl 2019.  This data is not currently publicly accessible.
Topic: web
Wide Web Targeted PDF Crawling (2017)
Wide Web Targeted PDF Crawling (2017)
collection
922
ITEMS
3M
VIEWS
by Internet Archive Web Group
collection

eye 3M

MAG-PDF-CRAWL-2020-07
MAG-PDF-CRAWL-2020-07
collection
196
ITEMS
1.6M
VIEWS
by Internet Archive Web Group
collection

eye 1.6M

DOI-LANDING-CRAWL-2018-06
DOI-LANDING-CRAWL-2018-06
collection
279
ITEMS
3.3M
VIEWS
by Internet Archive Web Group
collection

eye 3.3M

OA-DOI-CRAWL-2020-12
OA-DOI-CRAWL-2020-12
collection
191
ITEMS
1.4M
VIEWS
by Internet Archive Web Group
collection

eye 1.4M

UNPAYWALL-PDF-CRAWL-2020-11
UNPAYWALL-PDF-CRAWL-2020-11
collection
199
ITEMS
1.6M
VIEWS
by Internet Archive Web Group
collection

eye 1.6M

nlnz_2010
collection
167
ITEMS
10.5M
VIEWS
collection

eye 10.5M

this data is currently not publicly accessible.
National Library of Ireland Domain Crawl 2007
National Library of Ireland Domain Crawl 2007
collection
62
ITEMS
11.6M
VIEWS
collection

eye 11.6M

Crawl of the Ireland web domain, .ie, performed for the National Library of Ireland in 2007. This data is currently not publicly accessible.
UNPAYWALL-PDF-CRAWL-2020-05
UNPAYWALL-PDF-CRAWL-2020-05
collection
282
ITEMS
1.7M
VIEWS
by Internet Archive Web Group
collection

eye 1.7M

MAG-PDF-CRAWL-2021-08
MAG-PDF-CRAWL-2021-08
collection
189
ITEMS
704,655
VIEWS
collection

eye 704,655

NLS_humanidades
NLS_humanidades
collection
296
ITEMS
11.7M
VIEWS
collection

eye 11.7M

This crawl was performed in 2011 and 2012 on behalf of the National Library of Spain (BNE) to archive digital humanities web sites and online resources in Spain.
Topics: bne, spain, web, humanities, humanidades, 2011, 2012
UNPAYWALL-PDF-CRAWL-2021-07
UNPAYWALL-PDF-CRAWL-2021-07
collection
174
ITEMS
950,686
VIEWS
collection

eye 950,686

nlnzweb2015
nlnzweb2015
collection
1,071
ITEMS
13.1M
VIEWS
collection

eye 13.1M

This collection includes content harvested from the Web on behalf of the National Library & Archives New Zealand in January 2015.
Topics: new zealand, web, domain
BNL 2018 Summer Domain Crawl
BNL 2018 Summer Domain Crawl
collection
739
ITEMS
7.3M
VIEWS
collection

eye 7.3M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in July 2018.
Topic: web
National Library of Luxembourg Crawl Fall 2016
National Library of Luxembourg Crawl Fall 2016
collection
817
ITEMS
10.7M
VIEWS
collection

eye 10.7M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg  in September and October of 2016.
Topic: Luxembourg
BNL 2017 Summer Domain Crawl
BNL 2017 Summer Domain Crawl
collection
944
ITEMS
7.9M
VIEWS
collection

eye 7.9M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg  in June and July of 2017.
Topics: BNL, web, 2017
NARA 116th Congressional Crawl
NARA 116th Congressional Crawl
collection
2,383
ITEMS
3.4M
VIEWS
collection

eye 3.4M

This crawl of online resources of the 116th US Congress was performed on behalf of The United States National Archives & Records
Topic: crawldata
BNL 2021 Winter Domain Crawl
BNL 2021 Winter Domain Crawl
collection
1,449
ITEMS
1.6M
VIEWS
collection

eye 1.6M

015-2021-winter domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive December 2021 - January 2022 on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg.
Topic: crawldata
Olympics Crawl 2014
Olympics Crawl 2014
collection
1,339
ITEMS
11.9M
VIEWS
collection

eye 11.9M

These crawls were performed by IA on behalf of the IIPC in Winter 2014 during and prior to the 2014 Winter Olympics and Paralympic Games held in Sochi, Russia.
Topics: olympics 2014, web, sport, olympic games
OA-JOURNAL-CRAWL-2019-08
OA-JOURNAL-CRAWL-2019-08
collection
201
ITEMS
2.7M
VIEWS
by Internet Archive Web Group
collection

eye 2.7M

BNL 2019 Winter Domain Crawl
BNL 2019 Winter Domain Crawl
collection
1,189
ITEMS
4.9M
VIEWS
collection

eye 4.9M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in January 2019.
Topic: web
PLATFORM-CRAWL-2020
PLATFORM-CRAWL-2020
collection
649
ITEMS
412,460
VIEWS
by Internet Archive Web Group
collection

eye 412,460

TARGETED-ARTICLE-CRAWL-2022-04
TARGETED-ARTICLE-CRAWL-2022-04
collection
219
ITEMS
233,180
VIEWS
collection

eye 233,180

collection

eye 6.6M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg from December 2016 to January 2017.
Topic: Luxembourg
BNL 2021 Summer Domain Crawl
BNL 2021 Summer Domain Crawl
collection
1,514
ITEMS
1.9M
VIEWS
collection

eye 1.9M

013-2021-summer domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive July-August 2021 on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg.
Topic: crawldata
SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
collection
1,011
ITEMS
1.4M
VIEWS
by Internet Archive Web Group
collection

eye 1.4M

UNPAYWALL-PDF-CRAWL-2021-05
UNPAYWALL-PDF-CRAWL-2021-05
collection
123
ITEMS
869,386
VIEWS
by Internet Archive Web Group
collection

eye 869,386

NLAgov_2010
NLAgov_2010
collection
630
ITEMS
8M
VIEWS
collection

eye 8M

This crawl was performed on the .gov.au domain in 2010 on behalf of the National Library of Australia.
Topics: nla, gov.web, 2010
BNL 2021 Autumn Domain Crawl
BNL 2021 Autumn Domain Crawl
collection
1,251
ITEMS
1.5M
VIEWS
collection

eye 1.5M

014-2021-autumn domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive October-November 2021 on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg.
Topic: crawldata
collection

eye 1.9M

IA crawl of PDF urls provided by Semantic Scholar.
Topic: pdf
UNT Web
UNT Web
collection
35
ITEMS
6.8M
VIEWS
collection

eye 6.8M

This collection contains all collaborative crawl data contributed by University of North Texas (UNT).
Topics: UNT, web, texas, eot
Olympics Crawl 2010
Olympics Crawl 2010
collection
21
ITEMS
5M
VIEWS
collection

eye 5M

These crawls were performed by IA on behalf of the IIPC in Winter 2010 during and prior to the 2010 Winter Olympics held in Vancouver, BC, Canada.
Topics: winter, olympics, 2010, IIPC, web
CiteSeerX URL Crawl 2017
CiteSeerX URL Crawl 2017
collection
207
ITEMS
1.1M
VIEWS
collection

eye 1.1M

A targeted crawl to fetch research publications from the public web which have been crawled by CiteSeerX but have not previously been crawled by the Internet Archive.
Topics: scholarly, papers, journal
DOAJ-CRAWL-2020-11
DOAJ-CRAWL-2020-11
collection
102
ITEMS
876,890
VIEWS
by Internet Archive Web Group
collection

eye 876,890

by Internet Archive
collection

eye 5.5M

This collection includes all resources harvested from the online presence of the Legislative branch of the US Federal government as part of the NARA 112th Congressional Web Harvest Test Crawl. The crawl was performed from October 16th through November 5th 2012.
Topics: NARA, 112th, Congress
BNL 2021 Spring Domain Crawl
BNL 2021 Spring Domain Crawl
collection
1,203
ITEMS
1.6M
VIEWS
collection

eye 1.6M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in April-May 2021.
Topic: crawldata
IMLS Museum Universe 00000
IMLS Museum Universe 00000
collection
610
ITEMS
8.3M
VIEWS
collection

eye 8.3M

Crawl 00000 of the IMLS Museum Universe Date File.
Topic: museum imls universe
OAI-PMH-PATCH-CRAWL-2021-12
OAI-PMH-PATCH-CRAWL-2021-12
collection
75
ITEMS
306,005
VIEWS
collection

eye 306,005

Election Crawl 2012
web

eye 303,596

favorite 0

comment 0

Internet Archive crawldata uploaded by crawling119.us.archive.org:COL-ELECTION2012 from Wed Jun 27 09:45:33 PDT 2012 to Sat Dec 15 18:04:51 PST 2012.
Topic: crawldata
collection

eye 4.2M

Data collected by Internet Archive on behalf of the Fundacao para a Computacao Cientifica Nacional of Portugal. This data is currently not publicly accessible.
nlnz_2008
collection
97
ITEMS
4.9M
VIEWS
collection

eye 4.9M

this data is currently not publicly accessible.
BNL 2019 Summer Domain Crawl
BNL 2019 Summer Domain Crawl
collection
751
ITEMS
2.5M
VIEWS
collection

eye 2.5M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in August 2019.
Topic: web
DOI-CRAWL-2022-02
DOI-CRAWL-2022-02
collection
25
ITEMS
184,278
VIEWS
collection

eye 184,278

BNL 2020-21 Winter Domain Crawl
BNL 2020-21 Winter Domain Crawl
collection
676
ITEMS
1.8M
VIEWS
by Internet Archive
collection

eye 1.8M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in December 2020.
Topic: web
BNL 2020 Summer Domain Crawl
BNL 2020 Summer Domain Crawl
collection
866
ITEMS
1.8M
VIEWS
by Internet Archive
collection

eye 1.8M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in August 2020.
Topic: web
BNL 2022 Spring Domain Crawl
web

eye 10,900

favorite 0

comment 0

Internet Archive crawldata from the National Library of Luxembourg, captured by wbgrp-crawl041.us.archive.org:LUX-017-2022-06-27 from Tue 28 Jun 2022 04:36:24 PM PDT to Tue 28 Jun 2022 11:30:17 AM PDT.
Topic: crawldata
JOURNAL-HOMEPAGE-CRAWL-2022-03
JOURNAL-HOMEPAGE-CRAWL-2022-03
collection
44
ITEMS
233,538
VIEWS
collection

eye 233,538

collection

eye 3.4M

Crawl of the Ireland web domain, .ie, performed for the National Library of Ireland in 2007. This data is currently not publicly accessible.
Nara 110th Congressional Crawl
Nara 110th Congressional Crawl
collection
107
ITEMS
3.1M
VIEWS
collection

eye 3.1M

The end of term harvest of the 110th Congress of the United States was performed on behalf of NARA in Fall of 2008 and early winter of 2009.
Topics: nara, 110th, congress, web
NDIIPP Youtube Crawl
NDIIPP Youtube Crawl
collection
90
ITEMS
3.2M
VIEWS
collection

eye 3.2M

Youtube crawl performed by Internet Archive on behalf of the National Digital Internet Infrastructure Preservation Program. This data is currently not publicly accessible.
Election Crawl 2012
web

eye 82,901

favorite 0

comment 0

Internet Archive crawldata uploaded by crawling119.us.archive.org:COL-ELECTION2012 from Tue Jun 26 17:12:45 PDT 2012 to Sat Dec 15 12:04:57 PST 2012.
Topic: crawldata
PubMed Central Crawl (2019-10)
PubMed Central Crawl (2019-10)
collection
216
ITEMS
412,975
VIEWS
by Internet Archive Web Group
collection

eye 412,975

Olympics Crawl 2012
web

eye 759,056

favorite 0

comment 0

Internet Archive crawldata uploaded by selenium-101.us.archive.org:COL-OLYMPICS2012 from Fri May 4 19:37:06 PDT 2012 to Sat Jan 5 15:04:01 PST 2013.
Topic: crawldata
BNL 2020 Spring Domain Crawl
BNL 2020 Spring Domain Crawl
collection
630
ITEMS
1.6M
VIEWS
collection

eye 1.6M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in April 2020.
Topic: web
Election Crawl 2012
web

eye 250,873

favorite 0

comment 0

Internet Archive crawldata uploaded by crawling119.us.archive.org:COL-ELECTION2012 from Wed Jun 27 07:44:54 PDT 2012 to Sat Dec 15 05:51:18 PST 2012.
Topic: crawldata
Election Crawl 2012
web

eye 987,881

favorite 0

comment 0

Internet Archive crawldata uploaded by crawling119.us.archive.org:COL-ELECTION2012 from Wed Nov 14 09:59:52 PST 2012 to Thu Dec 13 01:06:59 PST 2012.
Topic: crawldata
Malta 2018 Domain Crawl
Malta 2018 Domain Crawl
collection
194
ITEMS
1.7M
VIEWS
collection

eye 1.7M

Crawls performed by the Internet Archive of the .mt (Malta) web domain. This data is not currently publicly accessible.
Topic: web
BNL 2020 Winter Domain Crawl
BNL 2020 Winter Domain Crawl
collection
687
ITEMS
1.6M
VIEWS
collection

eye 1.6M

Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in January 2019.
Topic: web
NARA 113th Congressional Test Crawl
collection
494
ITEMS
2.4M
VIEWS
collection

eye 2.4M

This crawl of online resources of the 113th US Congress was performed on behalf of NARA.
Data crawled by Fundacao para a Computacao Cientifica Nacional on behalf of Internet Archive from Mon Aug 30 00:00:00 PDT 2010 to Mon Aug 30 00:00:00 PDT 2010
Topic: crawldata
Election Crawl 2012
web

eye 96,235

favorite 0

comment 0

Internet Archive crawldata uploaded by crawling119.us.archive.org:COL-ELECTION2012 from Mon May 28 02:28:16 PDT 2012 to Thu Jan 17 12:11:47 PST 2013.
Topic: crawldata
Election Crawl 2012
web

eye 1M

favorite 0

comment 0

Internet Archive crawldata uploaded by crawling119.us.archive.org:COL-ELECTION2012 from Thu Aug 30 09:20:28 PDT 2012 to Thu Dec 13 00:02:32 PST 2012.
Topic: crawldata
Olympics Crawl 2012
web

eye 103,385

favorite 0

comment 0

Internet Archive crawldata uploaded by selenium-101.us.archive.org:COL-OLYMPICS2012 from Fri Jun 22 01:29:26 PDT 2012 to Thu Dec 20 01:16:19 PST 2012.
Topic: crawldata
NLNZ Domain Crawl 2018
web

eye 193,261

favorite 0

comment 0

Internet Archive crawldata from New Zealand Winter 2018 domain crawl, captured by wbgrp-crawl006.us.archive.org:NLNZ-NZ-CRAWL-007-HOMEPAGES from Thu Feb 1 02:58:53 PST 2018 to Thu Feb 1 00:09:58 PST 2018.
Topic: crawldata