Crawls performed by Internet Archive on behalf of the National Library of Australia. This data is currently not publicly accessible.
99.6M
100M
Dec 19, 2017
12/17
by
Internet Archive Web Group
A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications. This collection contains WARC and CDX files that end up in Wayback ( https://web.archive.org ). See also bibliographic metadata corpuses at https://archive.org/details/ia_biblio_metadata
WARCs from internal crawl testing.
Topics: web, cctld
This collection contains collaborative Election crawls performed by IA.
Topics: elections, web
This crawl was performed in Summer & Fall of 2012 to archive the US Federal Elections.
Topics: US, federal, elections, web, 2012
Data collected by Internet Archive on behalf of the National Library of Spain. This data is currently not publicly accessible.
Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
National Archives and Records Administration crawl performed by Internet Archive. This data is currently not publicly accessible.
National Library of Luxembourg
Topic: Luxembourg
Domain crawl of the Australian web domain (.au) performed by Internet Archive on behalf of the National Library of Australia in March-April, 2022.
Topic: crawldata
National Library of Austrailia crawl. This data is currently not publicly accessible.
Data collected by Internet Archive on behalf of the National Library of Israel. This data is currently not publicly accessible.
Topic: nlil
this data is currently not publicly accessible.
This collection includes all collaborative Olympic crawls performed by IA for the IIPC.
Topics: olympics, IIPC, web
Data collected by Internet Archive on behalf of the National Library of Sweden. This data is currently not publicly accessible.
this data is currently not publicly accessible.
These crawls were performed by IA on behalf of the IIPC in Summer 2012 during and prior to the 2012 Summer Olympics held in London, UK.
Topics: London, olympics, web, 2012, IIPC
this data is currently not publicly accessible.
this data is currently not publicly accessible.
4.7M
4.7M
May 28, 2020
05/20
by
Internet Archive Web Group
14.3M
14M
Jul 17, 2018
07/18
by
Internet Archive Web Group
Web archive data from a crawl of open access PDF URLs provided by Unpaywall.
9.5M
9.5M
Jul 6, 2020
07/20
by
Internet Archive Web Group
Crawls performed by the Internet Archive in 2017 on behalf of the National Library of Australia.
Topic: nla web 2017
11.5M
12M
Aug 4, 2017
08/17
by
Internet Archive Web Group
Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2016.
Topics: nla, australia, web
These crawls of the .es domain were performed in 2011 on behalf of the National Library of Spain (BNE).
Topics: bne, spain, web, 2011
Crawls performed by Internet Archive on behalf of the National Library of Ireland. This data is currently not publicly accessible.
ccTLD crawl for .br domain
Topics: br, web, 2018, cctld
This crawl of online resources of the 114th US Congress was performed on behalf of The United States National Archives & Records Administration (NARA).
Domain crawl of the Australian web domain (.au) performed by Internet Archive on behalf of the National Library of Australia in March-April, 2021.
Topic: crawldata
This crawl of online resources of the 112th US Congress was performed in Fall of 2012 and early winter of 2013 on behalf of NARA.
Topics: nara, 112th, web
Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-March, 2022.
Topic: crawldata
10.6M
11M
Apr 9, 2018
04/18
by
Internet Archive Web Group
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2015.
Topics: nla, web, 2015
Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-February, 2018.
Topics: web, nlnz, 2018
this data is currently not publicly accessible.
this data is currently not publicly accessible.
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2014.
Topics: nla, web, 2014
Crawls performed by the Internet Archive in 2018 on behalf of the National Library of Australia.
Topics: nla, web, 2018
5.2M
5.2M
Apr 26, 2019
04/19
by
Internet Archive Web Group
This crawl of online resources of the 115th US Congress was performed on behalf of The United States National Archives & Records
Topic: crawldata
this data is currently not publicly accessible.
Crawls performed by the Internet Archive in 2019 on behalf of the National Library of Australia.
Topics: nla, web, 2019
This crawl of the .au domain was performed on behalf of the National Library of Australia in Spring of 2013.
Topics: nla, web, 2013
2017 domain crawl for National Library of Ireland.
Topics: ireland, web
Web archive data collected by Internet Archive from a domain harvest of the Ukrainian ccTLD.
Topic: crawldata
Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-March, 2021.
Topic: crawldata
this data is currently not publicly accessible.
Topics: bne, spain, web, 2013
This crawl of the .es domain was performed in 2012 on behalf of the National Library of Spain (BNE).
Topics: bne, spain, web, 2012
2015 crawl of museum websites listed in the IMLS Museum Universe Data File. More about the IMLS MUDF can be found at https://www.imls.gov/research-evaluation/data-collection/museum-universe-data-file
Topic: AIT
this data is currently not publicly accessible.
Crawls performed by the Internet Archive in 2020 on behalf of the National Library of Australia.
Topics: nla, web, 2020
this data is currently not publicly accessible.
3.6M
3.6M
Mar 5, 2020
03/20
by
Internet Archive Web Group
016-2022-Spring domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive March - May 2022 on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg.
Topic: crawldata
26.5M
27M
Oct 3, 2013
10/13
by
dominic@archive.org
This crawl of the .il domain was performed in 2013 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2013
this data is currently not publicly accessible.
Crawl 00001 of the IMLS Museum Universe Date File.
this data is currently not publicly accessible.
5.1M
5.1M
Feb 15, 2019
02/19
by
Internet Archive Web Group
This crawl of the .nz domain was performed on behalf of the National Library of New Zealand in Spring of 2017.
Topics: nlnz, web, 2017
Domain crawl of the Israel web domain (.il) performed by Internet Archive in October-November 2021 on behalf of the National Library of Israel.
Topic: crawldata
This collection includes content harvested from the Web on behalf of the National Library & Archives New Zealand in January 2016.
Topics: new zealand, web, domain
Crawls performed by the Internet Archive in 2020 on behalf of the National Library of Israel .
Topic: web
Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-February, 2019.
Topics: web, nlnz, 2019
1.5M
1.5M
Oct 31, 2018
10/18
by
Internet Archive Web Group
Crawl of "upstream" URLs from CORE (core.ac.uk) metadata dump. Only a partial seedlist of files crawled.
This crawl of the .il domain was performed in 2015 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2015
Data collected by Internet Archive on behalf of Biblioteca Nazionale Centrale di Firenze. This data is currently not publicly accessible.
This collection includes content harvested from the Web on behalf of the National Library & Archives New Zealand in February 2013.
Topics: web, domain
3.2M
3.2M
Feb 5, 2020
02/20
by
Internet Archive Web Group
This crawl was a domain scale harvest of .au performed for the National Library of Australia in 2010.
Topics: nla, web, 2010
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 17.0px; font: 12.8px Menlo; color: #161516; background-color: #ffffff} span.s1 {font-kerning: none} WARCS from Whole Earth Web Archive (WEWA) Domain Crawls
Topic: web
This crawl was performed on behalf of the National Library of Spain (BNE) in Fall of 2011 to archive the National elections in Spain.
Topics: elections, web, 2011, spain, bne