Skip to main content

9,549
UPLOADS


More right-solid

More right-solid

Show sorted alphabetically

More right-solid

Show sorted alphabetically

More right-solid

More right-solid
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Internet Archive Research Publication Crawls
Internet Archive Research Publication Crawls
collection
21,054
ITEMS
92.1M
VIEWS
by Internet Archive Web Group
collection

eye 92.1M

A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications. This collection contains WARC and CDX files that end up in Wayback ( https://web.archive.org ). See also bibliographic metadata corpuses at  https://archive.org/details/ia_biblio_metadata
UNPAYWALL-PDF-CRAWL-2018-07
UNPAYWALL-PDF-CRAWL-2018-07
collection
1,241
ITEMS
13.5M
VIEWS
by Internet Archive Web Group
collection

eye 13.5M

Web archive data from a crawl of open access PDF URLs provided by Unpaywall.
UNPAYWALL-PDF-CRAWL-2019-04
UNPAYWALL-PDF-CRAWL-2019-04
collection
641
ITEMS
4.7M
VIEWS
by Internet Archive Web Group
collection

eye 4.7M

OA-JOURNAL-CRAWL-2020-07
OA-JOURNAL-CRAWL-2020-07
collection
1,923
ITEMS
8.8M
VIEWS
by Internet Archive Web Group
collection

eye 8.8M

Open Access Journal Test Crawl (2018)
Open Access Journal Test Crawl (2018)
collection
794
ITEMS
10.1M
VIEWS
by Internet Archive Web Group
collection

eye 10.1M

MSAG-PDF-CRAWL-2017
collection
1,855
ITEMS
10.9M
VIEWS
by Internet Archive Web Group
collection

eye 10.9M

Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
MAG-PDF-CRAWL-2020-03
MAG-PDF-CRAWL-2020-03
collection
489
ITEMS
3.3M
VIEWS
by Internet Archive Web Group
collection

eye 3.3M

OAI-PMH-CRAWL-2020-06
OAI-PMH-CRAWL-2020-06
collection
2,946
ITEMS
4M
VIEWS
by Internet Archive Web Group
collection

eye 4M

OA-DOI-CRAWL-2020-02
OA-DOI-CRAWL-2020-02
collection
278
ITEMS
3M
VIEWS
by Internet Archive Web Group
collection

eye 3M

DATACITE-DOI-CRAWL-2020-01
DATACITE-DOI-CRAWL-2020-01
collection
1,417
ITEMS
3.4M
VIEWS
by Internet Archive Web Group
collection

eye 3.4M

DIRECT-OA-CRAWL-2019
DIRECT-OA-CRAWL-2019
collection
2,566
ITEMS
4.8M
VIEWS
by Internet Archive Web Group
collection

eye 4.8M

MAG-PDF-CRAWL-2020-07
MAG-PDF-CRAWL-2020-07
collection
196
ITEMS
1.4M
VIEWS
by Internet Archive Web Group
collection

eye 1.4M

UNPAYWALL-PDF-CRAWL-2020-11
UNPAYWALL-PDF-CRAWL-2020-11
collection
199
ITEMS
1.5M
VIEWS
by Internet Archive Web Group
collection

eye 1.5M

UNPAYWALL-PDF-CRAWL-2020-05
UNPAYWALL-PDF-CRAWL-2020-05
collection
282
ITEMS
1.5M
VIEWS
by Internet Archive Web Group
collection

eye 1.5M

JOURNALS-PATCH-CRAWL-2022-01
JOURNALS-PATCH-CRAWL-2022-01
collection
104
ITEMS
421,004
VIEWS
collection

eye 421,004

UNPAYWALL-PDF-CRAWL-2020-03
UNPAYWALL-PDF-CRAWL-2020-03
collection
344
ITEMS
1.6M
VIEWS
by Internet Archive Web Group
collection

eye 1.6M

MAG-PDF-CRAWL-2021-08
MAG-PDF-CRAWL-2021-08
collection
189
ITEMS
514,508
VIEWS
collection

eye 514,508

DOI-LANDING-CRAWL-2018-06
DOI-LANDING-CRAWL-2018-06
collection
279
ITEMS
3.1M
VIEWS
by Internet Archive Web Group
collection

eye 3.1M

UNPAYWALL-PDF-CRAWL-2021-07
UNPAYWALL-PDF-CRAWL-2021-07
collection
174
ITEMS
781,776
VIEWS
collection

eye 781,776

CORE-UPSTREAM-CRAWL-2018-11
CORE-UPSTREAM-CRAWL-2018-11
collection
741
ITEMS
1.3M
VIEWS
by Internet Archive Web Group
collection

eye 1.3M

Crawl of "upstream" URLs from CORE (core.ac.uk) metadata dump. Only a partial seedlist of files crawled.
OA-DOI-CRAWL-2020-12
OA-DOI-CRAWL-2020-12
collection
191
ITEMS
1.3M
VIEWS
by Internet Archive Web Group
collection

eye 1.3M

UNPAYWALL-PDF-CRAWL-2021-05
UNPAYWALL-PDF-CRAWL-2021-05
collection
123
ITEMS
762,118
VIEWS
by Internet Archive Web Group
collection

eye 762,118

Wide Web Targeted PDF Crawling (2017)
Wide Web Targeted PDF Crawling (2017)
collection
922
ITEMS
2.9M
VIEWS
by Internet Archive Web Group
collection

eye 2.9M

OA-JOURNAL-CRAWL-2019-08
OA-JOURNAL-CRAWL-2019-08
collection
201
ITEMS
2.6M
VIEWS
by Internet Archive Web Group
collection

eye 2.6M

JOURNAL-HOMEPAGE-CRAWL-2022-03
JOURNAL-HOMEPAGE-CRAWL-2022-03
collection
44
ITEMS
164,013
VIEWS
collection

eye 164,013

SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
collection
1,011
ITEMS
1.3M
VIEWS
by Internet Archive Web Group
collection

eye 1.3M

TARGETED-ARTICLE-CRAWL-2022-04
TARGETED-ARTICLE-CRAWL-2022-04
collection
219
ITEMS
52,885
VIEWS
collection

eye 52,885

DOAJ-CRAWL-2020-11
DOAJ-CRAWL-2020-11
collection
102
ITEMS
794,540
VIEWS
by Internet Archive Web Group
collection

eye 794,540

DOI-CRAWL-2022-02
DOI-CRAWL-2022-02
collection
25
ITEMS
114,931
VIEWS
collection

eye 114,931

collection

eye 1.8M

IA crawl of PDF urls provided by Semantic Scholar.
Topic: pdf
OAI-PMH-PATCH-CRAWL-2021-12
OAI-PMH-PATCH-CRAWL-2021-12
collection
75
ITEMS
225,830
VIEWS
collection

eye 225,830

CiteSeerX URL Crawl 2017
CiteSeerX URL Crawl 2017
collection
207
ITEMS
1.1M
VIEWS
collection

eye 1.1M

A targeted crawl to fetch research publications from the public web which have been crawled by CiteSeerX but have not previously been crawled by the Internet Archive.
Topics: scholarly, papers, journal
Tianchi V700 KTV
Tianchi V700 KTV
collection
3,697
ITEMS
90,385
VIEWS
collection

eye 90,385

Music, Instrumentals and Wistful Backgrounds and Music to Sing Korean Hits To.
Topic: karaoke, North Korea
SCIELO-CRAWL-2020-07
SCIELO-CRAWL-2020-07
collection
41
ITEMS
177,960
VIEWS
by Internet Archive Web Group
collection

eye 177,960

TARGETED-ARTICLE-CRAWL-2022-03
TARGETED-ARTICLE-CRAWL-2022-03
collection
9
ITEMS
34,895
VIEWS
collection

eye 34,895

PLATFORM-CRAWL-2020
PLATFORM-CRAWL-2020
collection
649
ITEMS
296,895
VIEWS
by Internet Archive Web Group
collection

eye 296,895

PubMed Central Crawl (2019-10)
PubMed Central Crawl (2019-10)
collection
216
ITEMS
354,935
VIEWS
by Internet Archive Web Group
collection

eye 354,935

PUBMEDCENTRAL-CRAWL-2020-02
PUBMEDCENTRAL-CRAWL-2020-02
collection
108
ITEMS
207,111
VIEWS
by Internet Archive Web Group
collection

eye 207,111

arXiv Content Crawl (2019-10)
arXiv Content Crawl (2019-10)
collection
37
ITEMS
55,463
VIEWS
by Internet Archive Web Group
collection

eye 55,463

ARXIV-PUBMEDCENTRAL-CRAWL-2020-04
ARXIV-PUBMEDCENTRAL-CRAWL-2020-04
collection
60
ITEMS
87,662
VIEWS
by Internet Archive Web Group
collection

eye 87,662

Tor Project Archives
Tor Project Archives
collection
3,858
ITEMS
22,457
VIEWS
by The Tor Project
collection

eye 22,457

Archived versions of Tor Browser Bundle software and other Tor Project artifacts. This item is maintained by the Tor Project organization for historical interest and research use, not as a primary installation mechanism. Please visit  https://torproject.org/  to download and install Tor software.
Open Science Framework
Open Science Framework
collection
95,324
ITEMS
6,788
VIEWS
by Center for Open Science
collection

eye 6,788

Top-level collection for content mirrored from Open Science Framework (OSF, https://osf.io) repositories into Internet Archive.
OSF Registrations
OSF Registrations
collection
95,321
ITEMS
6,547
VIEWS
by Center for Open Science
collection

eye 6,547

Top-level collection for archiving Open Science Framework (OSF) Registrations into Internet Archive. Part of a collaboration with Center for Open Science.
Movies
by "Paywall The Movie"
movies

eye 5,654

favorite 3

comment 0

"Paywall: The Business of Scholarship" is a documentary film released in 2018 about the scholarly publishing industry and the Open Access movement. More information available from https://paywallthemovie.com/paywall Website blurb: "Paywall: The Business of Scholarship is a documentary which focuses on the need for open access to research and science, questions the rationale behind the $25.2 billion a year that flows into for-profit academic publishers, examines the 35-40% profit...
Topics: Open Access, Copyright, Publishing
UNPAYWALL-PDF-CRAWL-2022-04
UNPAYWALL-PDF-CRAWL-2022-04
collection
0
ITEMS
148
VIEWS
collection

eye 148

DATASET-CRAWL-2022-01
DATASET-CRAWL-2022-01
collection
2
ITEMS
3,901
VIEWS
collection

eye 3,901

OMICS-DOI-LANDING-CRAWL-2019-04
OMICS-DOI-LANDING-CRAWL-2019-04
collection
4
ITEMS
13,556
VIEWS
by Internet Archive Web Group
collection

eye 13,556

This crawl started in April 2019, as an informal collaboration with Crossref. Crawling a smallish number (100k) DOI redirects and landing pages (plus PDF outlinks, and maybe a couple other hops) for a single large publisher (OMICS, which has multiple subsidiaries). Intent is to get reasonably good capture that can be used as canonical preservation copies of the landing pages. Secondary goal is to get decent fulltext capture coverage.
Dat Early Days Collection
Dat Early Days Collection
collection
4
ITEMS
5,880
VIEWS
collection

eye 5,880

'dat' is a distributed web data archiving and transfer tool, originally developed by Code for Science, a grant-funded US non-profit. This collection preserves a selection of early and experimental dat archives. Note that important dat metadata is contained in a '.dat/' subdirectory, which is not displayed under "download" file listings by defaults, but can be browsed and downloaded from archive.org over HTTP(S) as expected.
Topics: dat, distributed web
Bulk Bibliographic Metadata
Bulk Bibliographic Metadata
collection
223
ITEMS
16,678
VIEWS
by Internet Archive Web Group
collection

eye 16,678

This collection contains both external ("upstream") metadata dumps and Internet Archive generated databases and reports on our holdings of papers, books, and other documents.
Dat Early Days Collection
movies

eye 5,145

favorite 0

comment 0

This items contains a set of relatively small (but not un-important!) "dat" distributed web archives.
CiteSeerX URL Crawl 2017
web

eye 10,786

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:57:06 PDT 2017 to Wed Jul 5 06:10:16 PDT 2017.
Topic: crawldata
arXiv.org Bulk Content
arXiv.org Bulk Content
collection
6,767
ITEMS
164,515
VIEWS
by arxiv.org
collection

eye 164,515

This collection contains PDF and source file (LaTeX) copies of content from the arxiv.org pre-print server, in the bulk-access format they provide via AWS S3. More information available at:  https://arxiv.org/help/bulk_data_s3 Note that direct access to the internal PDF files is possible, eg: https://archive.org/download/arXiv_pdf_0001_001/arXiv_pdf_0001_001.tar/0001%2Fastro-ph0001001.pdf However, we strongly prefer folks access these files via the individual items associated with each...
CiteSeerX URL Crawl 2017
web

eye 3,750

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 19:18:30 PDT 2017 to Wed Jul 5 12:32:14 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,987

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:54:24 PDT 2017 to Wed Jul 5 01:08:02 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,123

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:31:07 PDT 2017 to Thu Jul 6 08:54:41 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,943

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:38:04 PDT 2017 to Wed Jul 5 04:54:20 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,100

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 05:04:31 PDT 2017 to Thu Jul 6 00:05:17 PDT 2017.
Topic: crawldata
linux.conf.au 2018
linux.conf.au 2018
collection
183
ITEMS
7,767
VIEWS
collection

eye 7,767

linux.conf.au is a conference about the Linux operating system, and all aspects of the thriving ecosystem of Free and Open Source Software that has grown up around it. Run since 1999, in a different Australian or New Zealand city each year, by a team of local volunteers, LCA invites more than 500 people to learn from the people who shape the future of Open Source. For more information on the conference see https://linux.conf.au/
Topic: linux
CiteSeerX URL Crawl 2017
web

eye 7,786

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:17:48 PDT 2017 to Wed Jul 5 05:29:23 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 11,161

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:27:34 PDT 2017 to Wed Jul 5 05:39:37 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,003

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 13:45:57 PDT 2017 to Thu Jul 6 07:00:09 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 9,629

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 01:25:08 PDT 2017 to Wed Jul 5 18:40:27 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,418

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:01:08 PDT 2017 to Thu Jul 6 02:12:56 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,074

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:09:11 PDT 2017 to Thu Jul 6 04:53:41 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,884

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:47:41 PDT 2017 to Wed Jul 5 20:59:49 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,246

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:35:43 PDT 2017 to Thu Jul 6 01:46:47 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,516

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:45:37 PDT 2017 to Wed Jul 5 05:59:07 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,849

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:39:06 PDT 2017 to Thu Jul 6 04:51:08 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,182

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 01:48:00 PDT 2017 to Wed Jul 5 19:01:22 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,177

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:05:25 PDT 2017 to Thu Jul 6 00:16:46 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 9,820

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:45:15 PDT 2017 to Thu Jul 6 01:55:13 PDT 2017.
Topic: crawldata
by Internet Archive Web Group
collection

eye 6,352

This collection contains web crawl data for a random selection of 500k (0.5 million) Crossref DOI redirects, including the doi.org redirect requests. The intent of this crawl is to gather loose statistics on the number of failing redirects, number of host websites that block automated crawling, and a corpus of HTML landing pages for metadata extraction (eg, "signposting" HTTP headers, linked data HTML metadata, semantic markup). Total size of (uncompressed) WARC data is 50 GB,...
CiteSeerX URL Crawl 2017
web

eye 6,298

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:32:04 PDT 2017 to Thu Jul 6 00:46:06 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,307

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 05:01:45 PDT 2017 to Tue Jul 4 22:50:03 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 10,613

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 13:06:40 PDT 2017 to Wed Jul 5 06:20:59 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,543

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:26:10 PDT 2017 to Wed Jul 5 00:38:51 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,594

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:06:40 PDT 2017 to Wed Jul 5 04:21:44 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,219

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 17:01:13 PDT 2017 to Wed Jul 5 10:15:10 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,017

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 05:56:39 PDT 2017 to Wed Jul 5 23:08:54 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,201

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 06:46:05 PDT 2017 to Wed Jul 5 23:58:46 PDT 2017.
Topic: crawldata
Military Industrial Powerpoint Complex
texts

eye 4,593

favorite 0

comment 0

This item is part of the Military Industrial Powerpoint Complex project, a special project for the Internet Archive's 20th Anniversary in which IA staff extracted all the Powerpoint files from the .mil web domain collected in IA's web archive and converted them to searchable, browsable PDFs. This item contains the specific PDFs from the asec.navy.mil site. Read more about the project on the Military Industrial Powerpoint Complex collection page .
CiteSeerX URL Crawl 2017
web

eye 4,829

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 14:27:18 PDT 2017 to Wed Jul 5 07:42:31 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,224

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:18:25 PDT 2017 to Thu Jul 6 01:29:26 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,443

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:38:19 PDT 2017 to Wed Jul 5 20:50:10 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,859

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 06:07:30 PDT 2017 to Tue Jul 4 23:19:25 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,812

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 13:25:18 PDT 2017 to Thu Jul 6 06:39:00 PDT 2017.
Topic: crawldata
The Dataset Collection
by Weiwei Zhang, Jian Sun, and Xiaoou Tang
data

eye 7,608

favorite 6

comment 0

This dataset mirrored from http://137.189.35.203/WebUI/CatDatabase/catData.html, which circa May 2017 is a dead link. The original page is available in Wayback: https://web.archive.org/web/20150520175645/http://137.189.35.203/WebUI/CatDatabase/catData.html The CAT dataset includes 10,000 cat images. For each image, we annotate the head of cat with nine points, two for eyes, one for mouth, and six for ears. The detail configuration of the annotation was shown in Figure 6 of the original paper:...
Topics: cats, datasets, computer vision
CiteSeerX URL Crawl 2017
web

eye 5,459

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 02:51:41 PDT 2017 to Wed Jul 5 20:04:28 PDT 2017.
Topic: crawldata
Military Industrial Powerpoint Complex
texts

eye 8,739

favorite 1

comment 1

This item is part of the Military Industrial Powerpoint Complex project, a special project for the Internet Archive's 20th Anniversary in which IA staff extracted all the Powerpoint files from the .mil web domain collected in IA's web archive and converted them to searchable, browsable PDFs. This item contains the specific PDFs from the dlmso.dla.mil site. Read more about the project on the Military Industrial Powerpoint Complex collection page .
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Community Audio
by musicForProgramming
audio

eye 7,970

favorite 8

comment 0

Collection of episodes from musicforprogramming.net. "A series of mixes intended for listening while '+task+' to aid concentration and increase productivity (also compatible with other activities)."
CiteSeerX URL Crawl 2017
web

eye 5,909

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 06:16:47 PDT 2017 to Tue Jul 4 23:30:24 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,570

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 10:18:41 PDT 2017 to Thu Jul 6 03:32:30 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 3,935

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 13:35:38 PDT 2017 to Thu Jul 6 06:49:41 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,112

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 12:53:52 PDT 2017 to Thu Jul 6 06:07:14 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,428

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 10:58:23 PDT 2017 to Wed Jul 5 04:10:54 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,613

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 06:35:45 PDT 2017 to Wed Jul 5 23:47:46 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,246

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 23:14:09 PDT 2017 to Wed Jul 5 17:01:18 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,172

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:35:29 PDT 2017 to Wed Jul 5 00:48:05 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,778

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:11:26 PDT 2017 to Wed Jul 5 20:24:18 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,523

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:51:37 PDT 2017 to Thu Jul 6 01:02:46 PDT 2017.
Topic: crawldata