92.1M
92M
Dec 19, 2017
12/17
by
Internet Archive Web Group
A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications. This collection contains WARC and CDX files that end up in Wayback ( https://web.archive.org ). See also bibliographic metadata corpuses at https://archive.org/details/ia_biblio_metadata
13.5M
13M
Jul 17, 2018
07/18
by
Internet Archive Web Group
Web archive data from a crawl of open access PDF URLs provided by Unpaywall.
4.7M
4.7M
Apr 26, 2019
04/19
by
Internet Archive Web Group
8.8M
8.8M
Jul 6, 2020
07/20
by
Internet Archive Web Group
10.1M
10M
Apr 9, 2018
04/18
by
Internet Archive Web Group
10.9M
11M
Aug 4, 2017
08/17
by
Internet Archive Web Group
Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
3.3M
3.3M
Mar 5, 2020
03/20
by
Internet Archive Web Group
4M
4.0M
May 28, 2020
05/20
by
Internet Archive Web Group
3M
3.0M
Feb 5, 2020
02/20
by
Internet Archive Web Group
3.4M
3.4M
Jan 24, 2020
01/20
by
Internet Archive Web Group
4.8M
4.8M
Feb 15, 2019
02/19
by
Internet Archive Web Group
1.4M
1.4M
Jul 9, 2020
07/20
by
Internet Archive Web Group
1.5M
1.5M
Nov 2, 2020
11/20
by
Internet Archive Web Group
1.5M
1.5M
May 4, 2020
05/20
by
Internet Archive Web Group
1.6M
1.6M
Mar 5, 2020
03/20
by
Internet Archive Web Group
3.1M
3.1M
Jun 1, 2018
06/18
by
Internet Archive Web Group
1.3M
1.3M
Oct 31, 2018
10/18
by
Internet Archive Web Group
Crawl of "upstream" URLs from CORE (core.ac.uk) metadata dump. Only a partial seedlist of files crawled.
1.3M
1.3M
Dec 9, 2020
12/20
by
Internet Archive Web Group
762,118
762K
Apr 27, 2021
04/21
by
Internet Archive Web Group
2.9M
2.9M
Sep 21, 2017
09/17
by
Internet Archive Web Group
2.6M
2.6M
Aug 1, 2019
08/19
by
Internet Archive Web Group
1.3M
1.3M
Feb 6, 2020
02/20
by
Internet Archive Web Group
794,540
795K
Nov 24, 2020
11/20
by
Internet Archive Web Group
IA crawl of PDF urls provided by Semantic Scholar.
Topic: pdf
A targeted crawl to fetch research publications from the public web which have been crawled by CiteSeerX but have not previously been crawled by the Internet Archive.
Topics: scholarly, papers, journal
Music, Instrumentals and Wistful Backgrounds and Music to Sing Korean Hits To.
Topic: karaoke, North Korea
177,960
178K
Jul 6, 2020
07/20
by
Internet Archive Web Group
296,895
297K
Aug 27, 2020
08/20
by
Internet Archive Web Group
354,935
355K
Oct 12, 2019
10/19
by
Internet Archive Web Group
207,111
207K
Feb 5, 2020
02/20
by
Internet Archive Web Group
55,463
55K
Oct 12, 2019
10/19
by
Internet Archive Web Group
87,662
88K
Apr 24, 2020
04/20
by
Internet Archive Web Group
22,457
22K
Jun 17, 2019
06/19
by
The Tor Project
Archived versions of Tor Browser Bundle software and other Tor Project artifacts. This item is maintained by the Tor Project organization for historical interest and research use, not as a primary installation mechanism. Please visit https://torproject.org/ to download and install Tor software.
6,788
6.8K
Jun 2, 2021
06/21
by
Center for Open Science
Top-level collection for content mirrored from Open Science Framework (OSF, https://osf.io) repositories into Internet Archive.
6,547
6.5K
Jun 2, 2021
06/21
by
Center for Open Science
Top-level collection for archiving Open Science Framework (OSF) Registrations into Internet Archive. Part of a collaboration with Center for Open Science.
5,654
5.7K
Sep 6, 2018
09/18
by
"Paywall The Movie"
movies
eye 5,654
favorite 3
comment 0
"Paywall: The Business of Scholarship" is a documentary film released in 2018 about the scholarly publishing industry and the Open Access movement. More information available from https://paywallthemovie.com/paywall Website blurb: "Paywall: The Business of Scholarship is a documentary which focuses on the need for open access to research and science, questions the rationale behind the $25.2 billion a year that flows into for-profit academic publishers, examines the 35-40% profit...
Topics: Open Access, Copyright, Publishing
13,556
14K
Apr 26, 2019
04/19
by
Internet Archive Web Group
This crawl started in April 2019, as an informal collaboration with Crossref. Crawling a smallish number (100k) DOI redirects and landing pages (plus PDF outlinks, and maybe a couple other hops) for a single large publisher (OMICS, which has multiple subsidiaries). Intent is to get reasonably good capture that can be used as canonical preservation copies of the landing pages. Secondary goal is to get decent fulltext capture coverage.
'dat' is a distributed web data archiving and transfer tool, originally developed by Code for Science, a grant-funded US non-profit. This collection preserves a selection of early and experimental dat archives. Note that important dat metadata is contained in a '.dat/' subdirectory, which is not displayed under "download" file listings by defaults, but can be browsed and downloaded from archive.org over HTTP(S) as expected.
Topics: dat, distributed web
16,678
17K
Dec 14, 2017
12/17
by
Internet Archive Web Group
This collection contains both external ("upstream") metadata dumps and Internet Archive generated databases and reports on our holdings of papers, books, and other documents.
This items contains a set of relatively small (but not un-important!) "dat" distributed web archives.
10,786
11K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 10,786
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:57:06 PDT 2017 to Wed Jul 5 06:10:16 PDT 2017.
Topic: crawldata
164,515
165K
Sep 6, 2017
09/17
by
arxiv.org
This collection contains PDF and source file (LaTeX) copies of content from the arxiv.org pre-print server, in the bulk-access format they provide via AWS S3. More information available at: https://arxiv.org/help/bulk_data_s3 Note that direct access to the internal PDF files is possible, eg: https://archive.org/download/arXiv_pdf_0001_001/arXiv_pdf_0001_001.tar/0001%2Fastro-ph0001001.pdf However, we strongly prefer folks access these files via the individual items associated with each...
3,750
3.8K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 3,750
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 19:18:30 PDT 2017 to Wed Jul 5 12:32:14 PDT 2017.
Topic: crawldata
7,987
8.0K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 7,987
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:54:24 PDT 2017 to Wed Jul 5 01:08:02 PDT 2017.
Topic: crawldata
5,123
5.1K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 5,123
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:31:07 PDT 2017 to Thu Jul 6 08:54:41 PDT 2017.
Topic: crawldata
6,943
6.9K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 6,943
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:38:04 PDT 2017 to Wed Jul 5 04:54:20 PDT 2017.
Topic: crawldata
5,100
5.1K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 5,100
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 05:04:31 PDT 2017 to Thu Jul 6 00:05:17 PDT 2017.
Topic: crawldata
linux.conf.au is a conference about the Linux operating system, and all aspects of the thriving ecosystem of Free and Open Source Software that has grown up around it. Run since 1999, in a different Australian or New Zealand city each year, by a team of local volunteers, LCA invites more than 500 people to learn from the people who shape the future of Open Source. For more information on the conference see https://linux.conf.au/
Topic: linux
7,786
7.8K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 7,786
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:17:48 PDT 2017 to Wed Jul 5 05:29:23 PDT 2017.
Topic: crawldata
11,161
11K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 11,161
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:27:34 PDT 2017 to Wed Jul 5 05:39:37 PDT 2017.
Topic: crawldata
4,003
4.0K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 4,003
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 13:45:57 PDT 2017 to Thu Jul 6 07:00:09 PDT 2017.
Topic: crawldata
9,629
9.6K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 9,629
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 01:25:08 PDT 2017 to Wed Jul 5 18:40:27 PDT 2017.
Topic: crawldata
8,418
8.4K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 8,418
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:01:08 PDT 2017 to Thu Jul 6 02:12:56 PDT 2017.
Topic: crawldata
5,074
5.1K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 5,074
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:09:11 PDT 2017 to Thu Jul 6 04:53:41 PDT 2017.
Topic: crawldata
5,884
5.9K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 5,884
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:47:41 PDT 2017 to Wed Jul 5 20:59:49 PDT 2017.
Topic: crawldata
8,246
8.2K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 8,246
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:35:43 PDT 2017 to Thu Jul 6 01:46:47 PDT 2017.
Topic: crawldata
7,516
7.5K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 7,516
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:45:37 PDT 2017 to Wed Jul 5 05:59:07 PDT 2017.
Topic: crawldata
5,849
5.8K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 5,849
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 11:39:06 PDT 2017 to Thu Jul 6 04:51:08 PDT 2017.
Topic: crawldata
6,182
6.2K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 6,182
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 01:48:00 PDT 2017 to Wed Jul 5 19:01:22 PDT 2017.
Topic: crawldata
7,177
7.2K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 7,177
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:05:25 PDT 2017 to Thu Jul 6 00:16:46 PDT 2017.
Topic: crawldata
9,820
9.8K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 9,820
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:45:15 PDT 2017 to Thu Jul 6 01:55:13 PDT 2017.
Topic: crawldata
6,352
6.4K
May 7, 2018
05/18
by
Internet Archive Web Group
This collection contains web crawl data for a random selection of 500k (0.5 million) Crossref DOI redirects, including the doi.org redirect requests. The intent of this crawl is to gather loose statistics on the number of failing redirects, number of host websites that block automated crawling, and a corpus of HTML landing pages for metadata extraction (eg, "signposting" HTTP headers, linked data HTML metadata, semantic markup). Total size of (uncompressed) WARC data is 50 GB,...
6,298
6.3K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 6,298
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:32:04 PDT 2017 to Thu Jul 6 00:46:06 PDT 2017.
Topic: crawldata
7,307
7.3K
Jul 14, 2017
07/17
by
Internet Archive
web
eye 7,307
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 05:01:45 PDT 2017 to Tue Jul 4 22:50:03 PDT 2017.
Topic: crawldata
10,613
11K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 10,613
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 13:06:40 PDT 2017 to Wed Jul 5 06:20:59 PDT 2017.
Topic: crawldata
5,543
5.5K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 5,543
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:26:10 PDT 2017 to Wed Jul 5 00:38:51 PDT 2017.
Topic: crawldata
6,594
6.6K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 6,594
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:06:40 PDT 2017 to Wed Jul 5 04:21:44 PDT 2017.
Topic: crawldata
4,219
4.2K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 4,219
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 17:01:13 PDT 2017 to Wed Jul 5 10:15:10 PDT 2017.
Topic: crawldata
6,017
6.0K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 6,017
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 05:56:39 PDT 2017 to Wed Jul 5 23:08:54 PDT 2017.
Topic: crawldata
8,201
8.2K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 8,201
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 06:46:05 PDT 2017 to Wed Jul 5 23:58:46 PDT 2017.
Topic: crawldata
This item is part of the Military Industrial Powerpoint Complex project, a special project for the Internet Archive's 20th Anniversary in which IA staff extracted all the Powerpoint files from the .mil web domain collected in IA's web archive and converted them to searchable, browsable PDFs. This item contains the specific PDFs from the asec.navy.mil site. Read more about the project on the Military Industrial Powerpoint Complex collection page .
4,829
4.8K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 4,829
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 14:27:18 PDT 2017 to Wed Jul 5 07:42:31 PDT 2017.
Topic: crawldata
8,224
8.2K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 8,224
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:18:25 PDT 2017 to Thu Jul 6 01:29:26 PDT 2017.
Topic: crawldata
6,443
6.4K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 6,443
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:38:19 PDT 2017 to Wed Jul 5 20:50:10 PDT 2017.
Topic: crawldata
5,859
5.9K
Jul 14, 2017
07/17
by
Internet Archive
web
eye 5,859
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 06:07:30 PDT 2017 to Tue Jul 4 23:19:25 PDT 2017.
Topic: crawldata
4,812
4.8K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 4,812
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 13:25:18 PDT 2017 to Thu Jul 6 06:39:00 PDT 2017.
Topic: crawldata
7,608
7.6K
May 30, 2017
05/17
by
Weiwei Zhang, Jian Sun, and Xiaoou Tang
data
eye 7,608
favorite 6
comment 0
This dataset mirrored from http://137.189.35.203/WebUI/CatDatabase/catData.html, which circa May 2017 is a dead link. The original page is available in Wayback: https://web.archive.org/web/20150520175645/http://137.189.35.203/WebUI/CatDatabase/catData.html The CAT dataset includes 10,000 cat images. For each image, we annotate the head of cat with nine points, two for eyes, one for mouth, and six for ears. The detail configuration of the annotation was shown in Figure 6 of the original paper:...
Topics: cats, datasets, computer vision
5,459
5.5K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 5,459
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 02:51:41 PDT 2017 to Wed Jul 5 20:04:28 PDT 2017.
Topic: crawldata
This item is part of the Military Industrial Powerpoint Complex project, a special project for the Internet Archive's 20th Anniversary in which IA staff extracted all the Powerpoint files from the .mil web domain collected in IA's web archive and converted them to searchable, browsable PDFs. This item contains the specific PDFs from the dlmso.dla.mil site. Read more about the project on the Military Industrial Powerpoint Complex collection page .
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
7,970
8.0K
Jun 14, 2017
06/17
by
musicForProgramming
audio
eye 7,970
favorite 8
comment 0
Collection of episodes from musicforprogramming.net. "A series of mixes intended for listening while '+task+' to aid concentration and increase productivity (also compatible with other activities)."
5,909
5.9K
Jul 14, 2017
07/17
by
Internet Archive
web
eye 5,909
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 06:16:47 PDT 2017 to Tue Jul 4 23:30:24 PDT 2017.
Topic: crawldata
4,570
4.6K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 4,570
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 10:18:41 PDT 2017 to Thu Jul 6 03:32:30 PDT 2017.
Topic: crawldata
3,935
3.9K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 3,935
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 13:35:38 PDT 2017 to Thu Jul 6 06:49:41 PDT 2017.
Topic: crawldata
4,112
4.1K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 4,112
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 12:53:52 PDT 2017 to Thu Jul 6 06:07:14 PDT 2017.
Topic: crawldata
6,428
6.4K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 6,428
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 10:58:23 PDT 2017 to Wed Jul 5 04:10:54 PDT 2017.
Topic: crawldata
6,613
6.6K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 6,613
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 06:35:45 PDT 2017 to Wed Jul 5 23:47:46 PDT 2017.
Topic: crawldata
4,246
4.2K
Jul 12, 2017
07/17
by
Internet Archive
web
eye 4,246
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 23:14:09 PDT 2017 to Wed Jul 5 17:01:18 PDT 2017.
Topic: crawldata
6,172
6.2K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 6,172
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:35:29 PDT 2017 to Wed Jul 5 00:48:05 PDT 2017.
Topic: crawldata
6,778
6.8K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 6,778
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:11:26 PDT 2017 to Wed Jul 5 20:24:18 PDT 2017.
Topic: crawldata
6,523
6.5K
Jul 11, 2017
07/17
by
Internet Archive
web
eye 6,523
favorite 0
comment 0
Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:51:37 PDT 2017 to Thu Jul 6 01:02:46 PDT 2017.
Topic: crawldata