Skip to main content
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Custom Crawl Services
by Internet Archive Web Group
data

eye 0

favorite 0

comment 0

This item contains a copy of log files found on the Internet Archive (Web Group) machine `wbgrp-svc263.us.archive.org` on 2018-05-29, under the `/3` directory. These are logs of file transfer status between various crawler machines; they are not known to contain any sensitive metadata (eg, personal information, IPs, or other security-sensitive information), but are being keep `access-restricted` anyways. This data is almost certainly unimportant and could be deleted; it is being preserved out...
Wide Web Targeted PDF Crawling (2017)
Wide Web Targeted PDF Crawling (2017)
collection
922
ITEMS
3.7M
VIEWS
by Internet Archive Web Group
collection

eye 3.7M

MSAG-PDF-CRAWL-2017
collection
1,855
ITEMS
15.1M
VIEWS
by Internet Archive Web Group
collection

eye 15.1M

Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals