Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
A targeted crawl to fetch research publications from the public web which have been crawled by CiteSeerX but have not previously been crawled by the Internet Archive.
Topics: scholarly, papers, journal