Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.
to an academic publisher's website which points to a paper. The paper needs
to be downloaded and the metadata should be stored.
Returns a tuple of (paper, json_path, pdf_path, logpath).
:param url: url to fetch and examine
:type url: str
"""
# store logs in tempfile
(templogpath, loghandler) = loghijack()
if paper is None:
paper = Paper.create({})
# clean up url if necessary
url = run_url_fixers(url)
# whether or not metadata has already been populated
populated_metadata = False
for (url2, response) in iterdownload(url, paper=paper):
if is_response_pdf(response):
log.debug("Got pdf.")
pdfcontent = remove_watermarks(response.content)
paper.pdf = pdfcontent
store(paper)
break
paper.html = response.content
# Was not pdf. Attempt to parse the HTML based on normal expected
# HTML elements. The HTML elements may say that the actual pdf url