python - Errors with Beautiful Soup output -
i'm trying scrape data webpage on gamespot using beautifulsoup. however, result different page source viewer. first off, alot of errors produced. instance, have
r = requests.get(link) soup = bs4.beautifulsoup(r.text) and yet soup.title gives
<title>404: not found - gamespot</title>.
the data want scrape not appear. because webpage contains javascript alongside ? if how can around ?
you're sending http request server. need process javascript content.
a headless browser javascript support, ghost, it'd choice.
from ghost import ghost ghost = ghost() ghost.open(link) page, resources = ghost.evaluate('document.documentelement.innerhtml;') soup = beautifulsoup(page) .evaluate('document.documentelement.innerhtml') show dynamically generated content, not static you'd see taking @ source.
Comments
Post a Comment