python - Errors with Beautiful Soup output -
i'm trying scrape data webpage on gamespot using beautifulsoup
. however, result different page source viewer
. first off, alot of errors
produced. instance, have
r = requests.get(link) soup = bs4.beautifulsoup(r.text)
and yet soup.title
gives
<title>404: not found - gamespot</title>
.
the data want scrape not appear. because webpage contains javascript
alongside ? if how can around ?
you're sending http request server. need process javascript content.
a headless browser javascript support, ghost, it'd choice.
from ghost import ghost ghost = ghost() ghost.open(link) page, resources = ghost.evaluate('document.documentelement.innerhtml;') soup = beautifulsoup(page)
.evaluate('document.documentelement.innerhtml')
show dynamically generated content, not static you'd see taking @ source.
Comments
Post a Comment