python - Errors with Beautiful Soup output -


i'm trying scrape data webpage on gamespot using beautifulsoup. however, result different page source viewer. first off, alot of errors produced. instance, have

r = requests.get(link)   soup = bs4.beautifulsoup(r.text) 

and yet soup.title gives

<title>404: not found - gamespot</title>.

the data want scrape not appear. because webpage contains javascript alongside ? if how can around ?

you're sending http request server. need process javascript content.

a headless browser javascript support, ghost, it'd choice.

from ghost import ghost  ghost = ghost()  ghost.open(link) page, resources = ghost.evaluate('document.documentelement.innerhtml;') soup = beautifulsoup(page) 

.evaluate('document.documentelement.innerhtml') show dynamically generated content, not static you'd see taking @ source.


Comments

Popular posts from this blog

android - Gradle sync Error:Configuration with name 'default' not found -

java - Andrioid studio start fail: Fatal error initializing 'null' -

html - jQuery UI Sortable - Remove placeholder after item is dropped -