csv - Repeating Data after web scraping using python and beautiful soup4 -
i trying scrape data garmin site golf. want name of golf course , address after running script. have noticed codes repeats first page data on , on again. noticed page numbers on website not start @ 1 @ 10 second page. how go extracting data website , getting , instead of repeat of first page.
import csv import codecs import requests bs4 import beautifulsoup courses_list= [] in range(10): url = "http://sites.garmin.com/clsearch/courses?browse=1&country=us&lang=en&per_page={}".format(i) r = requests.get(url) soup = beautifulsoup(r.content) g_data2=soup.find_all("div",{"class":"result"}) item in g_data2: try: name= item.contents[3].find_all("div",{"class":"name"})[0].text print name except: name='' try: address= item.contents[3].find_all("div",{"class":"location"})[0].text except: address='' course=[name,address] courses_list.append(course) open ('g_final.csv','a') file: writer=csv.writer(file) row in courses_list: writer.writerow([s.encode("utf-8") s in row])
you discovered problem.
then change
url = "http://...?browse=1&country=us&lang=en&per_page={}".format(i)
to
url = "http://...?browse=1&country=us&lang=en&per_page={}".format(i*20)
Comments
Post a Comment