python - How to select nodes in html from lxml? -
i have html code http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0 previous post how set xpath query html parsing? , want create logic process since many of other pages similar, not same. with,
<div id="names"> <h2>names , synonyms</h2> <div class="ds"> <button class="toggle1col" title="toggle display between 1 column of wider results , multiple columns.">↔</button> <h3>name of substance</h3> <ul> <li id="ds2"><div>acetaldehyde</div></li> </ul> <h3>mesh heading</h3> <ul> <li id="ds3"><div>acetaldehyde</div></li> </ul> </div>
and in python script select nodes "name of substance" , "mesh heading" , check if exist , if select data in them otherwise return empty string. there way in python in javascript use node mynode = doc.documentnode.selectnode(/[text()="name of substance"/)?
from lxml import html import requests import csv page = requests.get(http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0) tree = html.fromstring(page.text) if( name of substance there ) chem_name = tree.xpath('//*[text()="name of substance"]/..//div')[0].text_content() else chem_name = [] if ( mesh heading there ) mesh_name = tree.xpath('//*[text()="mesh heading"]/..//div')[1].text_content() else mesh_name = [] names1 = [chem_name, mesh_name] open('testchem.csv', 'wb') myfile: wr = csv.writer(myfile) wr.writerow(names1)
you can check if name of substance
or mesh heading
in text of webpage, , if select contents.
from lxml import html import requests import csv page = requests.get('http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0') tree = html.fromstring(page.text) if ("name of substance" in page.text): chem_name = tree.xpath('//*[text()="name of substance"]/..//div')[0].text_content() else: chem_name = "" if ("mesh heading" in page.text): mesh_name = tree.xpath('//*[text()="mesh heading"]/..//div')[1].text_content() else: mesh_name = "" names1 = [chem_name, mesh_name] open('testchem.csv', 'wb') myfile: wr = csv.writer(myfile) wr.writerow(names1)
Comments
Post a Comment