python - How to select nodes in html from lxml? -


i have html code http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0 previous post how set xpath query html parsing? , want create logic process since many of other pages similar, not same. with,

<div id="names"> <h2>names , synonyms</h2> <div class="ds"> <button class="toggle1col" title="toggle display between 1 column of wider results , multiple columns.">&#8596;</button> <h3>name of substance</h3> <ul> <li id="ds2"><div>acetaldehyde</div></li> </ul> <h3>mesh heading</h3> <ul> <li id="ds3"><div>acetaldehyde</div></li> </ul> </div>  

and in python script select nodes "name of substance" , "mesh heading" , check if exist , if select data in them otherwise return empty string. there way in python in javascript use node mynode = doc.documentnode.selectnode(/[text()="name of substance"/)?

from lxml import html import requests  import csv  page = requests.get(http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0) tree = html.fromstring(page.text)   if( name of substance there )     chem_name = tree.xpath('//*[text()="name of substance"]/..//div')[0].text_content() else     chem_name = []  if ( mesh heading there )     mesh_name = tree.xpath('//*[text()="mesh heading"]/..//div')[1].text_content() else      mesh_name = []  names1 = [chem_name, mesh_name] open('testchem.csv', 'wb') myfile:     wr = csv.writer(myfile)      wr.writerow(names1) 

you can check if name of substance or mesh heading in text of webpage, , if select contents.

from lxml import html import requests import csv page = requests.get('http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0') tree = html.fromstring(page.text)  if ("name of substance" in page.text):     chem_name = tree.xpath('//*[text()="name of substance"]/..//div')[0].text_content() else:     chem_name = ""  if ("mesh heading" in page.text):     mesh_name = tree.xpath('//*[text()="mesh heading"]/..//div')[1].text_content() else:     mesh_name = ""  names1 = [chem_name, mesh_name] open('testchem.csv', 'wb') myfile:     wr = csv.writer(myfile)     wr.writerow(names1) 

Comments

Popular posts from this blog

android - Gradle sync Error:Configuration with name 'default' not found -

java - Andrioid studio start fail: Fatal error initializing 'null' -

html - jQuery UI Sortable - Remove placeholder after item is dropped -