Click here to Skip to main content
15,896,497 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
i am learning web scraping and i made this code to scrape from ChemSpider but it is slow , how can i improve it?

Python
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup

search=input()
def scrape_search(search):
    my_url="http://www.chemspider.com/Search.aspx?q="+str(search)
    uClient=urlopen(my_url)
    page_html=uClient.read()
    uClient.close()
    page_soup=soup(page_html,"html.parser")
    target=page_soup.findAll("div",{"class":"results-wrapper table"})
    target=target[0]
    base_url="http://www.chemspider.com/Chemical-Structure."
    results=target.div.table.tbody.findAll("tr")
    scraped_data=[{"ID":None,"URL":None,"img_url":None,"Molecular Formula":None,"Molecular Weight":None,"Name":None} for i in range(0,len(results))]
    for i in range(0,len(results)):
        result=results[i].findAll("td")
        scraped_data[i]["ID"]=result[0].a.text.strip()
        scraped_data[i]["URL"]=base_url+str(scraped_data[i]["ID"])+".html"
        scraped_data[i]["img_url"]="http://www.chemspider.com/ImagesHandler.ashx?id="+str(scraped_data[i]["ID"])+"&w=250&h=250"
        scraped_data[i]["Molecular Formula"]=result[2].text.strip()
        names=result[2].findAll("<sub>")
        for name in names:
            scraped_data[i]["Molecular Formula"]+=str(name.sub).strip()
        scraped_data[i]["Molecular Weight"]=result[3].text.strip()
        scraped_data[i]["Name"]=scrape_id_page(base_url+str(scraped_data[i]["ID"])+".html")
    return scraped_data
def scrape_id_page(url):
    uClient=urlopen(url)
    page_html=uClient.read()
    uClient.close()
    page_soup=soup(page_html,"html.parser")
    target=page_soup.findAll("span",{"id":"ctl00_ctl00_ContentSection_ContentPlaceHolder1_RecordViewDetails_rptDetailsView_ctl00_WrapTitle"})
    return target[0].text.strip()
print(scrape_search(search))


What I have tried:

how can i improve my code?

is scrappy faster than BeautifulSoup?
Posted
Updated 8-Jul-18 11:12am

1 solution

Quote:
how can i improve my code?

You need to understand how your code is spending time.
The tool is a program profiler.
26.4. The Python Profilers — Python 2.7.15 documentation[^]
python - How can you profile a script? - Stack Overflow[^]
Profiling Python Like a Boss - The Zapier Engineering Blog | Zapier[^]
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900