Click here to Skip to main content
14,241,417 members
Rate this:
Please Sign up or sign in to vote.
Hello guys,

I have a problem with the following:

I testing Python with selenium web driver.

I need download data of different web page, the schema is the same in all pages, the difference is in the URL, since the last value is variable, it can be a number between 1 and 100. These URLs are found in the a directory within a text file.

So, is there any way to review all those URLs and extract the data from each of them?

NOTE: The web pages is dynamic and are updated every five minutes with js and json.

But I get the following:

C:\Users\JDani\AppData\Local\Programs\Python\Python37-32\python.exe C:/Users/JDani/.PyCharmCE2019.1/config/scratches/scratch_7.py 
1 None , None 2 None , None 3 None , None 4 None , None 5 None


Thanks in advance

What I have tried:

''' Try with following code '''
baseurl = requests.get('http://mi.dominio.net/Report?server=xxx.xxx.xxx.xx&PC=')
valid_url = '1,2,3,4,5'

for n in range(len(valid_url)):
    url = f'{baseurl}{valid_url[n]}'
    driver.get(url)

    print(url)          
    print(pages.title)

''' after save data in text file '''


This is what I have, could you help me or any suggest?

<pre lang="Python"><pre>from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import os
from pandas import ExcelWriter
import pandas as pd

mipath = "C:/test"

desired_capabilities = DesiredCapabilities.CHROME
desired_capabilities["pageLoadStrategy"] = "none"
driver = webdriver.Chrome('/Users/JDan/Documents/Proyect/chromedriver/chromedriver.exe')
wait = WebDriverWait(driver, 20)

''' Try with following code '''
baseurl = requests.get('http://mi.dominio.net/Report?server=xxx.xxx.xxx.xx&PC=')
valid_url = '1,2,3,4,5'

for n in range(len(valid_url)):
    url = f'{baseurl}{valid_url[n]}'
    driver.get(url)

    print(url)          
    print(pages.title)

''' after save data in text file '''
try:
    wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "CutterValue")))
    '''os.stat(mipath)'''
except TimeoutException:
    print('Nope')
else:
    '''os.mkdir(mipath)'''
    driver.execute_script("window.stop();")
    content = driver.find_elements_by_class_name("CutterValue")
    codes = [element.text for element in content]
    '''Save data in Text file'''
    file = open(mipath + "/mytext.txt", "a")
    file.write('\n' + str(codes))
    file.close()

    '''print(codes)'''
    driver.close()

'''Save data in Excel file'''

df = pd.DataFrame(codigos)
writer = ExcelWriter('./a.xlsx')

df.to_excel(writer,'Sheet1')
writer.save()
Posted
Updated 6-Jun-19 8:03am
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100