在本文中,我们将给您介绍关于如何单击Google趋势中的“加载更多”按钮并通过Selenium和Python打印所有标题的详细内容,此外,我们还将为您提供关于ElementNotVisibleExce
在本文中,我们将给您介绍关于如何单击Google趋势中的“加载更多”按钮并通过Selenium和Python打印所有标题的详细内容,此外,我们还将为您提供关于ElementNotVisibleException:消息:尝试通过Selenium和Python单击按钮时,元素不可交互错误、python selenium单击按钮、python爬虫之图片懒加载、selenium和phantomJS、Selenium - 如何单击每个单独项目的所有更多按钮以从下拉列表中抓取数据的知识。
本文目录一览:- 如何单击Google趋势中的“加载更多”按钮并通过Selenium和Python打印所有标题
- ElementNotVisibleException:消息:尝试通过Selenium和Python单击按钮时,元素不可交互错误
- python selenium单击按钮
- python爬虫之图片懒加载、selenium和phantomJS
- Selenium - 如何单击每个单独项目的所有更多按钮以从下拉列表中抓取数据
如何单击Google趋势中的“加载更多”按钮并通过Selenium和Python打印所有标题
这次,我想单击一个按钮以加载更多实时搜索。这是网站的链接:
https://trends.google.com/trends/trendingsearches/realtime?geo=AR&category=all
该按钮位于页面末尾,具有以下代码:
<divng-if="ctrl.shouldShowLoadingMoreItemsSpinner()" ng-click="ctrl.loadMoreFeedItems()" role="button" tabindex="0">Load more</div>
由于涉及到一些AngularJS,所以我不知道该怎么做…有任何提示/帮助吗?
谢谢你们琼
答案1
小编典典要单击LOAD MORE
按钮以加载更多实时搜索,然后打印它们,您可以使用以下解决方案:
代码块:
# -*- coding: UTF-8 -*-
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutExceptionoptions = webdriver.ChromeOptions()
options.add_argument(“start-maximized”)
options.add_argument(‘disable-infobars’)
driver=webdriver.Chrome(chrome_options=options, executable_path=r’C:\Utility\BrowserDrivers\chromedriver.exe’)
driver.get("https://trends.google.com/trends/trendingsearches/realtime?geo=AR&category=all”)
myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, “//div[@class=’title’]”))))while True:
driver.execute_script(“window.scrollTo(0, document.body.scrollHeight);”)
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, “//div[@class=’feed-load-more-button’][@ng-click="ctrl.loadMoreFeedItems()"]”))).click()
WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath(“//div[@class=’title’]”)) > myLength)
titles = driver.find_elements_by_xpath(“//div[@class=’title’]”)
myLength = len(titles)
except TimeoutException:
breakfor title in titles:
print(title.text)
driver.quit()控制台输出:
Nikola Kalinic • 2018 World Cup • Zlatko Dalic
Vicentico • Valeria Bertuccelli • Florián Fernandez Capello
Mauricio Macri • Juan José Aranguren • Argentina • Jorge Lanata
Goalkeeper • 2018 World Cup • Mohamed El-Shenawy • Ahmed El-Shenawy
Argentina • Manuel Belgrano • María Josefa Ezcurra • Mercedes Tomasa de San Martín y Escalada
South Korea national football team • Russia • Zlatan Ibrahimovic
Italian nationality law • Mar del Plata
Hirving Lozano • 2018 World Cup • Earthquake
Central Bank of Argentina • Luis Caputo • Federico Sturzenegger • Nicolás Dujovne
Porsche Mission E • Electric vehicle • Car • Electric car
Intelligence • Intelligence quotient • Research
Call for bids • National Flag Memorial • Monument • Rosario • Pablo Javkin
Journalist • Club Atlético Belgrano • Córdoba • Manuel Belgrano
France • Spain • Immigration
Natti Natasha • Zum Zum • Daddy Yankee • R.K.M & Ken-Y • Francisco José Arcángel Ramos
Mercosur • Pacific Alliance • Pact • Paraguay
World Cup • England national football team • Romelu Lukaku • Dries Mertens
House arrest • Detention • Robbery
Argentina • Emerging markets • MSCI • Market • Finance • Morgan Stanley
Eurozone • Budget • France • Bruno Le Maire • Germany • Finance
Intelligence • Intelligence quotient • Research
Brazil national football team • Tite • Philippe Coutinho • Rostov Arena
2018 World Cup • Aleksandar Kolarov • Guillermo Ochoa
2018 World Cup • Spain
Sloth • Giant ground sloth • Fossil • San Pedro
Government • Inflation • Alfredo Cornejo • Mendoza • Ministry of Finance
Corrientes • Cannabis • Tractor • Argentine Naval Prefecture
Argentine Chamber of Deputies • Chubut Province • Peronism • Front for Victory
Jorge Rial • Father’s Day • Intrusos en el espectáculo
Debt • Debt relief • Refinancing • Chaco Province
Polygraph
Malaria • Paraguay • Americas
General Confederation of Labour • Unemployment • Trade union
Earthquake • Japan
Angela Merkel • Europe • Germany • Emmanuel Macron • European Union
Colombia • Ivan Duque • Gustavo Petro
Lynx • Wildcat • Cat
Rafael Márquez • 2018 World Cup • Antonio Carbajal
Superior Council for Private Enterprise • Nicaraguan Social Security Institute
2018 World Cup • William Kvist • Jefferson Farfán
La Plata • Controlled-access highway • Buenos Aires Province • Ensenada
Leonardo Mayer • Kevin Anderson • ATP World Tour 500 series • Association of Tennis Professionals
Cristiano Ronaldo • 2018 World Cup • Spain
Casino • Mendoza • Hyatt
Mauricio Macri • Jorge Lanata • Argentina • Alejandro Wiebe
Drug
Buenos Aires Province • Judiciary
2018 World Cup • Lionel Messi • Russia • Diego Maradona • Mario Kempes
Santiago • Natural environment • Air pollution
Chimbas Department
Chaco Province • Judiciary • Unemployment
Hailey Rhode Baldwin • Justin Bieber
2018 World Cup • Russia
Mauricio Macri • Luis Caputo • Politician • Argentine Chamber of Deputies
Trade war • Agriculture • Donald Trump • China
2018 World Cup • Dennis te Kloese • Juan Carlos Osorio
Aaron Ramsey • 2018 World Cup • Arsenal F.C. • Football player • Gareth Bale
2018 World Cup • Aleksandar Kolarov
Iñaki Urdangarin • Cristina Federica, Infanta of Spain • Spain • Luis Roldán
Argentina • Electric car
San Lorenzo de Almagro • Liga Nacional de Básquet • Corrientes
Christina Aguilera • Jimmy Fallon • Liberation • New York City
Intelligence quotient • Intelligence • Flynn effect • Research
María Eugenia Vidal • Republican Proposal • Horacio Rodríguez Larreta
Natural environment • Pollution • Plastic pollution
Funding • International Finance Corporation • World Bank • Córdoba
Face • Zygomatic bone • Rosario
Cristóbal López • Administración Federal de Ingresos Públicos • Ricardo Echegaray
Club Atlético Belgrano • Lucas Bernardi • Superliga Argentina de Fútbol
Juan José Aranguren • Mauricio Macri • Energy • YPF
Greater Buenos Aires • Motor coach
Stranger Things • MTV Movie & TV Awards
Peru • Lima • Earthquake • 2018 World Cup • Geophysics Institute of Peru
Juan Carlos Osorio • 2018 World Cup
2018 World Cup • Player • Russia • Lionel Messi
Raffaella Carrà
America’s Got Talent • Janis Joplin • Howie Mandel
Homicide • Detention
La Plata • Julio Garro • Víctor Manuel Fernández • María Eugenia Vidal
Mariano Arcioni • Chubut Province • Payment
Angela Merkel • Horst Seehofer • Germany • Government • Human migration
Israel • Benjamin Netanyahu • Syria • Iran • Ali Khamenei • Vladimir Putin
Yerba mate • Cannabis • Lomas de Zamora • Detention • Mate
Senate • Argentine Chamber of Deputies
2010 FIFA World Cup • South Africa
Handball • Chile • Argentina national football team
2018 World Cup • Russia • Terrorism • Attack
Abortion • Conscientious objector
Rosario • Trade
Natalie Weber • Mauro Zárate • Pampita
Water • Cipolletti • Pressure • Cleaning
Chimbas
NATO • Military exercise • Russia
Federico Sturzenegger • Economist • Argentina • Arnaldo Bocco • Martín Redrado
Nolle prosequi
2018 World Cup • Russia • Paolo Guerrero • Christian Cueva
Ricardo Darín • Valeria Bertuccelli • Érica Rivas
Joachim Löw • 2018 World Cup
Israel • Golan Heights • Syria • Donald Trump
Meningitis • Bacteria • Salta • Streptococcus pneumoniae
ANSES • Subsidy
Argentine Chamber of Deputies • Radical Civic Union • Cambiemos •
Martín Demichelis • 2018 World Cup • Russia
Santiago de Compostela • Pilgrim • Galicia • Spain
Refugee • Pope Francis • Human migration
Stranger Things • Father’s Day • Joe Keery
Ivan Duque • President of Colombia • Latin America
General Confederation of Labour • Unemployment • General strike • Buenos Aires Province
Juan Carlos Osorio • Faustino Asprilla • 2018 World Cup
Germán Burgos • Club Atlético River Plate • Diego Simeone • Atlético Madrid
Nissan Navara • Pickup truck • Automotive industry
Baobab • Research • Tree
Susana Giménez • Alejandro Wiebe • Argentina • Telefe
Wanda Nara • Maxi López • Mauro Icardi
Damir Skomina • 2018 World Cup • Colombia • Referee • Russia • Mehdi Abid Charef
Diabetes mellitus • Visual perception • Diabetic retinopathy
Berisso • Threat • La Plata • Search and seizure • School
Game of Thrones • HBO • San Diego Comic-Con • Spin-off • George R. R. Martin
Argentina • Duet • Traveling Wilburys
Blood donation • Uruguay • Maldonado
2018 World Cup Group F • Mexico • Hirving Lozano
Colombian presidential election, 2018 • Colombia • Juan Manuel Santos
Colombia national football team
Desertification • United Nations Convention to Combat Desertification
Saski Baskonia • Liga ACB • Pablo Laso • Real Madrid C.F.
Rosario de la Frontera • Salta • Spa town
Neymar • 2018 World Cup • Philippe Coutinho • Tite • Russia
Pocito Department
Argentine peso • Depreciation • Central Bank of Argentina
San Salvador de Jujuy • Buenos Aires International Book Fair • Fair •
Córdoba • Shock
Iceland • Immigration • Icelanders
Mirtha Legrand
Traffic collision • Wound
2018 World Cup • Russia • Vikings
Mauricio Macri • Businessperson • Argentina • Economic development
Ricardo Darín • Valeria Bertuccelli • Érica Rivas • Vicentico
María Eugenia Vidal • Martiniano Molina • Quilmes
Argentina women’s national field hockey team • Julieta Jankunas • Argentina
Prince Harry • Catherine, Duchess of Cambridge • British royal family
Sergej Milinkovic-Savic • SS Lazio • 2018 World Cup
Organ donation • Organ • Organ transplantation
Sex education • San Fernando del Valle de Catamarca • Argentine Chamber of Deputies
Margarita Stolbizer • Sergio Massa • Peronism • Justicialist Party • Elisa Carrió
Google Maps • Waze • Information • Radar
Locomotive • Japan
Spain • Human migration • Immigration • France • Valencia • Carmen Calvo Poyato
Joaquín Sabina • Madrid
Jorge Lorenzo • Marc Márquez • MotoGP • Dani Pedrosa
Manuel Belgrano
2018 World Cup • Russia • Mohamed Salah • Denis Cheryshev
María Eugenia Vidal • Teacher
Remand
Magistrate • Judiciary • Research
Laura Bush • Donald Trump • Immigration • George W. Bush • Melania Trump
Sebastián Piñera • Chile • Michelle Bachelet
School • Gender identity • Discrimination • National Institute Against Discrimination, Xenophobia and Racism
Engineering • San Miguel de Tucumán
School • Buenos Aires Province • Tariff
Edgardo Bauza • Rosario Central • Marco Ruben • Rosario
Bicameralism
Russia • 2018 World Cup • Lionel Messi
Rawson Department, San Juan
El Litoral • Corrientes Province
Abortion • PH: Podemos hablar • Andy Kusnetzoff • Charlotte Caniggia
The Shining • Ewan McGregor • Doctor Sleep • Danny Torrance • Stephen King
Paulo Ferrari • Rosario Central • Rosario • Superliga Argentina de Fútbol
Carbon monoxide
Ground frost • Fog • Rain and snow mixed • Cold • Posadas
Argentine rock • Russia • Luis Alberto Spinetta • Gustavo Cerati • Charly García
Incredibles 2 • Brad Bird • Pixar
Carbohydrate • Dieting • Weight loss
Uber • Mendoza Province • Government • System • Statute
Paragliding
Jorge Sampaoli • Pedro Pasculli • 2018 World Cup • Paulo Dybala • Russia
Posadas • Cold
Luis Miguel • Mexico
Unidentified flying object • Russia • Phenomenon
Lisandro Magallán • Boca Juniors • AFC Ajax • Wílmar Barrios
Tandil • Fossil • Glyptodon
Harry Kane • 2018 World Cup • Gareth Southgate
Light welterweight • Boxing • Almirante Brown Partido • Mariano Cascallares
New Jersey
Small and medium-sized enterprises • Argentina • CAME - Argentina Confederation of Medium Enterprises
Season • Ryan Murphy • American Horror Story: Murder House • Sarah Paulson
Conflagration • Posadas
Robot • Old age • China
2018 World Cup • Russia • Terrorism • Islamic State of Iraq and the Levant
Christian Cueva • 2018 World Cup
Argentina national football team • Volleyball • Argentina
National Electoral Institute • Candidate • National Action Party • Mexico
Horoscope • Astrological sign
Mauricio Macri • María Eugenia Vidal • Ensenada • Ambulance • Cambiemos
Trade • Tax deduction • Debt • Macroeconomics
Cristiano Ronaldo • Pelé • 2018 World Cup • Miroslav Klose • Uwe Seeler
Martín Miguel de Güemes • Juan Manuel Urtubey • Salta Province • Argentina
Compressed natural gas • La Pampa Province • General Pico • Camuzzi Gas Pampeana
Santa Fe • Provincial Hospital Dr. José María Cullen • Baleada
National University of La Plata • Vocational school • School • Vocational education
ElementNotVisibleException:消息:尝试通过Selenium和Python单击按钮时,元素不可交互错误
我有一个包含源代码的页面,如下面的代码。在执行操作后,在其中显示“撤消”和“关闭”按钮。我试图单击“关闭”按钮。我已经尝试了下面的所有三个代码块,但都无法正常工作。有人可以指出我做错了什么,还是建议其他尝试?
html来源:
<div><div><i></i><div>Your stuff is going to <span>place</span> is on its way.</div><div><button> Undo</button></div><div><button> Close</button></div></div></div>
代码尝试:
#driver.find_element_by_id("gh69ID1m3_xtdTUQuwadU").click()driver.find_element_by_css_selector(''.c-button.c-button--blue'').click()#driver.find_element_by_link_text(''Close'').click()
错误:
---------------------------------------------------------------------------ElementNotVisibleException Traceback (most recent call last)<ipython-input-15-6d570be770d7> in <module>() 1 #driver.find_element_by_id("gh69ID1m3_xtdTUQuwadU").click()----> 2 driver.find_element_by_css_selector(''.c-button.c-button--blue'').click() 3 #driver.find_element_by_link_text(''Close'').click()~/anaconda/envs/py36/lib/python3.6/site-packages/selenium/webdriver/remote/webelement.py in click(self) 78 def click(self): 79 """Clicks the element."""---> 80 self._execute(Command.CLICK_ELEMENT) 81 82 def submit(self):~/anaconda/envs/py36/lib/python3.6/site-packages/selenium/webdriver/remote/webelement.py in _execute(self, command, params) 626 params = {} 627 params[''id''] = self._id--> 628 return self._parent.execute(command, params) 629 630 def find_element(self, by=By.ID, value=None):~/anaconda/envs/py36/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params) 318 response = self.command_executor.execute(driver_command, params) 319 if response:--> 320 self.error_handler.check_response(response) 321 response[''value''] = self._unwrap_value( 322 response.get(''value'', None))~/anaconda/envs/py36/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response) 240 alert_text = value[''alert''].get(''text'') 241 raise exception_class(message, screen, stacktrace, alert_text)--> 242 raise exception_class(message, screen, stacktrace) 243 244 def _value_or_default(self, obj, key, default):ElementNotVisibleException: Message: element not interactable (Session info: chrome=72.0.3626.109) (Driver info: chromedriver=2.42.591059 (a3d9684d10d61aa0c45f6723b327283be1ebaad8),platform=Mac OS X 10.12.6 x86_64)
答案1
小编典典与文本的元素 关闭 是一个动态的元素,以便找到你要引起元素 WebDriverWait 的 元素是可点击 ,你既可以使用以下方法解决:
使用
CSS_SELECTOR
:WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.action-bar button.c-button.c-button--blue"))).click()
使用
XPATH
:WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[contains(@class, ''action-bar'')]//button[@c-button c-button--blue'' and normalize-space()=''Close'']"))).click()
注意 :您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
python selenium单击按钮
我是python selenium的新手,我尝试单击具有以下html结构的按钮:
<div> <divonclick="submitForm(''mTF'')"> <inputtype="button"></input> <div></div> <span> Search </span> </div> <divonclick="submitForm(''rMTF'')"> <inputtype="button"></input> <span> Reset </span> </div></div>
我希望能够同时单击上方的Search
和Reset
按钮(显然是单独单击)。
我尝试了几件事,例如:
driver.find_element_by_css_selector(''.button .c_button .s_button'').click()
要么,
driver.find_element_by_name(''s_image'').click()
要么,
driver.find_element_by_class_name(''s_image'').click()
但是,我似乎总是以结尾NoSuchElementException
,例如:
selenium.common.exceptions.NoSuchElementException: Message: u''Unable to locate element: {"method":"name","selector":"s_image"}'' ;
我想知道是否可以以某种方式使用HTML的onclick属性来进行selenium单击?
任何能将我指向正确方向的想法都很棒。谢谢。
答案1
小编典典对于python,请使用
from selenium.webdriver import ActionChains
和
ActionChains(browser).click(element).perform()
python爬虫之图片懒加载、selenium和phantomJS
一、什么是图片懒加载
在网页中,常常需要用到图片,而图片需要消耗较大的流量。正常情况下,浏览器会解析整个HTML代码,然后从上到下依次加载<img src="xxx">
的图片标签。如果页面很长,隐藏在页面下方的图片其实已经被浏览器加载了。如果用户不向下滚动页面,就没有看到这些图片,相当于白白浪费了图片的流量。
所以,淘宝、京东这些流量非常巨大的电商,商品介绍页又必须有大量的图片,因此,这些页面的图片都是“按需加载”,即用户滚动页面时显示出来的时候才加载图片。当网速非常快的时候,用户并不能感知懒加载的动作,既省流量又不影响用户浏览。
- 案例分析:抓取站长素材http://sc.chinaz.com/中的图片数据
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests
from lxml import etree
if __name__ == "__main__":
url = ''http://sc.chinaz.com/tupian/gudianmeinvtupian.html''
headers = {
''User-Agent'': ''Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'',
}
#获取页面文本数据
response = requests.get(url=url,headers=headers)
response.encoding = ''utf-8''
page_text = response.text
#解析页面数据(获取页面中的图片链接)
#创建etree对象
tree = etree.HTML(page_text)
div_list = tree.xpath(''//div[@id="container"]/div'')
#解析获取图片地址和图片的名称
for div in div_list:
image_url = div.xpath(''.//img/@src'')
image_name = div.xpath(''.//img/@alt'')
print(image_url) #打印图片链接
print(image_name)#打印图片名称
运行结果观察发现,我们可以获取图片的名称,但是链接获取的为空,检查后发现xpath表达式也没有问题,究其原因出在了哪里呢?
- 图片懒加载概念:
图片懒加载是一种网页优化技术。图片作为一种网络资源,在被请求时也与普通静态资源一样,将占用网络资源,而一次性将整个页面的所有图片加载完,将大大增加页面的首屏加载时间。为了解决这种问题,通过前后端配合,使图片仅在浏览器当前视窗内出现时才加载该图片,达到减少首屏图片请求数的技术就被称为“图片懒加载”。
- 网站一般如何实现图片懒加载技术呢?
在网页源码中,在img标签中首先会使用一个“伪属性”(通常使用src2,original......)去存放真正的图片链接而并非是直接存放在src属性中。当图片出现到页面的可视化区域中,会动态将伪属性替换成src属性,完成图片的加载。
- 站长素材案例后续分析:通过细致观察页面的结构后发现,网页中图片的链接是存储在了src2这个伪属性中
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests
from lxml import etree
if __name__ == "__main__":
url = ''http://sc.chinaz.com/tupian/gudianmeinvtupian.html''
headers = {
''User-Agent'': ''Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'',
}
#获取页面文本数据
response = requests.get(url=url,headers=headers)
response.encoding = ''utf-8''
page_text = response.text
#解析页面数据(获取页面中的图片链接)
#创建etree对象
tree = etree.HTML(page_text)
div_list = tree.xpath(''//div[@id="container"]/div'')
#解析获取图片地址和图片的名称
for div in div_list:
image_url = div.xpath(''.//img/@src''2) #src2伪属性
image_name = div.xpath(''.//img/@alt'')
print(image_url) #打印图片链接
print(image_name)#打印图片名称
二.selenium
- 什么是selenium?
是Python的一个第三方库,对外提供的接口可以操作浏览器,然后让浏览器完成自动化的操作。
- 环境搭建
1.安装selenum:pip install selenium
2.获取某一款浏览器的驱动程序(以谷歌浏览器为例)
2.1 谷歌浏览器驱动下载地址:http://chromedriver.storage.googleapis.com/index.html
2.2 下载的驱动程序必须和浏览器的版本统一,大家可以根据http://blog.csdn.net/huilan_same/article/details/51896672中提供的版本映射表进行对应
- 效果展示:大家可以将如下代码运行,观看效果
示例1
from selenium import webdriver
import time
bro=webdriver.Chrome(r''G:\爬虫练习\day03\chromedriver_win32\chromedriver.exe'')
# 打开浏览器发起请求
bro.get(''https://www.baidu.com'')
time.sleep(3)
# 定位到的搜索框
my_text=bro.find_element_by_id(''kw'')
# 向搜索框中输入一个字
my_text.send_keys(''吴秀波'')
time.sleep(3)
# 定位到搜索按钮
my_button=bro.find_element_by_id(''su'')
my_button.click()
time.sleep(10)
bro.quit()
示例2
from selenium import webdriver
from time import sleep
# 后面是你的浏览器驱动位置,记得前面加r'''',''r''是防止字符转义的
driver = webdriver.Chrome(r''G:\爬虫练习\day03\chromedriver_win32\chromedriver.exe'')
# 用get打开百度页面
driver.get("http://www.baidu.com")
# 查找页面的“设置”选项,并进行点击
driver.find_elements_by_link_text(''设置'')[0].click()
sleep(2)
# # 打开设置后找到“搜索设置”选项,设置为每页显示50条
driver.find_elements_by_link_text(''搜索设置'')[0].click()
sleep(2)
# 选中每页显示50条
m = driver.find_element_by_id(''nr'')
sleep(2)
m.find_element_by_xpath(''//*[@id="nr"]/option[3]'').click()
m.find_element_by_xpath(''.//option[3]'').click()
sleep(2)
# 点击保存设置
driver.find_elements_by_class_name("prefpanelgo")[0].click()
sleep(2)
# 处理弹出的警告页面 确定accept() 和 取消dismiss()
driver.switch_to_alert().accept()
sleep(2)
# 找到百度的输入框,并输入 星星
driver.find_element_by_id(''kw'').send_keys(''星星'')
sleep(2)
# 点击搜索按钮
driver.find_element_by_id(''su'').click()
sleep(2)
# 在打开的页面中找到“Selenium - 开源中国社区”,并打开这个页面
driver.find_elements_by_link_text(''美女_百度图片'')[0].click()
sleep(3)
# 关闭浏览器
driver.quit()
代码操作
#导包
from selenium import webdriver
#创建浏览器对象,通过该对象可以操作浏览器
browser = webdriver.Chrome(''驱动路径'')
#使用浏览器发起指定请求
browser.get(url)
#使用下面的方法,查找指定的元素进行操作即可
find_element_by_id 根据id找节点
find_elements_by_name 根据name找
find_elements_by_xpath 根据xpath查找
find_elements_by_tag_name 根据标签名找
find_elements_by_class_name 根据class名字查找
三.PhantomJS
PhantomJS是一款无界面的浏览器,其自动化操作流程和上述操作谷歌浏览器是一致的。由于是无界面的,为了能够展示自动化操作流程,PhantomJS为用户提供了一个截屏的功能,使用save_screenshot函数实现。
示例
from selenium import webdriver
import time
# phantomjs路径
path = r''PhantomJS驱动路径''
browser = webdriver.PhantomJS(path)
# 打开百度
url = ''http://www.baidu.com/''
browser.get(url)
time.sleep(3)
browser.save_screenshot(r''phantomjs\baidu.png'')
# 查找input输入框
my_input = browser.find_element_by_id(''kw'')
# 往框里面写文字
my_input.send_keys(''美女'')
time.sleep(3)
#截屏
browser.save_screenshot(r''phantomjs\meinv.png'')
# 查找搜索按钮
button = browser.find_elements_by_class_name(''s_btn'')[0]
button.click()
time.sleep(3)
browser.save_screenshot(r''phantomjs\show.png'')
time.sleep(3)
browser.quit()
【重点】selenium+phantomjs 就是爬虫终极解决方案:有些网站上的内容信息是通过动态加载js形成的,所以使用普通爬虫程序无法回去动态加载的js内容。例如豆瓣电影中的电影信息是通过下拉操作动态加载更多的电影信息。
综合操作:
- 需求:尽可能多的爬取豆瓣网中的电影信息
from selenium import webdriver
from time import sleep
import time
if __name__ == ''__main__'':
url = ''https://movie.douban.com/typerank?type_name=%E6%81%90%E6%80%96&type=20&interval_id=100:90&action=''
# 发起请求前,可以让url表示的页面动态加载出更多的数据
path = r''C:\Users\Administrator\Desktop\爬虫授课\day05\ziliao\phantomjs-2.1.1-windows\bin\phantomjs.exe''
# 创建无界面的浏览器对象
bro = webdriver.PhantomJS(path)
# 发起url请求
bro.get(url)
time.sleep(3)
# 截图
bro.save_screenshot(''1.png'')
# 执行js代码(让滚动条向下偏移n个像素(作用:动态加载了更多的电影信息))
js = ''document.body.scrollTop=2000''
bro.execute_script(js) # 该函数可以执行一组字符串形式的js代码
time.sleep(4)
bro.save_screenshot(''2.png'')
time.sleep(2)
# 使用爬虫程序爬去当前url中的内容
html_source = bro.page_source # 该属性可以获取当前浏览器的当前页的源码(html)
with open(''./source.html'', ''w'', encoding=''utf-8'') as fp:
fp.write(html_source)
bro.quit()
四.谷歌无头浏览器
由于PhantomJs最近已经停止了更新和维护,所以推荐大家可以使用谷歌的无头浏览器,是一款无界面的谷歌浏览器。
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
# 创建一个参数对象,用来控制chrome以无界面模式打开
chrome_options = Options()
chrome_options.add_argument(''--headless'')
chrome_options.add_argument(''--disable-gpu'')
# 驱动路径
path = r''C:\Users\ZBLi\Desktop\1801\day05\ziliao\chromedriver.exe''
# 创建浏览器对象
browser = webdriver.Chrome(executable_path=path, chrome_options=chrome_options)
# 上网
url = ''http://www.baidu.com/''
browser.get(url)
time.sleep(3)
browser.save_screenshot(''baidu.png'')
browser.quit()
Selenium - 如何单击每个单独项目的所有更多按钮以从下拉列表中抓取数据
要单击所有带有 More 文本的元素,您需要为 element_to_be_clickable()
引入 WebDriverWait,您可以使用以下任一 Locator Strategies:
-
使用
CSS_SELECTOR
:driver.get("https://www.narpm.org/find/property-managers/?submitted=true&toresults=1&resultsperpage=10&a=managers&orderby=&fname=&lname=&company=&chapter=S005&city=&state=&xRadius=") for more in WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"div.row div.arrow"))): more.click()
-
使用
XPATH
:driver.get("https://www.narpm.org/find/property-managers/?submitted=true&toresults=1&resultsperpage=10&a=managers&orderby=&fname=&lname=&company=&chapter=S005&city=&state=&xRadius=") for more in WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.XPATH,"//div[@]//div[contains(@class,'arrow') and contains(.,'More')]"))): more.click()
-
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
-
浏览器快照:
关于如何单击Google趋势中的“加载更多”按钮并通过Selenium和Python打印所有标题的介绍已经告一段落,感谢您的耐心阅读,如果想了解更多关于ElementNotVisibleException:消息:尝试通过Selenium和Python单击按钮时,元素不可交互错误、python selenium单击按钮、python爬虫之图片懒加载、selenium和phantomJS、Selenium - 如何单击每个单独项目的所有更多按钮以从下拉列表中抓取数据的相关信息,请在本站寻找。
本文标签: