如何下载XML文件以避免弹出窗口这种类型的文件可能会通过ChromeDriver和Chrome（使用Python中的Selenium）损坏您的计算机

25-02-16 22

对于想了解如何下载XML文件以避免弹出窗口这种类型的文件可能会通过ChromeDriver和Chrome的读者，本文将提供新的信息，我们将详细介绍使用Python中的Selenium损坏您的计算机，并

对于想了解如何下载XML文件以避免弹出窗口这种类型的文件可能会通过ChromeDriver和Chrome的读者，本文将提供新的信息，我们将详细介绍使用Python中的Selenium损坏您的计算机，并且为您提供关于3-Python爬虫-动态HTML/Selenium+PhantomJS/chrome无头浏览器-chromedriver、centos7.4 64位安装 google-chrome 与 chromedriver 运行 Python selenium 项目、chromedriver之"此类型文件可能会损害您的计算机"提示的处理、ChromeDriver仅在BMP错误中支持字符，同时使用Selenium Python将带有ChromeDriver Chrome的Emoji发送给Tkinter的label（）文本框的有价值信息。

本文目录一览：

如何下载XML文件以避免弹出窗口这种类型的文件可能会通过ChromeDriver和Chrome（使用Python中的Selenium）损坏您的计算机
3-Python爬虫-动态HTML/Selenium+PhantomJS/chrome无头浏览器-chromedriver
centos7.4 64位安装 google-chrome 与 chromedriver 运行 Python selenium 项目
chromedriver之"此类型文件可能会损害您的计算机"提示的处理
ChromeDriver仅在BMP错误中支持字符，同时使用Selenium Python将带有ChromeDriver Chrome的Emoji发送给Tkinter的label（）文本框

如何下载XML文件以避免弹出窗口这种类型的文件可能会通过ChromeDriver和Chrome（使用Python中的Selenium）损坏您的计算机

我想从供应商那里下载每日xml文件。我设法登录并单击链接接受下载以开始使用chromedriver下载。

但是我看到弹出窗口“此类型的文件可能会损害您的计算机”。页面的MIME 是text / html，我不确定链接是否是text / javascript

我尝试了所有建议的解决方案，例如

print(''Starting..'')prefs = {''download.default_directory'': ''C:\\Users\MainDesk\Downloads'',''download.prompt_for_download'': False,''download.extensions_to_open'': ''xml'',''safebrowsing.enabled'': False}options = Options()options.add_experimental_option(''prefs'',prefs)browser = webdriver.Chrome(options=options, executable_path=''C:\\chromedriver.exe'')

如何自动保存文件？

另外，我尝试进入Chrome的“设置”并关闭“询问以保存文件”

我正在Windows 7，Python 3.7和Visual Studio以及最新版本的chromedriver 上运行脚本

无法自动执行我的下载？

答案1

小编典典

关于一些信息的网页从那里您试图下载该XML文件可能是有帮助的调试与弹出窗口的问题
文本“这种类型的文件可能会损害您的计算机以更好的方式。

但是，这里有一个示例程序，可从该网页下载xml文件：

代码块：

        from selenium import webdriver    from selenium.webdriver.common.by import By    from selenium.webdriver.support.ui import WebDriverWait    from selenium.webdriver.support import expected_conditions as EC    prefs = {    ''download.default_directory'': ''C:/Utility/Downloads/'',    ''download.prompt_for_download'': False,    ''download.extensions_to_open'': ''xml'',    ''safebrowsing.enabled'': True    }    options = webdriver.ChromeOptions()    options.add_experimental_option(''prefs'',prefs)    options.add_argument("start-maximized")    # options.add_argument("disable-infobars")    options.add_argument("--disable-extensions")    options.add_argument("--safebrowsing-disable-download-protection")    options.add_argument("safebrowsing-disable-extension-blacklist")    driver = webdriver.Chrome(options=options, executable_path=r''C:\Utility\BrowserDrivers\chromedriver.exe'')    driver.get("http://www.landxmlproject.org/file-cabinet")    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()=''MntnRoad.xml'']//following::span[1]//a[text()=''Download'']"))).click()  * Browser Snapshot:[![xml_download](https://i.stack.imgur.com/hc1W6.png)](https://i.stack.imgur.com/hc1W6.png)

3-Python爬虫-动态HTML/Selenium+PhantomJS/chrome无头浏览器-chromedriver

动态HTML

爬虫跟反爬虫

动态HTML介绍

JavaScrapt
jQuery
Ajax
DHTML
Python采集动态数据
- 从Javascript代码入手采集
- Python第三方库运行JavaScript，直接采集你在浏览器看到的页面

Selenium + PhantomJS

Selenium: web自动化测试工具
- 自动加载页面
- 获取数据
- 截屏
- 安装： pip install selenium==2.48.0
- 官网： http://selenium-python.readthedocs.io/index.html
PhantomJS(幽灵)
- 基于Webkit 的无界面的浏览器
- 官网： http://phantomjs.org/download.html
Selenium 库有有一个WebDriver的API
WebDriver可以跟页面上的元素进行各种交互，用它可以来进行爬取
案例 v36
chrome + chromedriver
- 下载安装chrome：下载+安装
- 下载安装chromedriver：
Selenium操作主要分两大类：
- 得到UI元素
  - find_element_by_id
  - find_elements_by_name
  - find_elements_by_xpath
  - find_elements_by_link_text
  - find_elements_by_partial_link_text
  - find_elements_by_tag_name
  - find_elements_by_class_name
  - find_elements_by_css_selector
- 基于UI元素操作的模拟
  - 单击
  - 右键
  - 拖拽
  - 输入
  - 可以通过导入ActionsChains类来做到
- 案例37

案例v36

''''''
通过webdriver操作进行查找1
''''''



from selenium import webdriver
import time




# 通过Keys模拟键盘
from selenium.webdriver.common.keys import  Keys

# 操作哪个浏览器就对哪个浏览器建一个实例
# 自动按照环境变量查找相应的浏览器
driver = webdriver.PhantomJS()

# 如果浏览器没有在相应环境变量中，需要指定浏览器位置


driver.get("http://www.baidu.com")


# 通过函数查找title标签
print("Title: {0}".format(driver.title))

案例v37

from selenium import webdriver
import time
from selenium.webdriver.common.keys import  Keys

# 可能需要手动添加路径
driver = webdriver.Chrome()

url = "http://www.baidu.com"


driver.get(url)


text = driver.find_element_by_id(''wrapper'').text
print(text)
print(driver.title)
# 得到页面的快照
driver.save_screenshot(''index.png'')

# id="kw" 的是百度的输入框，我们得到输入框的ui元素后直接输入“大熊猫"
driver.find_element_by_id(''kw'').send_keys(u"大熊猫")

# id="su"是百度搜索的按钮，click模拟点击
driver.find_element_by_id(''su'').click()

time.sleep(5)
driver.save_screenshot("daxiongmao.png")


#获取当前页面的cookie
print(driver.get_cookies())

# 模拟输入两个按键 ctrl+ a
driver.find_element_by_id(''kw'').send_keys(Keys.CONTROL, ''a'')
#ctr+x 是剪切快捷键
driver.find_element_by_id(''kw'').send_keys(Keys.CONTROL, ''x'')

driver.find_element_by_id(''kw'').send_keys(u''航空母舰'')
driver.save_screenshot(''hangmu.png'')

driver.find_element_by_id(''su'').send_keys(Keys.RETURN)

time.sleep(5)
driver.save_screenshot(''hangmu2.png'')


# 清空输入框 , clear
driver.find_element_by_id(''kw'').clear()
driver.save_screenshot(''clear.png'')

# 关闭浏览器
driver.quit()

selenium自动化测试工具：

-用selenium登录的时候，先要get到页面； -get到页面之后就可以准备输入了， -selenium可以模拟输入； -selenium调没有界面的chrome或者PhantomJS -保存快照，手动输入，对静态验证码的一大杀手。

centos7.4 64位安装 google-chrome 与 chromedriver 运行 Python selenium 项目

centos7.4 实例

利用 yum 命令安装 google-chrome 超级简单（安装最新版）：

yum install https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm

chromedriver 也下载最新版就好，和Mac 版本一样，我依旧选择这个最新版本：

http://chromedriver.storage.googleapis.com/index.html?path=70.0.3538.16/

Python3.7 之前安装过：

Centos7 环境下 Python2.7 换成 Python3.7 运行 scrapy 应用所遇到的问题记录

然后配置一下，就能跑起来 selenium 项目了，当然该 pip install xxx 安装模块就缺什么安装什么了

# 设置 webdriver 参数
        options = webdriver.ChromeOptions()
        # 设置谷歌浏览器的一些选项
        #  proxy 代理 options 选项
        options.add_argument(r''--proxy--server=http:\\'' + ip)
        uas = get_uas()
        # 设置user-agent
        options.add_argument(''user-agent='' + choice(uas))
        # 以 headless 方案运行
        options.add_argument(''--headless'')
        options.add_argument(''--no-sandbox'')
        # options.add_argument(''--disable-dev-shm-usage'')
        # 禁用图片访问
        # prefs = {"profile.managed_default_content_settings.images": 2}
        # options.add_experimental_option("prefs", prefs)

如果不配置 # 以 headless 方案运行

会抛出下面异常

[root@iZwz9117d4ssv2v5oe81m3Z selenium_baidu]# python testRun.py
package_util 初始化
testRun.py:32: DeprecationWarning: use options instead of chrome_options
  driver = webdriver.Chrome(path, chrome_options=options)
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/python/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/python/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "testRun.py", line 32, in get_url
    driver = webdriver.Chrome(path, chrome_options=options)
  File "/usr/python/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
    desired_capabilities=desired_capabilities)
  File "/usr/python/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/python/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/python/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/usr/python/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
  (unknown error: DevToolsActivePort file doesn''t exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
  (Driver info: chromedriver=70.0.3538.16 (16ed95b41bb05e565b11fb66ac33c660b721f778),platform=Linux 3.10.0-693.2.2.el7.x86_64 x86_64)

selenium 代码正确运行：

chromedriver之

chromedriver之"此类型文件可能会损害您的计算机"提示的处理

背景

最近在使用selenium进行自动文件下载时，突然出现了一个报错：

图片描述

下载进行不下去了。

思路

经过各种谷歌、百度，均告诉我在要增加params，关闭浏览器安全选项，配置如下：

chromeOptions = webdriver.ChromeOptions()
prefs = {"profile.default_content_settings.popups": 0,
          "download.default_directory": path,
          "download.prompt_for_download": False,
          # "download.directory_upgrade": ''true'',
          "safebrowsing.enabled": True}
chromeOptions.add_experimental_option("prefs", prefs)

事实证明，可能以前的版本是可行的，现在的真心不行。

上面配置重点是"safebrowsing.enabled": True。在MacOS的环境下，哪怕不配也是没有问题的，Windows就不行了。

最后在谷歌上找到一篇相关文章，大意是说这个是无解的，可能是windows系统安全的问题，

对于这个解释我还是比较认可的，所以在mac上就不会提示。

Let’s start frankly: you can’t disable this feature. You can merely tweak the download settings in order to avoid it.
https://windowsreport.com/typ...

那么问题来了，既然这样，有什么曲线救国的办法呢？

当chromedriver弹出这个提示的时候，其实文件已经下载完成，如下图：

图片描述

我们只需要将文件名修改为正确的名字和后缀即可(比如test.txt)，直接无视警告提醒。思路如下：

找到最新下载的文件：通过对下载目录的文件按照创建时间排序，找到最新的
判断是否该文件是否已下载完成：通过判断时间间隔前后该文件是否有大小的变化

结论

根据上面思路，实现的关键代码如下：

def sort_file():
    global path
    dir_lists = os.listdir(path)
    dir_lists.sort(key=lambda fn: os.path.getmtime(os.path.join(path, fn)))
    return dir_lists[-1]
  
def changeName(path, oldname, newname):
    old_path = os.path.join(path, oldname)
    new_path = os.path.join(path, newname + ''.txt'')
    if os.path.exists(old_path):
        if os.path.exists(new_path):
            os.remove(new_path)
        os.rename(old_path, new_path)
        print (''rename done!'' + newname)
    else:
        print (''no file found!'')
  
def download():
  ...
  temp_filename = sort_file()
      if u''未确认'' in temp_filename:
          temp_filesize_old = os.path.getsize(os.path.join(path, temp_filename))
          while True:
              time.sleep(1)
              temp_filesize_new = os.path.getsize(os.path.join(path, temp_filename))
              if temp_filesize_old == temp_filesize_new:
                  changeName(path, temp_filename, ip)
                  return
              else:
                  temp_filesize_old = temp_filesize_new
      else:
          print(u''下载失败'')

需要注意的是，在文件重命名的时候，先检查下文件是否已经存在，先删除，在创建。

以上。

如果有更好的思路，欢迎分享。

ChromeDriver仅在BMP错误中支持字符，同时使用Selenium Python将带有ChromeDriver Chrome的Emoji发送给Tkinter的label（）文本框

如何解决ChromeDriver仅在BMP错误中支持字符，同时使用Selenium Python将带有ChromeDriver Chrome的Emoji发送给Tkinter的label（）文本框？

这个错误讯息…

selenium.common.exceptions.WebDriverException: Message: unkNown error: ChromeDriver only supports characters in the BMP

…暗示ChromeDriver无法通过方法发送表情符号信号。 send_keys()

ChromeDriver仅支持BMP中的字符，这是Chromium团队的一个已知问题，因为ChromeDriver仍不支持 Unicode字符。因此，它是无法发送任何字符以外通过ChromeDriver。结果，任何发送SMP字符（例如CJK，Emojis，Symbols等）的尝试都会引发错误。

Alternative

一个潜在的替代方法是使用_GeckoDriver_ / Firefox.

Code Block:
```
from selenium import webdriver
```
from selenium.webdriver.support.ui import webdriverwait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox(executable_path=r’C:\Utility\browserDrivers\geckodriver.exe’) driver.get(''https://www.google.com/’)

Chineese Character

webdriverwait(driver, 20).until(EC.element_to_be_clickable((By.NAME, “q”))).send_keys(“
解决方法

我正在自动化whatsapp消息，并希望通过
tkinter窗口发送出去。在此tkinter窗口中，我
借助.label（）创建了一个消息框，并能够通过硒连接到whatsapp网站。

目前，我已经可以发送消息了，但是没有表情符号。当我
包含表情符号时，出现此错误“ Chromedriver仅支持
BMP中的字符”。如何包含表情符号？

今天的关于如何下载XML文件以避免弹出窗口这种类型的文件可能会通过ChromeDriver和Chrome和使用Python中的Selenium损坏您的计算机的分享已经结束，谢谢您的关注，如果想了解更多关于3-Python爬虫-动态HTML/Selenium+PhantomJS/chrome无头浏览器-chromedriver、centos7.4 64位安装 google-chrome 与 chromedriver 运行 Python selenium 项目、chromedriver之"此类型文件可能会损害您的计算机"提示的处理、ChromeDriver仅在BMP错误中支持字符，同时使用Selenium Python将带有ChromeDriver Chrome的Emoji发送给Tkinter的label（）文本框的相关知识，请在本站进行查询。

本文标签：