使用请求在python中下载大文件（python post请求下载文件）

25-03-11 11

本篇文章给大家谈谈使用请求在python中下载大文件，以及pythonpost请求下载文件的知识点，同时本文还将给你拓展golang/python下载大文件时怎样避免oom、python下载大文件、P

本篇文章给大家谈谈使用请求在python中下载大文件，以及python post请求下载文件的知识点，同时本文还将给你拓展golang/python 下载大文件时怎样避免oom、python 下载大文件、Python 下载大文件，哪种方式速度更快！、Python-使用请求将文件直接下载到内存等相关知识，希望对各位有所帮助，不要忘了收藏本站喔。

本文目录一览：

使用请求在python中下载大文件（python post请求下载文件）
golang/python 下载大文件时怎样避免oom
python 下载大文件
Python 下载大文件，哪种方式速度更快！
Python-使用请求将文件直接下载到内存

使用请求在python中下载大文件（python post请求下载文件）

Requests是一个非常好的库。我想用它来下载大文件（>
1GB）。问题是不可能将整个文件保存在内存中。我需要分块阅读。这是以下代码的问题：

import requestsdef DownloadFile(url)    local_filename = url.split(''/'')[-1]    r = requests.get(url)    f = open(local_filename, ''wb'')    for chunk in r.iter_content(chunk_size=512 * 1024):         if chunk: # filter out keep-alive new chunks            f.write(chunk)    f.close()    return

由于某种原因，它不能以这种方式工作：它仍然在将响应保存到文件之前将其加载到内存中。

更新

如果你需要一个可以从 FTP 下载大文件的小客户端（Python 2.x
/3.x），你可以在这里找到它。它支持多线程和重新连接（它确实监视连接），它还为下载任务调整套接字参数。

答案1

小编典典

使用以下流式代码，无论下载文件的大小如何，Python 内存使用都会受到限制：

def download_file(url):    local_filename = url.split(''/'')[-1]    # NOTE the stream=True parameter below    with requests.get(url, stream=True) as r:        r.raise_for_status()        with open(local_filename, ''wb'') as f:            for chunk in r.iter_content(chunk_size=8192):                 # If you have chunk encoded response uncomment if                # and set chunk_size parameter to None.                #if chunk:                 f.write(chunk)    return local_filename

请注意，使用返回的字节数iter_content不完全是chunk_size; 预计它是一个通常更大的随机数，并且预计在每次迭代中都会有所不同。

请参阅body-content-
workflow和Response.iter_content以获取更多参考。

golang/python 下载大文件时怎样避免oom

问题场景：高频系统中，agent 会向ATS 服务器发出刷新和预缓存的请求，这里的请求head 里面有GET ，PURGE等，因为一般的预缓存都是小文件，但是某天，突然服务器oom。。。罪魁祸首发现是并发GET 大文件将服务器打死了。第一个版本是python 的，第二个版本是golang 实现的，这里记录下两种语言的下载大文件的实现方式。

该文章后续仍在不断的更新修改中，请移步到原文地址http://dmwan.cc

第一种是python，使用的是request 库, 使用流式读取的方式，写到空设备中去。

res = self.session.request(method, url, data=body, headers=header, timeout=timeout, proxies=proxies, stream=True)
with open("/dev/null", ''wb'') as f:
            for chunk in res.iter_content(chunk_size=1024):
                if chunk: # filter out keep-alive new chunks
                    f.write(chunk)
                    f.flush()

第二种方式，对于golang ，使用io.Copy(), 将response copy 到空设备中。

func downLoadFile(url string)(len int, err error){
	//err write /dev/null: bad file descriptor#
	out, err := os.OpenFile("/dev/null", os.O_RDWR|os.O_CREATE|os.O_APPEND, 0666)
	defer out.Close()
	resp, err := http.Get(url)
	defer resp.Body.Close()
	n, err := io.Copy(out, resp.Body)
	return n, err
}

使用这种方式为什么不会出现oom 的情况？因为两个原因，第一个， resp.Body 只是个reader 并没有发生真实的读取操作，第二个是io.copy 这个函数设置了缓冲区大小限制为3m，不会一次全部读取到内存中，下面是标准库的源码：

func Copy(dst Writer, src Reader) (written int64, err error) {
	return copyBuffer(dst, src, nil)
}

// copyBuffer is the actual implementation of Copy and CopyBuffer.
// if buf is nil, one is allocated.
func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
	// If the reader has a WriteTo method, use it to do the copy.
	// Avoids an allocation and a copy.
	if wt, ok := src.(WriterTo); ok {
		return wt.WriteTo(dst)
	}
	// Similarly, if the writer has a ReadFrom method, use it to do the copy.
	if rt, ok := dst.(ReaderFrom); ok {
		return rt.ReadFrom(src)
	}
	if buf == nil {
		buf = make([]byte, 32*1024) //这一步可以控制每次缓冲区迭代的大小，默认大小是3m
	}
	for {
		nr, er := src.Read(buf)
		if nr > 0 {
			nw, ew := dst.Write(buf[0:nr])
			if nw > 0 {
				written += int64(nw)
			}
			if ew != nil {
				err = ew
				break
			}
			if nr != nw {
				err = ErrShortWrite
				break
			}
		}
		if er != nil {
			if er != EOF {
				err = er
			}
			break
		}
	}
	return written, err
}

python 下载大文件

1，下载，肯定要看进度

优秀的 progress bar

tqdm

安装

sudo python3 -m pip install tqdm

1.1, 加环境变量

#!/usr/bin/python3
from tqdm import tqdm

import requests

上一步安装 tqdm 的是 python3 ，
python3 关联到了 tqdm 这个库，
不加环境变量，
走默认的 python 2.7,

不是在 python 2.7 上，安装的这个库 tqdm

2 , 下载代码

这个文件，700 M 左右
走文件流，下载的速度，比 Chrome 快多了
走 Chrome 下载，老是失败

#!/usr/bin/python3
from tqdm import tqdm

import requests

url = ''https://static.realm.io/downloads/swift/realm-swift-10.1.1.zip''

# Streaming, so we can iterate over the response.
response = requests.get(url, stream=True)
total_size_in_bytes = int(response.headers.get(''content-length'', 0))
block_size = 1024 #1 Kibibyte
progress_bar = tqdm(total=total_size_in_bytes, unit=''iB'', unit_scale=True)


path = ''/Users/xx/Desktop/Papr-develop/realm-swift-10.1.1.zip''

print("总量:")

print(total_size_in_bytes)

with open(path, ''wb'') as file:
    for data in response.iter_content(block_size):
        progress_bar.update(len(data))
        file.write(data)


progress_bar.close()


if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
    print("ERROR, something went wrong")

Python 下载大文件，哪种方式速度更快！

通常，我们都会用 requests 库去下载，这个库用起来太方便了。

方法一

使用以下流式代码，无论下载文件的大小如何，Python 内存占用都不会增加：

def download_file(url):
local_filename = url.split(''/'')[-1]
# 注意传入参数 stream=True
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, ''wb'') as f:
for chunk in r.iter_content(chunk_size=8192): 
f.write(chunk)
return local_filename

登录后复制

如果你有对 chunk 编码的需求，那就不该传入 chunk_size 参数，且应该有 if 判断。

def download_file(url):
local_filename = url.split(''/'')[-1]
# 注意传入参数 stream=True
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, ''w'') as f:
for chunk in r.iter_content(): 
if chunk:
f.write(chunk.decode("utf-8"))
return local_filename

登录后复制

iter_content^[1] 函数本身也可以解码，只需要传入参数 decode_unicode = True 即可。另外，搜索公众号顶级Python后台回复“进阶”，获取一份惊喜礼包。

请注意，使用 iter_content 返回的字节数并不完全是 chunk_size，它是一个通常更大的随机数，并且预计在每次迭代中都会有所不同。

方法二

使用 Response.raw^[2] 和 shutil.copyfileobj^[3]

import requests
import shutil

def download_file(url):
local_filename = url.split(''/'')[-1]
with requests.get(url, stream=True) as r:
with open(local_filename, ''wb'') as f:
shutil.copyfileobj(r.raw, f)

return local_filename

登录后复制

这将文件流式传输到磁盘而不使用过多的内存，并且代码更简单。

注意：根据文档，Response.raw 不会解码，因此如果需要可以手动替换 r.raw.read 方法

response.raw.read = functools.partial(response.raw.read, decode_content=True)

登录后复制

速度

方法二更快。方法一如果 2-3 MB/s 的话，方法二可以达到近 40 MB/s。

参考资料

[1]iter_content: https://requests.readthedocs.io/en/latest/api/#requests.Response.iter_content

[2]Response.raw: https://requests.readthedocs.io/en/latest/api/#requests.Response.raw

[3]shutil.copyfileobj: https://docs.python.org/3/library/shutil.html#shutil.copyfileobj

以上就是Python 下载大文件，哪种方式速度更快！的详细内容，更多请关注php中文网其它相关文章！

Python-使用请求将文件直接下载到内存

目的是从Internet下载文件，并从中创建文件对象或类似文件的文件，而无需使其接触硬盘驱动器。这仅是出于我的知识，想知道它是否可能或可行，尤其是因为我想看看是否可以绕过必须编写文件删除行的代码。

通常，这就是我从网络上下载内容并将其映射到内存的方式：

import requests
import mmap

u = requests.get("http://www.pythonchallenge.com/pc/def/channel.zip")

with open("channel.zip","wb") as f: # I want to eliminate this,as this writes to disk
    f.write(u.content)

with open("channel.zip","r+b") as f: # and his as well,because it reads from disk
    mm = mmap.mmap(f.fileno(),0)
    mm.seek(0)
    print mm.readline()
    mm.close() # question: if I do not include this,does this become a memory leak?

今天的关于使用请求在python中下载大文件和python post请求下载文件的分享已经结束，谢谢您的关注，如果想了解更多关于golang/python 下载大文件时怎样避免oom、python 下载大文件、Python 下载大文件，哪种方式速度更快！、Python-使用请求将文件直接下载到内存的相关知识，请在本站进行查询。

本文标签：