Python http下载页面源代码（python源代码下载网站）

25-02-06 21

如果您对Pythonhttp下载页面源代码和python源代码下载网站感兴趣，那么这篇文章一定是您不可错过的。我们将详细讲解Pythonhttp下载页面源代码的各种细节，并对python源代码下载网站

如果您对Python http下载页面源代码和python源代码下载网站感兴趣，那么这篇文章一定是您不可错过的。我们将详细讲解Python http下载页面源代码的各种细节，并对python源代码下载网站进行深入的分析，此外还有关于(Appium - Python) 打印页面源码、300 行 python 代码的轻量级 HTTPServer 实现文件上传下载、HttpURLConnection获取页面源代码乱码问题、HTTPX|Python 3的下一代HTTP客户端的实用技巧。

本文目录一览：

Python http下载页面源代码（python源代码下载网站）
(Appium - Python) 打印页面源码
300 行 python 代码的轻量级 HTTPServer 实现文件上传下载
HttpURLConnection获取页面源代码乱码问题
HTTPX|Python 3的下一代HTTP客户端

Python http下载页面源代码（python源代码下载网站）

你好，我在想是否可以连接到http主机（例如google.com）并下载网页源？

提前致谢。

(Appium - Python) 打印页面源码

如何解决(Appium - Python) 打印页面源码？

我遇到了“该元素不再存在于 DOM 中”的问题，我正在尝试打印 Android 页面以查看该元素是否仍在 DOM 中（可能不在）。

我应该如何在 Python 中打印页面？在 Appium 上刷新源会产生错误响应状态：13 所以我不能使用它。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

300 行 python 代码的轻量级 HTTPServer 实现文件上传下载

# !/usr/bin/env python
# coding=utf-8
# http://my.oschina.net/leejun2005/blog/71444

"""
    简介：这是一个 python 写的轻量级的文件共享服务器（基于内置的SimpleHTTPServer模块），
    支持文件上传下载，只要你安装了python（建议版本2.6~2.7，不支持3.x），
    然后去到想要共享的目录下，执行：
        python SimpleHTTPServerWithUpload.py
    或者 python SimpleHTTPServerWithUpload.py filename
"""

"""Simple HTTP Server With Upload.

This module builds on BaseHTTPServer by implementing the standard GET
and HEAD requests in a fairly straightforward manner.

"""

__version__ = "0.2"
__all__ = ["SimpleHTTPRequestHandler"]
__home_page__ = ""

import os, sys, platform, socket, struct, json
import posixpath
import BaseHTTPServer
from SocketServer import ThreadingMixIn
import threading
import urllib, urllib2
import cgi
import shutil
import mimetypes
import re
import time

reload(sys)
sys.setdefaultencoding("utf-8")

try:
    from cStringIO import StringIO
except ImportError:
    from StringIO import StringIO

def echoRed(s):
    return "%s[31;1m%s%s[0m" % (chr(27), s, chr(27))

def get_ip_address(ifname=None):
    if sys.platform == ''win32'':
        return socket.getaddrinfo(socket.gethostname(), None, socket.AF_INET, socket.SOCK_DGRAM)[-1][4][0]
    else:
        import fcntl
        s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        return socket.inet_ntoa(fcntl.ioctl(
            s.fileno(),
            0x8915,  # SIOCGIFADDR
            struct.pack(''256s'', ifname[:15])
        )[20:24])

class GetWanIp:
    def getip(self):
        try:
            myip = get_ip_address(ifname="eth0")
            # myip = self.visit("http://ipinfo.io")
        except Exception, e:
            print str(e)
            myip = "127.0.0.1"
        return myip
    def visit(self ,url):
        # req = urllib2.Request(url)
        # values = {''User-Agent'': ''Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537'',
        #            ''Referer'': ''http://ip.taobao.com/ipSearch.php'',
        #            ''ip'': ''myip''
        #         }
        # data = urllib.urlencode(values)
        import requests
        content = requests.get(''http://ipinfo.io'').content
        print content
        json_content = json.loads(content)
        return json_content["ip"]

def showTips():
    print ""
    print ''----------------------------------------------------------------------->> ''
    try:
        port = 12345
        file_name = sys.argv[1]
        print ''-------->> Please visit files or dirs use Chrome Browser:     http://'' + GetWanIp().getip() + '':'' + str(port)
        print "-------->> Also, You can wget download the file:    " + echoRed("wget http://" + GetWanIp().getip() + '':'' + str(port) + "/" + file_name)
    except Exception, e:
        print ''You have not give a filename, plase use Chrome Browser:     http://'' + GetWanIp().getip() + '':'' + str(port)
        print "You can give a filename and wget download the file, Usage: " + echoRed("pywget filename")

    if not 1024 < port < 65535:  port = 8080
    print ''----------------------------------------------------------------------->> ''

    print ""
    # serveraddr = ('''', port)
    return ('''', port)


serveraddr = showTips()


def sizeof_fmt(num):
    for x in [''bytes'', ''KB'', ''MB'', ''GB'']:
        if num < 1024.0:
            return "%3.1f%s" % (num, x)
        num /= 1024.0
    return "%3.1f%s" % (num, ''TB'')


def modification_date(filename):
    return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(os.path.getmtime(filename)))


class SimpleHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    """Simple HTTP request handler with GET/HEAD/POST commands.

    This serves files from the current directory and any of its
    subdirectories.  The MIME type for files is determined by
    calling the .guess_type() method. And can reveive file uploaded
    by client.

    The GET/HEAD/POST requests are identical except that the HEAD
    request omits the actual contents of the file.

    """

    server_version = "SimpleHTTPWithUpload/" + __version__

    def do_GET(self):
        """Serve a GET request."""
        # print "....................", threading.currentThread().getName()
        f = self.send_head()
        if f:
            self.copyfile(f, self.wfile)
            f.close()

    def do_HEAD(self):
        """Serve a HEAD request."""
        f = self.send_head()
        if f:
            f.close()

    def do_POST(self):
        """Serve a POST request."""
        r, info = self.deal_post_data()
        print r, info, "by: ", self.client_address
        f = StringIO()
        f.write(''<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">'')
        f.write("<html>\n<title>Upload Result Page</title>\n")
        f.write("<body>\n<h2>Upload Result Page</h2>\n")
        f.write("<hr>\n")
        if r:
            f.write("<strong>Success:</strong>")
        else:
            f.write("<strong>Failed:</strong>")
        f.write(info)
        f.write("<br><a href=\"%s\">back</a>" % self.headers[''referer''])
        f.write("<hr><small>Powered By: bones7456, check new version at ")
        f.write("<a href=\"http://li2z.cn/?s=SimpleHTTPServerWithUpload\">")
        f.write("here</a>.</small></body>\n</html>\n")
        length = f.tell()
        f.seek(0)
        self.send_response(200)
        self.send_header("Content-type", "text/html")
        self.send_header("Content-Length", str(length))
        self.end_headers()
        if f:
            self.copyfile(f, self.wfile)
            f.close()

    def deal_post_data(self):
        boundary = self.headers.plisttext.split("=")[1]
        remainbytes = int(self.headers[''content-length''])
        line = self.rfile.readline()
        remainbytes -= len(line)
        if not boundary in line:
            return (False, "Content NOT begin with boundary")
        line = self.rfile.readline()
        remainbytes -= len(line)
        fn = re.findall(r''Content-Disposition.*name="file"; filename="(.*)"'', line)
        if not fn:
            return (False, "Can''t find out file name...")
        path = self.translate_path(self.path)
        osType = platform.system()
        try:
            if osType == "Linux":
                fn = os.path.join(path, fn[0].decode(''gbk'').encode(''utf-8''))
            else:
                fn = os.path.join(path, fn[0])
        except Exception, e:
            return (False, "文件名请不要用中文，或者使用IE上传中文名的文件。")
        while os.path.exists(fn):
            fn += "_"
        line = self.rfile.readline()
        remainbytes -= len(line)
        line = self.rfile.readline()
        remainbytes -= len(line)
        try:
            out = open(fn, ''wb'')
        except IOError:
            return (False, "Can''t create file to write, do you have permission to write?")

        preline = self.rfile.readline()
        remainbytes -= len(preline)
        while remainbytes > 0:
            line = self.rfile.readline()
            remainbytes -= len(line)
            if boundary in line:
                preline = preline[0:-1]
                if preline.endswith(''\r''):
                    preline = preline[0:-1]
                out.write(preline)
                out.close()
                return (True, "File ''%s'' upload success!" % fn)
            else:
                out.write(preline)
                preline = line
        return (False, "Unexpect Ends of data.")

    def send_head(self):
        """Common code for GET and HEAD commands.

        This sends the response code and MIME headers.

        Return value is either a file object (which has to be copied
        to the outputfile by the caller unless the command was HEAD,
        and must be closed by the caller under all circumstances), or
        None, in which case the caller has nothing further to do.

        """
        path = self.translate_path(self.path)
        f = None
        if os.path.isdir(path):
            if not self.path.endswith(''/''):
                # redirect browser - doing basically what apache does
                self.send_response(301)
                self.send_header("Location", self.path + "/")
                self.end_headers()
                return None
            for index in "index.html", "index.htm":
                index = os.path.join(path, index)
                if os.path.exists(index):
                    path = index
                    break
            else:
                return self.list_directory(path)
        ctype = self.guess_type(path)
        try:
            # Always read in binary mode. Opening files in text mode may cause
            # newline translations, making the actual size of the content
            # transmitted *less* than the content-length!
            f = open(path, ''rb'')
        except IOError:
            self.send_error(404, "File not found")
            return None
        self.send_response(200)
        self.send_header("Content-type", ctype)
        fs = os.fstat(f.fileno())
        self.send_header("Content-Length", str(fs[6]))
        self.send_header("Last-Modified", self.date_time_string(fs.st_mtime))
        self.end_headers()
        return f

    def list_directory(self, path):
        """Helper to produce a directory listing (absent index.html).

        Return value is either a file object, or None (indicating an
        error).  In either case, the headers are sent, making the
        interface the same as for send_head().

        """
        try:
            list = os.listdir(path)
        except os.error:
            self.send_error(404, "No permission to list directory")
            return None
        list.sort(key=lambda a: a.lower())
        f = StringIO()
        displaypath = cgi.escape(urllib.unquote(self.path))
        f.write(''<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">'')
        f.write("<html>\n<title>Directory listing for %s</title>\n" % displaypath)
        f.write("<body>\n<h2>Directory listing for %s</h2>\n" % displaypath)
        f.write("<hr>\n")
        f.write("<form ENCTYPE=\"multipart/form-data\" method=\"post\">")
        f.write("<input name=\"file\" type=\"file\"/>")
        f.write("<input type=\"submit\" value=\"upload\"/>")
        f.write("&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp")
        f.write("<input type=\"button\" value=\"HomePage\" onClick=\"location=''/''\">")
        f.write("</form>\n")
        f.write("<hr>\n<ul>\n")
        for name in list:
            fullname = os.path.join(path, name)
            colorName = displayname = linkname = name
            # Append / for directories or @ for symbolic links
            if os.path.isdir(fullname):
                colorName = ''<span>'' + name + ''/</span>''
                displayname = name
                linkname = name + "/"
            if os.path.islink(fullname):
                colorName = ''<span>'' + name + ''@</span>''
                displayname = name
                # Note: a link to a directory displays with @ and links with /
            filename = os.getcwd() + ''/'' + displaypath + displayname
            f.write(
                ''<table><tr><td width="60%%"><a href="%s">%s</a></td><td width="20%%">%s</td><td width="20%%">%s</td></tr>\n''
                % (urllib.quote(linkname), colorName,
                   sizeof_fmt(os.path.getsize(filename)), modification_date(filename)))
        f.write("</table>\n<hr>\n</body>\n</html>\n")
        length = f.tell()
        f.seek(0)
        self.send_response(200)
        self.send_header("Content-type", "text/html")
        self.send_header("Content-Length", str(length))
        self.end_headers()
        return f

    def translate_path(self, path):
        """Translate a /-separated PATH to the local filename syntax.

        Components that mean special things to the local file system
        (e.g. drive or directory names) are ignored.  (XXX They should
        probably be diagnosed.)

        """
        # abandon query parameters
        path = path.split(''?'', 1)[0]
        path = path.split(''#'', 1)[0]
        path = posixpath.normpath(urllib.unquote(path))
        words = path.split(''/'')
        words = filter(None, words)
        path = os.getcwd()
        for word in words:
            drive, word = os.path.splitdrive(word)
            head, word = os.path.split(word)
            if word in (os.curdir, os.pardir): continue
            path = os.path.join(path, word)
        return path

    def copyfile(self, source, outputfile):
        """Copy all data between two file objects.

        The SOURCE argument is a file object open for reading
        (or anything with a read() method) and the DESTINATION
        argument is a file object open for writing (or
        anything with a write() method).

        The only reason for overriding this would be to change
        the block size or perhaps to replace newlines by CRLF
        -- note however that this the default server uses this
        to copy binary data as well.

        """
        shutil.copyfileobj(source, outputfile)

    def guess_type(self, path):
        """Guess the type of a file.

        Argument is a PATH (a filename).

        Return value is a string of the form type/subtype,
        usable for a MIME Content-type header.

        The default implementation looks the file''s extension
        up in the table self.extensions_map, using application/octet-stream
        as a default; however it would be permissible (if
        slow) to look inside the data to make a better guess.

        """

        base, ext = posixpath.splitext(path)
        if ext in self.extensions_map:
            return self.extensions_map[ext]
        ext = ext.lower()
        if ext in self.extensions_map:
            return self.extensions_map[ext]
        else:
            return self.extensions_map['''']

    if not mimetypes.inited:
        mimetypes.init()  # try to read system mime.types
    extensions_map = mimetypes.types_map.copy()
    extensions_map.update({
        '''': ''application/octet-stream'',  # Default
        ''.py'': ''text/plain'',
        ''.c'': ''text/plain'',
        ''.h'': ''text/plain'',
    })


class ThreadingServer(ThreadingMixIn, BaseHTTPServer.HTTPServer):
    pass


def test(HandlerClass=SimpleHTTPRequestHandler,
         ServerClass=BaseHTTPServer.HTTPServer):
    BaseHTTPServer.test(HandlerClass, ServerClass)


if __name__ == ''__main__'':
    # test()

    # 单线程
    # srvr = BaseHTTPServer.HTTPServer(serveraddr, SimpleHTTPRequestHandler)

    # 多线程
    srvr = ThreadingServer(serveraddr, SimpleHTTPRequestHandler)

    srvr.serve_forever()

REF:

[1] httpserver

httpserver
=======================================
This httpserver is a enhanced version of SimpleHTTPServer. 
It was write in python, I use some code from bottle[https://github.com/defnull/bottle] 
It support resuming download, you can set the document root, it has more 
friendly error hit, and it can handle mimetype gracefully

https://github.com/lerry/httpserver/blob/master/httpserver.py

[2] py2_SimpleHTTPServerWithUpload

https://github.com/jJayyyyyyy/cs/blob/master/just%20for%20fun/file_transfer/http/py2_SimpleHTTPServerWithUpload/SimpleHTTPServerWithUpload.py

[3] py3_SimpleHTTPServerWithUpload.py

https://github.com/jJayyyyyyy/cs/blob/master/just%20for%20fun/file_transfer/http/py3_SimpleHTTPServerWithUpload.py

[4] python3重写SimpleHTTPServerWithUpload

https://jjayyyyyyy.github.io/2016/10/07/reWrite_SimpleHTTPServerWithUpload_with_python3.html

[5] 支持upload文件的SimpleHTTPServer

http://buptguo.com/2015/11/07/simplehttpserver-with-upload-file/

HttpURLConnection获取页面源代码乱码问题

此情况分为两种情况：
1.网上大部分均是在发送请求的时候地址自带中文字符集而未使用encode，decode传输导致；

2.情况2是有些网站为了节省流量使用了GZIP压缩，浏览器自带解压缩，而在代码中从流获取未压缩则出现乱码，

此时，使用GZIP解码即可

HTTPX|Python 3的下一代HTTP客户端

简介

HTTPX 是最近 GitHub看的到一个比较火的一个项目，根据官网的描述，总结有如下特点:

和使用 requests 一样方便,requests 有的它都有

加入 HTTP/1.1 和 HTTP/2 的支持。

能够直接向 WSGI 应用程序或 ASGI 应用程序发出请求。

到处都有严格的超时设置

全类型注释

100％的测试覆盖率

今天的关于Python http下载页面源代码和python源代码下载网站的分享已经结束，谢谢您的关注，如果想了解更多关于(Appium - Python) 打印页面源码、300 行 python 代码的轻量级 HTTPServer 实现文件上传下载、HttpURLConnection获取页面源代码乱码问题、HTTPX|Python 3的下一代HTTP客户端的相关知识，请在本站进行查询。

本文标签：