urllib.request.urlopen()函数用于实现对目标 url 的访问。
函数原型如下:urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
url: 需要打开的网址
data:Post 提交的数据(bytes 类型,则需要通过 bytes()方法转化。另外,如果传递了这个参数,则它的请求方式就不再是 GET 方式,而是 POST 方式)
timeout:设置网站的访问超时时间
实例如下:
import urllib.request
response = urllib.request.urlopen('https://www.baidu.com')
#请求的响应体
print(response.read().decode())
#响应的状态码
print(response.status)
#获取响应头部信息
print(response.getheaders())
运行结果如下:
<html>
<head>
<script>
location.replace(location.href.replace("https://","http://"));
</script>
</head>
<body>
<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>
200
[('Accept-Ranges', 'bytes'), ('Cache-Control', 'no-cache'), ('Content-Length', '227'), ('Content-Type', 'text/html'), ('Date', 'Wed, 14 Aug 2019 08:47:12 GMT'), ('Etag', '"5d4be0b4-e3"'), ('Last-Modified', 'Thu, 08 Aug 2019 08:43:32 GMT'), ('P3p', 'CP=" OTI DSP COR IVA OUR IND COM "'), ('Pragma', 'no-cache'), ('Server', 'BWS/1.1'), ('Set-Cookie', 'BD_NOT_HTTPS=1; path=/; Max-Age=300'), ('Set-Cookie', 'BIDUPSID=DE34D47F2FC8B2BDA02B9CD97ECB0DD5; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com'), ('Set-Cookie', 'PSTM=1565772432; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com'), ('Strict-Transport-Security', 'max-age=0'), ('X-Ua-Compatible', 'IE=Edge,chrome=1'), ('Connection', 'close')]
发送带有 data 的请求:
import requests
import urllib.parse
data = bytes(urllib.parse.urlencode({'world':'hello'}),encoding='utf8')
response = urllib.request.urlopen('http://httpbin.org/post',data=data)
print(response.read().decode())
运行结果:
{
"args": {},
"data": "",
"files": {},
"form": {
"world": "hello"
},
"headers": {
"Accept-Encoding": "identity",
"Content-Length": "11",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "Python-urllib/3.7"
},
"json": null,
"origin": "111.194.50.110, 111.194.50.110",
"url": "https://httpbin.org/post"
}
利用 urlopen()方法可以实现最基本请求的发起,但这几个简单的参数并不足以构建一个完整的请求。如果请求中需要加入 Headers 等信息,就可以利用更强大的 Request 类来构建。所以现在我们引用了 Request 方法。实在是模拟请求,抓取数据的不二之选。
示例如下:
#Request
import urllib.parse
from urllib import request
from fake_useragent import UserAgent
url = 'http://httpbin.org/post'
headers = {
'User-Agent':UserAgent().random
}
dict = {
'name':'ccc'
}
data = bytes(urllib.parse.urlencode(dict),encoding='utf8')
req = request.Request(url=url,data=data,headers=headers,method='POST')
response = request.urlopen(req)
print(response.read().decode())
运行结果:
{
"args": {},
"data": "",
"files": {},
"form": {
"name": "ccc"
},
"headers": {
"Accept-Encoding": "identity",
"Content-Length": "8",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.15 (KHTML, like Gecko) Chrome/24.0.1295.0 Safari/537.15"
},
"json": null,
"origin": "111.194.50.110, 111.194.50.110",
"url": "https://httpbin.org/post"
}