python爬虫学习requests中的模块请求参数之一

一. 参数的全部概况

1. 下面是在python的源码中的所有的方法的参数,其中包括的方法有很多,但是我们不可能全部都用的上,在这里我就根据自己的经验,把我常常会用的一些参数来进行讲解.


def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How many seconds to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response

    Usage::

      >>> import requests
      >>> req = requests.request('GET', 'https://httpbin.org/get')
      >>> req
      <Response [200]>
    """

    # By using the 'with' statement we are sure the session is closed, thus we
    # avoid leaving sockets open which can trigger a ResourceWarning in some
    # cases, and look like a memory leak in others.
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)

二 .查询参数

1 .在这里我就随便举例了,例如,一个网站如下

https://pic.sogou.com/pics?query=%E9%A3%8E%E6%99%AF&mood=7&dm=0&mode=1

2.在这里这种的URL地址中。?前面是网址. 后面的二值性数就是查询参数

3.当然这种查询参数也是可以通过关键字参数来进行构建,例如,下面的两个代码的含义是一样的.


import requests
url = 'https://pic.sogou.com/pics?query=%E9%A3%8E%E6%99%AF&mood=7&dm=0&mode=1'
response = requests.get(url=url)
print(response.request.url)



import requests
url = 'https://pic.sogou.com/pics'  # ? 可加可不加
params = {
    
    
    'query': '风景',
    'mood': '7',
    'dm': '0',
    'mode': '1'

}
response = requests.get(url=url, params=params)
print(response.request.url)

4.当然你肯定也注意到了,这个query的值怎么改变了,其实他并没有改变,他只是在不同的地方有不同的叫法而已,这个是因为url编码: 默认在http协议中不支持中文字符, 会自动的经过url编码 他的组成部分:%字母数字.

5.他的方法如下

# 这只是一部分的代码,主要介绍的是方法
# requests.utils.quote('风景')  对指定的中文进行url编码
print(requests.utils.quote('风景'))
# requests.utils.unquote('%E9%A3%8E%E6%99%AF')  对指定的中文进行url解码
print(requests.utils.unquote('%E9%A3%8E%E6%99%AF'))

三. 请求参数(post发送的请求只能用代码来查看,直接在网页中搜不到)

1.当我们在发送post的请求时,我们经常会用到这个构建的方法,例如,我们在发送的是动态的数据的时候,这里我以肯德基为例吧.

2首先我们打开肯德基的官网,点击下面的餐厅查询,而后先打开开发者工具栏,点击Network,再点击Fetch/XHR 然后再到输入框中去输入你要查询地方的肯德基的门店的信息,点击搜索.打开返回的数据包.如图

在这里插入图片描述

在这里插入图片描述

3.Form Data 中的就是返回的请求参数的信息,我们用data来进行构建,下面是代码,代码是最直接的方式.

import requests
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx'
params = {
    
    'op': 'keyword'}
data = {
    
    
    'cname': '',
    'pid': '',
    'keyword': '黄石',
    'pageIndex': '1',
    'pageSize': '10'
}  # 构建请求参数

# data 关键字构建请求参数的关键字
response = requests.post(url=url, params=params, data=data)
json_data = response.json()
print(json_data)

四.proxies 参数的使用

1.这个是构建代理的参数,在我们后期的学习过程中,我们会面临IP被封的风险,这时候我们就要使用别人的IP来获取数据.


import requests
# 这个是获取IP的函数,你们运行不了,这里只是参考怎么使用
def get_proxy():
    url = 'http://zltiqu.pyhttp.taolop.com/getip?count=1&neek=13873&type=2&yys=0&port=2&sb=&mr=2&sep=0&ts=1'
    response = requests.get(url=url)
    json_data = response.json()
    # print(json_data)

    ip_port = json_data['data'][0]['ip'] + ":" + str(json_data['data'][0]['port'])
    # print(ip_port)

    proxies = {
    
    
        "http": "http://" + ip_port,
        "https": "http://"  + ip_port,
    }
    return proxies # 返回代理IP

proxies = get_proxy()
print(proxies)
url = 'https://www.ku6.com/index'
# proxies关键是使用代理请求的关键字, 代理质量不好的话就会报错(requests.exceptions.ProxyError)
response = requests.post(url=url, proxies=proxies)
print(response.status_code)
print(response.text)

2.这个就是代理的使用,在后面我也会去讲怎么找代理的IP.

猜你喜欢

转载自blog.csdn.net/m0_74459049/article/details/130763679