Python understands Chedi's comprehensive word-of-mouth data

Get into the habit of writing together! This is the 5th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .

All tutorials, source code and software in this article are for technical research only. It does not involve the deletion, modification, addition or interference of the functions of the computer information system, nor does it affect the normal operation of the computer information system. Do not use the code for illegal purposes, such as infringing and deleting!

Python understands Chedi's comprehensive word-of-mouth data

need

Understand the comprehensive word-of-mouth statistics of the whole series of Chedi modelsinsert image description here

Operating environment

  • win10
  • Google nexus5x(root)
  • Python3.9
  • Charles

demand analysis

First come to the web terminal to see if you can find the data interface you need, just find a car model and open the word-of-mouth page F12 to view the Network. insert image description hereAccording to the keyword search on the page, no obvious data interface can be found, although it is said that you can also use request or selenium to parse the data directly on the page. , but after all, this is not the preferred solution, it is still determined from the APP analysis. PS: The configuration of the mobile phone environment and the packet capture environment will not be repeated here. If you are interested, you can refer to the previous article APP capture environment configuration

Download the Chedi APP and install it on your mobile Please add image descriptionphone . Open Postern on your mobile phone and charles on your PC.

At this point, the packet capture work is ready, open the APP, and find a car model to enter the APP. It insert image description hereis still the old way to search for a wave of keywords on the page. It insert image description hereis obvious that the last two data are not needed, and the first four are the same interface. The returned data should be the required data. Double click to see the detailed data.

insert image description hereThe preliminary view is consistent with the page data. The data structure and specific values ​​are very similar to the data in the page. The Charles interface is too small. Copy the data to the web page for analysis, which is convenient for analysis. Share a common json data online analysis website. insert image description here Carefully compare the data on the page and find that this interface is the comprehensive word-of-mouth interface we need:

https://*******/get_detail/?series_id=4182&car_id=0&only_owner=0&year_id=all&iid=2467735824764398&device_id=40011211486215&ac=wifi&channel=dcd-yd-11zh-and-74&aid=36&app_name=automobile&version_code=693&version_name=6.9.3&device_platform=android&os=android&ab_client=a1%2Cc2%2Ce1%2Cf2%2Cg2%2Cf7&ab_group=3167590%2C3577236%2C3333988&ssmix=a&device_type=Nexus+5X&device_brand=google&language=zh&os_api=27&os_version=8.1.0&manifest_version_code=693&resolution=1080*1794&dpi=420&update_version_code=6931&_rticket=1648907286543&cdid=f3163204-7faf-45d7-89c4-e82215c3216c&city_name=%E8%81%8A%E5%9F%8E&gps_city_name=%E8%81%8A%E5%9F%8E&selected_city_name&rom_version=27&longi_lati_type=1&longi_lati_time=1648907102913&content_sort_mode=0&total_memory=1.77&cpu_name=Qualcomm+Technologies%2C+Inc+MSM8992&overall_score=4.873&cpu_score=4.8872&host_abi=
复制代码

right! You read that right, it’s just so long, verify the data interface, and request this url directly in the webpage. It is Please add image descriptionrecommended to install a plugin for json visualization of webpages. I’m lazy here, I parsed the json data online, and the data captured by Charles It is the same. After analysis, it is known that: series_id is the id of the car series. You can modify this parameter.

Get all vehicle IDs

Obtaining the car ID is very simple. First get the brand ID and then request the car ID according to the brand ID. Note that this is a post interface.

def get_series(self, brand_id):
    """
    获取品牌所有车系
    brand_id:品牌id
    """
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'}
    param = {
    'offset': 0,
    'limit': 1000,
    'is_refresh': 1,
    'city_name': '北京',
    'brand': brand_id
    }
    response = requests.post(url=url, data=param, headers=headers)
    rep_json = json.loads(response.text)
    # print(response.text)
    if rep_json['status'] == 'success':
    return rep_json['data']['series']
    else:
    raise Exception("get car series has exception!")
复制代码

Please add image description

Obtain the comprehensive word-of-mouth score of the car series

    def get_score(self, series_id):
        """
        获取车系综合评分
        series_id: 车系id
        """
        response = self._parse_url(url).json()
        tag_list = response.get('data').get('tab_info').get('tag_list')
        data = list()
        # 优点
        merits = [i.get('tag_name')+"("+str(i.get('count'))+")" for i in tag_list if i.get('sentiment') == 1]
        data.append(merits)
        # 缺点
        defects = [i.get('tag_name')+"("+str(i.get('count'))+")" for i in tag_list if i.get('sentiment') == -1]
        data.append(defects)
        return data
复制代码

running result

insert image description here insert image description here

Download

download.csdn.net/download/qq…


This article is only for learning and communication, if it is invaded, it will be deleted!

Guess you like

Origin juejin.im/post/7082779776011730981