Anti-crawling guide: “All or Nothing” scammers’ surprising tools for stealing user information

picture

Table of contents

What is a web crawler

Illegal theft by crawlers and anti-crawling on the platform

Full process anti-crawling solution

Verification code in the AI ​​era


"All or Nothing" is currently in theaters. The film tells the story of programmer Pan Sheng who was abducted to an overseas "company" by the lure of a high salary from an overseas online fraud team, and was forced to engage in fraud activities by Lu Bingkun and An Juncai. Finally, after helping Anna, a Chinese who was also deceived, to escape, The story of successful rescue through the cooperation between the Chinese police and the Foreign Immigration Bureau.

picture

In the film, Pan Sheng was attracted by overseas work, but accidentally fell into an overseas fraud factory. The first thing Pan Sheng was forced to do after joining the online telecommunications fraud company was to use crawler software to capture the email information of subtitle group members and send them online gambling links.

At the end of the film, Manager Lu, the leader of the fraud company, was sentenced to death. Anna was sentenced to two years in prison for committing fraud. Pan Sheng provided a list of more than 2,000 victims, which was a major meritorious performance and was exempted from criminal responsibility by the court.

picture


What is a web crawler

Web crawlers, also known as web spiders and web robots, are programs or scripts that automatically crawl network information and data according to certain rules. In layman's terms, web crawlers simulate human behavior, replacing human operations with programs, jumping from one link to the next, traversing web pages just like crawling on the Internet. Crawlers can jump, open, browse and other actions faster than humans, and the websites they browse are deeper, so they are called web crawlers.

picture

Web crawlers can illegally obtain information, pictures, reviews, and personal information on the Internet. The stolen data is not only used for commercial sales, but may also be used by black and gray industries to create fake websites and conduct phishing scams, causing significant economic losses to individuals and businesses.

picture


Illegal theft by crawlers and anti-crawling on the platform

The malicious crawling of web crawlers and the anti-crawling of platform websites are a dynamic attack and defense process, which can be roughly divided into three stages.

The first stage is to restrict IPs and accounts. Initially, the website's anti-crawling measures directly denied access that did not originate from browsers. When a malicious web crawler accesses, a 403 error response code will appear, or a "Sorry, cannot access" prompt will appear.

The second stage is verification code interception. In order to bypass the anti-crawling mechanism, the web crawler sets Headers information, simulates a browser, and performs large-scale malicious crawling of static pages in multiple threads. In response to malicious crawling behaviors, websites and platforms restrict and intercept accounts and devices that frequently change UserAgent (simulated browsers) and frequently use proxy IPs: when the same IP and the same device visit the website the number of times within a certain period of time, the system automatically limits It visits and browses; when a visitor visits too many times, the request will automatically jump to a verification code page, and the visit can only continue after entering the correct verification code.

The third stage is technical protection of dynamic web pages. Faced with the upgrade of anti-crawling technology, web crawlers have also been upgraded. The web crawler can automatically identify and fill in the verification code, bypassing the interception of secondary verification; use multiple accounts at the same time, configure IP proxy tools, and bypass the platform's restrictions on accounts and IP addresses. In response to changes in web crawlers, many websites and platforms use dynamic web page development technology. Based on dynamic web page technology, the URL address of the web page is not fixed, and the background interacts with front-end users in real time to complete user queries, submissions and other actions. Moreover, different pages will be generated when different users access the same URL address at different times. Compared with traditional static web pages, dynamic web pages effectively protect important data information and effectively curb malicious crawling behaviors of web crawlers.

In order to bypass the new anti-crawling measures, web crawlers use Selenium and Phantomjs technologies to completely simulate human operations. At this time, crawler attacks are becoming more intelligent and complex. Simply limiting the number of visits and encrypting the front-end page display cannot provide effective protection. It is necessary to improve human-machine recognition technology to intercept and identify black products and increase the cost of illegal activities. Dingxiang's full-process three-dimensional prevention and control measures effectively prevent malicious crawling behaviors to ensure the security of the website platform.

picture


Full process anti-crawling solution

Illegal theft by crawlers is becoming more and more intelligent and complex. Simply limiting the number of visits and encrypting front-end page display cannot provide effective protection. It is necessary to improve human-machine recognition technology to intercept and identify black products and increase the cost of illegal activities. Dingxiang's full-process three-dimensional prevention and control measures effectively prevent malicious crawling behaviors to ensure the security of e-commerce websites.

Regularly test the operating environment of the platform and App, reinforce the security of the App and client, and encrypt communication links to ensure end-to-end security of the entire link. At the same time, the Dingxiang defense cloud, risk control engine and intelligent model platform are deployed to build a multi-dimensional defense system.

Dingxiang risk control engine realizes effective identification of malicious "crawler" behaviors based on requests from business query scenarios, device fingerprint information collected by the client, and user behavior data. Based on security prevention and control strategies, it can effectively identify malicious crawling behaviors. and interception. . Based on changes in business, crawling risks and anti-crawling strategies, Dingxiang Intelligent Model Platform helps enterprises build exclusive risk control models and realize real-time changes in security policies, thereby effectively intercepting various malicious crawling risks.

picture


Verification code in the AI ​​era

Verification codes are an important technology to prevent data theft, and therefore have become an important target for black and gray products to crack. Dingxiang verification code is based on verification of environmental information for defense, and provides double security by producing endless verification pictures + providing verification of environmental information.

First of all, the top image verification code based on AIGC technology can continuously obtain new verification pictures, which greatly increases the cost of identifying and cracking black and gray products, and greatly increases the difficulty of identifying verification elements. Based on deep learning and neural networks, it generates some pictures and elements that are difficult to predict and repeat, and adds dynamically changing factors such as timestamps or random numbers during the verification process to increase the difficulty of cracking and effectively resist machine cracking.

Secondly, Dingxiang CAPTCHA integrates real-time flow calculations and scenario strategies, combined with machine learning-trained human-machine models and correlation analysis of historical data. Through graphical algorithms and AI models, it conducts machine learning modeling of user-generated behavioral trajectory data, combined with access Multiple dimensions of information such as frequency, geographical location, and historical records can quickly and accurately return human-machine judgment results. Collect identifiable environmental information during the verification process of the verification code, configure rules and strategies, and screen out requests that may be black and gray for secondary verification or interception. For example, determine whether the verification environment information when the verification is completed is consistent with the verification environment information when the token is reported, intercept IP addresses that have been attacked multiple times, limit the number of verification code inputs, etc.

Guess you like

Origin blog.csdn.net/dingxiangtech/article/details/133134180