Information Security-Application Security-Customized White Box Detection | Unauthorized Vulnerability Governance Sharing

Table of contents

1. Background

2. Challenges

3. Governance objectives

4. Solutions

4.1 System Architecture

4.2 Authentication function

4.3 Alarm identification

4.4 Authentication points

5. The future direction of white box detection

6. Ultra vires governance

7. Summary


1. Background

In the field of vulnerability scanning, the mainstream scanning methods are divided into black-box scanning and white-box scanning. Source code security detection, that is, white-box scanning, is a very important part of the security development process (SDLC). Most of the traditional vulnerabilities can be detected by means of tools, but for unauthorized vulnerabilities, tools are more difficult to solve. After empowering black box tools, Wuheng Labs also tried to explore customized scanning for white box tool empowerment. The following will share with you the ideas of Wuheng Labs using white box tools to manage unauthorized vulnerabilities.

2. Challenges

Wuheng Lab sorted out the historical unauthorized vulnerabilities and found that the types of businesses involved are complex. At the same time, different businesses have different authentication models, and there are also multiple authentication methods (function code, middleware, gateway plug-ins, upstream and downstream RPC, etc.) ). Therefore, white-box scanning will face the following challenges when managing business unauthorized vulnerabilities:

First: For authentication in the code, the white box needs to know which authentication functions or authentication codes are, and then determine whether there is a corresponding authentication function in the function call chain of the API. For authentication in the gateway or authentication in the upstream and downstream, the traditional white box cannot be obtained, and must be customized and developed. As a middle-end tool, the white box is not suitable for adding too many customized authentication logics of specified services inside the tool. Therefore, the business SDLC team of Wuheng Lab develops a white-box engine that integrates all parties by optimizing the middle-end white box engine. Data system to improve white-box detection capabilities.

Second: Most of the code warehouses in many businesses are RPC warehouses. RPC warehouses map RPC functions to HTTP interfaces by registering APIs in gateways, and when there are many business gateways, there are also large gateway nesting The case of small gateways. Due to the gateway, the API identification capability and taint tracking capability are limited to a certain extent.

Third: Some interfaces may be some public interfaces or interfaces that do not require authentication. The traditional white box cannot perform semantic analysis to determine whether the interface needs authentication.

3. Governance objectives

The governance goal of Wuheng Labs is to develop a system that integrates business data from all parties to determine whether an API is an alarm. The system judges whether the API has a risk of over-authorization by identifying whether the API should be authenticated, whether the authentication function is called, and whether the authentication function is adjusted. Pay attention to a few key words here: First, should it be authenticated? This means that the system needs to have semantic analysis capabilities to analyze whether the API is a public interface or an interface that does not require authentication. Second, whether there is an adjustment or not, the system needs to know all the authentication function data. Third, is there any correct authentication? After the system has API authentication data, the system should have rules to verify which authentication functions the API should call to prevent incomplete authentication (for example, two authentication functions A and B should be called, but actually only The A authentication function is adjusted) leading to an unauthorized vulnerability.

Here we explain the meaning of the authentication function. The authentication function defined here is a broad authentication function, which includes a variety of (eg. authentication functions in the code, authentication middleware in the code, gateway authentication plug-ins , authentication functions in upstream and downstream RPC warehouses, etc.).

4. Solutions

4.1 System Architecture

This chapter will describe the architecture of the system in detail, as shown in the figure below, the first is the API function name identification module, which will collect API data of all gateways. For some warehouses that do pan-parsing in the gateway and real API mapping in the code warehouse, the system provides script plug-ins to crawl the API data of the warehouses, and the plug-ins will be executed before each scan. The union of these three modules is Complete data for the API corresponding to each repository.

4.2 Authentication function

Wuheng Lab will assist business R&D personnel to sort out the authentication function and mark the authentication function scene. After sorting out, the authentication function will be imported into the platform, and the scanning interface of the white box in the middle platform will be called for scanning.

When scanning, the white box engine will check whether there is a given authentication function name in the function call chain of the API, and finally generate a corresponding relationship table between the API and the authentication function. Not only that, but the middle-end white box is also constantly iterating, abstracting some common rules applicable to each business line, these rules generate some risk labels, marking whether the API is risky or performing authentication on the entire link. This part is due to some general rules, so there are relatively few tags currently generated and may be less applicable to specified services. After scanning, the middle platform white box will send the data back to the platform, and the platform will add the data to the authentication function data of the API (the sources are respectively white box_authentication function, white box_risk label). Then the alarm identification module of the platform will perform alarm identification.

4.3 Alarm identification

Next, let's talk about the judgment logic inside the alarm recognition engine in detail, which includes 5 modules.

The first is the public interface module, which identifies whether the current interface is a public interface. If it is a public interface, an authentication function named public interface is added to the authentication function data of the API, and the source is set to public_api.

The second is an empty implementation module. During the operation process, it is found that some interfaces have been written in the master code in advance, but the code inside is empty. For this part of the interface, because the code has not been implemented, there is no need for alarm recognition, so this module It is to judge whether the current interface is an empty implementation based on the function call stack quantity data provided by the middle platform white box. After R&D implements this interface, the function call stack data will change. At this time, the interface needs to be activated to perform alarm identification again. The first two modules are mainly used to determine whether authentication is required.

The third module is the gateway authentication plug-in module, which will pull the authentication plug-in data of the API on the gateway after each scan of the code warehouse, and add the authentication plug-in data to the authentication function data of the API , the source is set to the gateway plugin.

The fourth module is the RPC authentication module. Wuheng Lab has developed a system by itself. As long as the upstream and downstream authentication functions are passed to the system, the system will return whether the API calls the specified authentication function in the upstream and downstream. The platform The system will be invoked at each scan, and the returned data will be stored in the platform, and the source is set to rpc. The two modules in the middle are mainly to solve the problem of whether there is a test.

The last module is the multi-step authentication module, which will abstract many customized detection plug-ins according to the business characteristics to verify whether the authentication is correct or not. There are many detection plug-ins under this module, and each detection plug-in supports security operators to customize and configure detection rules according to the characteristics of the business. There are many detection plug-ins on the platform: detection plug-ins based on http_path, detection plug-ins based on file_path, detection plug-ins based on input parameters, etc.

(1) Detection plug-in based on http_path. In most cases, the application has a management background, and the path prefix of the corresponding http interface is generally fixed, such as /admin/xxx. For the interface of the management background, it is necessary to check whether the current user is the administrator of the current application. If there is no verification, there may be a risk of vertical overreach. Based on this rule, the detection idea of ​​the system is to check whether the interface under /admin/xxx exists Adjust the authentication function of the role. If it is not adjusted, it is judged as risky. To give a practical example, the security operation personnel configured a rule on the http path-based plug-in of the system: all interfaces under /work/ in Warehouse A need to call the authentication function to determine whether they are administrators. After scanning, it is found that the c interface (http path is /work/c) under A warehouse does not call the authentication function, and the interface is identified as an alarm.

(2) A detection plug-in based on input parameters. This detection plug-in needs to call a specific authentication function if the API has a specified parameter name. For example, in a cross-space scenario, if there is a project_key in the API parameter, you need to call the gateway authentication plug-in named project_detect. If it is not called, it will be recognized as an alarm.

There are many detection plug-ins for the platform, so this article will not repeat them. In short, the platform provides a series of detection plug-ins, and security operators can customize detection rules for specific warehouses as needed.

4.4 Authentication points

As mentioned above, the alarm recognition engine is composed of multiple parts. The interaction between the results of different parts will definitely interfere with the alarm recognition. Therefore, a concept of authentication points is introduced here. The authentication points are composed of various data and It is calculated by summing the corresponding proportional coefficients. The system has a default authentication standard score and a proportional coefficient. The system supports custom configuration of the authentication standard score and proportional coefficient for the specified warehouse. If the total authentication score of the API is lower than the authentication standard score of the warehouse, it is considered that there is a risk of overreach , the scaling factor of each module is constantly adjusted to improve accuracy. At present, the default authentication standard of all warehouses is divided into 1, and the ratio system of all authentication functions is 1. This means that as long as the API has any authentication function, it is an alarm.

For example, for example, the authentication standard of Warehouse A is divided into 1, and the proportional coefficient corresponding to its public interface is also 1. If interface a does not call any authentication function but is judged as a public interface, the total authentication score of interface a is 0 (number of authentication functions) * 1 (authentication function proportional coefficient) + 1 (whether it is a public interface) *1 (Public interface proportional coefficient) = 1, greater than or equal to the authentication standard score of 1, it is judged as non-alarm.

The authentication score of the rules in the detection plug-in of the multi-step detection module is also an important part of the total API authentication score. The authentication score here is generally a negative number, that is, if the conditions are not met, the total authentication score will be reduced. Still taking the above role authentication rules as an example, the security operation has written a rule: A2 needs to be called under the /work/ path of warehouse A (A2 is the internal logic for role authentication) authentication function, and the authentication score corresponding to the rule Configured to -5. If the /work/c interface of Warehouse A calls the A1 authentication function but does not call the A2 authentication function, the authentication of this interface is divided into 1 (the A authentication function is adjusted) + (-5) (the A2 authentication function is not adjusted) weight function)=-4, which is less than the authentication standard score of the current warehouse by 1, so it is judged as an alarm. The method to develop and fix the vulnerability is to add an A2 authentication function to the c interface. After the repair, the /work/c interface authentication score becomes 1+1 (the two authentication functions A1 and A2 are dropped) = 2, which is greater than or equal to the current warehouse The authentication standard score is 1, so it is judged as non-alarm.

Next, an action for safe operation after an alarm is generated is described. After the alarm is generated, the security operation personnel will not directly submit the security work order. At the current stage, the SDLC docking personnel of each sub-business line will operate the alarm, and submit the security work order after confirming the risk. In order to help security operations, the system also integrates test traffic into the system and associates APIs with test traffic.

The traffic and authentication function data corresponding to the API are the reference standards for security operators to judge risks. The authentication function here is still a generalized authentication function, including authentication functions, authentication middleware, public_api, gateway authentication plug-ins, upstream and downstream rpc authentication functions, etc. At the beginning, the operation action of security operators was to find the corresponding interface according to the origin and referer in the traffic, and then conduct tests. Later, everyone brainstormed and developed two plug-ins linked with burp. Traffic is sent to burp, and then security operators conduct manual testing. The other is the automated version, which not only sends alarm traffic to burp, but also configures tenant pools and role pools. As long as basic configurations are performed, unauthorized scanning can be performed in batches. These two plug-ins greatly improve operational efficiency.

5. The future direction of white box detection

Next, we will continue to optimize the white-box customized unauthorized detection system from the following three directions.

  1. To improve the accuracy rate, the system not only performs similarity mining and analysis on alarm and false alarm data, but also uses AI methods to explore and reduce false alarms. Gray boxes are also combined to reduce false positives. The focus is on the gray box. The gray box has developed an interface for us to customize. Through this interface, we can query which parameters of the API can actually affect the id parameter of the underlying SQL statement. If the parameters in the alarm cannot affect the SQL statement, the system considers it not an alarm (the situation of reading redis and mq is not considered for the time being). At this time, you may have doubts, and there may still be false positives. For example, the interface needs to perform role authentication. Even if the function does not call the role authentication function, it is judged as a non-alert. The concept of authority points has many factors that act on an API. The system defaults to 1 for the ratio coefficient of gray box parameter data. Since the authentication standard score of all warehouses also defaults to 1, this means that by default, the gray box If it is considered that authentication is not required, it is a non-alert. For APIs that require role authentication, security operators will add authentication plug-ins to the multi-step detection module of the system to balance the authentication points, so that the alarm judgment logic is correct.

  2. At present, the multi-step authentication detection capability of the system is not perfect. Next, we will enrich the multi-step authentication detection capability, deeply analyze the authority model of each business line, and cooperate with business research and development, by promoting interface marking and default mandatory Authentication and other schemes to ensure that the multi-step detection capability is implemented in each business and minimize false positives.

  3. For some key business lines, the system now uses robots to notify new alarms in the group. One of the improvement actions that is being done recently is to check points during the pipeline code release stage. If there is a white-box unauthorized vulnerability during release, the business needs to be repaired before it can be released, so as to achieve the goal of shifting the vulnerability to the left.

6. Ultra vires governance

Here we introduce the concept of a capability maturity (CMM, Capability Maturity Model) level. The authority maturity scheme is referred to as CMM level, which includes 5 levels:

  • CMM 0 means no authority control.

  • CMM1 means that there is no unified permission component, and developers can write permission logic by themselves according to their needs.

  • CMM2 means that there is a unified permission component, and each developer needs to manually write code to call the permission component. If the permission component does not meet the development requirements, then add an authentication function to the permission component.

  • On the basis of CMM2, CMM3 defaults to mandatory authentication for all interfaces, and manually adds white if authentication is not required.

  • On the basis of CMM3, CMM4 uses other systems to cross-verify whether the authentication is correct.

Why introduce this concept, because after scanning and operating the system for a period of time, it was found that some warehouses were not suitable for system scanning because the authentication function was too scattered, so Wuheng Lab defined the applicable scope of white box scanning as CMM2 and CMM2 and above, for those with low CMM levels, we give priority to the authority management from the architecture level, and upgrade to CMM2 and above, otherwise it will be a case of digging and repairing one by one. Level governance is the key. After the upgrade, the white box ultra-authorized customized detection is used to check for leaks and fill in the gaps, and the bugs are stuck and repaired before going online. This can not only improve the security awareness of research and development, but also shift the security loopholes to the left, achieving a win-win effect.

At present, most business lines are connected to white-box scanning. After a period of data tracking, it is found that the number and proportion of unauthorized vulnerabilities discovered through white-box scanning are the highest among all discovery methods.

7. Summary

The whole system is a set of detection rules developed by Wuheng Laboratory in combination with business characteristics. During the development process from 0 to 1, many detours and many pitfalls have been taken. The whole scheme still has many points to be improved. Multi-step identification There are still many pitfalls to step on in the right direction, and we will continue to iterate and optimize the system at a high speed. If you have a good idea, you are welcome to join Wuheng Lab and work with us to develop customized unauthorized detection.

Wuheng Lab is a professional offensive and defensive research laboratory composed of ByteDance senior security researchers, dedicated to escorting ByteDance's products and businesses. Through vulnerability mining, actual combat drills, crackdown on black production, emergency response and other means, the company's basic security and business security levels are continuously improved, and the impact of security incidents on the business and the company is minimized. Wuheng Lab hopes to continue to share research results with the industry to help companies avoid security risks, and also hopes to cooperate with industry peers to contribute to the development of the network security industry.

Guess you like

Origin blog.csdn.net/philip502/article/details/131501397