Tencent senior: 9 difficult problems in software testing (part 1)

Foreword:

For software testing, how is the test enough? How to evaluate the effectiveness of the test? So many test cases, how to delete in the future? There will be a lot of problems in software testing. Ali researcher Zheng Ziying shared 18 others The problems and related opinions summarized, I hope to inspire students.

When I was in my last company more than ten years ago, I saw an internal website with a list of Hard Problems in Test. There were about 30 or 40 problems on it, which were provided by test students in various departments. But unfortunately, the list was lost. I regret that I didn't save a copy. Later, I wanted to find that list many times, because the questions listed above pointed out the huge development space of the testing profession in its own professionalism. The problems on that list convinced me at the time that the difficulty of software testing itself is no less than that of software development, and it may even be a bit more difficult.

If you want to rebuild such a Hard Problems in Test list today, I will add the following problems to this list. Well, let’s not say much. Next, let’s take a look at the 9 software tests we will talk about today. problem.

text:

1. Test Adequacy

How to answer "Is the test enough?" (including test new and test old). Code coverage is the starting point for measuring test adequacy, but it is far from the end. To answer "Is the test enough?", at least consider whether all scenarios, all states, all state transition paths, all event sequences, all possible configurations, all possible data, etc. have been tested. Even so, we may not be 100% sure that we have tested enough. Maybe we can only get very close to the test in the end.

2. Test validity

How to evaluate the ability of a set of test cases to find bugs. Effectiveness (the ability to find bugs) and sufficiency (testing enough or not) are two orthogonal attributes. Evaluating the validity of test cases can be carried out through positive analysis, for example, analyzing whether the test cases have verified all the data that was dropped by the SUT during the test process. A more versatile approach is mutation testing (Mutation Testing), that is, injecting different "artificial bugs" into the code under test, and counting how many can be perceived by the test case. At present, we have implemented large-scale engineering for mutation testing. The follow-up work focuses on: 1) How to prevent passivation (or “insecticide effect”), 2) Not only inject the tested code, but also More comprehensive injection of configuration and data.

Three, test case slimming

There used to be a saying in the advertising industry: I know that half of the advertising costs are wasted, but I don't know which half is wasted.

Software testing also has a similar confusion: so many use cases, it takes so much time to run, I know there is a lot of time wasted, but I don't know which time is wasted. The forms of waste include:

Redundant steps: some are wasted on some repeated steps, each use case must do some similar data preparation, and each use case must perform some intermediate processes (so that it can move to the next step).
Equivalence: For a payment scenario, should I test it in all countries, all currencies, all merchants, all payment channels, and the permutation and combination of card groups? Such a test is too expensive. Otherwise, I am worried that a certain merchant in a certain country may have a certain logic that I have missed. For specific businesses, human flesh analysis can also be conducted. Is there a more general, complete and reliable equivalent class analysis technique?
I have N use cases. I guess there may be M use cases in these N use cases. Even if these M use cases are deleted, the rest The effect of NM use cases is the same as that of the previous N use cases. How to identify whether there are such M use cases, and if so, which M are.
I participated in an internal review to promote the quality line to P9. At that time, a judge asked the student a question: "So many test cases, how do you delete them in the future?" This question may seem simple, but it is actually very difficult. I think that, in principle, if the measurement of test adequacy and test effectiveness is done very well and the measurement cost is very low, we can delete use cases through a lot of continuous attempts. This is an engineering idea, and there may be other theoretical derivation ideas.

Four, test stratification

Many teams will wonder whether to do full-link regression and to what extent. The core point of this question is: Is there a way to do it? As long as the boundary between the systems is agreed well and complete enough, it is possible to change the code of a system without integrating with upstream and downstream systems. For testing, as long as you verify your own code in accordance with the boundary conventions, you can ensure that there is no regression.

Many people, including me, believe that it is possible, but they can neither prove it nor dare not to run integration at all in practice. We also lack the successful experience that can be fully replicated, and we lack a complete methodology to guide the development team and QA team on what to do to achieve regression without integrating upstream and downstream.

Sometimes, I feel like I am a citizen of Gothenburg, constantly walking around, trying to find a one-time and non-repetitive route across the 7 bridges. But maybe one day, someone like Euler will appear in front of me, and tell me with theoretical proof that it is impossible.

5. Reduce analysis omissions

Analysis omission is the cause of many failures. When developing department points, there was a corner case that was not considered or handled. I forgot to consider a particular scenario when doing the test score. Compatibility evaluation, there is no compatibility problem under evaluation, but the result is there. And many times, analysis omissions belong to unknown unknowns, I don't even know I don't know. Is there a set of methods and techniques that can reduce analysis omissions and convert unknown unknowns into knowns?

Six, use cases are automatically generated

Fuzz Test, Model Based Test, recording and playback, Traffic Bifurcation (drainage), etc. are all means of automatically generating use cases. Some are relatively mature (such as single system recording and playback, drainage), some multiple teams are exploring (such as Fuzz), and some have not been successfully practiced on a large scale (such as MBT). We have also explored how to use NLP to generate use cases from PRD. In automatic use case generation, sometimes the difficulty is not to generate test steps, but how to generate test oracle. Anyway, the automatic generation of test cases is a very large area, and there are still many things that can be done in the future in this direction.

Seven, automatic troubleshooting

Including online and offline. For more elementary problems, automatic troubleshooting schemes often have two limitations. First of all, the scheme is not universal enough, and more or less customized. Secondly, it is more dependent on manual accumulation rules (called "expert experience" in a nice way), which is mainly achieved by recording and repeating the steps of human flesh investigation. However, every problem is not exactly the same. If the problem changes slightly, the previous troubleshooting steps may not work. Now there are some technologies, such as automatic comparison of call links, which are very helpful for troubleshooting and automatic defect location.

Eight, automatic defect repair

Ali's Precfix, Facebook's SapFix, etc. are currently well-known industrial practices. But in general, the existing technical solutions have limitations and shortcomings of one kind or another. This field is still at a relatively early stage and there is still a long way to go.

Nine, test data preparation

An important design principle of test cases is: there should be no dependencies between test cases, and the execution result of one test case should not be affected by the execution results of other test cases (including whether it is executed). Based on this principle, the traditional best time is to ensure that each test case should be self-sufficient: the background processing flow that needs to be triggered by a use case should be triggered by the use case itself, and the test data required by a test case should be prepared by itself. and many more. However, if the test data needed for each use case is prepared from scratch by itself, the execution efficiency is relatively low. How can it not violate the principle of "there should be no dependencies between test cases" while reducing the preparation time of test data?

What I envision is a more complete data bank. After each test case is executed, the data generated by it will be handed over to the data bank. For example, a member who has passed KYC and has tied a card in a certain country, a transaction that has been successfully paid, A merchant that has completed the sign-in process. At the beginning of the next test case, the data bank will be asked: "I want a merchant that meets such conditions, do you have any". The merchant that ran out of the last use case happens to meet the conditions, and the data bank will "lend" the merchant to this use case. And once it is lent, the merchant will not be lent to other use cases until it is returned.

After a period of operation, the data bank can learn what kind of data each test case needs and what kind of data will be generated. This knowledge is obtained through learning and does not require human flesh to add descriptions, so it can also be applied to the stock use cases of the old system. With this knowledge, data banks can achieve two optimizations:

After a test execution batch starts, the data bank will see what kind of data is needed for the later use cases in the batch and prepare it in advance. In this way, when the use cases are executed, the data bank will have the eligible data ready.
According to what kind of data each test case needs and what kind of data will be generated, the data bank can reasonably arrange the execution sequence of test cases, maximize the reuse of test data, and reduce the amount of test data and preparation overhead .

When the test bank "lends" test data to use cases, there can be many different modes. It can be exclusive or shared. Shared can also specify shared read, shared write, or both read-only and cannot write (for example, a merchant can be used by multiple use cases to test the payment and settlement scenario of an order, but these use cases cannot modify the merchant itself, such as re Signed).

If resources such as switches and timing tasks are also managed by the data bank as a kind of generalized test data, the test cases can be executed in parallel as much as possible. For example, there are N use cases that need to modify a switch value. If these N use cases are executed in parallel, they will affect each other, and they should be executed serially. But any one of the N use cases can be executed in parallel with use cases other than these N use cases. The data bank has mastered the details of the usage patterns of various resources for each use case, plus data such as the average running time of each use case, and can arrange a batch of test cases in the most optimized and accurate manner. Parallel is as parallel as possible, and cannot be guaranteed to be non-parallel, and it is also possible to continuously adjust the layout of the remaining unexecuted use cases during the execution of a batch.

Such a data bank is universally applicable, and the difference between different businesses is nothing more than the specific business objects and resources. These differences can be realized in the form of plug-ins. If there is such a general data bank, it can be easily adopted, and the testing efficiency of a large number of small and medium software teams can be significantly improved. The idea of ​​such a more complete data bank is just an idea so far, and I have never had the opportunity to practice it.

Written at the end:

When you can't hold it, you can say "I'm so tired" to yourself, but never admit to saying "I can't" in your heart. Don’t choose comfort at the age when you should struggle most. There is nothing to say, nothing is the reason for struggle. We try to grow up, stumble all the way, and then all over the body, one day, you will stand in the brightest place and live what you once was The look of desire.

Do not forget the original intention, continue to persevere, I believe you will eventually bloom a flower of your own.

You can follow my WeChat public account: Program Yuanyifei, I look forward to communicating and learning with you.

Guess you like

Origin blog.51cto.com/15086761/2634512