Unstable Automated Check, Really?不稳定的自动检查，真是这样吗？

Does the Automated Check Fail Regularly?自动检查是否定期失败？

Increase the Timeout? 增加超时时间？

Add a Retry? 增加重试？

Should the Support Team Acknowledge Inconsistent Failures?支持团队是否应该承认不一致的失败？

One More Thought 另一个想法

What have we learned?(我们学到了什么？)

关于作者

原链接

翻译内容

Ugh, I am so frustrated after today. The very fact that I am sitting here the night before my holiday break writing this blog post should indicate my level of frustration. Basically, one of my managers asked me to turn off one of my automated checks until it is more stable.

呃，今天过后我会很沮丧。我是在假期休息前一天晚上坐在这里写这篇博文的，这个事实表明我的沮丧程度。基本上，我的一位经理要求我关闭其中一项自动检查，一直关闭到它更稳定为止。

Although I’ve heard such silly comments before, after about 20 minutes of discussion, I realized there was no convincing him that turning off any automated check, at any point in time is never a good idea. As a result, I am here, at night, venting the bitter rage of my wounded heart upon the web.

虽然我之前听过这样愚蠢的评论，经过大约20分钟的讨论后，我发现无法说服他，在任何时候关闭任何自动检查并不是一个好主意。结果，我在这里，晚上，在网上发泄受伤的内心的愤怒。

So, what is the Problem?(问题是什么？)

Stress concept Our group has a dedicated team that is responsible for making sure our applications stay up 24/7. This is basically a team of 2 people: an Ops person and another Engineer who is rotated weekly. Whenever this team receives production failure notices, they must look into them, figure out if they are critical to our applications and either take action to fix the problem or contact the application’s Subject Matter Expert (SME). Often, this team needs to wake up in the middle of the night and even on weekends to look at some production failures. The job sucks!

我们组拥有一支专门的团队，来负责确保我们的应用程序全天候运行。这基本上是一个由2人组成的团队：一个Ops人员和另一个每周轮换一次的工程师。每当该团队收到生产环境中的失败通知时，他们必须查看它们，确定它们是否对我们的应用程序至关重要，并采取措施解决问题或联系应用程序的主题专家（SME）。通常，这个团队需要在半夜醒来，甚至在周末醒来，看一些生产环境中的失败。这样的工作糟糕透了！

I have an automated functional GUI check that runs every single hour to make sure that a user can login to our application. This automated check pulls up our website, logs in, and then validates that the application successfully loaded. One night, this test failed and the support team needed to wake up at 2am to take a look and make sure that our application was working. When they manually reproduced the steps, the application was fine and there were no problems. Therefore, the core set of issues my manager has with this automated check is that (and I’m paraphrasing):

我写了一个自动化功能方面的GUI检查，每小时运行一次来确保用户能够正常登录应用。这个自动化检查拉取网页，登录，然后验证成功加载处理的应用。一天夜里，这个测试失败了，支持团队需要在凌晨2点醒来看一看，确保我们的应用程序正常运行。当他们手动复制步骤时，应用程序一切正常，没有问题。因此，我的经理在这个自动检查中遇到的核心问题是（我在解读的）：

It’s unstable(不稳定)
It fails all the time(总是失败)
It’s unfair that someone has to wake up in the middle of the night to find out that there is no issue(有人不得不在半夜醒来却发现没有问题，这是不能接受的)

Hence, the automated check needs to be turned off!

这样，自动化检查需要被关闭

Unstable Automated Check, Really?不稳定的自动检查，真是这样吗？

As a result of these ludicrous accusations, I went digging into the automated check failure to figure out the root cause of the issues. It was actually pretty simple to figure out the problem. My automated checks run in a sexy technology stack which involves BrowserStack cloud through Selenium Webdriver. Thanks to BrowserStack, every functional GUI test has a video recording with Dev Tools open, text logs, and screenshot logs. My first step was to look at the video to see why the test actually failed.

由于这些荒谬的指责，我开始深入研究自动检查的失败，想找出问题的根本原因。找出问题实际上非常简单。我的自动化检查在一个性感的技术堆栈中运行，该堆栈涉及通过Selenium Webdriver的BrowserStack云。由于用了BrowserStack，每个功能GUI测试都有一个视频录制，包括Dev Tools打开，文本日志和屏幕截图日志。我的第一步是观看视频，了解测试实际失败的原因。

Upon further analysis, I saw that there was an iframe that took 26 seconds to load! Obviously, the automated check was waiting for the login fields to show up so that the test could proceed. But because the iframe took so long to load, the test threw an exception.

进一步分析，我发现一个iframe框需要26秒才能加载出来。陷入，自动化检查需要等待登录框出现才能继续测试。但是由于iframe框加载时间过长，测试抛出来了一个exception异常。

My question for you我的问题是:

Is this the fault of the test for being “unstable” or is this the fault of the application for having these intermittent problems?

这是“不稳定”测试的错，还是应用程序出现这些间歇性问题的错？

My argument is that this is not the fault of the check, but rather a flaw in the application. The application does not function in an efficient manner, so the check failed. I don’t care if this automated check fails once in one hundred test runs, it’s still an issue with the application. Just because we cannot reproduce it manually doesn’t mean we should turn off the automated check. In fact, as the images so efficiently captured by my check indicate, I have conclusive evidence showing the problem within the application.

我的看法是，这不是检查的错，而是应用程序中的问题。应用程序无法以有效的方式运行，因此检查失败。我不关心这种自动检查在一百次测试运行中是否失败，这仍然是应用程序的问题。仅仅因为我们无法手动重现它并不意味着我们应该关闭自动检查。实际上，由于我的检查如此有效地捕获的图像表明，我有确凿的证据表明应用程序中存在问题。

The crucial question is(关键问题是):

Should an application take this much time to load?

应用程序是否需要花费这么多时间来加载？

If the answer is yes, then our team has no business in Software Development for producing such slow applications and actually thinking that they’re acceptable. I don’t know any application in the world that takes this long to load. Furthermore, the user encountering this slow load time may have been trying the application for the first time. And which users in today’s world are going to sit around and wait for 26 seconds for an iframe to load? Very few.

如果答案是肯定的，那么我们的团队在软件开发方面没有这样的业务，来推出这种慢速的应用程序，并且认为这些是可以接受的。我不知道世界上还有需要这么长时间才能加载的应用程序。此外，用户遇到这种缓慢的加载时间可能是在第一次尝试该应用程序。当今世界的哪些用户会坐下来等待26秒来加载iframe？非常少吧。

Even worse, this could have been an actual paying client, in which case, this could have resulted in complaints, frustration, and more lost resources. Either way, this hiccup amounts to a loss of income, which is not what any company wants.

更糟糕的是，这可能是一个真正的付费客户，在这种情况下，这可能导致抱怨，沮丧和更多的用户流失。无论哪种方式，这种问题相当于收入损失，这不是任何公司想要的。

Given all these considerations, it’s pretty evident that this load time is not acceptable. It’s a bug that needs to be fixed!

考虑到所有这些因素，很明显这个加载时间是不可接受的。这是一个需要修复的bug！

Does the Automated Check Fail Regularly?自动检查是否定期失败？

Because of a well constructed automation framework, it is extremely easy to figure out how “flaky” an automated check is. I just looked in the cloud to see all of the test run sessions for that specific test and then figured out how many of those failed.

由于构建良好的自动化框架，很容易弄清楚自动检查的“不稳定”程度。我只是在云中查看该特定测试的所有测试运行会话，然后找出其中有多少失败。

Overall, it failed about once every 8 runs. I would definitely prefer that this automated check didn’t fail so often. Considering the evident instability of the application, there’s not much that I could do in this case. Two possible, but ultimately inadequate, solutions come to mind.

总的来说，它每8次运行失败一次。我绝对希望这种自动检查不会经常失败。考虑到应用程序的明显不稳定性，在这种情况下我无能为力。最终我想到了两种可能，但不充分的解决方案。

Increase the Timeout? 增加超时时间？

Some automation engineers might suggest increasing the time it takes for the application’s iframe to throw an exception. Initially, that seems like a valid solution, except that it’s not.

一些自动化工程师可能会建议增加应用程序的iframe引发异常所需的时间。最初，这似乎是一个有效的解决方案，或许不是。

Rather than fixing the functionality of the application to make it more stable, this solution simply covers up its failures by extending the time allowed for the application to load. Is 25 seconds not acceptable for a single page element to load? In today’s high speed internet world, this should be more than enough.

该解决方案不是修复应用程序的功能以使其更稳定，而是通过延长应用程序加载所允许的时间来简单地处理失败。加载单个页面元素需要25秒这是否是可接受的？在今天的高速互联网世界中，这应该是绰绰有余的。

Add a Retry? 增加重试？

Another proposition that I heard was to add a retry before throwing the exception. Again, this does nothing for fixing the functionality of the app, but again covers it up by creating an unrealistic use case. I would argue that very few real world users would ever reload a page if it is taking too long to come up. They would rather save their 25 seconds and go to another site that isn’t wasting their valuable time. So, why would I create a retry of the check so that it passes more frequently?

我听到的另一个主张是在抛出异常之前添加重试。同样，这对于修复应用程序的功能没有任何作用，但是再次通过创建一个不切实际的用例来掩盖它。我认为，如果需要很长时间才会出现，很少有真正的世界用户会重新加载页面。他们宁愿保存25秒，然后去另一个不浪费宝贵时间的网站。那么，为什么我会创建一个重试检查，以便更频繁地通过？

But what if a user did try again and reload the page, I know I do that sometimes. Still, this is not the kind of user experience that we want to be creating for our end users. Can you imagine if all your browsing sessions included constantly pressing the Refresh button on your browser? How bad of an experience would that be?

但是，如果用户再次尝试并重新加载页面，我知道我有时会这样做。尽管如此，这并不是我们希望为最终用户带来的那种用户体验。您能想象如果所有浏览会话都不断按下浏览器上的“刷新”按钮吗？这会有多糟糕体验？

It happened to me recently. I was trying to shop on Amazon on Black Friday and the app was super slow. I knew that they were probably under an insane load, so I refreshed the browser to give them another chance. The app still did not load. So, I left. Instead, I went on Google and started searching for deals on the items. As a result, Amazon lost income.

它最近发生在我身上。我试图在黑色星期五在亚马逊购物，应用程序非常慢。我知道他们可能处在一个疯狂的负载，所以我刷新浏览器再他们一个机会。该应用程序仍然没有加载。所以，我离开了。相反，我继续使用谷歌并开始搜索，以实现商品的交易。结果，亚马逊失去了收入。

Anyway, the goal of this automated functional GUI check is to simulate user behavior on our application in an automated fashion. Thus, if the user isn’t going to do a retry on our application, then the automated check shouldn’t retry either.

无论如何，这种自动化功能GUI检查的目标是以自动方式模拟我们应用程序上的用户行为。因此，如果用户不打算在我们的应用程序上重试，则自动检查也不应重试。

In fact, misunderstanding the purpose of automated functional testing by the manager is what led to this conversation in the first place. First, the automated check failed and sent out a text alert to the support team. Second, the support team saw the screenshots, opened the app and then tried to recreate the failure manually. Guess what, it passed for them! Thus, we might conclude, with my manager, that the automated check is unstable. However, the video, screenshots and logs recorded during the run tell us a completely different story.

实际上，经理误解自动化功能测试的目标，是导致这种对话的原因。首先，自动检查失败，并向支持团队发送了文本警报。其次，支持团队看到了截图，打开了应用程序，然后尝试手动重新创建故障。猜猜看，对他们来说应用程序通过手工检查了！因此，我们可能会与我的经理一起得出结论，自动检查是不稳定的。但是，在运行期间录制的视频，屏幕截图和日志告诉我们一个完全不同的故事。

In conclusion, yes, I would love for this check to fail less often, but I cannot come up with a more reasonable solution to this problem other than fixing the damn code base.

总而言之，是的，我希望这个检查失败的次数少，但除了修复该死的代码库之外，我无法想出更合理的解决方案。

个人心得：在进行底层服务、API接口自动化时，一分钟运行一次线上的所有自动化检查，也会有时不时的失败，手工尝试时又发现是正常的。于是看到这里，与作者心有戚戚焉了。

Should the Support Team Acknowledge Inconsistent Failures?支持团队是否应该承认不一致的失败？

Young men and women giving a thumbs up Yes, the support team should wake up and deal with these inconsistent alerts in the middle of the night!

是的，支持团队应该在半夜醒来并处理这些不一致的警报！

Look, I understand that this really sucks. In fact, I have to do it all the time, and I hate it. I’m a huge health freak and it makes me really angry to know that I have to sacrifice my sleep for something like that. I don’t even sacrifice my sleep for my wife, for whom I would sacrifice almost anything. Therefore, I feel the pain maybe even more than most, especially when I need to wake as a result of instabilities caused by other people’s applications. Ugh, my blood boils.

看，我明白这真的很糟糕。事实上，我必须一直这样做，我讨厌它。我是一个巨大的健康怪物，我真的很生气，我必须为这样的事情牺牲我的睡眠。我甚至不为我的妻子牺牲我的睡眠，为此我几乎可以牺牲任何东西。因此，我觉得疼痛可能比大多数人更痛苦，特别是当我需要因其他人的应用程序造成的不稳定性而醒来时。呃，我的血沸腾了。

However, I also understand that it is my job to do so once in awhile. And the Ops team needs to do this all the time. That’s their job; it’s what they get paid for.

但是，我也明白，偶尔这样做是我的工作。 Ops团队需要一直这样做。那是他们的工作; 这是他们得到的报酬。

And as much as it sucks to destroy my health because other teams wrote bad code, it’s what needs to be done. We can be angry about it, but we should channel our anger towards the right source. The automated check is NOT the source of our frustration in this case. The automated check just reveals the unpleasant truth. The check simply shows us that the code is unstable and it must be fixed.

而且，由于其他团队编写了错误的代码，因此伤害我的健康状况这是很糟糕，但这是需要做的事情。我们可能会对此感到愤怒，但我们应该将我们的愤怒引向正确的来源。在这种情况下，自动检查不是我们沮丧的根源。自动检查只是揭示了令人不快的事实。检查只是告诉我们代码不稳定，必须修复。

个人心得：这里不是很同意作者的这个观点——团队程序需要为这样的自动化失败，在大半夜起来检查或修复。除了在程序上线之前应该尽可能发现此类问题之外，自动化程序日常运行中，如果已经发现此类问题了，团队讨论后决定马上或延期修复，或者影响的用户量有多少的问题(统计半夜凌晨两点有多少人用这个应用)，那么就不应该再不厌其烦的报告此类问题了。重复报告问题，也会带来人力和时间的浪费

The root cause of the problem is poorly written and poorly tested code. Since I started testing this app, I can confidently say that it’s garbage! It’s tough to hear, but it’s true.

问题的根本原因是代码编写得不好以及测试不佳。自从我开始测试这个应用程序以来，我可以自信地说它是垃圾！这听起来很刺耳，但这是真的。

Therefore, rather than complaining about the check, why don’t we complain about the code? Why don’t we wake the developer responsible for this problem, make him find the root cause and fix the instability? Why is everyone suffering except the person who was responsible for the problem?

因此，为什么我们不抱怨代码而不是抱怨支票？为什么我们不唤醒开发人员对此问题负责，让他找到根本原因并解决不稳定问题？除了负责这个问题的人之外，为什么每个人都在受苦？

Why must a QA Engineer defend the validity of his automated checks to his manager? Why must an Ops person wake up in the middle of the night to try and resolve these intermittent problems? This isn’t fair for anyone, and problems begin in a company when people don’t feel like they are treated fairly.

为什么质量保证工程师必须为自己的经理辩护自动检查的有效性？为什么一个Ops人必须在半夜醒来试图解决这些间歇性问题？这对任何人都不公平，当人们不认为他们得到公平对待时，公司就会出现问题。

Had I coded or tested this functionality, I would man up, get out of bed, and fix my error. First, because it’s annoying to the end user. And second, because I wouldn’t want to burden my teammates with my mistakes. Would you?

如果我编码或测试了这个功能，我会起床，并修复我的错误。首先，因为它对最终用户来说很烦人。第二，因为我不想让我的队友承担错误。你会吗？

Ultimately, the problem does not lie with a “flaky” check as suggested by management. Rather, the problem is in the faulty code of the application, which could be easily found and fixed. Killing the messenger (aka the automated check) for telling you the truth is not going to make the truth disappear. Instead, it will just delay the truth’s appearance until it comes back as something worse, like a client complaint.

最终，问题不在于管理层建议的“不稳定”检查。相反，问题在于应用程序的错误代码，这是很容易地找到并修复的。杀死信使（也就是自动检查）不让它告诉你真相并不会让真相消失。相反，它会延迟真相的出现，直到它变成更糟糕的事情，比如客户投诉。

个人心得：除了作者本身的抱怨之类的言辞之外，还是比较同意类似的问题，让应用程序在代码层面做出改进，是比较可取的方法

One More Thought 另一个想法

Let’s think about this. Even if all of my assertions are complete nonsense (which happens sometimes, I admit), are we willing to lose the functionality coverage of one of the most important components of our application, the login?

让我们考虑一下。即使我的所有断言完全是胡说八道（有时会发生这种情况，我承认），我们是否愿意失去我们应用程序中最重要组件之一的登录功能？

Because of a “flaky” automated check, do we really want to stop monitoring to make sure that a user can always successfully login and load the application? The user can have issues at many points throughout the flow of this use case:

由于“不稳定的”自动检查，我们是否真的想停止监控以确保用户始终能够成功登录并加载应用程序？在整个用例流程中，用户可能会遇到许多问题：

The iframe may not load (iframe框可能不加载)
They may not be able to login(他们可能不能登录)
Maybe they logged in, but the application doesn’t load(也许他们登录了，但是应用程序没有加载)
Maybe the application loads, but not completely(应用程序可能加载了，但没有全部加载)

If the iframe loading can be unstable, imagine the other negative user experience problems that can occur.

如果iframe加载可能不稳定，请设想可能发生的其他负面用户体验问题。

To turn off, or even cover up the complaints from this test, would mean losing some of the most critical monitoring for our application. If a user cannot login to our app, they can’t use it and they can’t pay us. Therefore, it makes no sense to turn off or pad the check just so that it passes and allows our team members to sleep peacefully under a blanket of false comfort.

关闭，甚至掩盖此测试的投诉，将意味着失去对我们的应用程序的一些最关键的监控。如果用户无法登录我们的应用程序，他们就无法使用它，也无法向我们付款。因此，关闭或填充自动化检查，来让它通过并允许我们的团队成员在一种虚假的安慰下安静地睡觉，这是没有意义的

What have we learned?(我们学到了什么？)

So through this entire tirade, there was actually some really good information that I wanted to convey. Regardless of whether I was successful or not, I will re-emphasize those points here:

因此，通过整个长篇大论，我想传达一些非常好的信息。无论我是否成功，我都会在这里再次强调这些要点：

Usually, a “flaky” check is not actually flaky, rather, the code running the application is the unstable culprit. If you have a moderate understanding of automated software testing and follow automation testing best practices like the Page Object Pattern or KISS, then you can be almost sure that your tests are mostly stable. Obviously, if you are doing silly things like implicit waits, waits that are unrealistically short, or other automation anti-patterns, your tests are probably much less stable.通常，“不稳定”检查实际上并不是不稳定的，而是运行应用程序的代码不稳定的替罪羊。如果您对自动化软件测试有一定的了解，并遵循页面对象模式或KISS等自动化测试最佳实践，那么您几乎可以确定您的测试大多是稳定的。显然，如果你正在做一些愚蠢的事情，比如隐式等待，不切实际的等待，或其他自动化反模式，你的测试可能不太稳定。
If you can prove that your automated check isn’t “flaky”, then the only solution to the failures is to fix the code.如果您可以证明您的自动检查不是“不稳定的”，那么失败的唯一解决方案是修复代码。
Often we are faced with tough situations where a manager may disagree with your opinion. In that case, do your best to present all of the facts regarding why the automated check is failing in the first place. Convey to your superior that covering up inconsistent failures does not actually fix the bugs. Rather, people need to be held responsible for the code that they produce.我们经常面对经理可能不同意您的观点的困境。在这种情况下，尽力提供有关自动检查失败原因的所有事实。向你的上级传达，掩盖不一致的失败并不能真正解决这些问题。相反，人们需要对他们生成的代码负责。
Test early, test often. Although I did not specifically talk about this topic, this is one of the widely accepted ways to prevent bad code from happening. By making the code more testable, testing it earlier in the lifecycle, and testing more often, we can have less production issues and more sleep. Now that sounds like a win for everyone!早期测试，经常测试。虽然我没有具体谈论这个主题，但这是防止错误代码发生的广泛接受的方法之一。通过使代码更易于测试，在生命周期的早期测试，以及更频繁地测试，我们可以减少生产问题和获得更多睡眠。现在这对每个人来说都是一场胜利！

What are your thoughts? Do you agree or disagree with my assertions? Have you faced such issues before and, if so, how did you handle them?

你的想法是什么？你同意还是不同意我的观点呢？你以前遇到过这样的问题，如果是的话，你是如何处理它们的？

关于作者

Nikolay Advolodkin is infinitely passionate about three things: self-improvement, IT & computers, and making an impact. He teaches people the art of automated software testing and quality assurance worldwide. His goal is to help develop higher quality software in a more efficient manner. Nikolay is the creative mind behind ultimateqa.com Follow Nikolay on his Twitter page for all of the latest updates: @Nikolay_A00.

Nikolay Advolodkin对三件事情充满热情：自我提升，IT和计算机，以及对其产生影响。他向人们传授全球自动化软件测试和质量保证的艺术。他的目标是以更有效的方式帮助开发更高质量的软件。 Nikolay是ultimateqa.com背后的创意思维。在他的Twitter页面上关注Nikolay以获取所有最新更新：@ Nikolay_A00。

原链接

https://simpleprogrammer.com/your-automation-test-sucks/

您的自动化测试糟透了

翻译内容

So, what is the Problem?(问题是什么？)

Unstable Automated Check, Really?不稳定的自动检查，真是这样吗？

Does the Automated Check Fail Regularly?自动检查是否定期失败？

Increase the Timeout? 增加超时时间？

Add a Retry? 增加重试？

Should the Support Team Acknowledge Inconsistent Failures?支持团队是否应该承认不一致的失败？

One More Thought 另一个想法

What have we learned?(我们学到了什么？)

关于作者

原链接

猜你喜欢