Put the knife down and take a green herb, dude. (c) Ruffin Bailey 2001-2021

Today, I ran across a clickbaitily titled post called Just Say No to More End-to-End Tests posted at the Google Testing Blog, written back on April 22, 2015, detailing a fictional end-to-end (e2e) testing run. The scenario was very clearly fictionalized, but you can forgive most of it, as they're simply trying to illustrate a few points.

Here's their "analysis" of the test, which apparently took over a week to complete and had some serious errors with the testing apparatus as well.

What Went Well
Customer-impacting bugs were identified and fixed before they reached the customer.

What Went Wrong
The team completed their coding milestone a week late (and worked a lot of overtime).
Finding the root cause for a failing end-to-end test is painful and can take a long time.
Partner failures and lab failures ruined the test results on multiple days.
Many smaller bugs were hidden behind bigger bugs.
End-to-end tests were flaky at times.
Developers had to wait until the following day to know if a fix worked or not.
...
Although end-to-end tests do a better job of simulating real user scenarios, this advantage quickly becomes outweighed by all the disadvantages of the end-to-end feedback loop:

[Criteria]

Unit

End-to-end

Fast

✓

X

Reliable

✓

X

Isolates Failures

✓

X

Simulates a Real User

X

✓

The author uses this cluster of an e2e test to argue for a preponderance of unit and integration tests...

Integration Tests
Unit tests do have one major disadvantage: even if the units work well in isolation, you do not know if they work well together. But even then, you do not necessarily need end-to-end tests. For that, you can use an integration test. An integration test takes a small group of units, often two units, and tests their behavior as a whole, verifying that they coherently work together.
...

Testing Pyramid
Even with both unit tests and integration tests, you probably still will want a small number of end-to-end tests to verify the system as a whole. To find the right balance between all three test types, the best visual aid to use is the testing pyramid. Here is a simplified version of the testing pyramid from the opening keynote of the 2014 Google Test Automation Conference:

...
As a good first guess, Google often suggests a 70/20/10 split: 70% unit tests, 20% integration tests, and 10% end-to-end tests. The exact mix will be different for each team, but in general, it should retain that pyramid shape.

That's a lot of info to digest.

Here's the problem: "Simulates a Real User" isn't a single point. Their chart really isn't a scorecard, though that's how it's presented.

My quick list of critiques:

How many jobs have given you permission to create good unit tests?
Of those jobs, how many programmers actually created worthwhile, real-world, success & fail case tests?
Of the two unit tests you have left from the filters of 1 & 2, how many jobs then also factored creating integration tests into your schedule?
Did you use TDD? Because otherwise the tests you made in 1-3 are crap. No, honestly. No Scottishness at all.

Look, more typical situations in my experience are that you either have lip-service unit tests [only] or that you have no testing at all.

Guess what's most valuable if you have to pull teeth to convince management that testing is important? In my experience, it's an "end to end" test. You'll get the most return on your testing resource dollar with smoke tests.

I do like to call it a smoke test, and I've had pretty good results using Selenium to automate a browser using C#. Which browser? Well, you get your biggest bang for the buck by just using one. It'd be great to test in IE, Firefox, Edge, Chrome, and even macOS & iOS Safari, but each time is a diminishing return. The first test in whatever browser is going to catch 75% of what's wrong.

I recommend using either 1.) The lowest common denominator for browser functionality, usually IE, or 2.) Whatever's the easiest to get to behave for your tests. Timing can be a pain in Selenium.

Recently, I've used Selenium's Chrome web driver. That's dangerous. Chrome seems the most fault tolerant of the browsers, and on a developer's phat box, you can add slow downs to browser incompatibilities as problems that are often hard to notice.

But one browser, going through an automated master set of user stories, will quickly ensure that the 80% of your website that's your real bread and butter works and wasn't screwed up by whatever whizbang gizmo your latest release just pushed out the door.

If there is an error, sure, it can be hard to immediately identify the exact cause, which the Testing Blog seems to think is the end of the world. And perhaps a unit test to cover each error you do discover and track down will be useful. Do that.

But, again, unless you're doing TDD, to ask developers to write useful unit tests ahead of time almost never covers what your users are actually going to do and doesn't prevent bugs your own developers and QA are going to catch anyway. To get useful unit tests, you're going to have to spend even more time getting developers to code review others' unit tests, and you barely had time to green light unit tests to begin with. (How do I know that? If you had plenty of time, you'd be using TDD.)

And if you've gotten so far down the line that you can't tell if an error is coming from the front or back end, as in the Testing Blog's worst-case scenario, well, you have worse problems.

My suggestion if you're losing days looking for bugs your e2e tests turn up, and more days with your test lab going down?

Forget the users; your development process is broken. It's time to start arguing for TDD.

Update: Interesting to look at the slide linked in the quote, above:

And the notes address what I'm saying exactly (and argue against it):

Automated API tests are the best bang for the buck wrt tests that are easy to write and stable and a reliable signal. Integration Tests are always valuable; but we need to ration the number of integration tests one writes as they can be more difficult to debug and have a high noise factor due to all the moving pieces Automated component testing allows one to test a particular component (server) in isolation. you mock everything else out and ensure this component behaves as intended

As evident; the cone needs to be inverted in a lot of teams and companies. There is too much focus on Automated UI testing. At Google; we realized that half a decade back and have been putting in the required effort in the Integration testing and unit testing layer.

Interesting. I'm going to guess Google's coders do a much better job writing tests. If you want to convince me, talk more about how tests are written, and less about how they're better than automated UI testing if you have no testing now.

But wow, look at some later slides:

Push on Amber

Daily pushes to prod
Stable top of tree
Smarter regression testing
Critical tests cannot be bypassed

Comments to that slide:

In an ideal world; we should push on green; but we live in a practical world. I want baby carriers with beer holders; but they don’t make them. so I improvise and use my denim back pocket as my beer holder.

Not great. I've heard what's important when coding is to know which bugs to ship, but which bugs in tests to ignore? What are we doing? Later on...

How tests have become a nightmare. due to flakiness. stability.
I remember when I joined a team about 7 years back; I was excited; to be in server world @ Google. had been on client stuff before. When I asked around folks told me they had 1000s of end2end automated tests. I was like w00t! Turns out only 100s of them were providing real enough signal. some were flaky; some were broken forever.

EXACTLY. I think the Testing Blog grossing oversimplified the take-home from this fellow's presentation. I'll job bombarding pixels now.

Labels: coding, Google, selenium, tdd, testing