WebKit Layout Tests: Theory #

This post originally appeared on the WebKit blog, but now appears to be gone. I've republished it here.

When I began WebKit development, one of the things that I was curious about was how testing is handled. Having been a web developer, I was aware of both how many bugs browser rendering engines can have (though things are certainly getting better), and how increasingly complex web pages are pushing those engines more and more. Having to live with bugs for years is definitely something to be avoided, so enforcing spec compliance and avoiding regressions both seem key.

The WebKit solution to this is layout tests. At the simplest level, layout tests are simple web pages (the simpler the better) that are checked into the WebKit repository, along with expected renderings (golden files), either as text or as images. A test harness (run-webkit-tests) uses an app that embeds WebKit (DumpRenderTree) to go through the tests (all 20,000+ of them) and compares the result of rendering of test cases against golden files, and reports tests that fail the comparison, crash, time out, or otherwise behave unexpectedly. The WebKit project has builders that go through this process continuously across all platforms that it has been ported to, making it easy to spot changes that break things (and if they do, revert them).

Developers are also encouraged to run the tests before committing changes. The easiest way is to use the commit queue, which does this automatically. If not, running the full suite on a workstation is also quite feasible, it currently takes around 15 minutes and will be down to ~4 minutes or less with Dirk Pranke’s multi-process test runner.

With judicious use of test data, layout tests are used to verify the behavior of many things, from JavaScript engine spec compliance to repaint behavior and the WebSocket protocol implementation. For things like the latter that need network access, the test harness starts a local server (Apache, lighthttpd, or WebSocket) and runs tests from it. The local HTTP server is also useful for simulating network-related edge cases; it amuses me that I’ve had to learn and use more PHP in the past 6 months on WebKit than I have in 6 years of web development.

For simpler tests that are more in the unit test style (i.e. using assertions), there is a helper framework that makes this easy to set up. The golden file then is just a series of “success” statements.

Given that the layout test infrastructure tests not just rendering/layout, but also unit tests the JavaScript bindings, interactions with the network stack, does order-of-magnitude performance tests, and much more, the name “layout test” is increasingly inaccurate, something that gets discussed occasionally. Because of that flexibility, the layout test model also works well for importing third-party test suites. As part of layout tests, we run the Sputnik JavaScript conformance suite, Philip Taylor’s <canvas> suite, an HTML5 parser suite, and tests from other browser makers.

Generally layout tests accompany all check ins, especially those that fix bugs (to make sure that the bugs do not reappear). This also means that the first step in fixing a bug is reducing a possibly complex page that triggers the bug to something simpler. If you ever file a bug and it gets the NeedsReduction label, and you’re the author of the web page that exhibits the bug, you’re much better positioned than a WebKit developer in creating a minimal reduction. It’s much easier to investigate an issue if it boils down to reloading a page and looking for an alert, or the magical word “PASS”. It also means that if you provide a good reduced test case, you can achieve immortality insofar as your test being run hundreds of times a day.

A follow-up post discusses some realities of the layout test system. To learn even more about them see the WebKit wiki.

Post a Comment