Perils of Measuring Long Intervals with Performance.now() #

Date: Tuesday, November 29, 2016

I recently ran into an interesting quirk when using Performance.now() to measure long-ish intervals in Quip's web app. Since it does not seem to be broadly known, I thought I would document it.

To mitigate the possibility of self-induced DDoS attacks, I recently added duplicate request detection to Quip's model layer data loading mechanism. Since we pretty aggressively cache loaded data, repeated requests for the same objects would indicate that something is going wrong. I therefore added a client-side check for the exact same request being issued within 60 seconds of the first occurrence. If it triggered, it would send a diagnostic report to our error tracking system.

This turned up some legitimate bugs (e.g. two independent code paths racing to trigger loads) as well as some false positives (e.g. retries of requests that failed should be allowed). After pushing the fixes and tweaks to the detection system, I was left with a few puzzling reports. The report claimed that a duplicate request had occurred within a very short interval, but based on other events it looked like the requests had been several minutes (or even hours) apart.

When I looked at the reports more carefully, I saw that the long time interval was always bracketed by a disconnect and reconnect of the Web Socket that we use for sending real-time updates. I hypothesized that this may have been a laptop whose lid was closed and later re-opened. Taking a look at how I measured elapsed time between requests, I saw that this was computing the delta between to high-resolution timestamps returned by Performance.now(). I was then able to reproduce this scenario locally by comparing wall-clock elapsed time with high resolution elapsed time while my computer was asleep (to see it in action, see this simple test bed). I initially did this in Chrome, but Safari and Firefox seem to have the same behavior.

Performance.now() behavior across sleep intervals

The fix was switch to using Date.now(), which otherwise worked equally well for this use-case. We didn't actually need the high-resolution guarantees of Performance.now() — the reason why it was used in the first place is because the code already had a timestamp with it in place that was used for measuring load request-to-response time. The same code runs in on our desktop app (where load times can be sub-millisecond) and so the high resolution was needed for that use case.

I am definitely not the first to run into this; I have found a few off-hand mentions of this behavior. For example, see this Stack Overflow comment and this post on elm-dev. Curiously, neither the currently published version of the time specification nor the latest draft seem to indicate that this may be a possibility. Per the spec, Peformance.now() is supposed to return the difference between the time of the call and the time origin, and presumably the origin is fixed.

As to the specifics of why this happens, I spelunked through Chrome's codebase a bit. The Performance.now implementation calls monotonicallyIncreasingTime which uses base::TimeTicks::Now which uses the CLOCK_MONOTONIC POSIX clock. I wasn't able to find any specific gotchas about macOS's implementation of that clock, but Apple does have a tech note that says that “timers are not accurate across a sleep and wake cycle of the hardware” so this is not a surprise at that low level. Within the Chrome project it is also known that base::TimeTicks is unreliable across sleep intervals. Though it's common to think of the browser environment as being very high level and abstracted away from the hardware and operating system, small nuances such as this one do sometimes leak through.

Avoiding Incremental Rendering in Hybrid Desktop Apps #

Date: Wednesday, February 17, 2016

Labels: Quip Post a Comment

I previously described how adding native popovers and modal dialogs to Quip’s hybrid desktop apps helped them to blend in and avoid the “uncanny valley” of web-based apps that don’t quite feel right. Another area that we focused on was the experience of creating a new window, especially during application startup. This is the first impression for a user, and thus informs how they will perceive the rest of the app.

In theory launching should be fast — not much happens when a new window is created:

An NSWindow is instantiated.
A WebView is added to the window.
The web view is directed to load the HTML file from the app’s bundle. That file serves as the “shell” for the HTML, JavaScript and CSS that are used for the UI.
Once a signal from the JS that it has been initialized is received, the native side instructs it to render the desktop (or other initial object, when restoring the previous application state).

Here’s a short screen recording showing the application launch sequence for a small account:

End-to-end it doesn’t feel too slow, but there’s a lot of flashing and incremental rendering of the UI, which definitely feels “webby” as opposed to native. To make it a bit easier to understand what is going on, here’s a version that has been slowed down¹ by a factor of 3:

The visual progression can be broken down into 4 stages:

Blank window appears
Basic app “chrome” appears (without any user data)
Data and images begin to appear
Rendering is complete

In a native app, none of the intermediate stages are visible, thus they have much of a “snappy” feeling when creating a new window. Incremental rendering is normally desirable on the web if waiting on resources that require a network round-trip. In this case the overall time is short enough that intermediate states which only appear for a few frames (such as the one shown below) are distracting rather than giving a sense of progress.

Intermediate state

To avoid showing an entirely blank window (stage 1), instead of showing the window immediately, we modified the new window sequence to keep the window hidden until the ReactDOM.render() call for the application’s root view finished. This does mean that there is a slightly longer delay between the app being launched and the window appearing. However, since other things are happening (dock icon bouncing, menu bar changing), and the delay is on the order one to two hundred milliseconds, it's not noticeable.

The initial rendered view was very empty since it didn’t have any of the data for the user’s desktop or inbox. On the web we “seed” this data in the initial HTML response to avoid the extra network round-trip to fetch it. We had assumed that loading data from the local database is so fast that such an optimization was unnecessary, and instead it could be loaded on demand like everything else. It was indeed quite fast, but it still took tens of milliseconds, which led to frames that appeared incomplete. Once we added the same “seeding” capability to the desktop app (the request to render the initial object also included all the data necessary for the view), almost everything appeared at once, skipping over most of stage 3.

The reason why I said almost everything appeared at once is because some images were still taking a few extra frames to show up. The odd part was that these images were local static assets, and thus should have been readily available (e.g. the folder and sidebar icons in the screenshot above). Further, we inline the images as data: URIs into our main stylesheet (an optimization originally meant for the website, but carried over to the desktop app since it didn’t seem like it would hurt). Thus loading of the images should not involve any more I/O once the stylesheet was loaded and parsed. Evidently that was not the case — even when using data: URIs there was an asynchronous “fetch” and decompress step for rendering the image.

When experimenting with creating additional windows, we noticed that in those cases the images do appear instantaneously. We thus concluded that they must still end up in WebKit’s memory cache. We wanted to simulate this behavior for even the first window, and in the absence of an explicit cache hinting API, we added a “cache warmer” to invoke early on during application startup. It creates a WebView and loads a simple static HTML file that references our stylesheet and has dummy markup corresponding to our most common views. The view doesn’t need to be added to a window or otherwise shown, and it gets disposed of 10 seconds after startup.

Once all those changes are implemented no incremental rendering is visible (slowed down version). None of modifications that were necessary were particularly complex, but they do show the extra attention to detail that is necessary to get a hybrid web app to feel more native.

During development I often end up taking screen recordings and then stepping through them frame-by-frame to get a better understanding of the rendering sequence. I usually use ScreenFlow, but QuickTime Player can do this too.

Grafting Local Static Resources onto Production #

Date: Tuesday, January 26, 2016

Labels: Hammers, Quip Post a Comment

tl;dr: Using the webRequest Chrome extension API it is possible to “graft” development/localhost JavaScript and CSS assets on a production web service, thus allowing rapid debugging iteration against real production data sets. Demo site and extension.

During the summer of 2015 I was investigating an annoying bug in Quip where our message list would not stay “bottom-anchored” in some circumstances¹. Unfortunately I was only able to trigger it on our live production site, not on my local development setup. Though Chrome’s developer tools are quite nice, I did not have the necessary ability to rapidly iterate on the code in order to further investigate the bug. I had in the past pushed alternate builds our staging site to debug such production-only issues, but that would still take several minutes to see the results of every change.

My next thought was that I could instead try to reproduce the bug in our soon-to-be-released desktop app. The app can use local (minimally processed) JavaScript while running against production data. Unfortunately the bug did not manifest itself in our Mac app. I chalked this up to rendering engine differences (the bug was only visible in Chrome, and our Mac app uses a WebKit-backed WebView). I then tried our Windows app (which uses the same rendering engine as Chrome via the Chromium Embedded Framework), but it didn’t happen there it either. I was forced to conclude that the bug was due to some specific behavior in our website when running against production data, not something in the shared React-based UI.

As I was wishing for a way to use JavaScript and CSS from my laptop with production data (for security reasons my local Quip server cannot connect to the production databases) I remembered that Gmail used to have exactly such a mode. As I recall it, you could start a local ~~Caribou~~Gmail server, go to your (work) Gmail instance and append a special URL parameter that would cause the JavaScript from the local server to be requested instead². With most of Gmail’s behavior being driven by the client-side JavaScript (with the server serving as an API endpoint) this meant that it was possible to try out pretty complex changes on your own data without having to “deploy” them.

I considered adding this mode to Quip, but that seemed scary, security-wise, since it was effectively intentional cross-site scripting. It also would have meant waiting for the next day’s production push (and I wanted to solve the problem as soon as possible). However, it then occurred to me that I didn’t actually need to have the server change it behavior; I could instead write a Chrome extension which (via the webRequest API) would “graft” the local JavaScript and CSS files from my local server onto the production site when loaded in my browser.

I had hoped that the extension could modify the HTML that is initially served and replace the JavaScript and CSS URLs, but it turns out the webRequest API cannot modify the HTTP response body. What did work was to intercept the JavaScript and CSS requests before they were sent to our CDN and redirect them to paths on my local server. Chrome would initially flag this as being insecure (since we use HTTPS in production, and the redirected URLs were over plain HTTP), but it is possible to convince it to load the resources anyway.

Once I had the necessary tooling and ability to iterate quickly, fixing the bug that prompted all this was pretty straightforward (it was caused by the “mount point” system that we used to incrementally migrate our website to React, but that’s a whole other blog post). Since then it’s come in handy in debugging other hard-to-recreate problems, and for measuring JavaScript performance against more realistic data. It did briefly break when we added a Content Security Policy (CSP) — since we were loading scripts from an unknown domain the browser was correctly blocking the “grafted” response. However, the webRequest API also allows the extension to edit the response headers, thus it was straightforward to have it intercept the main HTML page request and strip the CSP header.

The extension that I wrote to accomplish this is very barebones and hardcodes a bunch of Quip-specific logic and URLs, thus is not easily shared. However, I have recreated a simplified version of it and put it in my web experiments repository. There is also a demo site that it can be applied to.

Yet more developer time spent faking something that should be a built-in capability, further confirming Bret’s observation.
For a bit more history: back in 2005 I was using Greasemonkey to hack Gmail left and right. When I talked to the Gmail team about this approach (versus working in my 20% time to add those features to Gmail directly) I rationalized it as “Greasemonkey lets me do UI experiments on the real email in my account with minimal lag, instead of needing to wait for code reviews and production pushes.” Darick Tong (a Gmail engineer) took this feedback to heart and added the custom JavaScript mode. Unfortunately by that point I had mostly moved on from Gmail hacking (Reader was keeping me plenty busy, JavaScript-wise), so I never got to actually use it.

Multiple Windows in Hybrid React Desktop Apps #

Date: Wednesday, January 06, 2016

Labels: Quip Post a Comment

Quip’s desktop apps are hybrid apps: both the Windows and Mac apps are composed of a web React-based UI talking to our C++ Syncer library, along with some platform-specific glue code. While this allows us to support two additional platforms with a small team we knew that architecting the apps in this way would run the risk of not “fitting in” with other Mac OS X or Windows applications.

To see if we could mitigate this problem, we started to enumerate what makes an app feel like “just a web view.” There were obvious things like responsiveness, working offline, and using platform conventions (e.g. a menu bar on the Mac). After thinking about it more, we also observed that native apps often have multiple windows, whereas web apps are bound to a single browser tab. This wasn’t just a matter of multiple windows showing Quip documents (something that did not seem too difficult to accomplish), but also all of the child windows that a native app would have (especially popovers and sheets on Mac OS X).

We did have equivalents in our React “parts” library that we used in our website, but visually they didn’t match the native versions, and they felt very much bound by the enclosing rectangle of the web view. This was especially apparent with popovers that were triggered near the edge of the web view; one would expect them to “spill out” of the window but instead they would awkwardly position themselves “inward” to avoid getting clipped by the edges. This triggered an “uncanny valley” effect where the illusion of a native app would be broken.

All of this was very much on my mind about a year ago as I was building up the foundations of our desktop apps. Though seemingly just a “polish” kind of detail, it seemed like supporting child windows would require some design trade-offs that would be easier to make early on, rather than retrofit later. My first thought was that we should define all of the popover and dialog content in a very declarative manner, thus allowing to later change how they were rendered. For modal dialogs this worked pretty well, since their content tended to be pretty simple.

For example, a deletion confirmation dialog would be represented as:

parts.spec.newPanel()
    .addSection(parts.spec.newTitleSection(_("Delete Chat Room")))
    .addSection(parts.spec.newSection()
        .addMessage(_("If you delete this chat room, you will…")))
    .addSection(parts.spec.newSection()
        .addDefaultButton(_("Hide"), function() {
            ...
        })
        .addButton(_("Delete for all %(count)s people", {"count": count}), function() {
            ...
        })
        .addCancelButton(_("Cancel"), function() {}))

It was then a pretty straightforward mapping to generate an NSAlert instance that would fit right in on a Mac app (we support multiple modal dialog “runners”: one that creates a React version and another that passes it off to the platform native code).

Native alert dialog

However, once we began to implement more and more popovers, the sheer variety of the kinds of controls used within them became apparent.

Popovers used in Quip

In addition to all these widgets popovers also often had custom validation logic (e.g. the “Create” button in the “New Folder” popovers is grayed out while the title is empty) and dynamic updates (rows representing users show presence). A declarative system would need to be quite complex in order to support all of these concepts. Furthermore, implementing each control multiple times was going to slow down development significantly, and the benefits seemed minimal (for example there are no platform conventions to follow for how to show a user row — that is a Quip-specific concept).

Since we already had a separation between the content of popovers and their frames (indeed, any React-based popover could be shown in a modal dialog instead), I wondered if it was possible to show the contents in a separate web view contained within an NSPopover instead. That way the frame would be native, and it could be positioned outside the main web view, but we would not have to re-implement all of our widgets.

To see if such an approach was feasible, I needed to answer two questions: “Could React render into another window?” and “Could we create another web view that was visible in the main window's JavaScript context?”

The reason why I wanted React to render into another window from the child was that this way all JavaScript execution would still happen in a single window, which would make this code path much more similar to regular popover rendering. To test this out, I tried something along these lines (in the web version of the app):

var otherWindow = window.open();
var container = otherWidow.document.createElement("div");
otherWindow.document.body.append(container);
ReactDOM.render(<parts.PanelContents .../>, container);

To my pleasant surprise this worked (see a simple test harness here), including event handlers (that would be invoked for events in the child window, but the handler would run in the parent window). It turned out that the React team had added support for rendering into other windows quite early, though they had iframes in mind. The only (in retrospect obvious) gotcha was that the child window also needed to have the stylesheet injected into it.

I then turned to the second question — was it possible to intercept the window.open() call when running in the native app and create a popover instead? Since the call seemingly did nothing, I assumed that it was up to the application to handle it. After going through the 6 delegates that WebView supports I came across the webView:createWebViewWithRequest: method of WebUIDelegate. That was indeed invoked for window.open calls, and it was up to the application to create the child WebView and put it somewhere in the view hierarchy. Thus it was a matter of creating the view, a view controller to own it, and setting that as the contentViewController of a NSPopover that was also created then. I’ve created a simple Xcode project that shows all of the pieces working together.

Once I had determined that there were no fundamental roadblocks, a bunch more supporting code was necessary (e.g. for passing around the popover anchor bounds and content size to the native side and for notifying the JS side that a popover was dismissed), but no more showstopper issues appeared. The same “runner” concept was used to allow popovers to be shown in a React frame on the web but a native frame on the Mac (Windows does not have a native popover container window and we have not yet implemented a custom version there).

As I was wrapping up, I was reminded of a Chrome tech talk that I had watched in early 2010 when I was first joining the Chrome team. A surprising amount of time (i.e. greater than 0 in a 30 minute overview talk) was spent discussing the implementation details of <select> popup menu rendering. I hadn’t really thought of select menus as being a particularly tricky part of HTML to implement, but the fact that they extend outside of the main window led to extra layers in the class hierarchy (RenderWidget vs. RenderView) as well as other complexity. Just as most web developers don’t really think about <select> as being all that special, I’m sure most of our users don’t think twice about our popovers sticking out outside of the main window. However, every time I open one I think of all of the things that had to be wired up for it appear the way it does and I smile a little bit.

persistent.info

Perils of Measuring Long Intervals with Performance.now() #

Avoiding Incremental Rendering in Hybrid Desktop Apps #

Grafting Local Static Resources onto Production #

Multiple Windows in Hybrid React Desktop Apps #

Blog Archive

Labels

About Me