The following is an account of the final development stages of an android phone, which for obvious reasons will not be named. The bug in itself was a very simple matter and the consequent fix too. But the entire process of discovering what exactly was happening was quite fun. (Ya right! tell that to my manager :P)
The android phone, i was working on, supported connecting an external-display/TV via MHL(Mobile-HD link i.e. HDMI-over-USB). Once connected, the entire UI would be displayed on the TV. The phone battery would also charge while connected.
The issue in question was initially raised as an application issue. It so happened that the charging status was not being displayed properly on the lockscreen.
1. The issue...
With the device locked, when a external MHL-cable was connected to the android phone, it used to update the charging-icon. But if removed immediately, the charging-icon would continue to be displayed. This would not happen always. But sometimes even upto a minute after the cable was removed, the status would continue to be displayed as "charging".
The application developers banged their heads over the weekend and finally pushed the issue onto the underlying kernel drivers, stating that they were updating the charging-status as-&-when they get an update from the framework, which in turn depends on the fuelguage/battery-driver to obtain the battery-status.
It was now the kernel developers turn to use the "stress-reduction kit". After hours of logging almost every single instruction in every single interrupt routine, it was quite evident that the battery driver was not at fault. It was promptly reporting the connect/disconnect events when(and only when) an MHL cable was inserted/removed. The android framework was getting the events and eventually the lockscreen application too.
So now the question was that if EVERYTHING was working as it was supposed to, why the charging-status was not being displayed correctly?
2. The peculiar observation...
The peculiar thing about the disappearing charging-icon was that it was almost never for the same amount of time. Every time we tested it by plugging-in the cable, if it would disappear, it would do so for varying periods of time and then appear again onscreen.
3. What it meant...
We finally got onto the right track after we saw that the icon ALWAYS re-appeared onscreen just as the clock on the lock-screen updated itself. As it turned out the culprit was the display driver. When plugging in the MHL cable, there was some amount of tinkering going on in the background to handle the multiple displays and/or switch to the secondary(external HDTV over MHL) from the primary(mobile-LCD). As is the norm, the display was double-buffered to improve performance and prevent onscreen flickering and tearing. Plugging-in the MHL-cable just as the display driver was initiating a swapbuffer() (i.e. a page-flip operation to pick the back-buffer to display onscreen) the device would then initiate another swapbuffer() which meant the stale buffer was displayed onscreen. to add to the misery the "smart" display driver was programmed to skip redundant swapbuffer() calls. i.e. unless the display contents had changed from the time the previous call to swapbuffer() it would not refresh the display unnecessarily. This meant that after plugging-in the MHL-cable, once the wrong screen (one without the chargin-icon) was displayed, it would not be refreshed unless something else changed onscreen.
Usually the onscreen clock forced a refresh of the buffers when the time was updated. As it showed time only down to the minute, it would mean that sometimes the display could be "stale" for as long as (but no longer than) a minute. An
additional forced-refresh in the MHL-cable detection routine fixed the issue properly.
A simple example of double-buffering is shown below:
4. Could Triple-buffering have prevented this issue?
Triple-buffering involves 2 back-buffers. at any given moment, the display-driver can immediately pick one that is not being updated by the graphics h/w to display into the front-buffer.
Triple buffering itself has 2 variants:
(A) Triple-buffering with no-sync.
In this method the back-buffers are alternately updated by the graphics-h/w as fast as it can. At each Vsync, the display driver picks one of the buffers which is currently not being written to and swaps it with the front-buffer.
(B) Triple-buffering with Vsync.
In this method, the back-buffers are updated by the graphics h/w as fast as it can. But the update stops if both the back-buffers are updated but have not been displayed in the front-buffer yet. The display-driver as usual swaps one of the back-buffers witht he fornt-buffer at each Vsync. at this point the previous front-buffer which is now a back buffer is considered "stale" and the graphics h/w fills it up with the updated frame.
Triple-bufffering used could potentially correct the issue as one of the back-buffers would hold the properly updated screen data and it even if it was not picked-up right away, it would be picked immediately in the following next swapbuffer() call. Also in double-buffering, the graphics h/w doesn't have to wait for access to the backbuffer till the swapbuffer() completes the flip operation between the front and back buffers. This is not the case in triple-buffering, thus allowing the graphics h/w to run at full throttle thereby reducing the time that either of the backbuffers contains stale display data.
Further reading: A detailed description of double/triple buffering.