Unit tests for GUI can benefit from verifying the appearance of widgets against an "approved" snapshot of the widget, since not all the minute details of the widget may be easily testable. The recipe in this tip is specific to PyQt widgets, and is designed to either generate an image of a PyQt widget, or compare the current widget to a saved image. It has been tested with the following configuration: PyQt 5.7, Python 3.5, and Windows 7 64 bit. It will likely work additionally for any combination of PyQt 5.x and Python 3.x, on any platform supported by the combination.
The basic requirements of a widget appearance test function are the following:
- It should be easy to generate a snapshot of the widget the first time, or whenever a widget has been intentionally modified.
- If the image already exists, the test function should assume that the current widget appaearance should be compared against it.
- If the appearance has not changed (beyond a specified tolerance, if any), the function should return True.
- If the appearance has changed beyond a certain tolerance (which defaults to 0), the test function should save the actual snapshot of the widget and should generate a difference image, in case these help understand what has been broken. It should then return False.
- The test function shoud delete any previous actual and difference images saved to filesystem: if the test succeeds, they are obsolete (and hence leaving them around would be misleading), if the test fails, they will get overwritten.
- The test function should allow for the check to repeat at a certain interval for a short period of time since some multi-threaded apps may delay the completion of appearance of a widget
- Several types of appearance differences should be checked:
- very slight change in colors over a large area. Example: background color change
- medium change in colors, in small area. Example: speckling effect around characters in widgets embedded in graphics scene
- large change in colors, in very small area. Example: text change by even one character
- It should be easy to save the widget snapshot in the same folder as a test.
- The name of the file to compare against, or to save as reference, should be easy to provide.
Using the Code
The function requires a PyQt widget, a path to a folder or file, and a name for the snapshot. Optionally, it can take a % tolerance of mismatch and a maximum number of pixels that can be different.
def check_widget_snapshot(widget: QWidget, ref_path: str, fig_name: str,
rms_tol_perc: float = 0.0, num_tol_perc: float=100.0) -> bool:
Get the difference between a widget's appearance and a file stored on disk. If the file doesn't
exist, the file is created.
:param widget: the widget for which appearance must be verified
:param ref_path: the path to folder containing reference image (can be a file name too, then the
parent is used as folder -- useful for tests where __file__ can be passed)
:param fig_name: the name of the reference image (will be saved in ref_path folder if it doesn't
exist, or read if it does)
:param rms_tol_perc: percentage difference allowed between widget's snapshot and the reference, i.e.
a floating point value in range 0..100
:param num_tol_perc: the maximum percentage of mismatched pixels allowed, relative to total number
of pixels in the reference image; two pixels differ if any of their R, G, B and A values differ
:return: True if reference didn't exist (and was hence generated); True if widget matches reference;
True if not matched but RMS diff is less than rms_tol_perc % and number of different pixels
< max_num_diffs; False otherwise
# file some_test.py in folder 'testing'
app = QApplication()
widget = QLabel('test')
assert check_widget_snapshot(widget, __file__, 'label', rms_tol_perc=0.5, num_tol_perc=4)
The first time the test is run, it will save a snapshot of the widget to label.png in
Path(__file__).parent and return True. The next time it is run, the widget snapshot will be compared
to the image saved. If the image colors have changed by more than 0.5% when averaged over all pixels,
or if more than 4% of pixels have changed in any way, check_widget_snapshot() will save a snapshot
of the widget to label_actual.png and the difference image to label_diff.png and will return False,
thus causing the test to fail. Otherwise (no differences, or differences within stated bounds),
check_widget_snapshot() will just return True and the test will pass.
actual_pixmap = widget.grab()
image = actual_pixmap.toImage()
rms_tol_perc = float(rms_tol_perc) num_tol_perc = float(num_tol_perc)
ref_path = Path(ref_path)
ref_path = ref_path.parent
ref_pixmap_path = (ref_path / fig_name).with_suffix('.png')
if not ref_pixmap_path.exists():
log.info('Generating ref snapshot %s in %s for widget %s',
ref_pixmap_path.name, ref_pixmap_path.parent, widget)
ref_pixmap = QPixmap(str(ref_pixmap_path))
ref_image = ref_pixmap.toImage()
if ref_image == image:
diff_image = QImage(ref_image.width(), ref_image.height(), ref_image.format())
DIFF_REF_PIX = 255 DIFF_MAX = 255
REF_RGBA = [DIFF_REF_PIX] * 4
diff_rms = 0
num_diffs = 0
for i in range(ref_image.width()):
for j in range(ref_image.height()):
pixel = image.pixelColor(i, j).getRgb()
ref_pixel = ref_image.pixelColor(i, j).getRgb()
if pixel != ref_pixel:
diff = [pow((x - y) / DIFF_MAX, 2) for x, y in zip(pixel, ref_pixel)]
diff_rms += sqrt(sum(diff) / len(pixel))
diff = [DIFF_REF_PIX - abs(x - y) for x, y in zip(pixel, ref_pixel)]
diff_image.setPixelColor(i, j, QColor(*diff))
num_diffs += 1
diff_image.setPixelColor(i, j, QColor(*REF_RGBA))
total_num_pixels = ref_image.width() * ref_image.height()
diff_rms /= total_num_pixels
diff_rms_perc = diff_rms * 100
num_diffs_perc = num_diffs * 100 / total_num_pixels
log.info("Widget %s vs ref %s in %s:",
widget.objectName() or repr(widget), ref_pixmap_path.name, ref_pixmap_path.parent)
log.info(" RMS diff=%s%% (rms_tol_perc=%s%%), # pixels changed=%s%% (num_tol_perc=%s%%)",
diff_rms_perc, rms_tol_perc, num_diffs_perc, num_tol_perc)
if diff_rms_perc <= rms_tol_perc and num_diffs_perc <= num_tol_perc:
actual_pix_path = ref_pixmap_path.with_name(fig_name + '_actual.png')
log.warn(" Snapshot has changed beyond tolerances, saving actual and diff images to folder %s:",
diff_pix_path = ref_pixmap_path.with_name(fig_name + '_diff.png')
assert actual_pix_path.parent == diff_pix_path.parent
diff_pixmap = QPixmap(diff_image)
log.warn(" Actual image saved to %s", actual_pix_path.name)
log.warn(" White - |ref - widget| image saved to %s", diff_pix_path.name)
docstring gives an example of use. If the RMS difference percentage is larger than tolerated, or if the number of pixels that did not match between the widget snapshot and the reference is more than num_tol_perc % of total number of pixels, then two PNG images will be saved: the actual widget snapshot, and the absolute difference between the two images, relative to white.
Points of Interest
There are many ways to compute a scalar the represents the difference between two images. For widget appearance regression testing, it is likely sufficient to compute the average RMS of each pixel difference scaled between 0 and 1:
diff_for_one_pixel = sqrt [(r1-r2)<sup>2</sup>/255<sup>2</sup> + (g1-g2)<sup>2</sup>/255<sup>2</sup> + (b1-b2)<sup>2</sup>/255<sup>2</sup> + (a1-a2)<sup>2</sup>/255<sup>2</sup> ] / 2
where r, g, b, a are the integer values (0-255) of the RGB and alpha channels of the pixel. This produces a number between 0 and 1. The total difference is then:
total_diff = sum(diff_for_one_pixel for all pixels) / number of pixels
rms_tol_perc is a "percentage difference". Very localized but strong differences get washed out. For example, a difference of 1 between each pixel's RGBA values (0-255) will give a percentage difference of about 0.4% over the entire image. Whereas a difference of 255 on R, G, B, A of a 1000 pixels in a 1000x1000 pixel image will give a percentage difference of 0.1%. Yet the latter may well represent a more severe regression. In GUI widgets, this would be a common case of widget layout, words, or content having changed, where much of the widget remains unchanged but some portions have shifted or been replaced in various "pockets" of the widget.
It is useful to average the RMS difference over differing pixels only, rather than the total number of pixels, because identical pixels don't carry useful information for regression testing. In the previous example of a 1000 pixels completely different i.e. color = (0,0,0) in one image and (255, 255, 255) in the other, the check_widget_snapshot() would indicate that 0.1% of all pixels were unmatched, on average by 100%. In the image that has all pixels just one notch down from the reference image (color = (R, G, B, A) in reference image and (R-1, G-1, B-1, A-1) in the actual), the function would indicate that 100% of the image pixels were different, on average by 0.4%.
If the tolerances (RMS and percentage of pixels changed) are set to non-zero because of noise (such as speckling in graphics view with embedded widgets, due to machine differences), another measure is necessary to catch real differences such as a character changing in a label. Such change would be a large color difference in only a few pixels, but the difference could get diluted in the noise pixels. The maximum allowed difference is a tolerance that can guard against such changes going unnoticed.
Another measurement of difference between two images is the Structural Similarity Index (SSIM). However, this measure is significantly more complicated to compute, and available implementations are based on numpy. However, check_widget_snapshot() could have a parameter that allows a "difference computation" to be passed in, a form of dependency injection.
The check_widget_snapshot() function is available in github and is significantly more comprehensive than the above: it supports dependency injection for the difference computation, and implements all the requirements mentioned at the begninng of this article.
- First version: 2016/09/28
- Version 2: additional tolerance for number of differences
- Version 3: more requirements, additional tolerance for max difference, dependency injection of difference computation