Test Reduction Using Plackett-Burman

by George Field on July 17, 2014

In order to train the machine learner that is central to MAGEEC, a large data set is required relating energy measurements and compiler optimisation passes. Our benchmarking suite, BEEBS, is comprised of 93 applications (correct at the time of writing), and is continuing to expand. Similarly, our current count of trivially optional optimization passes in GCC (for an AVR platform) is 120. If we consider a full factorial design, whereby we run a test for every possible configuration of compiler passes, for each benchmark, clearly the experiment will be enormous in scope. Therefore, it is in our interest to analyse the effects of each compiler pass on a per architecture basis, in order to determine which have negligible effect on energy consumption. To achieve this, as part of my contribution to the MAGEEC project this Summer, I will be working on implementing a Plackett-Burman design.

An exampe matrix for Plackett-Burman design.

An example matrix for the L=2, N=8 case of Plackett-Burman design.

An Overview of Plackett-Burman Design

A Plackett-Burman design is a form of experimental design, developed in order to significantly reduce the number of tests needed to determine which of the independent variables, known as factors, have the most significant effect on the measured outcome. Plackett-Burman is particularly suited to two level factors – i.e. situations where we’re interested in two possible values for every factor – in this case, running or skipping a pass. The number of levels in a design is referred to as L, while the number of factors takes the symbol n. Knowing L and n, we must then choose a number of tests to run, denoted N.

There are some requirements for a suitable value of N. Primarily, N must be greater than n, and N must be divisable by L². A further restriction is that, at least in our problem space with L = 2, N should not be a power of 2. If a power of 2 is used, our approach will be identical to a fractional factorial design. Once a suitable N has been selected, a matrix representing all of the tests to be ran must be created. For our case of L = 2, this matrix is actually a Hadamard matrix, provided we use the values 1 and -1 to denote each value. The intuitive description of the desired matrix is that it should, for every pair of factors, feature an equal number of every possible combination of levels. For example, the accompanying table shows a possible matrix for the case where N = 12, as described in the original publication. This property allows Plackett-Burman to identify main effects with a limited number of tests.

Plan Going Forward

In order to employ Plackett-Burman design, there are two problems I will need to tackle. The first is implementing a system that can run the entire set of tests fully automatically – due to the large number of tests that will need to be performed, it will not be practical to have someone supervise the process. Any tests that fail to compile should neither hang nor halt the process; rather, they should skip the erroneous step and log it for future response. For this I will be utilising DejaGNU. The second step is creating the Plackett-Burman design itself. As we have a large set of factors, I will need to generate a proportionally large Hadamard matrix to determine what tests to run. The original paper described designs up to only N=100. Therefore, I will need to investigate a method of generating a sufficiently large matrix, or source a pre-computed matrix of an appropriate size. Short term, I hope to have prepared a baseline implementation of the framework using DejaGNU, to act as a proof of concept before continuing with development.