[MAGEEC] [BEEBS] Plackett-Burman Initial Review

Thu Aug 14 12:15:24 BST 2014

Hi all,

Apologies for the lack of replies on my part - I have been away the past
two days.

Unfortunately, it appears that my data is actually useless! I have just
checked the binaries produced from my tests - and they are all identical,
for all benchmarks.

The very small percentage changes which make up the majority of my results,
as well as the sporadic nature Simon pointed out, must be simply due to
variability in the measurements. This is something I will need to try and
filter out, or at least ignore, when doing future tests. Obviously the
simplest way would be to repeat measurements a lot more to hopefully
average away any error- but this would dramatically increase the time
needed.

For the more interesting anomalies, take a look at the following excerpts
from the raw data produced by the make process:
https://gist.github.com/ks07/fa4c0f6cf124ac7db66f

I've included three benchmarks: 2dfir, sglib-arraybinsearch, and sha. I
chose sha as Andrew identified it as having the most instructions per
iteration, and it shows very little variation between tests. Looking at
2dfir, a weird pattern emerges - it appears that the first measurements are
almost always outliers. I think benchmarks are being affected by
(erroneous?) benchmarks ran previously - is there anything I can change in
the configuration to make sure that the 328p is in a sane configuration
before it is flashed and measured by DejaGNU? It appears that wikisort is
having problems (complaints about multiple measurements from platformrun),
which has the knock-on effect of breaking the 2dfir measurement that
happens afterwards. Note that, oddly, I'm struggling to recreate this
effect on 2dfir when running wikisort followed by 2dfir manually - perhaps
there is another explanation?

Thankfully, sglib-arraybinsearch, which also showed some large energy
changes, can be explained by variation expected in the measurement process.
With it's currently very short runtime, this benchmark simply has a very
small energy usage, meaning the percentage error is far greater than with
others. I can fix this trivially by increasing the iteration count.

Regardless, I believe I can say, at the very least, that the first 12
passes taken from the top of the 'optional list' have no effect on the
binaries produced from BEEBS. It would be good to get conformation of this
- if someone could compile the benchmarks for AVR using my 'PASSES_TO_RUN'
files and just compare the resulting binaries, that would be much
appreciated. (Not currently available online.) In the meantime, I will pull
in the latest BEEBS changes and check again - possibly picking a new set of
passes to test. (Any suggestions on which passes might be particularly
effective?)

Thanks,
George
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mageec.org/pipermail/mageec/attachments/20140814/ee0c8e8a/attachment.html>