<div dir="ltr"><div><div><div>Hi all,<br><br></div>Apologies for the lack of replies on my part - I have been away the past two days.<br><br></div>Unfortunately, it appears that my data is actually useless! I have just checked the binaries produced from my tests - and they are all identical, for all benchmarks.<br>

<br></div><div>The very small percentage changes which make up the majority of my results, as well as the sporadic nature Simon pointed out, must be simply due to variability in the measurements. This is something I will need to try and filter out, or at least ignore, when doing future tests. Obviously the simplest way would be to repeat measurements a lot more to hopefully average away any error- but this would dramatically increase the time needed.<br>

<br></div><div>For the more interesting anomalies, take a look at the following excerpts from the raw data produced by the make process:<br><a href="https://gist.github.com/ks07/fa4c0f6cf124ac7db66f">https://gist.github.com/ks07/fa4c0f6cf124ac7db66f</a><br>

<br></div><div>I've included three benchmarks: 2dfir, sglib-arraybinsearch, and sha. I chose sha as Andrew identified it as having the most instructions per iteration, and it shows very little variation between tests. Looking at 2dfir, a weird pattern emerges - it appears that the first measurements are almost always outliers. I think benchmarks are being affected by (erroneous?) benchmarks ran previously - is there anything I can change in the configuration to make sure that the 328p is in a sane configuration before it is flashed and measured by DejaGNU? It appears that wikisort is having problems (complaints about multiple measurements from platformrun), which has the knock-on effect of breaking the 2dfir measurement that happens afterwards. Note that, oddly, I'm struggling to recreate this effect on 2dfir when running wikisort followed by 2dfir manually - perhaps there is another explanation?<br>

<br></div><div>Thankfully, sglib-arraybinsearch, which also showed some large energy changes, can be explained by variation expected in the measurement process. With it's currently very short runtime, this benchmark simply has a very small energy usage, meaning the percentage error is far greater than with others. I can fix this trivially by increasing the iteration count.<br>

<br></div><div>Regardless, I believe I can say, at the very least, that the first 12 passes taken from the top of the 'optional list' have no effect on the binaries produced from BEEBS. It would be good to get conformation of this - if someone could compile the benchmarks for AVR using my 'PASSES_TO_RUN' files and just compare the resulting binaries, that would be much appreciated. (Not currently available online.) In the meantime, I will pull in the latest BEEBS changes and check again - possibly picking a new set of passes to test. (Any suggestions on which passes might be particularly effective?)<br>

<br></div><div>Thanks,<br>George<br></div></div>