<div dir="ltr">First, a great piece of work from George in generating this data.<div><br></div><div>I agree it looks very interesting and I really look forward to digging into what's causing the sporadic nature of effect (almost equally often negative and positive) of the four most effective optimisations. Clearly, we have a lot of understanding to gain about what's going on here. Is there any possibility that we can look behind the scenes and see what the actual generated instruction streams look like both with and without the optimisation turned on for some of these largest changes, to gain some of this understanding?</div>
<div><br></div><div>Simon</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 11 August 2014 18:46, James Pallister <span dir="ltr"><<a href="mailto:James.Pallister@bristol.ac.uk" target="_blank">James.Pallister@bristol.ac.uk</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Hi George,<div class=""><br>
<br>
<blockquote type="cite"><br>
I was aiming for a clustered bar chart, but instead settled on
stacked columns, as there were far too many data points and the
chart was cluttered. To explain the chart: I've plotted the
percentage change between the average energy usage of each
benchmark with each pass enabled and disabled. Thus, a negative
value shows that the pass reduced the energy usage of a
benchmark. In terms of the chart produced - bands below the x
axis are where the benchmark had reduced energy, whereas those
above used more.</blockquote></div>
Looks good (although slightly difficult to interpret - lots of
data points). It seems that all of the optimizations are having an
effect on the energy consumption, so we can't exclude any of
these. Perhaps a box whisker plot for each pass would give a good
idea of the distribution of results.<div class=""><br>
<br>
<blockquote type="cite">One thing I noticed is that 2dfir had
disproportionately large magnitudes in the energy changes.
Therefore, I excluded 2dfir from the chart linked earlier. I
will see if this changes after I've pulled in recent beebsv2
changes.<br>
</blockquote></div>
Looking at the raw data, are we certain this is correct? These
measurements seem really large (could there have been an anomalous
reading?) It may be worth repeating the experiment for this
benchmark and seeing if you get the same results.<div class=""><br>
<br>
<blockquote type="cite">I believe comparing the means of enabled
vs disabled is the way to determine main effects. However, I'm
not sure how to determine whether or not the difference is
statistically significant - if you look at the raw data (<a href="https://github.com/ks07/beebs/blob/plb/plb/rudimentary_analysis.txt" target="_blank">https://github.com/ks07/beebs/blob/plb/plb/rudimentary_analysis.txt</a>),
a large portion of the energy changes are very small (for
example, 1.953e-14%).</blockquote></div>
We should be able to look at the raw data (i.e. non-averaged, data
from each run), and do the mann-whitney test, to work out whether
the two distributions are significantly different or not.<br>
<br>
From your raw data, here is a hinton diagram:<br>
<img src="cid:part2.04010000.08070306@bristol.ac.uk" alt=""><br>
Black indicates a decrease in energy, white is an increase, size
is the delta % column. The benchmarks are horizontal, and the
passes are vertical. I'd have to agree that we should exclude the
gdb-* tests. I've also excluded the 2dfir benchmark.
sglib-arraybinsearch also benefits a lot from the optimizations
(may also be worth reinvestigating).<br>
<br>
Interesting data :)<br>
<br>
James<div><div class="h5"><br>
<br>
On 11/08/14 16:20, George Field wrote:<br>
</div></div></div>
<blockquote type="cite"><div><div class="h5">
<div dir="ltr">
<div>
<div>
<div>
<div>Hi all,<br>
<br>
</div>
I've just finished doing a bit of analysis on a small
subset of GCC passes for BEEBS. I still need to pull in
some of the recent changes to BEEBS, namely deleting the
benchmarks that are no longer part of the suite (the gdb-*
benchmarks seem to be skewing the results somewhat) - but
the results are still interesting.<br>
<br>
</div>
I've ran 16 tests 3 times, testing the first 12 optional
passes. Possibly the most interesting thing I've produced
from the energy measurements is the following graph:<br>
<br>
<a href="https://raw.githubusercontent.com/ks07/beebs/plb/plb/main_effects_test.png" target="_blank">https://raw.githubusercontent.com/ks07/beebs/plb/plb/main_effects_test.png</a><br>
<br>
</div>
I was aiming for a clustered bar chart, but instead settled on
stacked columns, as there were far too many data points and
the chart was cluttered. To explain the chart: I've plotted
the percentage change between the average energy usage of each
benchmark with each pass enabled and disabled. Thus, a
negative value shows that the pass reduced the energy usage of
a benchmark. In terms of the chart produced - bands below the
x axis are where the benchmark had reduced energy, whereas
those above used more.<br>
<br>
</div>
<div>You'll notice that no GCC pass was universally good or bad
wrt the energy usage of our benchmarks. However, it's clear
that the majority of passes have a tendency to either improve
or impair the energy usage, on average.<br>
<br>
</div>
<div>Another, more detailed look at the main effects shows the 3
best, and 3 worst passes for each benchmark. <br>
<br>
<a href="https://github.com/ks07/beebs/blob/plb/plb/best_passes.txt" target="_blank">https://github.com/ks07/beebs/blob/plb/plb/best_passes.txt</a><br>
<br>
</div>
<div>In this file, you'll see the name of the benchmark,
followed by the best 3 passes and the percentage change on
their energy usage. Following that are the 3 worst.<br>
<br>
</div>
<div>One thing I noticed is that 2dfir had disproportionately
large magnitudes in the energy changes. Therefore, I excluded
2dfir from the chart linked earlier. I will see if this
changes after I've pulled in recent beebsv2 changes.<br>
<br>
</div>
<div>I believe comparing the means of enabled vs disabled is the
way to determine main effects. However, I'm not sure how to
determine whether or not the difference is statistically
significant - if you look at the raw data (<a href="https://github.com/ks07/beebs/blob/plb/plb/rudimentary_analysis.txt" target="_blank">https://github.com/ks07/beebs/blob/plb/plb/rudimentary_analysis.txt</a>),
a large portion of the energy changes are very small (for
example, 1.953e-14%).<br>
<br>
</div>
<div>Thanks,<br>
</div>
<div>George<br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div></div><pre>_______________________________________________
mageec mailing list
<a href="mailto:mageec@mageec.org" target="_blank">mageec@mageec.org</a>
<a href="http://mageec.org/cgi-bin/mailman/listinfo/mageec" target="_blank">http://mageec.org/cgi-bin/mailman/listinfo/mageec</a>
</pre>
</blockquote>
<br>
</div>
<br>_______________________________________________<br>
mageec mailing list<br>
<a href="mailto:mageec@mageec.org">mageec@mageec.org</a><br>
<a href="http://mageec.org/cgi-bin/mailman/listinfo/mageec" target="_blank">http://mageec.org/cgi-bin/mailman/listinfo/mageec</a><br>
<br></blockquote></div><br></div>