[MAGEEC] [mageec-magicians] Re: Structure of the Results Database

Fri Aug 30 12:14:48 BST 2013

Hello again everyone,

Thanks for everyone's feedback so far. As per James' recommendation I've
removed the raw_results table and put results about a specific run in the
runs table, and summary data about a test in the test table.

Also, after a discussion with Kerstin about how useful the database
structure would be for the machine learning it was realised that the
database wouldn't be storing any information about compiler passes.
Therefore the diagram has been changed so that flag sets and feature
vectors are associated with a pass instead of a test. A test can also have
multiple passes.

Here's the new diagram: http://mageec.org/wiki/File:Results_ERD_Proposal.png
I'll link to this page from now one so that it's easier to see differences
between different versions of the diagram.

Thanks,
Ashley

On 29 August 2013 10:50, Ashley Whetter <aw0455 at my.bristol.ac.uk> wrote:

> Right you are. I've updated the diagram:
> http://mageec.org/w/images/1/19/Results_ERD_Proposal.png
> I've also rename "time" to "timestamp" and "energy" to "power".
>
> Ashley
>
>
> On 29 August 2013 10:35, Munaaf Ghumran <mg0950.2010 at my.bristol.ac.uk>wrote:
>
>> I agree with James that the flags_tests and run tables might need test_id
>> as shared foreign keys.
>>
>> Other than that, nothing I can see that stands out, seems good!
>>
>> Moon
>>
>>
>> On 28 August 2013 19:08, James Pallister <James.Pallister at bristol.ac.uk>wrote:
>>
>>>  Hi,
>>>
>>>
>>>  A test (and test id) refers to a single combination of a platform,
>>> compiler, benchmark, and flag set.
>>> A run (and run id) refers to a single run of a test. A test can have
>>> multiple runs.
>>>
>>> In the ERD, do the flags_tests and runs tables need test_id?
>>>
>>>
>>> The raw_results table is the bottleneck. We record 200,000+ individual
>>> results for a single run, so this table will get really quickly.
>>>
>>> I'm guessing this is the power trace directly from the measurement
>>> board? If so, the fields should be run_id, timestamp and power.
>>>
>>> We may not want to store the entire trace in the database - with
>>> millions of measurements, the database might get unmanagably large - might
>>> be better if the raw_results was just the time, energy, average power, peak
>>> power, etc for that specific run.
>>>
>>>
>>> We could split this out into a different results table, but it's a
>>> one-to-one relationship
>>>
>>> The join between tests and runs should be changed from one-to-many in
>>> the diagram to one-to-one.
>>>
>>>
>>> Looks good from the energy measurement side :)
>>>
>>>
>>> James
>>>
>>>
>>> On 28/08/13 18:24, Simon Hollis wrote:
>>>
>>> Hi Ashley,
>>>
>>> Thanks very much for starting this discussion. This is a really good
>>> starting point.
>>>
>>> What I would like is that if everybody who has an interest in the
>>> structure of this database can provide their feedback on the structure that
>>> Ashley has proposed and see if it will work for their anticipated needs.
>>>
>>> As I see it there are at least three interests we need to support:
>>> Energy Measurement; the MAGEEC framework; ML.
>>>
>>> Perhaps all sides could outline the suitability of the proposed
>>> structure for their needs?
>>>
>>> P.S. for Magicians: If you received this message, but not Ashley's
>>> original one, it is because you are not yet signed up for the external
>>> mageec at mageec.org mailing list. Please do so!
>>>
>>>
>>> On 28/08/13 17:21, Ashley Whetter wrote:
>>>
>>> Hey everyone,
>>>
>>>  At the last meeting we looked at the ER diagram (
>>> http://mageec.org/wiki/Database) for the database that would store the
>>> results that would be recorded by the test framework and used by the plugin.
>>>
>>>  I've taken some of the comments made at the meeting and made a more
>>> detailed ER diagram to discuss. (
>>> http://mageec.org/w/images/1/19/Results_ERD_Proposal.png)
>>>
>>>  A test (and test id) refers to a single combination of a platform,
>>> compiler, benchmark, and flag set.
>>> A run (and run id) refers to a single run of a test. A test can have
>>> multiple runs.
>>>
>>>  Currently the flag table is only really storing a flag name (eg
>>> "-fgcse", "-fno-gcse", etc). I've kept this as a separate table, though,
>>> because eventually we'll want to start storing values for flags that aren't
>>> just on or off. We could add this value field now, keep the table as is for
>>> now, or get rid of flag_id all together and just use the flag name in
>>> flags_tests instead.
>>>
>>>  I've put summary data in the runs table. We could split this out into
>>> a different results table, but it's a one-to-one relationship it would add
>>> an unnecessary overhead of joining the runs and results table when we want
>>> to search it. This isn't such a problem if we search for results by run_id
>>> though.
>>>
>>>  The raw_results table is the bottleneck. We record 200,000+ individual
>>> results for a single run, so this table will get really quickly.
>>>
>>>
>>>  Ashley
>>>
>>>
>>> _______________________________________________
>>> mageec mailing listmageec at mageec.orghttp://mageec.org/cgi-bin/mailman/listinfo/mageec
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mageec mailing list
>>> mageec at mageec.org
>>> http://mageec.org/cgi-bin/mailman/listinfo/mageec
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mageec.org/pipermail/mageec/attachments/20130830/24e5d3e7/attachment.html>