Data gather

This page documents what the data gathering stage of MAGEEC will perform. The purpose of this stage is to generate a number of binaries, each with a different set of features, and run them on the target hardware to get energy data. A single metric will likely be presented to the machine learner, however we also want to store all of the raw data, for reference and so that it can be reused.

Python is used for the framework, to control the GDB servers, GDB and the platforms.

Sequence of events to gather one data point:

  1. Python initalises, and from the command line gathers data about the target platform.
  2. Python calls the compiler with the plugin set to 'data gather' mode. The plugin then compiles with the specified options, adding the options used and the features found into a backend database, with a unique test id generated from the resulting binary.
  3. The compiler returns the test-id to the python script for future reference.
  4. Python connects to the GDB servers and runs the program, retreiving raw data about the energy taken to run the program.
  5. Python loads the plugin, allowing it to have access to the results interface. This allows it to submit raw data, and summary data (for the machine learner).

The python interface is a boost python wrapper around the MAGEEC plugin, which gives a unified point of access to the results and machine learner. The two functions needed to be exposed for this:

   // test_id is the id returned by the compiler, generated from the binary
   // metrics/raw_metrics are the data collected.
   // good is whether the program executed correctly
   add_result(test_id, metrics..., good);
   add_raw_result(test_id, raw_metrics..., good);

Seperate results are submitted as 'summary' data, so that the machine learner does not need to handle platform specific data such as the number of runs required, the number of energy measurement points and how these are transformed into a single metric.