Difference between revisions of "Deliverable 5.2"
m (→Status: Ongoing, options identified) |
(→Status: Ongoing, options identified) |
||
(One intermediate revision by the same user not shown) | |||
Line 7: | Line 7: | ||
Via experimentation, primarily using the WEKA framework, along with discussions from the MILEPOST team, we have already identified the following as possible core machine learning algorithms: | Via experimentation, primarily using the WEKA framework, along with discussions from the MILEPOST team, we have already identified the following as possible core machine learning algorithms: | ||
− | * Decision Tree J48 | + | * '''Decision Tree J48''' |
− | * KNN | + | * '''KNN''' |
− | * SVM | + | * '''SVM''' |
* PART | * PART | ||
* RIDOR | * RIDOR | ||
Line 26: | Line 26: | ||
In terms of validation methods, the following are good candidates: | In terms of validation methods, the following are good candidates: | ||
− | * 10 Cross Fold Validation | + | * '''10 Cross Fold Validation''' |
* Leave One Out Validation | * Leave One Out Validation | ||
Line 35: | Line 35: | ||
It has been researched that LOOV is not quite as optimal as 10 Cross Fold, and there will be added benefits of 10 Cross Fold being faster in this case. Both models are approximately unbiased, with 10 fold having slightly less variance which is preferred. (Efron, 1983) | It has been researched that LOOV is not quite as optimal as 10 Cross Fold, and there will be added benefits of 10 Cross Fold being faster in this case. Both models are approximately unbiased, with 10 fold having slightly less variance which is preferred. (Efron, 1983) | ||
+ | Work is being done for combination of algorithms and thus far '''J48+KNN''' performs well. | ||
− | This leaves a choice between J48, KNN & SVM to be made by the end of September. We will do this taking into account what our training data will be, how long these algorithms will take to train, as well as practicality, performance and reliability. Currently research is tending towards J48 due to its simplicity, ease to translate to C code and very fast creation and evaluation of test data. | + | This leaves a choice between '''J48''', '''KNN''' & '''SVM''' to be made by the end of September. We will do this taking into account what our training data will be, how long these algorithms will take to train, as well as practicality, performance and reliability. Currently research is tending towards J48 due to its simplicity, ease to translate to C code and very fast creation and evaluation of test data. |
Latest revision as of 14:44, 3 September 2013
Deliverable 5.2: Selection of Core Machine Learning Algorithms
Status: Ongoing, options identified
Via experimentation, primarily using the WEKA framework, along with discussions from the MILEPOST team, we have already identified the following as possible core machine learning algorithms:
- Decision Tree J48
- KNN
- SVM
- PART
- RIDOR
- Association Rules
- Neural Nets
Of these, the following have been discarded as unsuitable:
- RIDOR - Pruned tree algorithm with J48 outperforming
- PART - Another tree algorithm that J48 outperforms
- Association Rules - will tend to over-fit our data
- Neural Nets
Evaluation has been done currently based on the BEEBS framework available for sample evaluation which will be extended for the training set later on.
In terms of validation methods, the following are good candidates:
- 10 Cross Fold Validation
- Leave One Out Validation
Of these, the following have been discarded as unsuitable:
- SVM - Will take too long to train on the data
- Leave One Out Validation
It has been researched that LOOV is not quite as optimal as 10 Cross Fold, and there will be added benefits of 10 Cross Fold being faster in this case. Both models are approximately unbiased, with 10 fold having slightly less variance which is preferred. (Efron, 1983)
Work is being done for combination of algorithms and thus far J48+KNN performs well.
This leaves a choice between J48, KNN & SVM to be made by the end of September. We will do this taking into account what our training data will be, how long these algorithms will take to train, as well as practicality, performance and reliability. Currently research is tending towards J48 due to its simplicity, ease to translate to C code and very fast creation and evaluation of test data.