ZEMCH 2015 - International Conference Proceedings

for D2 contained a total of 9606 samples over the 56 days instead of the initial 80600 (See Figure 3 and Figure 4). 3.3.2 Chunk Data Approach (CDA): All the initial results obtained when training the models using the D2 and the TA approach, were really bad in terms of accuracy levels (40%-50% of accuracy). Comparing our results with the ones reported in publications associated with the dataset, we noted they achieved much higher accuracies even when using similar methodologies (e.g HMM). That made us think that the real difference was on the preprocessing of the data rather that in the algorithm of choice. Therefore, we also followed the preprocessing techniques they suggested. Under this scope, the data was processed in chunks instead of timeslices. Each chunk of data contained all the sensors events happened while an activity was active. The CDA approach also implied removing all the unlabelled data from the D2 and the final number of samples was a total of 600, which is the number of any activity occurrence through the whole dataset. Figure 3: Dataset 2 Activity occurrence. In the over 80000 samples generated by the TA approach, more than 70000 (white area) are unlabelled or ‘idle’. The coloured portion was the only information processed for this dataset. Figure 4: In this case, the unlabelled region corresponds to the red region, and is not as big as in D2 (between 7% and 12% of the total for any scenario). Therefore, for this dataset we can include the activity ‘idle’ to get as much information without incurring in classification failure as happened with the other dataset. 248 ZEMCH 2015 | International Conference | Bari - Lecce, Italy

ZEMCH 2015 - International Conference Proceedings | Página 250