ZEMCH 2015 - International Conference Proceedings | Página 250
for D2 contained a total of 9606 samples over the 56 days instead of the initial 80600 (See Figure
3 and Figure 4).
3.3.2 Chunk Data Approach (CDA):
All the initial results obtained when training the models using the D2 and the TA approach, were
really bad in terms of accuracy levels (40%-50% of accuracy). Comparing our results with the ones
reported in publications associated with the dataset, we noted they achieved much higher accuracies even when using similar methodologies (e.g HMM). That made us think that the real
difference was on the preprocessing of the data rather that in the algorithm of choice. Therefore,
we also followed the preprocessing techniques they suggested. Under this scope, the data was
processed in chunks instead of timeslices. Each chunk of data contained all the sensors events
happened while an activity was active. The CDA approach also implied removing all the unlabelled data from the D2 and the final number of samples was a total of 600, which is the number
of any activity occurrence through the whole dataset.
Figure 3: Dataset 2 Activity occurrence. In the over 80000 samples generated by the TA approach,
more than 70000 (white area) are unlabelled or ‘idle’. The coloured portion was the only information
processed for this dataset.
Figure 4: In this case, the unlabelled region corresponds to the red region, and is not as big as in D2
(between 7% and 12% of the total for any scenario). Therefore, for this dataset we can include the
activity ‘idle’ to get as much information without incurring in classification failure as happened with
the other dataset.
248
ZEMCH 2015 | International Conference | Bari - Lecce, Italy