Skip to content

Data collection methodology

As researchers, we value standardization, reproducibility and comparability. To increase reproducibility, we have to decrease the room for variance in the collected data and potential errors. To increase comparability, we have to make support common data formats to store our data.

Reduced overhead for integrating new Sensors

Common approaches for data collection in other frameworks and custom applications

When dealing with data collection from Smartphones or Wearables, many things can go wrong along the way. First, we have to understand how to access a certain sensor via the corresponding native API's. Typically, we have to manage permissions associated with the sensors we want to use (e.g., Microphone access). Afterward, depending on the scenario, we have to serialize the collected data and upload it to a server. Additionally, we have to add measures to store data offline, to prevent loss of data when we have no connection to a network. All of this becomes even more complicated if it shall also work while the app is minimized and running in the background. Increasing restrictions on background sensing capabilities imposed by Android and iOS operating systems further complicate this process.

The minimal steps required for data collection from a Smartphone to a server are:

  1. Sensor access: Use the native API to retrieve data from the sensor.
  2. Data serialization: Serialize the data to a common format, e.g. XML, CSV, MP3 (for audio files) or PNG (for images).
  3. Data upload: Upload the data to the server, either using file uploads or REST-API calls.

See the figure below:

Additional steps might involve:

  • Local data storage and synchronization: Store data offline (temporarily), when no connection to a server is available, and synchronize when a connection is reestablished.
  • Scheduling data collection: Query sensors at specific times of the day or interval rates (e.g., 20HZ)
  • Background data recording: Ensure uninterrupted sensor access, data recording, and uploading while the app is minimized. This involves necessary measures to prevent the application to be suspended or terminated (e.g., via Services and/or WakeLocks) and to make sure that the application has the necessary permissions for a certain sensor to be able to query it from the background. For example, on recent Android versions, it is only possible to record audio data from a background, if the recording happens in a foreground service, which has to be had started while the app was in the foreground.

With naive approaches, all of these steps are done manually whenever a new sensor is integrated (see components with a dashed outline in the figiure above). In many applications, sensor data collection and upload are implemented as one chunk of monolithic software. This introduces room for error, especially asynchronous data upload (do not block the sensors while uploading) and synchronization are common pitfalls.

CLAID's approach to data collection

CLAID, on the other hand, employs a reflection system, that allows serializing arbitrary data types to common formats automatically. Data can be passed between Modules via Channels. Data from Channels can be serialized to files directly. We provide a DataSaverModule, which uses this feature to store recordings to the file system. Additionally, we also provide Modules for data upload and file synchronization. See how data collection works with CLAID below:

Adding support for a new Sensor only requires implementing the corresponding Module that realizes the sensor access (retrieving data from the sensor via the native API). This data can then be used by other Modules (e.g., to store or upload it) asynchronously.

TL; DR Integrating new Sensors with CLAID mostly only requires implementing the Sensor Module realizing access to the Sensor, while serialization and data upload is handled asynchronously by other Modules.

Increased flexibility

As reasoned above researchers and developers often focus on achieving reliable sensor data collection. Considerations of making implemented software parts (such as sensor access) reusable in different contexts are not taken into account. Since CLAID is based on Modules, it is possible to connect individual Modules as required and build arbitrarily complex applications. Consider the following figure:

As you can see, in comparison to many other approaches, with CLAID it is possible use outputs of any Module by any other Module. For example, not only can Sensor data be stored and uploaded to a server, but it can also be used directly by Machine Learning and Visualization Modules.