Skip to content

Data collection on Mobile Devices

This tutorial is available on GitHub

TODO ADD LINK

In CLAID, Modules that realize Sensor access are termed Sensor Modules or Collectors. They post data to a channel based on intervall rates, at specific times or by certain events. Data posted to channels can be used by other Modules and can be serialized to multiple formats, like CSV, XML or binary, using the DataSaverModule. Special formats, e.g. MP3 for Audio, are available on a per-type basis. Channels also work across the network boundary, between connected instances of CLAID. Hence, Modules can also use Channels to synchronize data between one or multiple devices or servers, as done by our DataSyncModule. You can specify what Modules to load and how to connect them using configuration files. A simple data collection pipeline with CLAID looks as follows:

Note, that this is just the most basic configuration. Multiple Modules can use data produced by a Collector. For example, you could let a Machine-Learning-Module analyze incoming data, while a DataSaverModule is recording it. To learn more about how CLAID tackles data collection and how it is distinguished from other existing frameworks, refer to the page describing our data-collection-methodology. This tutorial will cover Collectors and the DataSaverModule. The DataUploadModule will be part of the next tutorial.

In sum, to use CLAID for data collection, you have to do the following steps:

  1. Create a configuration file and specify one or multiple Collectors to load, as well as their properties such as sampling rate and output channel(s).
  2. Specify a DataSaverModule and its properties, such as storage format and path.
  3. Run the configuration.

We will demonstrate this with 3 examples of data collection with CLAID in the following sections. They should work the same way across Android, iOS and WearOS. Please note, that data collection according to this tutorial will only work as long as the app is active (in the foreground) and the device is unlocked. We will cover continuous data collection from the background in subsequent tutorials.

A note on existing Collectors

Please note, that while we provide packages containing Collectors for common sensors on Android, iOS and WearOS it is not our goal to support all possible sensors out of the box. Instead, CLAID aims to streamline and standardize the process of data collection, making it easier to build complex data collection applications with various and complex sensors. Please read our philosophy on data collection in the introduction to this tutorial and on the separate page regarding our data-collection-methodology.

Installing the required packages

With CLAID, we provide existing packages for data collection. For data collection on mobile phones, we require the following packages:

  • DataCollection: This package, among others, contains the DataSaverModule, which allows recording data to files.
  • MobileDataCollectors: This package provides Sensor Modules that allow collecting data from common internal smartphone sensors such as the accelerometer, microphone or location.
  • MobileDataTypes: This package provides common data types that are required by the mobile data collectors. These datatypes include, among others, AccelerometerData, AudioData and LocationData.
  • Permissions: This package is required for Modules to manage permissions under Android and iOS.

You can learn more about the functionality and Modules they provide in our package overview. To install these packages, use the following:

Installing the required packages

On Linux and macOS:

claid install DataCollection MobileDataCollectors MobileDataTypes Permissions
On Windows (do not use PowerShell!):
%claid% install DataCollection MobileDataCollectors MobileDataTypes Permissions

Make sure to include these packages into your application. As discussed in the previous tutorial, we can do so in the CLAIDPackages.cmake file:

Including the packages in CLAIDPackages.cmake
CLAID_Include(ModuleAPI)
CLAID_Include(JavaCLAID)
CLAID_Include(DataCollection)
CLAID_Include(MobileDataTypes)
CLAID_Include(MobileDataCollectors)
CLAID_Include(Permissions)

Example 1 Recording the Battery Level to XML files

For starters, we will discuss a very simple example as a warm-up. The goal is to record the battery level of the device periodically using CLAID. A corresponding BatteryCollector is available via the MobileDataCollectors package.

Image title To use the collector, we have to create a configuration file for CLAID first. On Android and WearOS, this file can either be part of the assets folder of the APK, or be placed anywhere on the sdcard. For iOS, it is possible to make this file part of the bundle, or to place it in the application's shared storage. You can learn more about this and the different ways to load a configuration with CLAID in Tutorial Series 03. Since we used the claid manager to set up a project in the previous tutorial, a new empty config was already created for us (in the assets folder for Android, in the bundle for iOS). You can see it on the screenshot to the right.

Here, we can now include the Modules we want to use. For this example, we will require two Modules:

  • BatteryCollector: Module reading out the battery level and charging state and posting the data to a channel.
  • DataSaverModule: Module saving all incoming data from the Channel that the BatteryCollector posts to one or multiple files on the device.

To use those Modules with CLAID, we specify them in a configuration file and assign values to required properties. Many Collectors offer multiple options or "modes" to collect data, either periodically, at certain times or based on certain external events. In the latter case, typically other Modules can request data from a Collector. Different modes can be specified in the configuration as well. Note, that CLAID's Collectors are designed to be extensible. If you feel a data collection Mode is missing, you can add it to the code of the Collector.

For this example, we use the configuration file as seen below. This configurations allows to record the battery level and charging state at a frequency of 2Hz to an XML file. The recorded battery levels will be written automatically to one XML file for each hour. Check out the step-by-step explanations below for more details.

XML configuration
<JavaModule class="BatteryCollector.BatteryCollector">
    <PeriodicMonitoring>
        <rate>2Hz</rate> <!-- You can also specify seconds (s), milliseconds (ms) etc. -->
    </PeriodicMonitoring>
    <outputChannel>BatteryData</outputChannel>
</JavaModule>

<Module class="claid::DataSaverModule">
    <save>
        <what>BatteryData</what>
        <storagePath>/sdcard/CLAIDTutorial/BatteryData</storagePath>
        <fileNameFormat>Data_%d-%m-%y.xml</fileNameFormat>
        <serializer class="claid::XMLSerializer"></serializer>
    </save>
</Module>
Step-by-step explanation of the configuration file

<JavaModule class="BatteryCollector.BatteryCollector">
Specifies that we want to load a Module at runtime. The tag indicates that this Module is implemented in Java. On Android, we implement Modules that have to use native Android APIs in Java, common Modules can be written in C++. Using the class attribute, we specify the name of the Module to load. This name is the fully qualified java class name, i.e., "Package"."Class". In this case, we want to load the BatteryCollector Module, which is part of the Java package AndroidCollectors. Note that AndroidCollectors is not the name of the CLAID package. It is simply the Java package declaration as done in the code of the BatteryCollector.

<PeriodicMonitoring>
    <rate>2Hz</rate>
</PeriodicMonitoring>
This specifies that we want to sample (the battery data) periodic at 2Hz. You can also specify a period instead of the frequency using "1s" or "100ms" for example. The BatteryCollector also supports other Modues, such as , or more information here

<outputChannel>BatteryData</outputChannel>
Name of the Channel to which the BatteryCollector will post collected data to.

<Module class="claid::DataSaverModule">
Indicates that we want to load a Module at runtime. The tag is used for Modules written in C++. Using the class attribute, we specify the name of the Module to load, in this case, the "DataSaverModule". The naming convention for C++ Modules is "namespace::Class". We want to use the DataSaverModule store the retrieved battery information.

<save>
The tag is used to specify a description about what data to store (from which Channel), where and in which format. You can specify multiple descriptions for the same DataSaverModule, so that it can store data from multiple channels (or from the same Channel in multiple formats or at multiple destinations).

<what>BatteryData</what>
Indicates the Channel that we want to store data from. In this case, the "BatteryData" Channel as specified for the BatteryCollector Module above.

<storagePath>/sdcard/CLAIDTutorial/BatteryData</storagePath>
Root storage folder for the collected data. On Android, this will store incoming data in the local storage of the phone. Note that under Android /sdcard/ refers to the INTERNAL storage (not an SD card!), while ext_sdcard/ refers to an inserted SD card.

<fileNameFormat>Data_%d-%m-%y.xml</fileNameFormat>
Format describing how to name recorded Files. You can use time format identifiers to automatically name the files containing current date and time information (e.g., %y.%m.%d will be year.month.date of when the data was recorded).

Note: You can also use this to create multiple files automatically. For example, if you want to store battery data into seperate files on a per-minute basis, you can use: BatteryData_Minute_%M.xml The DataSaverModule will then automatically create a new file every minute and stores every incoming data within a minute in the corresponding file. You can also use this to specify subfolders. For example, let's say you want to store data every hour in a separate subfolder (under storagePath) and one file per hour, simply do: Data-%H/%M.xml Every hour, a new folder will be created under /sdcard/CLAIDTutorial/BatteryData/, and each subfolder will create 60 files, one for each minute. You should prefer few larger files, preferably distributed across different folders, over many small files in a single folder. If you have too many files in one folder, the file system will be slowed down significantly, and at some point you will not be able to collect data any more due to increased storage times

<serializer class="claid::XMLSerializer"></serializer>
Serializer to use for serializing the data to a certain format. CLAID's reflection system automatically allows to serialize arbitrary data types to different formats like XML, CSV or binary.

The properties of the two Modules are described further in the following (you can also check out the Step-by-step explanation of the configuration file above, for more details and examples!):

Description of properties for the Modules
  • PeriodicMonitoring: Specify a time interval either as period (e.g., 0.1s) or frequency (2Hz).
  • outputChannel: On what Channel to post collected data to.
  • save: List that can be used to specify what data (from which Channels) to save and where
    • what: Name of the Channel where incoming data will be posted to
    • storagePath: Path to a folder where data shall be stored
    • tmpStoragePath: Path to a folder where temporary files can be stored. Whenever a new file is started by the DataSaverModule, it copies all files from this folder to the storagePath folder.
    • fileNameFormat: Format describing how to name recorded files. You can use time format identifiers to automatically name the files containing current date and time information (e.g., %y.%m.%d will be year.month.date of when the data was recorded). If by this format, two subsequent samples will belong to different files, the DataSaverModule will automatically create new files accordingly. You can also specfify subfolders this way: %y.%m.%d/%M-%S.xml would store data in a folder year.month.date and create files for every Minute-Second.
    • serializer: Specifies what serializer shall be used to serialize incoming data. Common options are XML, CSV or Binary.

You can now rebuild and run the application in Android studio or Xcode. Using a file explorer on your PC, you should be able to see that new files will be created under the storage path we specified in the configuration on the device.

I do not see any files

In case you cannot see the files at the folder that we specified under in the configuration above, there might be a problem with storage access permissions. Please check whether all permissions are given to the application. You can typically do so by going to the settings page of your device and finding our CLAID application (AndroidCLAID, iOSCLAID or WearOSCLAID) there.
Since Android 11, the OS become increasingly strict in isolating data of individual apps from one another. In order to place files in shared storage spaces such as /sdcard/, the application now needs to behave as "storage manager", meaning it has to request advanced permissions separately. CLAID should handle this automatically by forwarding you to the corresponding settings page. In case it didn't work, you can learn more about this here.

Note that under Android /sdcard/ refers to the INTERNAL storage (not an SD card!), while ext_sdcard/ refers to an inserted SD card.

Example 2 Recording Accelerometer Data to CSV files

For the second example, we will use something a little bit more advanced than just collecting the battery level. Here, our goal is to measure acceleration using the accelerometer of the device. Typical ranges for the sampling frequency of an accelerometer in smartphone is roughly up to 300Hz. Therefore, the accelerometer allows us to test data recording with some higher sampling rates. An AcceleromterCollector Modules is available with CLAID via the MobileDataCollectors package aswell.

Consider the configuration below. With this configuration, the AccelerometerCollector will output samples at 50Hz, which are saved to a CSV file by the DataSaverModule.

XML configuration
<JavaModule class="AccelerometerCollector.AccelerometerCollector">
    <Periodic>
        <rate>50Hz</rate>
    </Periodic>
    <outputChannel>AccelerometerData</outputChannel>
</JavaModule>

<Module class="claid::DataSaverModule">
    <save>
        <what>AccelerometerData</what>
        <storagePath>/sdcard/CLAIDTutorial/AccelerometerData</storagePath>
        <fileNameFormat>Data_%d-%m-%y.csv</fileNameFormat>
        <serializer class="claid::CSVSerializer"></serializer>
    </save>
</Module>
Step-by-step explanation of the configuration file

<JavaModule class="AccelerometerCollector.AccelerometerCollector">
Specifies that we want to load a Module at runtime. The tag indicates that this Module is implemented in Java. On Android, we implement Modules that have to use native Android APIs in Java, common Modules can be written in C++. Using the class attribute, we specify the name of the Module to load. This name is the fully qualified java class name, i.e., "Package"."Class". In this case, we want to load the AccelerometerCollector Module, which is part of the Java package AndroidCollectors. Note that AndroidCollectors is not the name of the CLAID package. It is simply the Java package declaration as done in the code of the AccelerometerCollector.

<Periodic>
    <rate>50Hz</rate>
</Periodic>
This specifies that we want to sample accelerometer measurements periodic at 50Hz. You can also specify 0.02s as a period, instead of using a frequency. The AccelerometerCollector also supports other Modes, such as or more information here

<outputChannel>AccelerometerData</outputChannel>
Name of the Channel to which the AccelerometerCollector will post collected data to.

<Module class="claid::DataSaverModule">
Indicates that we want to load a Module at runtime. The tag is used for Modules written in C++. Using the class attribute, we specify the name of the Module to load, in this case, the "DataSaverModule". The naming convention for C++ Modules is "namespace::Class". We want to use the DataSaverModule store the recorded accelerometer data.

<save>
The tag is used to specify a description about what data to store (from which Channel), where and in which format. You can specify multiple descriptions for the same DataSaverModule, so that it can store data from multiple channels (or from the same Channel in multiple formats or at multiple destinations).

<what>AccelerometerData</what>
Indicates the Channel that we want to store data from. In this case, the "AccelerometerData" Channel as specified for the AccelerometerCollector Module above.

<storagePath>/sdcard/CLAIDTutorial/AccelerometerData</storagePath>
Root storage folder for the collected data. On Android, this will store incoming data in the local storage of the phone. Note that under Android /sdcard/ refers to the INTERNAL storage (not an SD card!), while ext_sdcard/ refers to an inserted SD card.

<fileNameFormat>Data_%d-%m-%y.csv</fileNameFormat>
Format describing how to name recorded Files. You can use time format identifiers to automatically name the files containing current date and time information (e.g., %y.%m.%d will be year.month.date of when the data was recorded). You can also use this to create multiple files automatically. For example, if you want to store accelerometer data into seperate files on a per-minute basis, you can use: AccelerometerData_Minute_%M.csv The DataSaverModule will then automatically create a new file every minute and stores every incoming data within a minute in the corresponding file. You can also use this to specify subfolders. For example, let's say you want to store data every hour in a separate subfolder (under storagePath) and one file per hour, simply do: Data-%H/%M.csv Every hour, a new folder will be created under /sdcard/CLAIDTutorial/AccelerometerData/, and each subfolder will create 60 files, one for each minute. Note: You should prefer few larger files, preferably distributed across different folders, over many small files in a single folder. If you have too many files in one folder, the file system will be slowed down significantly, and at some point you will not be able to collect data any more due to increased storage times

<serializer class="claid::CSVSerializer"></serializer>
Serializer to use for serializing the data to a certain format. CLAID's reflection system automatically allows to serialize arbitrary data types to different formats like XML, CSV or binary.

Example 3 Recording Audio Data to MP3 files

In example 3, we cover recording audio files via the Microphone. The provided MicrophoneCollector supports different recording modes. For example, you can record periodically in chunks (for example every 6 seconds) or at certain times of the day. In this tutorial, we will record audio data continuously, in chunks of 60 seconds. This means that every 60 seconds, one file is created. We will store the audio data in MP3 format. Different audio file formats are available. Check the configuration file below.

XML configuration
<JavaModule class="AndroidCollectors.MicrophoneCollector">
    <samplingRate>44100</samplingRate>
    <channels>MONO</channels>

    <ContinuousChunkRecording>
        <length>60s</length>
    </ContinuousChunkRecording>
    <outputChannel>AudioData</outputChannel>
</JavaModule>

<Module class="claid::DataSaverModule">
    <save>
        <what>AudioData</what>
        <storagePath>/sdcard/CLAIDTutorial/AudioData</storagePath>
        <fileNameFormat>Data_%d-%m-%y_%M.mp3</fileNameFormat>
        <serializer class="claid::MP3Serializer"></serializer>
    </save>
</Module>
Step-by-step explanation of the configuration file

<JavaModule class="AndroidCollectors.MicrophoneCollector">
Specifies that we want to load a Module at runtime. The tag indicates that this Module is implemented in Java. On Android, we implement Modules that have to use native Android APIs in Java, common Modules can be written in C++. Using the class attribute, we specify the name of the Module to load. This name is the fully qualified java class name, i.e., "Package"."Class". In this case, we want to load the MicrophoneCollector Module, which is part of the Java package AndroidCollectors. Note that AndroidCollectors is not the name of the CLAID package. It is simply the Java package declaration as done in the code of the MicrophoneCollector.

<ContinuousChunkRecording>
    <length>60s</length>
</ContinuousChunkRecording>
Recording audio data can be done in multiple ways with the MicrophoneCollector. When we record audio continuously from the microphone, typically we do not have one large audio file in the end, but are interested in smaller "chunks". Using "ContinuousChunkRecording" as recording mode, the MicrophoneCollector will continuously record data in chunks of equal size (here: 60 seconds), which will be posted to a Channel. Different modes of the MicrophoneCollector allow to start or stop recordings at certain times or by external events.

<outputChannel>AudioData</outputChannel>
Name of the Channel to which the MicrophoneCollector will post recorded data to.

<Module class="claid::DataSaverModule">
Indicates that we want to load a Module at runtime. The tag is used for Modules written in C++. Using the class attribute, we specify the name of the Module to load, in this case, the "DataSaverModule". The naming convention for C++ Modules is "namespace::Class". We want to use the DataSaverModule store the recorded accelerometer data.

<save>
The tag is used to specify a description about what data to store (from which Channel), where and in which format. You can specify multiple descriptions for the same DataSaverModule, so that it can store data from multiple channels (or from the same Channel in multiple formats or at multiple destinations).

<what>AudioData</what>
Indicates the Channel that we want to store data from. In this case, the "AudioData" Channel as specified for the MicrophoneCollector Module above.

<storagePath>/sdcard/CLAIDTutorial/AudioData</storagePath>
Root storage folder for the collected data. On Android, this will store incoming data in the local storage of the phone. Note that under Android /sdcard/ refers to the INTERNAL storage (not an SD card!), while ext_sdcard/ refers to an inserted SD card.

<fileNameFormat>Data_%d-%m-%y.csv</fileNameFormat>
Format describing how to name recorded Files. You can use time format identifiers to automatically name the files containing current date and time information (e.g., %y.%m.%d will be year.month.date of when the data was recorded). You can also use this to create multiple files automatically. For example, if you want to store accelerometer data into seperate files on a per-minute basis, you can use: MicrophoneData_Minute_%M.mp3 The DataSaverModule will then automatically create a new file every minute and stores every incoming data within a minute in the corresponding file. You can also use this to specify subfolders. For example, let's say you want to store data every hour in a separate subfolder (under storagePath) and one file per hour, simply do: Data-%H/%M.mp3 Every hour, a new folder will be created under /sdcard/CLAIDTutorial/AudioData/, and each subfolder will create 60 files, one for each minute. Note: You should prefer few larger files, preferably distributed across different folders, over many small files in a single folder. If you have too many files in one folder, the file system will be slowed down significantly, and at some point you will not be able to collect data any more due to increased storage times

<serializer class="claid::MP3Serializer"></serializer>
Serializer to use for serializing the data to a certain format. CLAID's reflection system automatically allows to serialize arbitrary data types to different formats like XML, CSV or binary. Keep in mind, that specifying MP3 for data other than audio data is invalid.

Bonus: Using all sensor simultaneously

To use the sensors from the previous examples simultaneously, you can combine them into one configuration file. The important step is to also combine the save instructions for the DataSaverModule. You can either combine the tags of the individual examples, or instantiate multiple DataSaverModules. The latter is to be preferred, if you expect the individual Collectors to produce a lot of data. The DataSaverModule saves all data in it's own thread. If you have too many Collectors producing a lot of data, one DataSaverModule might not be fast enough. To be sure, you can use one DataSaverModule per Collector, but this increases CPU load. Make sure you have installed the required packages discussed in the individual examples.

Combined XML configuration
<JavaModule class="BatteryCollector.BatteryCollector">
    <Periodic>
        <rate>2Hz</rate>
    </Periodic>
    <outputChannel>BatteryData</outputChannel>
</JavaModule>

<JavaModule class="AndroidCollectors.AccelerometerCollector">
    <Periodic>
        <rate>50Hz</rate>
    </Periodic>
    <outputChannel>AccelerometerData</outputChannel>
</JavaModule>

<JavaModule class="AndroidCollectors.MicrophoneCollector">
    <samplingRate>44100</samplingRate>
    <channels>MONO</channels>

    <ContinuousChunkRecording>
        <length>60s</length>
    </ContinuousChunkRecording>
    <outputChannel>AudioData</outputChannel>
</JavaModule>

<Module class="claid::DataSaverModule">
    <save>
        <what>BatteryData</what>
        <storagePath>/sdcard/CLAIDTutorial/BatteryData</storagePath>
        <fileNameFormat>Data_%d-%m-%y.xml</fileNameFormat>
        <serializer class="claid::XMLSerializer"></serializer>
    </save>

    <save>
        <what>AccelerometerData</what>
        <storagePath>/sdcard/CLAIDTutorial/AccelerometerData</storagePath>
        <fileNameFormat>Data_%d-%m-%y.csv</fileNameFormat>
        <serializer class="claid::CSVSerializer"></serializer>
    </save>

    <save>
        <what>AudioData</what>
        <storagePath>/sdcard/CLAIDTutorial/AudioData</storagePath>
        <fileNameFormat>Data_%d-%m-%y_%M.mp3</fileNameFormat>
        <serializer class="claid::MP3Serializer"></serializer>
    </save>
</Module>

Common issues

Common issues when storing data to files

If you run the examples above and notice that no files are created, this potentially can have multiple causes. Keep in mind the following aspects, even if you build data collection applications without CLAID:

Android & WearOS:

  • Since Android 11, we have something that is called scoped storage. By introducing scoped storage, Apps can only access their own app-specific directory by default. Accessing all files on the internal storage (/sdcard/) has become a bit more complicated, as only "externalStorageManager apps" are allowed to do so. This requires the user to allow an App to act as storage manager on a separate settings page. On Android 10, we additionally have to specify the requestLegacyExternalStorage flag.
  • In general, check whether storage permissions have been granted on the Apps settings page.

iOS:

All systems:

  • make sure to not store many small files in few directories. If you for example store data of each second in a separate file, this will clutter the file system, resulting in increased response times. It might happen, that you can not store files anymore, since the file system reacts too slow. Avoid this situation, as deleting all the files to fix the problem would be very slow aswell