Data Collection Layer
Structure and composition of the data collection layer
The data collection layer is a key component of the IoT platform that provides integration with devices. It includes a set of specialized applications, each designed to interact with a specific type of IoT devices (meters, sensors, etc.). These applications function as network services, handling incoming connections through dedicated ports and managing data transfer between devices and the platform. This architecture guarantees scalability and flexibility when working with heterogeneous equipment.
Note
For added security, resilience, and ease of installation and upgrades, applications are packaged in Docker containers. The container set is managed by Docker Compose technology, which allows linked containers to be combined into a single control panel.
The structure and composition of the data collection layer applications are summarized below:
flowchart TD A(Household energy meters) <-->|Data transfer| B("Data collection daemon (Type-J)") C(Industrial energy meters) <-->|Data transfer| D("Data collection daemon (Type-T)") E(Smart sensors) <-->|Data transfer| F("Data collection daemon (DDT)") subgraph B("Data collection daemon (Type-J)") <--> I{"Balancer (Pgbouncer)"} D("Data collection daemon (Type-T)") <--> I{"Balancer (Pgbouncer)"} F("Data collection daemon (DDT)") <--> I{"Balancer (Pgbouncer)"} X@{ shape: braces, label: "Data Polling Subsystem" } end I{"Balancer (Pgbouncer)"} <--> J[(DBMS PostgeSQL)]
Exchanges with end devices are performed over a TCP connection, with each of the collection daemons implementing its own application exchange protocol serving a particular type of IoT device.
Architecture Features
- Each container includes an application specialized for a particular protocol.
- Lightweight base images are used (
alpine
in the base configuration), to minimize overhead. However, it is possible to build images for the customer’s OS. - To manage dependencies between services (such as database access),
depends_on
andhealth-check
sections are specified in Docker Compose. - The compose daemons run in multiple instances (replicas) to distribute the load. In this case, each collection daemon can serve tens of thousands of connections simultaneously.
- Balancer (HAProxy, Nginx Stream) directs traffic to available daemons using Round Robin or Least Connections algorithms.
- In cloud environments, you can add automatic scaling based on metrics (CPU, number of connections).
- All communication with appliances is logged in detail in files, allowing it to be analyzed in case problems and failures are detected.
Data Exchange Principles
Below is a diagram of a communication session between an IoT device and the collection daemon:
sequenceDiagram autonumber participant D as IoT device participant I as Data collection
daemon Note over I,D: Stages of interaction D->>I: Session start, authentication I-->>D: Sending request parameters D->>I: Data acquisition I-->>D: Sending the next session time
Principles of data exchange organization:
-
Initiation of a communication session.
A communication session is always initiated by the IoT device. After establishing a connection to the data collection daemon, the device transmits an identity packet. -
Authentication Procedure
The data collection daemon authenticates the device. If the authentication is successful, it performs:- Retrieving the saved device configuration from the database.
- Sending the actual interface settings to the device.
-
Data transfer
Based on the received request, the device transfers to the collection server:- Current device readings.
- Archived records (if any).
-
Schedule the next session
After data reception is complete, the collection daemon:- Generates the next session timestamp.
- Transmits next session timestamp to the device.
- Initiates connection termination.
-
Data Processing and Storage
Received information:- Aggregates into structured JSON objects.
- Stored in an intermediate PostgreSQL database.
- Formatted according to the protocol specification of a particular daemon.
-
Control System Integration
Data becomes available to the upper level of the system (management subsystem) for:- Further analytical processing.
- Visualization in management interfaces.
- Generation of automated reports.
Interaction with the management layer of the IoT platform
Subsystem Interconnection Architecture
The data collection daemons perform integration with the top-level system (management layer) through an intermediate database. To realize this interaction, a Docker container with a deployed PostgreSQL DBMS is used.
Note
The management subsystem stores all the necessary configuration of the polled IoT devices in this database. Data collection daemons use these settings to establish communication with the devices and perform data exchange operations.
The results of successful interaction with devices are automatically saved by daemons to a specialized table of the same database, providing transparency of data transfer between the control level and devices.
The scheme of subsystems interaction is presented below:
flowchart LR A@{ shape: procs, label: "Data Polling Subsystem"} -->|Readings data| B[(СУБД PostgeSQL)] B[(DBMS PostgeSQL)] -->|Configuration| A@{ shape: procs, label: "Data Polling Subsystem"} B[(DBMS PostgeSQL)] -->|Readings data| C@{ shape: procs, label: "Control Subsystem"} C@{ shape: procs, label: "Control Subsystem"} -->|Configuration| B[(СУБД PostgeSQL)]
Device Configuration
The device configuration includes the following information:
- Dates of last readings by archives: hourly, daily, monthly. These parameters indicate from which date the data of the corresponding archive should be read out.
- Metering device code (for IoT devices using Type-T protocol). This parameter is required only for the collection daemons serving the Type-T protocol and indicates to the collection server which algorithm to use to work with the device.
- Serial Port Speed (for IoT devices running Type-T protocol). This parameter tells the IoT device how fast it should communicate with the meter. The collection daemon sends it at the beginning of an exchange session.
- Communication Schedule. The management subsystem saves all schedules to an intermediate database, and the collection daemon calculates the nearest date from the received schedules and sends it to the device.
- Device Control Commands. The management commands, depending on the type of IoT device and its capabilities, include:
- A command to update the software of the telemetry part of the IoT device. Upon receiving this command, the device downloads new firmware from the server and performs the update.
- Command to proofread the configuration of the metering device.
- Command to close/open the valve (if there is a valve and support from the IoT device software).
- Command to set parameters of metering device operation (depends on the model).
- Command to reboot the IoT device.
The main part of the device configuration is stored in an intermediate database, in JSON format.
An example of the configuration is shown below:
{"settings": "1", "day_event": 1706140800, "hour_event": 1706140800, "month_event": 1673857740, "net_address": 58}
where day_event
, hour_event
, month_event
- time stamps in unixtime
format of the most recently saved records of daily, hourly and monthly archives, respectively.
Schedules are stored separately, also in JSON format. At the same time, the schedule string itself is formed and processed in CRON format. Using this approach provides flexibility in setting and processing of any schedules.
Below is an example of storing several schedules for the device:
[{"crontab": "10 * * * *", "schedule_id": 6}, {"crontab": "40 * * * *", "schedule_id": 7}]
According to these schedules, the device should go out at 10 and 40 minutes of each hour. In this case, the demon itself determines the most suitable schedule at the moment of device communication and transmits the nearest one.