Collaborative Sensing in Cellular Networks

One of the potential solutions for tackling the problem of RF congestion is integrating sensing and communication infrastructures in a manner where they use the same frequency band in a controlled way. In our last paper, which is submitted to Journal of Selected Areas in Communications (JSAC), we have proposed a mechanism to perform multi-radar sensing in Cellular Networks via NR Sidelink (link to the ArxiV version). In this article I want to briefly discuss the idea and give some insights on collaborative sensing using multiple FMCW mmWave radars (the working principles of this type of radars are discussed in the previous post).

To begin with, let’s briefly discuss the sidelink capability which is recently introduced by 3GPP to reserve channel resources in the system. The LTE/NR sidelink feature (PC5 interface) has been introduced first in LTE releases 12 and 13 for device-to-device communication. Specifically, a user equipment (UE) device may request resources for sidelink communication from the corresponding gNB. According to the standard, the sidelink communication then occurs via the Physical Sidelink Shared Channel (PSSCH). In this work we utilize this functionality to perform sensing as shown in Fig.1. For more information about the sidelink implementation’s detail you can read our paper on ArxiV.

Integration of sensing capabilities
Fig. 1: Integration of sensing capabilities into next generation cellular communication systems exploiting sidelink device-to-device communication

Now we assume that we already have a resource allocated by base-station to perform sensing. In this paper we use 8 radars simultaneously to sense the environment from 8 different angles. First, the relative orientation of the RF-interface and the subject (e.g. in gesture sensing) or (moving) object (e.g. environmental perception) significantly impacts the observed electromagnetic pattern and hence the recognition accuracy. Second, we can combat shadowing effect that happens when the area of interest is shadowed or even completely blocked in the environment. We demonstrate this impact in an experimental study with 15 subjects and more than 25,000 data samples.

Data processing and aggregation pipeline
Fig. 2: Data processing and aggregation pipeline for multi-radar RF sensing through NR-sidelink

Fig. 2 shows the data processing pipeline for collaborative sensing in NR Sidelink. Each UE (in this case FMCW mmWave radar) performs independent sensing in the reserved resource, then they apply pre-processing, segmentation and feature extraction steps, and finally using communication capabilities they share the features to identify the correct label of the gesture performed by the participant in the environment. In this work we study gesture recognition scenario using the proposed mechanism. Although using multiple NR sidelink-based radars for sensing the environment can potentially address the shadowing effect and the low-resolution of the radar in z-axis, it also gives rise to issues due to the interference between radars including noise floor degradation, blind-spots at certain ranges or directions, as well as ghost objects. How do we address this problem?

Crossing and parallel chirp interference
Fig. 3: Crossing and parallel chirp interference between transmitted chirp (victim) and aggressor chirp

Two different types of interference can occur with FMCW radars: crossing chirp interference and parallel chirp interference. As shown in Fig. 3.a, crossing interference occurs when one chirp (referred to as aggressor in the following) crosses the chirp of another chirp (referred to as victim, since it is falling victim to the interference of the aggressor). This type of interference typically increases the noise floor resulting in a reduction in Signal to Noise Ratio (SNR) of the real targets thereby affecting detection and creating momentary blind-spots. The glitch duration in crossing interference is given by:

glitch\_duration = \frac{bandwidth}{|slope_{aggressor} - slope_{victim}|}

According to Equation 1, the glitch duration for two crossing interferers is typically low and affects few samples. The parallel interference is shown in Fig. 3.b. This type of interference occurs when the aggressor chirp and the victim chirp have the same slope. If the delay in the start of a chirp between different radars is within microsecond, the aggressor chirp will be within the bandwidth of the entire chirp of the victim. This type of interference results in ghost objects at random distance with random velocity that do not exist in the environment but are detected by the radar. Since such interference will occur only when the NR sidelink operating radars start nearly simultaneously, the probability of it is small. During the experiments we employed the built-in capability of the radar in interference detection to avoid any interference issues in the recorded dataset.

Now, let’s go to the actual gesture recognition model we introduced in the paper to recognize gestures performed by participant using 8 radars. To do so we introduced two different approaches: orientation independent and orientation tracking shown in Fig. 4. The main difference is that in orientation tracking for each angle we have a separate feature extraction learners while in orientation independent we have a single feature extractor for all the angles.

Schematic view of the two proposed approaches
Fig. 4: Schematic view of the two proposed approaches: (a) orientation independent and (b) orientation tracking

What is the encoder block in Fig. 4? The radar we use in this work is IWR1443 by Texas Instrument (TI). A pre-processing pipeline is applied on I/Q data to generate point clouds. Consequently, the encoder block shown in Fig. 4 takes in 4D point clouds and produces a representation vector. The architecture of the graph-based encoder implemented using Message Passing Neural Network (MPNN) paradigm is shown in Fig. 5.

Schematic of the proposed encoder
Fig. 5: Schematic of the proposed encoder. (a) Point cloud generated by the radar for a gesture; (b) The graph representation of the gesture after applying the proposed K-NN to reflect the temporal dependency; (c) Edge features for each incident edge at a central point i are calculated in this step; (d) The representation for all the nodes are calculated by applying an aggregation function over the edge features

In Fig.4 for pooling purposes we propose four different mechanisms: max pool, vote pool, attention pool, and orientation tracking which are discussed in the paper. Now, it is time to briefly discuss the experiment setting. The experimental setup is shown in Fig. 6. As you can see we have eight different radars equally spaced on a circle around the participant with a distance of 1.5 m. The participant always faces the radar labeled 0o. We collected 10 repetition from each of the 21 classes of gestures of 15 participants from 8 different angles.

Experimental setup for data collection
Fig. 6: Experimental setup for data collection

As shown in Table 1, two of our proposed models (attention pooling and orientation tracking) outperform state of the art. Especially the orientation tracking approach achieves 100% accuracy. In this experiment we assumed all of the 8 different angles are available in the training and inference phases.

Comparison with the state of the art on the collected dataset
TABLE I: Comparison with the state of the art on the collected dataset when all the angles are available in the inference phase. The best results per column are denoted in bold typeface.

In another experiment we studied the effect of having less than 8 angles in the inference phase. So, the models are trained on 8 angles but inferred on less than 8 angles. As shown in Fig. 7, as we remove more angles the accuracy drops for all the models. But the drop for most of the models is dramatic. For example, in case of PointLSTM and PointGest the accuracy drops from 98% to almost 10% when we have only a single angle available in the inference phase. However, for our best-performing model, orientation tracking, the accuracy is around 80% even for a single angle.

The performance of the proposed models compared to the baselines
Fig. 7: The performance of the proposed models compared to the baselines when not all the angles are available in the inference phase. Different number of angles are randomly removed in the inference phase.

In Fig. 8, the importance of each angle for each gesture in the orientation tracking approach is shown in a polar chart. For most gestures the radars in front (0 ◦ , 45◦ , and 315◦ ) have the highest impact, while the radar at the back (180◦ ) has the least importance. However, for few gestures like (q), (r), and (u) where the hands are visible from a back view, the impact of the radar at 180◦ increases. Moreover, for the gestures that happen on one side of the body, e.g., (d) and (n), the radars on the same side are of higher importance compared to the radars on the other side.

Gesture set used in the experiments
Fig. 8: Gesture set used in the experiments: (a) ‘lateral raise’, (b) ‘two-hand lateral-raise’, (c) ‘two-hand lateral-to-front’, (d) ‘lateral-to-front’, (e) ‘two-hand inward circles’ (f) ‘left-arm circle’, (g) ‘two-hand outward circles’, (h) ‘right-arm circle’, (i) ‘lift’, (j) ‘pull’, (k) ‘two-hand pull’, (l) ‘push’, (m) ‘two-hand push’, (n) ‘push-down’, (o) ‘swipe right’, (p) ‘swipe left’, (q) ‘throw’, (r) ‘two-hand throw’, (s) ‘circle counter-clockwise’. (t) ‘circle clockwise’, (u) ‘arms swing’. In right side of each gesture, the importance of radars for the orientation tracking (the best) model is shown in a polar chart

In conclusion, we have proposed a mechanism for RF-convergence in cellular communication systems. In particular, we suggest to integrate RF-sensing with NR sidelink device-to-device communication, which is since the release of 12 part of the 3GPP cellular communication standards. We specifically investigated a common issue related to NR sidelink based RF-sensing, which is its angle and rotation dependence. In particular, we discussed transformations of mmWave point-cloud data which achieve rotational invariance as well as distributed processing based on such rotational invariant inputs at distributed angle and distance diverse devices. Further, and to process the distributed data, we proposed a graph based encoder to capture spatio-temporal features of the data as well as four approaches for multi-angle learning. The approaches are compared on a newly recorded and openly available dataset comprising 15 subjects, performing 21 gestures which are recorded from 8 angles. We were able to show that our data aggregation and processing tool chain outperforms the state-of-the-art point cloud based gesture recognition approaches for angle-diverse gesture recordings.

Dariush Salami

I am a Ph.D. candidate in networking and machine learning at Aalto University and Early Stage Researcher (ESR) in ITN WindMill project. I have graduated from the Amirkabir University of Technology in Artificial Intelligence (AI). My master thesis was about fraud detection in the communication industry. Now, I am working on Radio Frequency (RF) sensing using mmWave FMCW radars. Machine Learning (ML), graph based neural networks, machine vision, and Natural Language Processing (NLP) some of the topics that I am interested in.