Following this series of research articles, today we will briefly discuss some of our recent contributions of clustering applied to Wireless Networks. By definition, clustering consists of grouping a set of elements according to some criteria (often the similarity among these elements). We will start with a general review of the topic and then will discuss recent applications to wireless communication.
A simple example
Even if you have never heard of any of the (possible) fancy mathematical definitions of clustering, surely you have already used it in your day-to-day life. Let us consider a simple scenario. Imagine that you and your family had lunch, and today is your day to wash and dry the dishes. You just finished doing so and, now, you also have to put them in their respective places. As a house rule, you should keep the mugs on the second shelf, such that the white ones are on the right-hand side and the dark ones on the left. You should also put the plates, cups, and all the other items on their respective shelves. You organize them like this so that it becomes easier for you to find them for your next meal.
Intrinsically, while organizing these items, you are categorizing (i.e., clustering) them according to their specific features or purposes. For instance, while organizing the cups, you arrange them according to visual features, while you distinguish the other elements based only on their purpose. This is a simple example, but we can see the difference between raw/observed features (e.g., color, texture, format) and hidden/latent features (e.g., purpose). The latter ones are usually considered as a “summary” of a set of features, for instance, in our scenario, a mug is expected to have a specific shape and is used to drink something.
A more solid definition
Jumping towards a more theoretical point-of-view, clustering analysis is the task of grouping elements such that the (overall) similarity between the elements in the same cluster (or group) is larger than the (overall) similarity between elements of different groups. Let be a -dimensional set of points drawn from different processes. The optimal goal of clustering is to form groups:
where is some pre-defined similarity measure. A trivial solution consists of exhaustively searching for the solution, but one can readily see that when or grows large this is not practical. Hence, several clustering solutions have been proposed in the literature, for instance, means spectral clustering, hierarchical clustering, statical methods and deep clustering solutions, to name a few.
The Curse of Dimensionality
A common problem that persistently appears in most recent clustering tasks is the curse of dimensionality. The simple explanation of the problem is that the larger the observation space is, i.e., the larger , the sparser the data in it becomes. As a consequence, it becomes harder to define the similarity in this large dimensional space as all observations seem to be unrelated (far away from each other). In this scenario, a common solution is trying to map data from this large dimensional space into a lower one where data has better properties. A typical example is the fact that face-clustering problems can be seen as lower-dimensional (when compared to the number of modern digital cameras) subspace clustering problems .
Applications to Wireless Networks
Among the several applications of clustering to Wireless Networking, we focus on two publications originating from the Windmill ITN project, namely: Pilot Assignment for Cell-Free Massive MIMO and User Clustering for Rate Splitting.
Pilot Assignment in Cell-Free mMIMO
Cell-free massive Multiple-Input Multiple-Output (MIMO) type communication has been proposed as a distributed solution in which Access Points (APs) jointly serve a few User Equipments (UEs). This type of network is usually designed in a user-centric fashion, in which the UE is surrounded and served by multiple APs. Hence, because the number of orthogonal pilots is limited, the same pilot may be assigned to two different but relatively close links (a connection between a UE and an AP), this is known as pilot contamination and can potentially limit the network performance . An open challenge is how to come up with appropriate pilot reuse patterns.
In the cell-free mMIMO context, the pilot assignment task can be seen as a maximally diverse clustering problem, i.e., defining clusters such that the similarity of its elements (pilots) is minimal. With that in mind, our colleagues in , have proposed a heuristic algorithm for repulsive clustering that is directly inspired by the needs of Cell-free mMIMO systems. The proposed method is compared against random and greed pilot assignments, as well as Oracle pilot assignments (without contamination) and shown to have higher per-user throughput than the compared methods.
User Clustering for Hierarchical Rate Splitting
Still, in the mMIMO context, clustering has also been applied to the context of precoding. One of the pioneer ideas is the so-called two-tier precoding schemes that consider (as the name suggests) two-step precoding. The first precoder divides users into orthogonal groups to mitigate inter-group interferences while the second precoder reduces the intra-group interference. More recently, the idea of Rate Splitting has also been proposed. In its more simplistic method (1-layer rate splitting), each message is divided into a private (intended for a unique receiver) and a common part (intended for all the receivers). The idea is to superpose both messages at the transmitter side and the receiver side, and retrieve the message in two steps: 1) obtain the common part by considering the desired private message as interference and the undesired ones as noise and; 2) after removing the common message from the received signal obtain the desired private message.
For this scheme to work, it is necessary to ensure that every receiver is capable of retrieving the common message. In the presence of large number of receivers, it is often the case that this results in large portion of the power being designed to the common message. In these conditions, relying on multiple common streams (generalized rate splitting) leads to higher multiplexing gains, but at the cost of high complexity at the decoder caused by the several layers of successive interference cancellations. To tackle the increasing complexity of generalized RS while having a small loss in multiplexing, the authors in  consider a 2-layer Hierarchical Rate Splitting (HRS) transmission mechanism. In this scenario, users are considered to be divided into groups and required to decode three messages: two common messages and a private one.
In our recent work  accepted at EUSIPCO 2022, we propose using a shallow neural network to learn and cluster the users based on the instantaneous noisy channel such that the resulting cluster maximizes the communication rate obtained after applying 2-layer HRS transmission. The proposed technique is able to achieve a rate comparable with current works while being less complex compared to other techniques.
 Vidal, R. (2011). Subspace clustering. IEEE Signal Processing Magazine, 28(2), 52-68.
 Kimes, P. K., Liu, Y., Neil Hayes, D., & Marron, J. S. (2017). Statistical significance for hierarchical clustering. Biometrics, 73(3), 811-821.
 Mohebi S, Zanella A, Zorzi M. Repulsive Clustering Based Pilot Assignment for Cell-Free Massive MIMO Systems. arXiv preprint arXiv:2203.12403. 2022 Mar 23.
 Pereira R, Deshpande AA, Vaca-Rubio CJ, Mestre X, Zanella A, Gregoratti D, de Carvalho E, Popovski P. User Clustering for Rate Splitting using Machine Learning. Accepted for publication on IEEE European Signal Processing Conference (EUSIPCO), Aug. 2022.
 T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans. on Wireless Communications, vol. 9, no. 11, pp. 3590–3600, Nov. 2010.
 Dai M, Clerckx B, Gesbert D, Caire G. A rate splitting strategy for massive MIMO with imperfect CSIT. IEEE Transactions on Wireless Communications. 2016 Mar 16;15(7):4611-24.