# Europe Dataset (Base Stations Big Data)

• ### Data Description

• ###### ITALIAN DATASET DESCRIPTION

We initially focus on the Emilia-Romagna region of Italy, which is covered by four different cellular operators (referred as A, B, C and D in the following). Table 1 reports the main features of the considered dataset. The total number of deployed BSs considering the whole set of operators is more than 4900 BSs. Focusing then on each operator, the number of deployed BSs is similar for operator A and operator B, while it is slightly lower for operators C and D. More in depth, operator D reuses part of the cellular infrastructure of the two largest operators to guarantee coverage in the zones not covered by its own BSs. As for the morphological characteristics of the area, the whole region spans over 22000 km^2, which includes rural areas, town areas and one metropolitan area. This is also reflected in the number of subscribers, which is larger than 6.5 millions in total, with the largest number of subscribers living in the metropolitan area. Finally, the average BS density (i.e., the total number of deployed BSs for each operator over the total region), is always lower than one, due to the fact that in rural areas less BSs are deployed compared to urban ones. However, the density is larger for operator A and B, and slightly lower for the other operators.

• ###### CROATIAN DATASET DESCRIPTION

In addition to the Italian dataset, we have considered the set of BSs sites having freestanding masts from the country of Croatia. In particular, more than 2600 BSs are deployed in an area of around 56000 km^2. The database is composed of the BSs sites owned by the telecom operators currently active in Croatia, serving in total more than 4.6 millions of users. The morphological characteristics of the country include one large metropolitan area around the capital Zagreb, different rural zones, and one coastal zone including most of tourist attractions. In addition to the BSs sites actually deployed in the network, the positions of planned BSs sites to be installed in the future is also provided, considering a vast region in the north of the country. Moreover, Table 2 reports the characteristics of each scenario in terms of: number of considered BSs, size of the area, and average BS density.

• Model Description

The mathematical models adopted here are the same with that in China dataset.

• Case-Studies Results

Given the BS positions in each scenario, we then compute the empirical spatial distribution of the BS density. Initially, we sample each scenario with a small area of fixed size. We then randomly select 10000 squares of fixed area size. For each square, we compute the number of BSs falling into it. This number, divided by the area size, represents the BS density. From the BS densities, we derive the PDF. This spatial distribution is then used as reference one vs. the possible candidates (i.e., Poisson, GP, Weibull, Lognormal and  $$\alpha$$-Stable). For each candidate distribution, we estimate the unknown parameters by applying the Maximum Likelihood Estimation (MLE) criterion. For estimating the parameters of the $$\alpha$$ -Stable distribution, we adopted a similar procedure like the one reported in, due to the fact that the closed-form for this distribution does not always exist.

We initially focus on the urban area of the Italian scenario. As a showcase, we compute the PDF of BS density with a sample area of size 10 *10 km^2. Moreover, we have taken into account the BSs from all the operators in order to maximize the number of BSs under consideration. Fig. 3 reports the empirical PDF (i.e., the real one) with the fitting of various candidate distributions. Interestingly, the best fitting is obtained with the $$\alpha$$-Stable distribution, while the other ones perform consistently worse.

In the following, we have computed the Root Mean Square Error (RMSE) of the different fittings against the empirical PDF. This metric is useful to capture the fitting accuracy of the considered distribution. In this case, for modeling generalization purpose, we have also considered the variation of the sample area between 5*5 km^2 and 11*11 km^2 in the scenario. Recall that for each sample area size we randomly select 10000 samples in the scenario. Fig. 4 illustrates the obtained results. Obviously, the $$\alpha$$-Stable is the best fitting for all the considered sample areas, with a RMSE always lower than 0.3. On the other hand, the Poisson distribution exhibits a RMSE always larger than 0.5, thus confirming our intuition that it is not suitable to capture the spatial density distribution of real BSs.

Furthermore, we have investigated the impact of single operators. Fig. 5(a) reports the RMSE values for each single operator. Recall that A and B exhibit the largest number of BSs, while operator D tends to exploit the BSs of the other operators to provide user coverage. Surely, the $$\alpha$$-Stable is the best fitting for operators A, B and C. On the contrary, for operator D the $$\alpha$$-Stable RMSE is lower than the Poisson distribution but higher than the other ones. This is due to the fact that this operator does not spread its own BSs in the same way like the other ones, resulting in a different density distribution. Moreover, we can see that the RMSE tends to increase from left to right (i.e., towards operators with less BSs). To give more insight, Fig. 5(b) provides the results when multiple operators are considered to compute the BS density.

Interestingly, the $$\alpha$$ -Stable fitting tends to be almost constant, while the RMSEs of the other distributions tend to increase when the number of operators is decreased. In particular, operator A, which is also the largest one, has deployed its BSs in the scenario in order to always provide coverage to users with its own BSs. On the contrary, operator D tends to lease the infrastructure from the other operators. The case with the single operator A matches better a complete BSs deployment in the targeted region, resulting in a low RMSE with the $$\alpha$$-Stable model.

In the following part we have taken into account the impact of various cellular networking technologies on the BS density. Together with each BS position, in fact, our dataset includes information about the technology, which can be GSM, UMTS, LTE, or not specied. Each BS entry in the BS database includes a list of the supported technologies. Specifically, by manually checking in the BS database, we have found that the UMTS service is always provided in the considered region, except for the BSs for which the technology is not specied. At the same time, when the LTE service is provided, also GSM and UMTS services are available. Therefore, we have considered the following categories: GSM/UMTS, GSM/UMTS/LTE, or the entire dataset (i.e., including the BSs for which the technology is not specified). For each category, we have then computed the empirical PDF as well as the distribution fitting. Table 2 describes the obtained RMSE values. Once again, these results confirm that the $$\alpha$$ -Stable fitting reaches the highest accuracy in this scenario, while all the other distributions have a RMSE at least more than doubled.

In the following, we have moved our attention to the Italian rural scenario. Differently, from the previous case, in this scenario there are no big towns, and the BS distribution over the territory is rather sparse. In order to evaluate the behavior of the different distributions, we have computed the RMSE for different sample area sizes, and for different technologies, as reported in Fig. 6. As expected, the Poisson distribution does not adhere to the empirical distribution, resulting in the highest RMSE. The other distributions tend to have a lower RMSE. Among them, the best candidate is the Lognormal distribution in most of the cases. On the contrary, the $$\alpha$$-Stable distribution exhibits a higher RMSE than the Lognormal one (but lower than the Poisson one). This fact is confirmed across the different technologies, and for the different area sizes. Therefore, the $$\alpha$$ -Stable distribution is useful to capture the BS spatial behavior in urban maps. When the BS distribution becomes more sparse than in a city, like in this case, the best candidates are other types of distributions (i.e., like the Lognormal one in this case).

We have investigated in the next step the Croatian scenarios. Fig. 7 reports the obtained results in terms of RMSE for the different distributions. In this case, we have also varied the size of the sample area. Particularly, since the BSs are rather sparse in the rural, coastal and urban scenarios (without planned BSs), we have adopted a larger sample area size than the Italian cases (i.e., ranging between 14* 14 km^2 and 20*20 km^2). On the contrary, we have adopted a sample area comparable with the Italian case for the urban scenario with future planned BS, since the BS density is quite similar in these two cases. Focusing on the obtained results, the best fitting for the coastal case is the Weibull distribution (reported in Fig. 7(a)), while the Lognormal one tends to achieve comparable RMSE values when the rural case in considered (see Fig. 7(b)). However, when the urban scenarios are considered (Fig. 7(c) and Fig. 7(d)) the distribution achieving the lowest RMSE is the $$\alpha$$-Stable. This fact further confirms our intuition that the $$\alpha$$-Stable model matches the BS spatial distribution in urban areas. Moreover, our results also imply that the definition of a universal model, covering all kinds of urban, coastal, and rural scenarios, is still an open issue.

• CONCLUSIONS AND FUTURE WORK

We have studied the BS spatial distributions across different scenarios obtained from Italy and Croatia, considering urban, coastal, and rural zones.We have compared the real distribution against different candidate ones. Our results show that the best distribution matching the real one is the $$\alpha$$-Stable model in urban scenarios. This fact is confirmed across different sample area sizes, operators, and technologies. On the contrary, the Lognormal and Weibull distributions tend to fit better the real one in coastal and rural scenarios. We believe that this work can be used to derive fruitful guidelines for the BS deployment. As next step, we will complement these findings with a detailed study of spatial and temporal variations of user traffic. Moreover, we will extend our analysis to other countries (such as in Asia and in North America). Finally, we will study the possibility of deriving a universal model covering rural, urban and coastal zones.

• Luca Chiaraviglio, Francesca Cuomo, Maurizio Maisto, Andrea Gigli, Josip Lorincz, Yifan Zhou, Zhifeng Zhao, Chen Qi, Honggang Zhang, “What is the Best Spatial Distribution to Model Base Station Density? A Deep Dive in Two European Mobile Networks,” IEEE Access, Apr. 2016. PDF

• Luca Chiaraviglio, Francesca Cuomo, Andrea Gigli, Maurizio Maisto, Yifan Zhou, Zhifeng Zhao, Honggang Zhang, “A Reality Check of Base Station Spatial Distribution in Mobile Networks,” IEEE INFOCOM 2016 (Poster), San Francisco, Apr. 2016. PDF

• Rongpeng Li, Zhifeng Zhao, Yi Zhong, Chen Qi, and Honggang Zhang, “The Stochastic Geometry Analyses of Cellular Networks with alpha-Stable Self-Similarity,” arxiv.org/abs/1709.05733v1, September 2017. PDF