# China Dataset (Base Sations Big Data)

• ### Background

• ##### Data description

In order to reach credible results, we collect a massive amount of practical data of BSs information from China Mobile in a well-developed eastern province of China. The collected dataset, containing over 47,000 BSs of GSM cellular networks and serving  over 40 million subscribers, encompasses all BS-related records like location information (i.e. longitude, latitude, etc.) and BS type (i.e. macrocell or microcell). Based on the coverage area and location information, we divide the dataset into disjoint subsets. Accordingly, we can classify the data set as subsets of urban areas and rural areas, by matching the geographical land forms with local maps, as depicted in Fig. 1. Fig. 1. An illustration of the deployment of base stations in three typical cities with geographical landforms, namely City A, B, C, respectively.
• ##### mathematical model

Heavy-tailed distributions could be widely applied to explain a number of natural phenomena, including the Internet topology. Mathematically, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded. In other words, they have heavier tails than the exponential distribution.

There exist many statistical distributions proving to be heavy-tailed. Among them, generalized Pareto (GP) distribution, Weibull distribution, and log-normal distribution belong to one-tailed ones with the probability density function (PDF) in closed-forms (see Table II). Another famous heavy-tailed distribution is $$\alpha$$-Stable distribution, who manifests itself in the capability to characterize the distribution of normalized sums of a relatively large number of independent identically distributed random variables. However, $$\alpha$$-Stable distribution, with few exceptions, lacks a closed-form expression of the PDF, and is generally specified by its characteristic function, as presented in the model description page.

TABLE II: The List of Candidate Distributions and Estimated Parameters. • ### Statistical Pattern of Base Stations with Large-scale Identification

Based on the large amount of BS location data, we sample one certain city randomly with a fixed sample area size. Then, we compute the spatial density for different 10000 sample areas and obtain the empirical density distribution, by counting and sorting the number of BSs in each sample area. Next, we estimated the unknown parameters in candidate distributions (except $$\alpha$$-Stable distribution) using maximum likelihood estimation (MLE) methodology. For $$\alpha$$-Stable distribution, we estimate the relevant parameters using quantile methods, correspondingly build the model to generate the corresponding random variable, and finally compare its induced PDF with the exact (empirical) one.

In the first place, we refer to City B as an example, and compute the PDF of BS density under the sample area size 4×4 km ² . After fitting the corresponding PDF to distributions in Table II, we provide the comparison between the empirical BS density distribution with candidate ones in Fig. 2 and Fig. 3. As depicted in Fig. 3, the statistical pattern of BSs obviously exhibits heavy-tailed characteristics. Besides, among all candidate distributions, $$\alpha$$ -Stable distribution most precisely match the empirical PDF. On the other hand, we provide the numerical comparison in Table III, in terms of root mean square error (RMSE). Indeed, the RMSE results in Table III show $$\alpha$$-Stable distribution has the minimum RMSE value (0.0279) while Poisson distribution has the maximum one (0.2537), and once again strengthen this aforementioned conclusion. All of the estimated parameters of the fitted candidate distributions are also listed in Table II. Fig. 2. The log-log comparison between practical BS density distribution in City B with candidate ones, when sample area size equals 4×4 km ² . Fig.3. The results after fitting BS density distribution in City B to candidate distributions, when sample area sizes vary. (a) 3×3 km ²  ; (b) 4×4 km ²; (c) 5×5 km ².

In order to examine the geographical impact on the fitting results, we further analyze the density distribution of BSs in City A and City C using a sample area size of 4×4 km ². Due to the factor of geographical irregularity, there is a noticeable gap between the $$\alpha$$-Stable distribution and the empirical PDF of City A and C in comparison with City B. Nevertheless, as shown in Table III and Fig. 4, it can be observed that, $$\alpha$$-Stable distribution could match the practical one in both cities, with RMSE values equaling 0.0177 and 0.0451 respectively and being less than those of other candidate distributions. Moreover, the same conclusions concerned with sample area sizes of 3×3 km ² and 5×5 km ², could be also testified in Table III. Based on the extensive analyses above, we could confidently reach the following remark.

The spatial pattern of deployed BSs exhibits strong heavy-tailed characteristics. Based on the large-scale identification, $$\alpha$$-Stable distribution manifests itself as the most precise one. On the contrary, the popular Poisson distribution is an inappropriate model for the BS density distribution, in terms of the root mean square error. Fig. 4. The comparison between BS density distribution and α-Stable distribution in City A and City C, when sample area size equals 4×4 km ².
• ### Conclusions and Future Works

Based on the practical BS deployment information of one on-operating cellular networks, we carried out a thorough investigation over the statistical pattern of BS density. Our study showed that the distribution of BS density exhibits strong heavy-tailed characteristics. Furthermore, we found that the widely adopted Poisson distribution severely diverges from the realistic distribution. Instead, $$\alpha$$-Stable distribution, the distribution also found in the traffic dynamics of broadband networks and cellular networks, most precisely match the practical one. Moreover, our study could contribute to the understanding of evolution trend of BS deployment, as well as the impact of human social activities in long term.

Currently, the lack of closed-form for $$\alpha$$-Stable distribution makes it difficult to reach tractable solutions and might hinder its applications in networking performance (e.g., coverage, rate, etc) analyses. Therefore, we are dedicated to handle the related meaningful yet more challenging issues over applications of $$\alpha$$-Stable distribution in the future.

•  Yifan Zhou, Rongpeng Li, Zhifeng Zhao, Xuan Zhou, and Honggang Zhang, “On the $$\alpha$$-Stable Distribution of Base Stations in Cellular Networks“, IEEE Communications Letters, vol. 19, no. 10, pp. 1750-1753, Aug. 2015. PDF

• Yifan Zhou, Zhifeng Zhao, Yves Louet, Qianlan Ying, Rongpeng Li, Xuan Zhou, Xianfu Chen, and Honggang Zhang, “Large-scale Spatial Distribution Identification of Base Stations in Cellular Networks,”  IEEE Access, vol. 3, pp. 2987-2999, Dec. 2015. PDF

• Zhifeng Zhao, Meng  Li, Rongpeng Li, and Yifan Zhou, “Temporal-Spatial Distribution Nature of Traffic and Base Stations in Cellular Networks,” IET Communications, August 2017.

• Rongpeng Li, Zhifeng Zhao, Yi Zhong, Chen Qi, and Honggang Zhang, “The Stochastic Geometry Analyses of Cellular Networks with alpha-Stable Self-Similarity,” arxiv.org/abs/1709.05733v1, September 2017. PDF