NN-Descent on High-Dimensional DataBrankica BratićUniversity of Novi SadFaculty of SciencesNovi Sad, [email protected] E. HouleNational Institute of InformaticsTokyo, [email protected] OriaVladimir KurbalijaUniversity of Novi SadFaculty of SciencesNovi Sad, [email protected]š RadovanovićNew Jersey Inst. of TechnologyDept. of Computer ScienceNewark, New Jersey, [email protected] of Novi SadFaculty of SciencesNovi Sad, [email protected] neighbor graphs (K-NNGs) are used in many data-miningand machine-learning algorithms. Naive construction of K-NNGshas a complexity of O(n 2 ), which could be a problem for largescale data sets. In order to achieve higher efficiency, many exactand approximate algorithms have been developed, including theNN-Descent algorithm of Dong, Charikar and Li. Empirical evidence suggests that the practical complexity of this algorithm isin Õ(n 1.14 ), which is a significant improvement over brute forceconstruction. However, NN-Descent has a major drawback — itproduces good results only on data of low intrinsic dimensionality.This paper presents an experimental analysis of this behavior, andinvestigates possible solutions. We link the quality of performanceof NN-Descent with the phenomenon of hubness, defined as thetendency of intrinsically high-dimensional data to contain hubs —points with high in-degrees in the K-NNG. We propose two approaches to alleviate the observed negative influence of hubs onNN-Descent performance.A K-nearest neighbor graph (K-NNG) is a directed graph whosevertices represent elements upon which some distance function isdefined. Vertex v is unidirectionally connected to vertex u only ifvertex u is one of the K elements that are nearest to point v withrespect to the defined distance function. K-NNGs are used in manymachine-learning and data-mining algorithms , including classification , similarity search , clustering and outlier detectionproblems [2, 3, 12], manifold learning [1, 20, 22]. They are also usedin other areas, such as robot motion planning  and computergraphics .Naive brute-force computation of a K-NNG entails n · (n 1)distance computations, which leads to quadratic time complexity.Numerous methods for K-NNG construction have been developedin order to decrease the computational cost, but many of themintroduce certain restrictions and therefore do not apply to the general case. For example, many efficient methods are developed forEuclidean or other Lp distance metrics [4, 6], whereas others canbe applied to more general metric spaces . In order to bypassall those restrictions while still preserving efficiency, Dong, Charikar and Li created the NN-Descent  algorithm, that efficientlyproduces highly accurate K-NNG approximations independentlyof the underlying distance function. As reported by the authors,empirical complexity of NN-Descent is in Õ(n 1.14 ). Although theresults produced by this algorithm are mostly excellent, there isstill one major drawback — NN-Descent often produces inaccurateresults when it is used on data of high intrinsic dimensionality.Park et al.  developed a modification of NN-Descent basedon a greedy filtering method, with which they obtained improvedresults for high-dimensional data together with the cosine similarity measure. Houle et al.  introduced NNF-Descent (NearestNeighbor Feature Descent), an NN-Descent-based feature sparsification method optimized for image databases. It uses the Laplacianscore in order to identify noisy values, which are then replacedwith zeros. Debatty et al.  presented a method for constructing aK-NNG for large text data sets. This method applies NN-Descentonly to smaller buckets of data, leading to a lower execution time.Although several improvements have been proposed for NNDescent, so far none of them have given an explanation for thevariability of its performance, nor have they given a universal solution to the problem. In this paper we will provide an explanationCCS CONCEPTS Information systems Information retrieval; Theory ofcomputation Design and analysis of algorithms;KEYWORDSNN-Descent, k-nearest neighbor graph, hubnessACM Reference Format:Brankica Bratić, Michael E. Houle, Vladimir Kurbalija, Vincent Oria, and Miloš Radovanović. 2018. NN-Descent on High-Dimensional Data. In WIMS’18: 8th International Conference on Web Intelligence, Mining and Semantics, June 25–27, 2018, Novi Sad, Serbia. ACM, New York, NY, USA, 8 ssion to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected] ’18, June 25–27, 2018, Novi Sad, Serbia 2018 Association for Computing Machinery.ACM ISBN 978-1-4503-5489-9/18/06. . . UCTION
WIMS ’18, June 25–27, 2018, Novi Sad, Serbiafor this variation in terms of hubness, a measure of the variation innearest neighbor relationships . We also propose two approaches that improve the quality of NN-Descent approximations forhigh-dimensional datasets.Section 2 will describe the necessary background by providingan overview of the NN-Descent algorithm and the phenomenon ofhubness. In Section 3 we show how a minor modification of the NNDescent algorithm can produce estimates of hubness. Experimentalresults presented in Section 4 demonstrate the empirical effect ofhubness on NN-Descent. Section 5 proposes methods that to someextent overcome the problems of NN-Descent on high-dimensionaldata. Finally, Section 6 gives an overview of the conclusions andresults, and indicates directions for future work.2BACKGROUNDIn this section we will review the NN-descent algorithm (Section 2.1)and the hubness phenomenon (Section 2.2), which, as we will demonstrate later, has a significant impact on NN-Descent.2.1B. Bratić et al.Algorithm 1: Outline of NN-Descent algorithm.input : dataset D, distance function dist, neighborhood size Koutput : K-NNG G1 foreach data point u D do2Initialize G by randomly generating a tentative K-NN listfor u with an assigned distance of ;3 end4 repeat5foreach data point u D do6Check different pairs of u’s neighbors (v, w) in u’sK-NN and R-NN (reverse nearest neighbor) lists, andcompute dist(v, w);7Use ⟨v, dist(v, w)⟩ to update w’s K-NN list, and use⟨w, dist(v, w)⟩ to update v’s K-NN list;8end9 until G converges;10 return G.NN-DescentThe main purpose of the NN-Descent algorithm is to create a goodapproximation of the true K-NNG, and to create it as fast as possible. The basic assumption made by NN-Descent can be summarizedas “my neighbor’s neighbor is also likely to be my neighbor.” Thealgorithm starts by creating a random K-NNG, and then iterativelyimproves the neighbor lists by using the current neighborhood relationships to suggest new neighbor candidates. During the executionof NN-Descent, for a data point u, we will refer to the current listof its K nearest neighbors as the NN list of u. A reverse neighborof u is a data point which has u in its own NN list; we will refer tothe current list of reverse neighbors of u as its RNN list.NN-Descent is organized into stages, during each of which asingle improvement iteration is applied to each of the data points.The algorithm iteration for u examines its NN and RNN lists tosee whether any points in these lists should be added to the NNlists of any of the others. If this is the case, the algorithm makesthe necessary changes, and increases the number of total updates.After every stage has completed, the algorithm tests a terminationcondition based on the number of updates that were applied during the stage. If the number of updates is smaller than a suppliedthreshold, the algorithm terminates; otherwise, it continues withthe next stage. An outline of NN-Descent is shown as Algorithm 1.The speed of NN-Descent is strongly influenced by the K value.As K becomes larger, the algorithm becomes slower. To be moreprecise, the time complexity has a quadratic dependence on K, sinceduring each iteration, points appearing in NN or RNN lists of thesame point are evaluated as candidates for each others NN lists. As away to reduce the number of combinations of points considered, theauthors introduced sampling . Sampling reduces the numberof evaluations by taking a random selection of NN and RNN lists,and then operates only on the points from those lists. Samplingis controlled by a parameter ρ that takes a value from 0 to 1. Thealgorithm takes ρ · K points from both NN and RNN lists.The second drawback of NN-Descent is that quality of approximation it produces is highly influenced by the intrinsic dimensionality of the data set, in that the quality of the approximationdecreases as the intrinsic dimensionality increases. The reason forthis has not been adequately explained. Although low-dimensionalembeddings have often been used as a general method of reducingthe complexity of data indexing, no solutions have yet been proposed that allow the NN-Descent strategy to work more effectivelywith high-dimensional data. In this paper we will explain the influence of high dimensionality on NN-Descent, and will also proposetwo approaches that are designed to overcome this challenge tosome extent.2.2HubnessHubness is an aspect of the curse of dimensionality pertaining tonearest neighbors which has come to the attention of the researchcommunity only relatively recently . Let D Rd be a set ofdata points and let the k-occurrences of point x D be the numberof times x occurs in k-nearest-neighbor lists of other points from D.Nk (x) is then the number of k-occurrences of point x D — thatis, the in-degree of node x in the K-NNG. As the dimensionality ofthe data increases, the distribution of Nk (x) becomes considerablyskewed. As a consequence, some data points, which we will refer toas hubs, are included in many more k-nearest-neighbor lists thanother points.In the remainder of the discussion we will refer to the numberof k-occurrences of point x D as its hubness value. If a data setcontains many points that are hubs — that is, if the distribution ofhubness values is highly skewed — it can be said that the hubnessphenomenon is present therein. It has been shown that hubness,as a phenomenon, appears in high-dimensional data as an inherent property of intrinsically high dimensionality, and is not anartifact of finite samples nor a peculiarity of specific data sets .It was shown that hubness influences various data-mining andmachine-learning algorithms [16–19, 23], often in a negative way.Practical computation of (exact) hubness values entails the creationof the K-NNG, from which hubness values can be easily obtainedby extracting the in-degrees of nodes.
NN-Descent on High-Dimensional Data3HUBNESS ESTIMATION USINGNN-DESCENTApproximate hubness values for each data point could be veryeasily calculated during NN-Descent, with minimal impact on algorithm performance. At the very beginning, we initialize the hubnessvalues of each data set point to zero. Then, during algorithm execution, we increase the hubness value of a given point by one if thatpoint is added to the NN list of some other point, and analogously,we decrease the hubness value by one if the point is removed fromsome NN list.Some of the results that we obtained after this small upgrade ofNN-Descent algorithm are illustrated in Figure 1, which shows thecorrelation between estimated and exact hubness values of datapoints. For the purposes of the experimentation in this paper, wecreated several synthetic data sets drawn from a Gaussian distribution with standard deviation 1 and mean (0, 0, . . . , 0). The data setseach have n 10000 points, but differ in their dimensionality d.During the analysis we also varied the size of the nearest neighborlists, represented by the parameter K. We considered the values 10,20 and 50 for K, and the values 2, 10, 20 and 100 for parameter d.As a distance measure we used the Euclidean distance. In Figure 1we show results only for the data set of highest dimensionality(d 100), since the hubness phenomenon is known to strengthenas the dimensionality of the data increases. As can be seen, thecorrelation between real and estimated hubness values is very high.A shortcoming of NN-Descent is that estimated hubness values arelower then they should be for points with low true hubness values,and greater then they should be for points with high true hubness.The precision improves as K increases, but even for small K values,the algorithm produces strong correlations.4WIMS ’18, June 25–27, 2018, Novi Sad, Serbia(a)(b)INFLUENCE OF HUBNESS ON NN-DESCENTNN-Descent is known to produce large amount of incorrect Knearest neighbors when applied upon high-dimensional data. Sincehubness is a phenomenon that appears in high-dimensional data,and at the same time was shown to influence many other datamining algorithms, in this section we investigate whether it alsoinfluences the performance of NN-Descent.For a given data set point, we will define the hit count to be thenumber of K-nearest neighbors produced by NN-Descent that areat the same time one of the point’s K-nearest neighbors in the exactK-NNG. In other words, the hit count tells us how many correctselections NN-Descent has made for the neighborhood list of agiven data point. The first part of our analysis will be to examinethe correlation between hit count values and hubness values of datapoints. In orde