A cover tree is a tree specifically designed to speed up nearest-neighbor computation in high-dimensional spaces.
class DualTreeTraverser
A dual-tree cover tree traverser; see dual_tree_traverser.hpp. class SingleTreeTraverser
A single-tree cover tree traverser; see single_tree_traverser.hpp for implementation.
typedef arma::mat Mat
CoverTree (const arma::mat &dataset, const double base=2.0, MetricType *metric=NULL)
Create the cover tree with the given dataset and given base. CoverTree (const arma::mat &dataset, MetricType &metric, const double base=2.0)
Create the cover tree with the given dataset and the given instantiated metric. CoverTree (const arma::mat &dataset, const double base, const size_t pointIndex, const int scale, CoverTree *parent, const double parentDistance, arma::Col< size_t > &indices, arma::vec &distances, size_t nearSetSize, size_t &farSetSize, size_t &usedSetSize, MetricType &metric=NULL)
Construct a child cover tree node. CoverTree (const arma::mat &dataset, const double base, const size_t pointIndex, const int scale, CoverTree *parent, const double parentDistance, const double furthestDescendantDistance, MetricType *metric=NULL)
Manually construct a cover tree node; no tree assembly is done in this constructor, and children must be added manually (use Children()). CoverTree (const CoverTree &other)
Create a cover tree from another tree. ~CoverTree ()
Delete this cover tree node and its children. double Base () const
Get the base. double & Base ()
Modify the base; don't do this, you'll break everything. void Centroid (arma::vec ¢roid) const
Get the centroid of the node and store it in the given vector. const CoverTree & Child (const size_t index) const
Get a particular child node. CoverTree & Child (const size_t index)
Modify a particular child node. const std::vector< CoverTree * > & Children () const
Get the children. std::vector< CoverTree * > & Children ()
Modify the children manually (maybe not a great idea). const arma::mat & Dataset () const
Get a reference to the dataset. size_t Descendant (const size_t index) const
Get the index of a particular descendant point. size_t DistanceComps () const
size_t & DistanceComps ()
double FurthestDescendantDistance () const
Get the distance from the center of the node to the furthest descendant. double & FurthestDescendantDistance ()
Modify the distance from the center of the node to the furthest descendant. double FurthestPointDistance () const
Get the distance to the furthest point. This is always 0 for cover trees. bool IsLeaf () const
double MaxDistance (const CoverTree *other) const
Return the maximum distance to another node. double MaxDistance (const CoverTree *other, const double distance) const
Return the maximum distance to another node given that the point-to-point distance has already been calculated. double MaxDistance (const arma::vec &other) const
Return the maximum distance to another point. double MaxDistance (const arma::vec &other, const double distance) const
Return the maximum distance to another point given that the distance from the center to the point has already been calculated. MetricType & Metric () const
Get the instantiated metric. double MinDistance (const CoverTree *other) const
Return the minimum distance to another node. double MinDistance (const CoverTree *other, const double distance) const
Return the minimum distance to another node given that the point-to-point distance has already been calculated. double MinDistance (const arma::vec &other) const
Return the minimum distance to another point. double MinDistance (const arma::vec &other, const double distance) const
Return the minimum distance to another point given that the distance from the center to the point has already been calculated. double MinimumBoundDistance () const
Get the minimum distance from the center to any bound edge (this is the same as furthestDescendantDistance). size_t NumChildren () const
Get the number of children. size_t NumDescendants () const
Get the number of descendant points. size_t NumPoints () const
CoverTree * Parent () const
Get the parent node. CoverTree *& Parent ()
Modify the parent node. double ParentDistance () const
Get the distance to the parent. double & ParentDistance ()
Modify the distance to the parent. size_t Point () const
Get the index of the point which this node represents. size_t Point (const size_t) const
For compatibility with other trees; the argument is ignored. math::Range RangeDistance (const CoverTree *other) const
Return the minimum and maximum distance to another node. math::Range RangeDistance (const CoverTree *other, const double distance) const
Return the minimum and maximum distance to another node given that the point-to-point distance has already been calculated. math::Range RangeDistance (const arma::vec &other) const
Return the minimum and maximum distance to another point. math::Range RangeDistance (const arma::vec &other, const double distance) const
Return the minimum and maximum distance to another point given that the point-to-point distance has already been calculated. int Scale () const
Get the scale of this node. int & Scale ()
Modify the scale of this node. Be careful... const StatisticType & Stat () const
Get the statistic for this node. StatisticType & Stat ()
Modify the statistic for this node. std::string ToString () const
Returns a string representation of this object.
static bool HasSelfChildren ()
Returns true: this tree does have self-children.
void ComputeDistances (const size_t pointIndex, const arma::Col< size_t > &indices, arma::vec &distances, const size_t pointSetSize)
Fill the vector of distances with the distances between the point specified by pointIndex and each point in the indices array. void CreateChildren (arma::Col< size_t > &indices, arma::vec &distances, size_t nearSetSize, size_t &farSetSize, size_t &usedSetSize)
Create the children for this node. void MoveToUsedSet (arma::Col< size_t > &indices, arma::vec &distances, size_t &nearSetSize, size_t &farSetSize, size_t &usedSetSize, arma::Col< size_t > &childIndices, const size_t childFarSetSize, const size_t childUsedSetSize)
size_t PruneFarSet (arma::Col< size_t > &indices, arma::vec &distances, const double bound, const size_t nearSetSize, const size_t pointSetSize)
void RemoveNewImplicitNodes ()
Take a look at the last child (the most recently created one) and remove any implicit nodes that have been created. size_t SortPointSet (arma::Col< size_t > &indices, arma::vec &distances, const size_t childFarSetSize, const size_t childUsedSetSize, const size_t farSetSize)
Assuming that the list of indices and distances is sorted as [ childFarSet | childUsedSet | farSet | usedSet ], resort the sets so the organization is [ childFarSet | farSet | childUsedSet | usedSet ]. size_t SplitNearFar (arma::Col< size_t > &indices, arma::vec &distances, const double bound, const size_t pointSetSize)
Split the given indices and distances into a near and a far set, returning the number of points in the near set.
double base
The base used to construct the tree. std::vector< CoverTree * > children
The list of children; the first is the self-child. const arma::mat & dataset
Reference to the matrix which this tree is built on. size_t distanceComps
double furthestDescendantDistance
Distance to the furthest descendant. bool localMetric
Whether or not we need to destroy the metric in the destructor. MetricType * metric
The metric used for this tree. size_t numDescendants
The number of descendant points. CoverTree * parent
The parent node (NULL if this is the root of the tree). double parentDistance
Distance to the parent. size_t point
Index of the point in the matrix which this node represents. int scale
Scale level of the node. StatisticType stat
The instantiated statistic.
A cover tree is a tree specifically designed to speed up nearest-neighbor computation in high-dimensional spaces.
Each non-leaf node references a point and has a nonzero number of children, including a 'self-child' which references the same point. A leaf node represents only one point.
The tree can be thought of as a hierarchy with the root node at the top level and the leaf nodes at the bottom level. Each level in the tree has an assigned 'scale' i. The tree follows these three conditions:
nesting: the level C_i is a subset of the level C_{i - 1}.
covering: all node in level C_{i - 1} have at least one node in the level C_i with distance less than or equal to b^i (exactly one of these is a parent of the point in level C_{i - 1}.
separation: all nodes in level C_i have distance greater than b^i to all other nodes in level C_i.
The value 'b' refers to the base, which is a parameter of the tree. These three properties make the cover tree very good for fast, high-dimensional nearest-neighbor search.
The theoretical structure of the tree contains many 'implicit' nodes which only have a 'self-child' (a child referencing the same point, but at a lower scale level). This practical implementation only constructs explicit nodes -- non-leaf nodes with more than one child. A leaf node has no children, and its scale level is INT_MIN.
For more information on cover trees, see
@inproceedings{ author = {Beygelzimer, Alina and Kakade, Sham and Langford, John}, title = {Cover trees for nearest neighbor}, booktitle = {Proceedings of the 23rd International Conference on Machine Learning}, series = {ICML '06}, year = {2006}, pages = {97--104] }
For information on runtime bounds of the nearest-neighbor computation using cover trees, see the following paper, presented at NIPS 2009:
@inproceedings{ author = {Ram, P., and Lee, D., and March, W.B., and Gray, A.G.}, title = {Linear-time Algorithms for Pairwise Statistical Problems}, booktitle = {Advances in Neural Information Processing Systems 22}, editor = {Y. Bengio and D. Schuurmans and J. Lafferty and C.K.I. Williams and A. Culotta}, pages = {1527--1535}, year = {2009} }
The CoverTree class offers three template parameters; a custom metric type can be used with MetricType (this class defaults to the L2-squared metric). The root node's point can be chosen with the RootPointPolicy; by default, the FirstPointIsRoot policy is used, meaning the first point in the dataset is used. The StatisticType policy allows you to define statistics which can be gathered during the creation of the tree.
Template Parameters:
MetricType Metric type to use during tree construction.
RootPointPolicy Determines which point to use as the root node.
StatisticType Statistic to be used during tree creation.
Definition at line 103 of file cover_tree.hpp.
Definition at line 106 of file cover_tree.hpp.
Create the cover tree with the given dataset and given base. The dataset will not be modified during the building procedure (unlike BinarySpaceTree).
The last argument will be removed in mlpack 1.1.0 (see #274 and #273).
Parameters:
dataset Reference to the dataset to build a tree on.
base Base to use during tree building (default 2.0).
Create the cover tree with the given dataset and the given instantiated metric. Optionally, set the base. The dataset will not be modified during the building procedure (unlike BinarySpaceTree).
Parameters:
dataset Reference to the dataset to build a tree on.
metric Instantiated metric to use during tree building.
base Base to use during tree building (default 2.0).
Construct a child cover tree node. This constructor is not meant to be used externally, but it could be used to insert another node into a tree. This procedure uses only one vector for the near set, the far set, and the used set (this is to prevent unnecessary memory allocation in recursive calls to this constructor). Therefore, the size of the near set, far set, and used set must be passed in. The near set will be entirely used up, and some of the far set may be used. The value of usedSetSize will be set to the number of points used in the construction of this node, and the value of farSetSize will be modified to reflect the number of points in the far set after the construction of this node.
If you are calling this manually, be careful that the given scale is as small as possible, or you may be creating an implicit node in your tree.
Parameters:
dataset Reference to the dataset to build a tree on.
base Base to use during tree building.
pointIndex Index of the point this node references.
scale Scale of this level in the tree.
parent Parent of this node (NULL indicates no parent).
parentDistance Distance to the parent node.
indices Array of indices, ordered [ nearSet | farSet | usedSet ]; will be modified to [ farSet | usedSet ].
distances Array of distances, ordered the same way as the indices. These represent the distances between the point specified by pointIndex and each point in the indices array.
nearSetSize Size of the near set; if 0, this will be a leaf.
farSetSize Size of the far set; may be modified (if this node uses any points in the far set).
usedSetSize The number of points used will be added to this number.
Manually construct a cover tree node; no tree assembly is done in this constructor, and children must be added manually (use Children()). This constructor is useful when the tree is being 'imported' into the CoverTree class after being created in some other manner.
Parameters:
dataset Reference to the dataset this node is a part of.
base Base that was used for tree building.
pointIndex Index of the point in the dataset which this node refers to.
scale Scale of this node's level in the tree.
parent Parent node (NULL indicates no parent).
parentDistance Distance to parent node point.
furthestDescendantDistance Distance to furthest descendant point.
metric Instantiated metric (optional).
Create a cover tree from another tree. Be careful! This may use a lot of memory and take a lot of time.
Parameters:
other Cover tree to copy from.
Delete this cover tree node and its children.
Get the base.
Definition at line 262 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::base.
Modify the base; don't do this, you'll break everything.
Definition at line 264 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::base.
Get the centroid of the node and store it in the given vector.
Definition at line 343 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::dataset, and mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::point.
Get a particular child node.
Definition at line 238 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::children.
Modify a particular child node.
Definition at line 240 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::children.
Get the children.
Definition at line 246 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::children.
Modify the children manually (maybe not a great idea).
Definition at line 248 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::children.
Fill the vector of distances with the distances between the point specified by pointIndex and each point in the indices array. The distances of the first pointSetSize points in indices are calculated (so, this does not necessarily need to use all of the points in the arrays).
Parameters:
pointIndex Point to build the distances for.
indices List of indices to compute distances for.
distances Vector to store calculated distances in.
pointSetSize Number of points in arrays to calculate distances for.
Create the children for this node.
Get a reference to the dataset.
Definition at line 227 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::dataset.
Get the index of a particular descendant point.
Definition at line 479 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::distanceComps.
Definition at line 480 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::distanceComps.
Get the distance from the center of the node to the furthest descendant.
Definition at line 332 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::furthestDescendantDistance.
Modify the distance from the center of the node to the furthest descendant.
Definition at line 336 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::furthestDescendantDistance.
Get the distance to the furthest point. This is always 0 for cover trees.
Definition at line 329 of file cover_tree.hpp.
Returns true: this tree does have self-children.
Definition at line 316 of file cover_tree.hpp.
Definition at line 234 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::children.
Return the maximum distance to another node.
Return the maximum distance to another node given that the point-to-point distance has already been calculated.
Return the maximum distance to another point.
Return the maximum distance to another point given that the distance from the center to the point has already been calculated.
Get the instantiated metric.
Definition at line 346 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::metric.
Return the minimum distance to another node.
Return the minimum distance to another node given that the point-to-point distance has already been calculated.
Return the minimum distance to another point.
Return the minimum distance to another point given that the distance from the center to the point has already been calculated.
Get the minimum distance from the center to any bound edge (this is the same as furthestDescendantDistance).
Definition at line 340 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::furthestDescendantDistance.
Get the number of children.
Definition at line 243 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::children.
Get the number of descendant points.
Definition at line 235 of file cover_tree.hpp.
Get the parent node.
Definition at line 319 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::parent.
Modify the parent node.
Definition at line 321 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::parent.
Get the distance to the parent.
Definition at line 324 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::parentDistance.
Modify the distance to the parent.
Definition at line 326 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::parentDistance.
Get the index of the point which this node represents.
Definition at line 230 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::point.
For compatibility with other trees; the argument is ignored.
Definition at line 232 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::point.
Return the minimum and maximum distance to another node.
Return the minimum and maximum distance to another node given that the point-to-point distance has already been calculated.
Return the minimum and maximum distance to another point.
Return the minimum and maximum distance to another point given that the point-to-point distance has already been calculated.
Take a look at the last child (the most recently created one) and remove any implicit nodes that have been created.
Get the scale of this node.
Definition at line 257 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::scale.
Modify the scale of this node. Be careful...
Definition at line 259 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::scale.
Assuming that the list of indices and distances is sorted as [ childFarSet | childUsedSet | farSet | usedSet ], resort the sets so the organization is [ childFarSet | farSet | childUsedSet | usedSet ]. The size_t parameters specify the sizes of each set in the array. Only the ordering of the indices and distances arrays will be modified (not their actual contents).
The size of any of the four sets can be zero and this method will handle that case accordingly.
Parameters:
indices List of indices to sort.
distances List of distances to sort.
childFarSetSize Number of points in child far set (childFarSet).
childUsedSetSize Number of points in child used set (childUsedSet).
farSetSize Number of points in far set (farSet).
Split the given indices and distances into a near and a far set, returning the number of points in the near set. The distances must already be initialized. This will order the indices and distances such that the points in the near set make up the first part of the array and the far set makes up the rest: [ nearSet | farSet ].
Parameters:
indices List of indices; will be reordered.
distances List of distances; will be reordered.
bound If the distance is less than or equal to this bound, the point is placed into the near set.
pointSetSize Size of point set (because we may be sorting a smaller list than the indices vector will hold).
Get the statistic for this node.
Definition at line 267 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::stat.
Modify the statistic for this node.
Definition at line 269 of file cover_tree.hpp.
References mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::stat.
Returns a string representation of this object.
The base used to construct the tree.
Definition at line 362 of file cover_tree.hpp.
Referenced by mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::Base().
The list of children; the first is the self-child.
Definition at line 356 of file cover_tree.hpp.
Referenced by mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::Child(), mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::Children(), mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::IsLeaf(), and mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::NumChildren().
Reference to the matrix which this tree is built on.
Definition at line 350 of file cover_tree.hpp.
Referenced by mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::Centroid(), and mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::Dataset().
Definition at line 483 of file cover_tree.hpp.
Referenced by mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::DistanceComps().
Distance to the furthest descendant.
Definition at line 377 of file cover_tree.hpp.
Referenced by mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::FurthestDescendantDistance(), and mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::MinimumBoundDistance().
Whether or not we need to destroy the metric in the destructor.
Definition at line 380 of file cover_tree.hpp.
The metric used for this tree.
Definition at line 383 of file cover_tree.hpp.
Referenced by mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::Metric().
The number of descendant points.
Definition at line 368 of file cover_tree.hpp.
The parent node (NULL if this is the root of the tree).
Definition at line 371 of file cover_tree.hpp.
Referenced by mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::Parent().
Distance to the parent.
Definition at line 374 of file cover_tree.hpp.
Referenced by mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::ParentDistance().
Index of the point in the matrix which this node represents.
Definition at line 353 of file cover_tree.hpp.
Referenced by mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::Centroid(), and mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::Point().
Scale level of the node.
Definition at line 359 of file cover_tree.hpp.
Referenced by mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::Scale().
The instantiated statistic.
Definition at line 365 of file cover_tree.hpp.
Referenced by mlpack::tree::CoverTree< MetricType, RootPointPolicy, StatisticType >::Stat().
Generated automatically by Doxygen for MLPACK from the source code.