SYNOPSIS

Public Member Functions

RASearch (const typename TreeType::Mat &referenceSet, const typename TreeType::Mat &querySet, const bool naive=false, const bool singleMode=false, const MetricType metric=MetricType())

Initialize the RASearch object, passing both a query and reference dataset. RASearch (const typename TreeType::Mat &referenceSet, const bool naive=false, const bool singleMode=false, const MetricType metric=MetricType())

Initialize the RASearch object, passing only one dataset, which is used as both the query and the reference dataset. RASearch (TreeType *referenceTree, TreeType *queryTree, const typename TreeType::Mat &referenceSet, const typename TreeType::Mat &querySet, const bool singleMode=false, const MetricType metric=MetricType())

Initialize the RASearch object with the given datasets and pre-constructed trees. RASearch (TreeType *referenceTree, const typename TreeType::Mat &referenceSet, const bool singleMode=false, const MetricType metric=MetricType())

Initialize the RASearch object with the given reference dataset and pre-constructed tree. ~RASearch ()

Delete the RASearch object. void ResetQueryTree ()

This function recursively resets the RAQueryStat of the queryTree to set 'bound' to WorstDistance and the 'numSamplesMade' to 0. void Search (const size_t k, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const double tau=5, const double alpha=0.95, const bool sampleAtLeaves=false, const bool firstLeafExact=false, const size_t singleSampleLimit=20)

Compute the rank approximate nearest neighbors and store the output in the given matrices. std::string ToString () const

Private Member Functions

void ResetRAQueryStat (TreeType *treeNode)

Private Attributes

bool hasQuerySet

Indicates if a separate query set was passed. MetricType metric

Instantiation of kernel. bool naive

Indicates if naive random sampling on the set is being used. size_t numberOfPrunes

Total number of pruned nodes during the neighbor search. std::vector< size_t > oldFromNewQueries

Permutations of query points during tree building. std::vector< size_t > oldFromNewReferences

Permutations of reference points during tree building. arma::mat queryCopy

Copy of query dataset (if we need it, because tree building modifies it). const arma::mat & querySet

Query dataset (may not be given). TreeType * queryTree

Pointer to the root of the query tree (might not exist). arma::mat referenceCopy

Copy of reference dataset (if we need it, because tree building modifies it). const arma::mat & referenceSet

Reference dataset. TreeType * referenceTree

Pointer to the root of the reference tree. bool singleMode

Indicates if single-tree search is being used (opposed to dual-tree). bool treeOwner

If true, this object created the trees and is responsible for them.

Detailed Description

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >>class RASearch< SortPolicy, MetricType, TreeType >

The RASearch class: This class provides a generic manner to perform rank-approximate search via random-sampling.

If the 'naive' option is chosen, this rank-approximate search will be done by randomly sampled from the whole set. If the 'naive' option is not chosen, the sampling is done in a stratified manner in the tree as mentioned in the algorithms in Figure 2 of the following paper:

{ram2009rank, title={{Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions}}, author={{Ram, P. and Lee, D. and Ouyang, H. and Gray, A. G.}}, booktitle={{Advances of Neural Information Processing Systems}}, year={2009} }

RASearch is currently known to not work with ball trees (#356).

Template Parameters:

SortPolicy The sort policy for distances; see NearestNeighborSort.

MetricType The metric to use for computation.

TreeType The tree type to use.

Definition at line 71 of file ra_search.hpp.

Constructor & Destructor Documentation

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> \fBRASearch\fP< SortPolicy, MetricType, TreeType >::\fBRASearch\fP (const typename TreeType::Mat &referenceSet, const typename TreeType::Mat &querySet, const boolnaive = \fCfalse\fP, const boolsingleMode = \fCfalse\fP, const MetricTypemetric = \fCMetricType()\fP)

Initialize the RASearch object, passing both a query and reference dataset. Optionally, perform the computation in naive mode or single-tree mode, and set the leaf size used for tree-building. An initialized distance metric can be given, for cases where the metric has internal data (i.e. the distance::MahalanobisDistance class).

This method will copy the matrices to internal copies, which are rearranged during tree-building. You can avoid this extra copy by pre-constructing the trees and passing them using a diferent constructor.

Parameters:

referenceSet Set of reference points.

querySet Set of query points.

naive If true, the rank-approximate search will be performed by directly sampling the whole set instead of using the stratified sampling on the tree.

singleMode If true, single-tree search will be used (as opposed to dual-tree search).

leafSize Leaf size for tree construction (ignored if tree is given).

metric An optional instance of the MetricType class.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> \fBRASearch\fP< SortPolicy, MetricType, TreeType >::\fBRASearch\fP (const typename TreeType::Mat &referenceSet, const boolnaive = \fCfalse\fP, const boolsingleMode = \fCfalse\fP, const MetricTypemetric = \fCMetricType()\fP)

Initialize the RASearch object, passing only one dataset, which is used as both the query and the reference dataset. Optionally, perform the computation in naive mode or single-tree mode, and set the leaf size used for tree-building. An initialized distance metric can be given, for cases where the metric has internal data (i.e. the distance::MahalanobisDistance class).

If naive mode is being used and a pre-built tree is given, it may not work: naive mode operates by building a one-node tree (the root node holds all the points). If that condition is not satisfied with the pre-built tree, then naive mode will not work.

Parameters:

referenceSet Set of reference points.

naive If true, the rank-approximate search will be performed by directly sampling the whole set instead of using the stratified sampling on the tree.

singleMode If true, single-tree search will be used (as opposed to dual-tree search).

leafSize Leaf size for tree construction (ignored if tree is given).

metric An optional instance of the MetricType class.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> \fBRASearch\fP< SortPolicy, MetricType, TreeType >::\fBRASearch\fP (TreeType *referenceTree, TreeType *queryTree, const typename TreeType::Mat &referenceSet, const typename TreeType::Mat &querySet, const boolsingleMode = \fCfalse\fP, const MetricTypemetric = \fCMetricType()\fP)

Initialize the RASearch object with the given datasets and pre-constructed trees. It is assumed that the points in referenceSet and querySet correspond to the points in referenceTree and queryTree, respectively. Optionally, choose to use single-tree mode. Naive mode is not available as an option for this constructor; instead, to run naive computation, construct a tree with all of the points in one leaf (i.e. leafSize = number of points). Additionally, an instantiated distance metric can be given, for cases where the distance metric holds data.

There is no copying of the data matrices in this constructor (because tree-building is not necessary), so this is the constructor to use when copies absolutely must be avoided.

Note:

Because tree-building (at least with BinarySpaceTree) modifies the ordering of a matrix, be sure you pass the modified matrix to this object! In addition, mapping the points of the matrix back to their original indices is not done when this constructor is used.

Parameters:

referenceTree Pre-built tree for reference points.

queryTree Pre-built tree for query points.

referenceSet Set of reference points corresponding to referenceTree.

querySet Set of query points corresponding to queryTree.

singleMode Whether single-tree computation should be used (as opposed to dual-tree computation).

metric Instantiated distance metric.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> \fBRASearch\fP< SortPolicy, MetricType, TreeType >::\fBRASearch\fP (TreeType *referenceTree, const typename TreeType::Mat &referenceSet, const boolsingleMode = \fCfalse\fP, const MetricTypemetric = \fCMetricType()\fP)

Initialize the RASearch object with the given reference dataset and pre-constructed tree. It is assumed that the points in referenceSet correspond to the points in referenceTree. Optionally, choose to use single-tree mode. Naive mode is not available as an option for this constructor; instead, to run naive computation, construct a tree with all the points in one leaf (i.e. leafSize = number of points). Additionally, an instantiated distance metric can be given, for the case where the distance metric holds data.

There is no copying of the data matrices in this constructor (because tree-building is not necessary), so this is the constructor to use when copies absolutely must be avoided.

Note:

Because tree-building (at least with BinarySpaceTree) modifies the ordering of a matrix, be sure you pass the modified matrix to this object! In addition, mapping the points of the matrix back to their original indices is not done when this constructor is used.

Parameters:

referenceTree Pre-built tree for reference points.

referenceSet Set of reference points corresponding to referenceTree.

singleMode Whether single-tree computation should be used (as opposed to dual-tree computation).

metric Instantiated distance metric.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> \fBRASearch\fP< SortPolicy, MetricType, TreeType >::~\fBRASearch\fP ()

Delete the RASearch object. The tree is the only member we are responsible for deleting. The others will take care of themselves.

Member Function Documentation

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> void \fBRASearch\fP< SortPolicy, MetricType, TreeType >::ResetQueryTree ()

This function recursively resets the RAQueryStat of the queryTree to set 'bound' to WorstDistance and the 'numSamplesMade' to 0. This allows a user to perform multiple searches on the same pair of trees, possibly with different levels of approximation without requiring to build a new pair of trees for every new (approximate) search.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> void \fBRASearch\fP< SortPolicy, MetricType, TreeType >::ResetRAQueryStat (TreeType *treeNode)\fC [private]\fP

Parameters:

treeNode The node of the tree whose RAQueryStat is reset and whose children are to be explored recursively.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> void \fBRASearch\fP< SortPolicy, MetricType, TreeType >::Search (const size_tk, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const doubletau = \fC5\fP, const doublealpha = \fC0.95\fP, const boolsampleAtLeaves = \fCfalse\fP, const boolfirstLeafExact = \fCfalse\fP, const size_tsingleSampleLimit = \fC20\fP)

Compute the rank approximate nearest neighbors and store the output in the given matrices. The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.

Note that tau, the rank-approximation parameter, specifies that we are looking for k neighbors with probability alpha of being in the top tau percent of nearest neighbors. So, as an example, if our dataset has 1000 points, and we want 5 nearest neighbors with 95% probability of being in the top 5% of nearest neighbors (or, the top 50 nearest neighbors), we set k = 5, tau = 5, and alpha = 0.95.

The method will fail (and issue a failure message) if the value of tau is too low: tau must be set such that the number of points in the corresponding percentile of the data is greater than k. Thus, if we choose tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting to choose 5 nearest neighbors out of the closest 1 point -- this is invalid.

Parameters:

k Number of neighbors to search for.

resultingNeighbors Matrix storing lists of neighbors for each query point.

distances Matrix storing distances of neighbors for each query point.

tau The rank-approximation in percentile of the data. The default value is 5%.

alpha The desired success probability. The default value is 0.95.

sampleAtLeaves Sample at leaves for faster but less accurate computation. This defaults to 'false'.

firstLeafExact Traverse to the first leaf without approximation. This can ensure that the query definitely finds its (near) duplicate if there exists one. This defaults to 'false' for now.

singleSampleLimit The limit on the largest node that can be approximated by sampling. This defaults to 20.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> std::string \fBRASearch\fP< SortPolicy, MetricType, TreeType >::ToString () const

Member Data Documentation

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> bool \fBRASearch\fP< SortPolicy, MetricType, TreeType >::hasQuerySet\fC [private]\fP

Indicates if a separate query set was passed.

Definition at line 279 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> MetricType \fBRASearch\fP< SortPolicy, MetricType, TreeType >::metric\fC [private]\fP

Instantiation of kernel.

Definition at line 287 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> bool \fBRASearch\fP< SortPolicy, MetricType, TreeType >::naive\fC [private]\fP

Indicates if naive random sampling on the set is being used.

Definition at line 282 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> size_t \fBRASearch\fP< SortPolicy, MetricType, TreeType >::numberOfPrunes\fC [private]\fP

Total number of pruned nodes during the neighbor search.

Definition at line 295 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> std::vector<size_t> \fBRASearch\fP< SortPolicy, MetricType, TreeType >::oldFromNewQueries\fC [private]\fP

Permutations of query points during tree building.

Definition at line 292 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> std::vector<size_t> \fBRASearch\fP< SortPolicy, MetricType, TreeType >::oldFromNewReferences\fC [private]\fP

Permutations of reference points during tree building.

Definition at line 290 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> arma::mat \fBRASearch\fP< SortPolicy, MetricType, TreeType >::queryCopy\fC [private]\fP

Copy of query dataset (if we need it, because tree building modifies it).

Definition at line 264 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> const arma::mat& \fBRASearch\fP< SortPolicy, MetricType, TreeType >::querySet\fC [private]\fP

Query dataset (may not be given).

Definition at line 269 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> TreeType* \fBRASearch\fP< SortPolicy, MetricType, TreeType >::queryTree\fC [private]\fP

Pointer to the root of the query tree (might not exist).

Definition at line 274 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> arma::mat \fBRASearch\fP< SortPolicy, MetricType, TreeType >::referenceCopy\fC [private]\fP

Copy of reference dataset (if we need it, because tree building modifies it).

Definition at line 262 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> const arma::mat& \fBRASearch\fP< SortPolicy, MetricType, TreeType >::referenceSet\fC [private]\fP

Reference dataset.

Definition at line 267 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> TreeType* \fBRASearch\fP< SortPolicy, MetricType, TreeType >::referenceTree\fC [private]\fP

Pointer to the root of the reference tree.

Definition at line 272 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> bool \fBRASearch\fP< SortPolicy, MetricType, TreeType >::singleMode\fC [private]\fP

Indicates if single-tree search is being used (opposed to dual-tree).

Definition at line 284 of file ra_search.hpp.

template<typename SortPolicy = NearestNeighborSort, typename MetricType = mlpack::metric::SquaredEuclideanDistance, typename TreeType = tree::BinarySpaceTree<bound::HRectBound<2, false>, RAQueryStat<SortPolicy> >> bool \fBRASearch\fP< SortPolicy, MetricType, TreeType >::treeOwner\fC [private]\fP

If true, this object created the trees and is responsible for them.

Definition at line 277 of file ra_search.hpp.

Author

Generated automatically by Doxygen for MLPACK from the source code.