K-nearest neighbours

instance-based learning and non-generalising learning
- does not attempt to construct a model
- simply stores instances of the training data
Classification is computed from a simple majority vote of the nearest neighbours of each point.
Two different implementations:
- KNeighborsClassifier
- RadiusNeighborsClassifier

KNeighborsClassifier vs RadiusNeighborsClassifier

Untitled

from sklearn.neighbors import KNeighborsClassifier
kneighbor_classifier = KNeighborsClassifier()

kneighbor_classifier.fit(X_train, y_train)

n_neighbors (default = 5)
- Specify the number of nearest neighbors K
- value should be int
weights
- uniform (default)
- distance
  - weigh points by the inverse of their distance
  - closer neighbors of a query point will have a greater influence than neighbors which are further away
- own weight values
  - parameter also accepts a user-defined function which takes an array of distances as input, and returns an array of the same shape containing the weights.
algorithm
- ball_tree
- kd_tree
- brute
- auto (default)

For 'ball_tree' and 'kd_tree' algorithms, there are some other parameters to be set.

leaf_size (default = 30)
- can affect the speed of the construction and query, as well as the memory required to store the tree
metric
- Distance metric to use for the tree
- It is either string or callable function
  - “euclidean”, “manhattan”, “chebyshev”, “minkowski” (default), “wminkowski”, “seuclidean”, “mahalanobis”
p (default = 2)
- Power parameter for the Minkowski metric

The number of neighbors is specified within a fixed radius r of each training point using radius parameter.