Feature Extraction

<aside> 💡 sklearn.feature_extraction

</aside>

Converts lists of mappings of feature name and feature value, into a matrix.

Untitled

High-speed, low-memory vectorizer that uses feature hashing technique.
Instead of building a hash table of the features, as the vectorizers do, it applies a hash function to the features to determine their column index in sample matrices directly.
This results in increased speed and reduced memory usage, at the expense of inspectability; the hasher does not remember what the input features looked like and has no inverse_transform method.
The output is scipy.sparse matrix.

Data Cleaning

<aside> 💡 sklearn.impute

</aside>

Fills missing values with one of the following strategies: 'mean', 'median', 'most_frequent' and 'constant'.

Uses the k-nearest neighbours approach to fill missing values in a dataset.
The missing value of an attribute in a specific example is filled with the mean value of the same attribute of n_neighbors closest neighbours.
The nearest neighbours are decided based on Euclidean distance.