How to impute missing class labels using nearest neighbours in Python?

This recipe helps you impute missing class labels using nearest neighbours in Python

Recipe Objective

Have you ever tried to impute calss labels? We can impute class labels by K nearest neighbours by training it on known data and predicting the class labels.

So this is the recipe on how we can impute missing class labels using nearest neighbours in Python.

List of Classification Algorithms in Machine Learning

Step 1 - Import the library

import numpy as np from sklearn.neighbors import KNeighborsClassifier

We have imported numpy and KNeighborsClassifier which is needed.

Step 2 - Setting up the Data

We have created a feature matrix using array and we will use this to train the KNN model. X = np.array([[0, 2.10, 1.45], [2, 1.18, 1.33], [0, 1.22, 1.27], [1, 1.32, 1.97], [1, -0.21, -1.19]]) We have created a matrix with missing class labels. X_with_nan = np.array([[np.nan, 0.87, 1.31], [np.nan, 0.37, 1.91], [np.nan, 0.54, 1.27], [np.nan, -0.67, -0.22]])

Step 3 - Predicting the Class Labels

We are training the KNeighborsClassifier with parameters K equals to 3 and weights equals to distance. We have used the matrix X to train the model. clf = KNeighborsClassifier(3, weights="distance") trained_model = clf.fit(X[:,1:], X[:,0]) We have predicted the class labels of matrix "X_with_nan". imputed_values = trained_model.predict(X_with_nan[:,1:]) print(imputed_values) So finally we have filled the null values with the predicted output of model. X_with_imputed = np.hstack((imputed_values.reshape(-1,1), X_with_nan[:,1:])) print(); print(X_with_imputed) So the output comes as

[2. 1. 2. 1.]

[[ 2.    0.87  1.31]
 [ 1.    0.37  1.91]
 [ 2.    0.54  1.27]
 [ 1.   -0.67 -0.22]]

Download Materials


What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Build Customer Propensity to Purchase Model in Python
In this machine learning project, you will learn to build a machine learning model to estimate customer propensity to purchase.

Build a Text Generator Model using Amazon SageMaker
In this Deep Learning Project, you will train a Text Generator Model on Amazon Reviews Dataset using LSTM Algorithm in PyTorch and deploy it on Amazon SageMaker.

Build a Churn Prediction Model using Ensemble Learning
Learn how to build ensemble machine learning models like Random Forest, Adaboost, and Gradient Boosting for Customer Churn Prediction using Python

NLP Project for Beginners on Text Processing and Classification
This Project Explains the Basic Text Preprocessing and How to Build a Classification Model in Python

Build CI/CD Pipeline for Machine Learning Projects using Jenkins
In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

Recommender System Machine Learning Project for Beginners-4
Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.

Build a Speech-Text Transcriptor with Nvidia Quartznet Model
In this Deep Learning Project, you will leverage transfer learning from Nvidia QuartzNet pre-trained models to develop a speech-to-text transcriptor.

Build Real Estate Price Prediction Model with NLP and FastAPI
In this Real Estate Price Prediction Project, you will learn to build a real estate price prediction machine learning model and deploy it on Heroku using FastAPI Framework.

Medical Image Segmentation Deep Learning Project
In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

Langchain Project for Customer Support App in Python
In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.