How to Convert Dict to Array in Python using dictvectorizer?

Learn the efficient way to convert a Python dict to array using the powerful DictVectorizer. Explore step-by-step instructions & examples by ProjectPro

Python provides powerful tools for data manipulation and analysis, and working with dictionaries is a common practice in data handling. However, when it comes to machine learning and numerical analysis, converting dictionaries to arrays is often necessary. One useful tool for this task is the DictVectorizer class from the scikit-learn library. Check out this dictvectorizer tutorial to understand how to convert a dictionary to an array in Python using DictVectorizer, and we will cover key aspects such as converting dictionary keys to an array and vice versa.

What is DictVectorizer?

DictVectorizer is a class in scikit-learn that transforms dictionaries into NumPy arrays, suitable for machine learning algorithms. It works by converting dictionaries with categorical features into a sparse matrix representation, where each unique category becomes a feature column.

DictVectorizer Tutorial: Converting Dict to Array in Python

The DictVectorizer class is a powerful tool for converting dictionaries into arrays. It transforms a list of dictionaries into a NumPy array, making it easier to work with machine learning algorithms that require numerical input. Check below the step-by-step guide to learn How to Convert Dict to Array in Python using dictvectorizer -

Step 1: Importing Required Libraries

from sklearn.feature_extraction import DictVectorizer

Step 2: Creating a Sample Dictionary

data = [{'feature1': 10, 'feature2': 20},

        {'feature1': 15, 'feature2': 25},

        {'feature1': 18, 'feature2': 30}]

Step 3:  Create a DictVectorizer instance

vectorizer = DictVectorizer(sparse=False)

Step 4: Fit and transform the data

array_representation = vectorizer.fit_transform(data)

Step 5: Examining the Result

print(array_representation)

Convert dict to array in Python

In the above code, each dictionary represents a data point, and DictVectorizer converts it into a numerical array. The sparse=False parameter ensures that the output is a dense array instead of a sparse matrix.

How to Convert Python Dict Keys to Array?

Sometimes, you may only be interested in extracting the keys from a dictionary and converting them into an array. This can be achieved using the keys() method and converting it to a list. 

Here's how you can do it:

Convert Python dict keys to array

In this example, keys_array will contain the keys of the dictionary as elements in the array.

How to Convert Python Array to Dict?

Converting an array back to a dictionary involves creating a new dictionary and populating it with the array elements. Check out the example below:

Python array to dict

The above example involves initializing a new dictionary where each key is taken from the array, and the values are set to None. You can then update the values based on your requirements.

How to Convert a Dictionary into a Matrix? 

Step 1 - Import the library

from sklearn.feature_extraction import DictVectorizer

We have only imported the DictVectorizer which is needed.

Step 2 - Setting up the Data

We have created a dictionary of data with three features named 'Pen', 'Pencil' and 'Eraser'. Each three features have values assigned to them.

    data_dict = [{'Pen': 2, 'Pencil': 4},

                 {'Pen': 4, 'Pencil': 3},

                 {'Pen': 1, 'Eraser': 2},

                 {'Pen': 2, 'Eraser': 2}]

    print(data_dict)

Step 3 - Converting Dictionary into Matrix

So here we want to convert a dictionary into a matrix. So we have used DictVectorizer to do so, it will create a matrix such that each column will signify a feature and rows will be the samples of the dictionary. Finally we have also printed the feature name using get_feature_names.

    dictvectorizer = DictVectorizer(sparse=False)

    features = dictvectorizer.fit_transform(data_dict)

    print(features)

    feature_name =dictvectorizer.get_feature_names()

    print(feature_name)

So the output comes as

[{'Pen': 2, 'Pencil': 4}, {'Pen': 4, 'Pencil': 3}, {'Pen': 1, 'Eraser': 2}, {'Pen': 2, 'Eraser': 2}]

[[0. 2. 4.]

 [0. 4. 3.]

 [2. 1. 0.]

 [2. 2. 0.]]

['Eraser', 'Pen', 'Pencil']

Directorized Example and Use Cases 

Here are a few examples to better understand how to use DictVectorizer and why it can be beneficial.

  • Handling Categorical Data

DictVectorizer automatically handles categorical data by converting it into binary features. In our example, the 'city' key represents categorical data, and the vectorizer transforms it into binary columns.

  • Dealing with Missing Values

If a dictionary has missing values, DictVectorizer replaces them with zeros in the resulting array.

Dealing with missing values

The missing 'city' and 'age' values will be filled with zeros.

DictVectorizer vs. Other Vectorizers 

Let's now briefly compare DictVectorizer with other vectorization techniques like OneHotEncoder and simple CountVectorizer.

DictVectorizer vs. OneHotEncoder 

While both are used for handling categorical data, DictVectorizer is more flexible as it can handle mixed data types. OneHotEncoder is designed specifically for categorical variables and may not handle non-categorical data as effectively.

DictVectorizer vs. CountVectorizer

CountVectorizer is commonly used for converting text data into numerical features, whereas DictVectorizer is more general-purpose, working well with dictionaries containing various data types.

DictVectorizer in Action with ProjectPro! 

Converting dictionaries to arrays is a critical step in data preparation for machine learning models, and Python's scikit-learn library offers a powerful solution with DictVectorizer. Through our exploration of examples and comparisons with alternative vectorization techniques, DictVectorizer proves its versatility and effectiveness, particularly in managing mixed data types and categorical features during preprocessing. Beyond this, the paramount importance of gaining practical experience through real-world projects is emphasized. This hands-on approach is fundamental for true mastery, and ProjectPro  can help you achieve this. Wiith a repository of over 270+ projects focused on data science and big data, ProjectPro facilitates a seamless transition from theory to application. Engaging with ProjectPro not only enhances theoretical understanding but also cultivates essential practical skills, making it an indispensable resource for aspiring data scientists .


Download Materials


What Users are saying..

profile image

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd
linkedin profile url

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

AWS Project to Build and Deploy LSTM Model with Sagemaker
In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

NLP Project for Beginners on Text Processing and Classification
This Project Explains the Basic Text Preprocessing and How to Build a Classification Model in Python

Classification Projects on Machine Learning for Beginners - 1
Classification ML Project for Beginners - A Hands-On Approach to Implementing Different Types of Classification Algorithms in Machine Learning for Predictive Modelling

MLOps Project on GCP using Kubeflow for Model Deployment
MLOps using Kubeflow on GCP - Build and deploy a deep learning model on Google Cloud Platform using Kubeflow pipelines in Python

Build Multi Class Text Classification Models with RNN and LSTM
In this Deep Learning Project, you will use the customer complaints data about consumer financial products to build multi-class text classification models using RNN and LSTM.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

End-to-End Snowflake Healthcare Analytics Project on AWS-1
In this Snowflake Healthcare Analytics Project, you will leverage Snowflake on AWS to predict patient length of stay (LOS) in hospitals. The prediction of LOS can help in efficient resource allocation, lower the risk of staff/visitor infections, and improve overall hospital functioning.

Recommender System Machine Learning Project for Beginners-2
Recommender System Machine Learning Project for Beginners Part 2- Learn how to build a recommender system for market basket analysis using association rule mining.

Learn How to Build a Linear Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple linear regression model in PyTorch to predict the number of days subscribed.

Stock Price Prediction Project using LSTM and RNN
Learn how to predict stock prices using RNN and LSTM models. Understand deep learning concepts and apply them to real-world financial data for accurate forecasting.