How to randomly sample a Pandas DataFrame?

This recipe helps you randomly sample a Pandas DataFrame

Recipe Objective

While working on a dataset we sometimes need to randomly select fixed or random number of rows for some test. So how to select random rows.

This data science python source code does the following:
1. Creates data dictionary
2. Converts dictionary into pandas dataframe
3. Randomly selects subsets from datasample.

So this is the recipe on How we can randomly sample a Pandas DataFrame.

Step 1 - Import the library

import pandas as pd import numpy as np

We have only imported pandas and numpy which is needed.

Step 2 - Setting up the Data

We have created a dictionary of data and passed it in pd.DataFrame to make a dataframe with columns 'first_name', 'last_name', 'age', 'Comedy_Score' and 'Rating_Score'. raw_data = {'first_name': ['Sheldon', 'Raj', 'Leonard', 'Howard', 'Amy'], 'last_name': ['Copper', 'Koothrappali', 'Hofstadter', 'Wolowitz', 'Fowler'], 'age': [42, 38, 36, 41, 35], 'Comedy_Score': [9, 7, 8, 8, 5], 'Rating_Score': [25, 25, 49, 62, 70]} df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'Comedy_Score', 'Rating_Score']) print(df)

Step 3 - Selecting random subsets

We can select random subsets of rows by df.take and passing random permutation of number from the length of df. We have done this twice for 2 and 4 samples to select. print(df.take(np.random.permutation(len(df))[:2])) print(df.take(np.random.permutation(len(df))[:4])) df1 = df.sample(3) print(df1) So the output comes as

  first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25
2    Leonard    Hofstadter   36             8            49
3     Howard      Wolowitz   41             8            62
4        Amy        Fowler   35             5            70

  first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25

  first_name     last_name  age  Comedy_Score  Rating_Score
2    Leonard    Hofstadter   36             8            49
4        Amy        Fowler   35             5            70
1        Raj  Koothrappali   38             7            25
3     Howard      Wolowitz   41             8            62

  first_name     last_name  age  Comedy_Score  Rating_Score
1        Raj  Koothrappali   38             7            25
0    Sheldon        Copper   42             9            25
3     Howard      Wolowitz   41             8            62

Download Materials


What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Stock Price Prediction Project using LSTM and RNN
Learn how to predict stock prices using RNN and LSTM models. Understand deep learning concepts and apply them to real-world financial data for accurate forecasting.

Learn How to Build a Logistic Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple logistic regression model in PyTorch for customer churn prediction.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Loan Default Prediction Project using Explainable AI ML Models
Loan Default Prediction Project that employs sophisticated machine learning models, such as XGBoost and Random Forest and delves deep into the realm of Explainable AI, ensuring every prediction is transparent and understandable.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Azure Text Analytics for Medical Search Engine Deployment
Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

Learn Hyperparameter Tuning for Neural Networks with PyTorch
In this Deep Learning Project, you will learn how to optimally tune the hyperparameters (learning rate, epochs, dropout, early stopping) of a neural network model in PyTorch to improve model performance.

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Classification Projects on Machine Learning for Beginners - 1
Classification ML Project for Beginners - A Hands-On Approach to Implementing Different Types of Classification Algorithms in Machine Learning for Predictive Modelling

Image Classification Model using Transfer Learning in PyTorch
In this PyTorch Project, you will build an image classification model in PyTorch using the ResNet pre-trained model.