Open In App

Pandas Dataframe.sample() | Python

Last Updated : 11 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Pandas DataFrame.sample() function is used to select randomly rows or columns from a DataFrame. It proves particularly helpful while dealing with huge datasets where we want to test or analyze a small representative subset. We can define the number or proportion of items to sample and manage randomness through parameters such as n, frac and random_state.

Example : Sampling a Single Random Row

In this example, we load a dataset and generate a single random row using the sample() method by setting n=1.

C++
import pandas as pd

# Load dataset
d = pd.read_csv("employees.csv")

# Sample one random row
r_row = d.sample(n=1)

# Display the result
r_row

Output

sample_one_row
one row of dataframe

The sample(n=1) function selects one random row from the DataFrame.

Syntax

DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

Parameters:

  • n: int value, Number of random rows to generate.
  • frac: Float value, Returns (float value * length of data frame values ) . frac cannot be used with n.
  • replace: Boolean value, return sample with replacement if True.
  • random_state: int value or numpy.random.RandomState, optional. if set to a particular integer, will return same rows as sample in every iteration.
  • axis: 0 or 'row' for Rows and 1 or 'column' for Columns.

Return Type: New object of same type as caller.

To download the CSV file used, Click Here.

Examples of Pandas Dataframe.sample()

Example 1: Sample 25% of the DataFrame

In this example, we generate a random sample consisting of 25% of the entire DataFrame by using the frac parameter.

C++
import pandas as pd
d = pd.read_csv("employees.csv")

# Sample 25% of the data
sr = d.sample(frac=0.25)

# Verify the number of rows
print(f"Original rows: {len(d)}")
print(f"Sampled rows (25%): {len(sr)}")

# Display the result
sr

Output

25_sample_data
25% of dataframe

As shown in the output image, the length of sample generated is 25% of data frame. Also the sample is generated randomly.

Example 2: Sampling with Replacement and a Fixed Random State

This example demonstrates how to sample multiple rows with replacement (i.e., allowing repetition of rows) and ensures reproducibility using a fixed random seed.

C++
import pandas as pd
d = pd.read_csv("employees.csv")

# Sample 3 rows with replacement and fixed seed
sd = d.sample(n=3, replace=True, random_state=42)

sd

Output

Sample_random_state
sampling with replacement

The replace=True parameter allows the same row to be sampled more than once, making it ideal for bootstrapping. random_state=42 ensures the result is reproducible across multiple runs very useful during testing and debugging.


Similar Reads

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy