PromptBase
Close icon
General
Home
Marketplace
Hire
Create
Login
Sell
Explore
🤖 GPT

Data Cleaning

Data Cleaning gpt prompt mini thumbnail
2Heart icon
Favorites
65Eye icon
Views
GPT-3.5-TURBO
TestedTick icon
TipsTick icon
Data Preprocessor
...more
$4.99
Get prompt
After purchasing, you will gain access to the prompt file which you can use with GPT or the app builder. You'll receive 20 free generation credits with this purchase. By purchasing this prompt, you agree to our terms of service.
Over 1 month ago

Prompt Details

Model
Chat - GPT-3.5 (gpt-3.5-turbo)
Token size
88 ($0.00010 / call)
Example input
[Dataset Type]: Tabular Datasets, Time Series Datasets, Spatial Datasets, Text Datasets, Image Datasets, Audio Datasets, Video Datasets, Graph Datasets
Example output
# Import necessary libraries import pandas as pd # Step 1: Load the dataset # Replace 'your_dataset.csv' with the actual file path or URL dataset_path = 'your_dataset.csv' df = pd.read_csv(dataset_path) # Step 2: Explore the dataset # Check basic information about the dataset, such as columns, data types, and missing values print("Dataset Overview:") print(df.info()) # Step 3: Handling Missing Values # Identify and handle missing values in the dataset # Check for missing values in each column missing_values = df.isnull().sum() # Display columns with missing values print("\nColumns with Missing Values:") print(missing_values[missing_values > 0]) # Option 1: Remove rows with missing values df_cleaned_rows = df.dropna() # Option 2: Impute missing values # Replace missing values with the mean, median, or a specific value # Example: Fill missing values in the 'column_name' with the mean df['column_name'].fillna(df['column_name'].mean(), inplace=True) # Step 4: Handling Duplicate Values # Identify and handle duplicate values in the dataset # Check for duplicate rows duplicate_rows = df[df.duplicated()] # Display duplicate rows print("\nDuplicate Rows:") print(duplicate_rows) # Remove duplicate rows df_cleaned_duplicates = df.drop_duplicates() # Step 5: Save the cleaned dataset # Save the cleaned dataset to a new file df_cleaned_duplicates.to_csv('cleaned_dataset.csv', index=False) # Summary print("\nCleaning Summary:") print(f"Original dataset shape: {df.shape}") print(f"Dataset shape after removing rows with missing values: {df_cleaned_rows.shape}") print(f"Dataset shape after removing duplicate rows: {df_cleaned_duplicates.shape}") # End of the cleaning process print("\nCleaning process completed.") Explanation of the Steps: Load the Dataset: Use the pd.read_csv() function from the pandas library to load the dataset into a DataFrame. Explore the Dataset: Use df.info() to get an overview of the dataset, including column names, data types, and the presence of missing values. Handling Missing Values: Identify columns with missing values using df.isnull().sum(). Choose between removing rows with missing values (df.dropna()) or imputing missing values using methods like mean, median, or a specific value. Handling Duplicate Values: Identify and display duplicate rows using df[df.duplicated()]. Remove duplicate rows with df.drop_duplicates(). Save the Cleaned Dataset: Save the cleaned dataset to a new CSV file using df.to_csv(). Cleaning Summary: Display the shapes of the dataset at different stages to summarize the cleaning process.
Browse Marketplace