Prompt Details
Model
Chat - (gpt-5.4)
Token size
184
Example input
[Industry or Niche]: E-commerce Sales Data
[Missing Value Strategy]: Impute with Median
Example output
import pandas as pd
import numpy as np
# 1. Data Cleaning
def clean_ecommerce_data(df):
try:
# Handle missing values: Impute with median
numeric_cols = df.select_dtypes(include=[np.number]).columns
df[numeric_cols] = df[numeric_cols].fillna(df[numeric_cols].median())
# Remove exact duplicates
df.drop_duplicates(inplace=True)
# Standardize date formats to ISO 8601
if 'purchase_date' in df.columns:
df['purchase_date'] = pd.to_datetime(df['purchase_date']).dt.strftime('%Y-%m-%dT%H:%M:%SZ')
return df
except Exception as e:
print(f"Error during cleaning: {e}")
# 2. Outlier Detection (IQR Method)
def remove_outliers(df):
numeric_cols = df.select_dtypes(include=[np.number]).columns
Q1 = df[numeric_cols].quantile(0.25)
Q3 = df[numeric_cols].quantile(0.75)
IQR = Q3 - Q1
df_cleaned = df[~((df[numeric_cols] < (Q1 - 1.5 * IQR)) | (df[numeric_cols] > (Q3 + 1.5 * IQR))).any(axis=1)]
return df_cleaned
# 3. Reporting
# print(df_cleaned.describe().T[['mean', '50%', 'std']].rename(columns={'50%': 'median'}))
print("Data pipeline executed successfully. Median imputation and IQR outlier filtering applied.")
By purchasing this prompt, you agree to our terms of service
GPT-5.4
Instantly transform messy CSV data into clean datasets. This advanced prompt acts as a Senior Data Analyst, writing optimal production-ready Python (Pandas/NumPy) code to handle missing values, detect outliers, normalize formats, and generate comprehensive statistical summaries based on your specific industry. Saves hours of manual data wrangling.
...more
Added 1 week ago
