Prompt Details
Model
Chat - GPT-3.5 (gpt-3.5-turbo)
Token size
59 ($0.00010 / call)
Example input
I want you to act as a data analyst. I have a dataset of [name of data set]. My dataset have such columns [put the names of you column1], [put the names of you column2], [put the names of you column3], etc.
Please write python code to clean the data by removing missing values, duplicates, and outliers.
Example output
import pandas as pd
import numpy as np
import seaborn as sns
# Load the dataset
df = pd.read_csv('reference_channels.csv')
# Check for missing values
print('Missing values before cleaning:\n', df.isnull().sum())
# Drop missing values
df.dropna(inplace=True)
# Check for duplicates
print('Number of duplicates before cleaning:', df.duplicated().sum())
# Drop duplicates
df.drop_duplicates(inplace=True)
# Identify and remove outliers using z-score
outliers = np.abs(df['MKT_TAG'] - df['MKT_TAG'].mean()) / df['MKT_TAG'].std() > 3
df = df[~outliers]
# Check for missing values after cleaning
print('Missing values after cleaning:\n', df.isnull().sum())