PromptBase
Close icon
General
Home
Marketplace
Hire
Create
Login
Sell
Explore
πŸ€– GPT

Python Data Cleaning

Python Data Cleaning gpt prompt mini thumbnail
5.0Star icon
1 review
39Eye icon
Views
GPT-3.5-TURBO
TestedTick icon
TipsTick icon
1.πŸ“š Read the Book: Load messy potion book (dataset) 2.πŸ•΅οΈ Explore Chaos: Check types of info on each page (column) 3.πŸ”„ Magical Conversions: Convert strange numbers and dates 4.🧾 Categorize Ingredients: Assign numbers to ingredients (Label Encoding) 5.πŸ—οΈ Decode Spells: Translate codes using one-hot encoding 6.πŸ•°οΈ Time Traveling: Extract and understand date information 7.πŸ“Š Present Cleaned Book: Display organized pages (cleaned dataset) 8.πŸ’Ύ Save Tidy Book: Export cleaned book (CSV file)
...more
$3.99
Get prompt
After purchasing, you will gain access to the prompt file which you can use with GPT or the app builder. You'll receive 20 free generation credits with this purchase. By purchasing this prompt, you agree to our terms of service.
Over 1 month ago

Prompt Details

Model
Chat - GPT-3.5 (gpt-3.5-turbo)
Token size
119 ($0.00010 / call)
Example input
**[Dataset Type]:** Tabular Datasets **[Details]:** - File: [your_dataset.csv] - Columns: Numeric, Date, Text, ... **[Expectations]:** - Python script for data cleaning. - Handle conversions, categorical variables, text encoding, time-related features using pandas.
Example output
Certainly! Below is an extended Python script that focuses on refining data cleaning skills, with an emphasis on converting data types and addressing advanced considerations. The script includes comments to explain each step and provides insights and techniques for handling different scenarios such as dealing with categorical variables, encoding text data, and transforming time-related features. The example uses the popular pandas library for data manipulation and cleaning. ```python import pandas as pd from sklearn.preprocessing import LabelEncoder from datetime import datetime # Load your dataset (replace 'your_dataset.csv' with your actual file) df = pd.read_csv('your_dataset.csv') # Display the initial state of the dataset print("Initial Dataset:") print(df.head()) # Explore the data types of each column print("\nData Types:") print(df.dtypes) # ----------------------------------------------------------------------------- # Data Type Conversion # ----------------------------------------------------------------------------- # Convert specific columns to desired data types df['NumericColumn'] = pd.to_numeric(df['NumericColumn'], errors='coerce') df['DateColumn'] = pd.to_datetime(df['DateColumn'], errors='coerce') # ----------------------------------------------------------------------------- # Handling Categorical Variables # ----------------------------------------------------------------------------- # Identify and convert categorical columns categorical_columns = df.select_dtypes(include=['object']).columns # Use Label Encoding for categorical columns label_encoder = LabelEncoder() for col in categorical_columns: df[col] = label_encoder.fit_transform(df[col]) # ----------------------------------------------------------------------------- # Encoding Text Data # ----------------------------------------------------------------------------- # Example: One-Hot Encoding for a specific text column text_column = 'TextColumn' df = pd.get_dummies(df, columns=[text_column], prefix=text_column) # ----------------------------------------------------------------------------- # Transforming Time-Related Features # ----------------------------------------------------------------------------- # Extract features from DateColumn df['Year'] = df['DateColumn'].dt.year df['Month'] = df['DateColumn'].dt.month df['Day'] = df['DateColumn'].dt.day df['Weekday'] = df['DateColumn'].dt.weekday # Drop the original DateColumn if needed df = df.drop('DateColumn', axis=1) # ----------------------------------------------------------------------------- # Display the cleaned dataset # ----------------------------------------------------------------------------- print("\nCleaned Dataset:") print(df.head()) # Save the cleaned dataset to a new CSV file df.to_csv('cleaned_dataset.csv', index=False) print("\nCleaned dataset saved to 'cleaned_dataset.csv'") ``` Make sure to replace `'your_dataset.csv'` with the actual path or URL of your dataset file. Additionally, adjust column names and data types according to your dataset. This script covers basic data type conversions, handling categorical variables using Label Encoding, one-hot encoding text data, and transforming time-related features. Feel free to customize it further based on your specific dataset and cleaning requirements.
  • Reviews for this prompt (1)
- Jan 8, 2024
5.0
Filled star iconFilled star iconFilled star iconFilled star iconFilled star icon
- Verified PurchaseTick icon
"Good"
Browse Marketplace