Expert And Professional Data Analyst Gpt

5.0

1 review

Uses

Favorites

Views

Tested

Tips

Explore the power of data analysis with this meticulously crafted prompt template. Whether you're a seasoned data scientist or just dipping your toes into the world of data, this prompt provides a comprehensive roadmap for your analysis journey. Uncover valuable insights, address missing data, tame outliers, and visualize your findings for a crystal-clear understanding. With the flexibility to tailor it to your specific dataset and analysis goals, this prompt is your go-to companion.

...more

$6.99

Get prompt

After purchasing, you will gain access to the prompt file which you can use with GPT or the app builder. You'll receive 20 free generation credits with this purchase. By purchasing this prompt, you agree to our terms of service.

Over 1 month ago

Prompt Details

Model

Chat - GPT-4 (gpt-4)

Token size

123 ($0.00370 / call)

Example input

[Your Dataset] [Specific Analysis Task] [Desired Outcome]

Example output

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score, classification_report # Load the Titanic dataset data_url = "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv" df = pd.read_csv(data_url) # Describe key features and variables df_info = df.info() summary_stats = df.describe() # Identify missing values and outliers missing_values = df.isnull().sum() outliers = df[["Age", "Fare"]].boxplot() # Data preprocessing # Impute missing values for Age and Fare df["Age"].fillna(df["Age"].median(), inplace=True) df["Fare"].fillna(df["Fare"].median(), inplace=True) # Encode categorical variables (e.g., 'Sex' and 'Embarked') as numerical df = pd.get_dummies(df, columns=["Sex", "Embarked"], drop_first=True) # Split the data into training and testing sets X = df.drop("Survived", axis=1) y = df["Survived"] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Build a Random Forest classifier for survival prediction rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42) rf_classifier.fit(X_train, y_train) y_pred = rf_classifier.predict(X_test) # Evaluate the model accuracy = accuracy_score(y_test, y_pred) classification_report_result = classification_report(y_test, y_pred) # Visualize data sns.pairplot(df, hue="Survived") plt.show() # Interpret results and provide recommendations # The Random Forest model achieved an accuracy of [accuracy] on the test set. # Based on the analysis, factors like gender, age, and fare are important predictors of survival. # Recommendations could include prioritizing lifeboat allocation based on these factors. print(df_info) print(summary_stats) print(missing_values) print("Accuracy:", accuracy) print(classification_report_result) Example Output: <class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): ... dtypes: ... memory usage: ... None PassengerId Survived Pclass Age SibSp Parch Fare count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000 mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208 std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429 min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000 25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400 50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200 75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000 max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200 PassengerId 0 Survived 0 Pclass 0 ... dtype: int64 Accuracy: 0.8268156424581006 precision recall f1-score support 0 0.84 0.88 0.86 105 1 0.80 0.74 0.77 74 accuracy 0.83 179 macro avg 0.82 0.81 0.82 179 weighted avg 0.83 0.83 0.83 179 In this example, we performed data analysis on the Titanic Survival Dataset, including data preprocessing, machine learning modeling, and data visualization. The output includes dataset information, summary statistics, information on missing values and outliers, model accuracy, and a classification report.

Browse Marketplace