Data Science Project Generator

Name: Data Science Project Generator GPT Prompt
Brand: PromptBase
Price: 4.99 USD
Availability: InStock
Author: basemanprompt

Data Science Project Generator gpt prompt mini thumbnail

Views

GPT-4O

Tested

Tips

4 examples Tick icon

@basemanprompt

Data Science Project Blueprint Generator

...more

$4.99

Get prompt

After purchasing, you will gain access to the prompt file which you can use with GPT or the app builder. You'll receive 20 free generation credits with this purchase. By purchasing this prompt, you agree to our terms of service.

Over 1 month ago

Prompt Details

Model

Chat - GPT-4o (gpt-4o)

Token size

75 ($0.00375 / call)

Example input

[specific problem or research question]: customer churn prediction [specific industry or target group]: telecommunication companies

Example output

### Project Blueprint: Customer Churn Prediction for Telecommunication Companies #### **1. Objectives** - **Primary Objective:** Accurately predict customer churn to enable proactive retention strategies. - **Secondary Objectives:** - Identify key factors contributing to churn. - Segment customers based on churn risk and associated attributes. - Develop actionable recommendations to reduce churn and improve customer satisfaction. --- #### **2. Data Sources** - **Customer Data:** - Demographic details (age, gender, location, etc.). - Contract information (type, duration, payment method). - Account status (tenure, billing history, overdue payments). - **Service Usage Data:** - Call data records (duration, frequency, types of calls). - Internet usage (bandwidth, data consumption patterns). - Subscription details (plans, upgrades, or downgrades). - **Customer Interaction Data:** - Complaint records. - Customer support tickets and resolution times. - Satisfaction survey responses. - **External Data (Optional):** - Market trends and competition analysis. - Social media sentiment analysis related to the company. --- #### **3. Recommended Preprocessing Steps** 1. **Data Cleaning:** - Handle missing values using imputation techniques (mean/median/mode for numerical, most frequent category for categorical). - Remove duplicate or inconsistent records. - Normalize inconsistent formats (e.g., date formats). 2. **Feature Engineering:** - Create derived metrics such as average revenue per user (ARPU), churn score, etc. - Encode categorical variables using one-hot encoding or label encoding. - Bin continuous variables into meaningful groups if necessary (e.g., age brackets, tenure buckets). 3. **Outlier Detection:** - Use boxplots or z-score methods to identify outliers and decide on treatment strategies (cap, remove, or transform). 4. **Data Balancing:** - Address class imbalance using techniques like SMOTE (Synthetic Minority Oversampling Technique) or cost-sensitive learning. 5. **Scaling:** - Standardize or normalize numerical features for algorithms sensitive to feature scales (e.g., Logistic Regression, SVM). --- #### **4. Key Analysis Methods** 1. **Exploratory Data Analysis (EDA):** - Visualize customer distributions, correlations, and patterns using tools like Matplotlib, Seaborn, or Tableau. - Identify trends in churn rates by demographics, service usage, and other attributes. 2. **Feature Selection:** - Use statistical methods (e.g., Chi-square test, ANOVA) and feature importance techniques (e.g., from Random Forest, SHAP values). 3. **Predictive Modeling:** - Train machine learning models such as: - Logistic Regression (baseline model for interpretability). - Decision Trees or Random Forests. - Gradient Boosting Machines (XGBoost, LightGBM, or CatBoost). - Neural Networks (for complex, high-dimensional data). 4. **Model Evaluation:** - Evaluate model performance using metrics like accuracy, precision, recall, F1-score, and ROC-AUC. - Perform cross-validation to ensure robustness and avoid overfitting. 5. **Churn Risk Segmentation:** - Use clustering algorithms (e.g., K-Means, DBSCAN) to group customers based on risk levels and contributing factors. --- #### **5. Tools and Technologies** - **Programming Languages:** - Python (libraries: Pandas, NumPy, Scikit-learn, TensorFlow, Keras, XGBoost, SHAP). - R (for statistical analysis and visualization). - **Data Visualization:** - Tableau, Power BI, or Python (Matplotlib, Seaborn, Plotly). - **Big Data Processing (if applicable):** - Apache Spark or Hadoop for large-scale datasets. - **Cloud Platforms:** - AWS (SageMaker), Google Cloud (Vertex AI), or Microsoft Azure (ML Studio). - **Version Control and Collaboration:** - Git, GitHub, or GitLab. --- #### **6. Presentation and Insights Communication** 1. **Executive Summary:** - Present key findings in simple terms, focusing on high-impact insights. - Highlight actionable recommendations to reduce churn. 2. **Data Visualization:** - Use dashboards to present churn trends, segment analysis, and predictive model outputs interactively. - Include visual explanations for model interpretability (e.g., SHAP plots or LIME results). 3. **Customer Profiles:** - Show representative profiles for high-risk churn customers with recommended actions. 4. **Impact Simulation:** - Provide "what-if" analysis to demonstrate the potential business impact of implementing retention strategies. 5. **Technical Report (Optional):** - Share detailed methodology, data preprocessing steps, and model performance metrics for technical stakeholders. --- This blueprint provides a structured approach to developing a churn prediction system for telecommunications. Each step ensures a balance between technical rigor and practical business application.