Example output
Certainly! Below is a Python code snippet that demonstrates how to train a Random Forest model using the `scikit-learn` library. This example includes data preprocessing, training, and evaluation of the model. For simplicity, let's use the well-known Iris dataset.
```python
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
from sklearn.datasets import load_iris
# Load dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)
# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
# Preprocess the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train the Random Forest model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)
# Predict on the test set
y_pred = model.predict(X_test_scaled)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
# Print evaluation metrics
print(f'Accuracy: {accuracy:.4f}')
print('Confusion Matrix:')
print(conf_matrix)
print('Classification Report:')
print(class_report)
# Feature importance
feature_importances = pd.DataFrame(model.feature_importances_, index=iris.feature_names, columns=['Importance']).sort_values('Importance', ascending=False)
print('Feature Importances:')
print(feature_importances)
```
### Explanation:
1. **Imports and Data Loading**:
- Imports essential libraries such as `numpy`, `pandas`, and `scikit-learn` modules.
- Loads the Iris dataset, a commonly used dataset for classification problems.
2. **Data Splitting**:
- Splits the data into training and testing sets with a test size of 30%, ensuring the classes are evenly distributed using `stratify`.
3. **Data Preprocessing**:
- Scales the features using `StandardScaler` to standardize the data.
4. **Model Training**:
- Initializes a `RandomForestClassifier` with 100 trees and trains it on the scaled training data.
5. **Model Prediction and Evaluation**:
- Predicts the labels for the test set.
- Calculates and prints the accuracy, confusion matrix, and classification report.
- Displays feature importances to understand which features contribute the most to the model.
This code snippet provides a solid foundation for training and evaluating a Random Forest classifier, following best practices for machine learning.