Example input
Here's how this code works:
First, it imports the necessary libraries for data manipulation, K-Means clustering, and visualization.
Next, it loads your dataset and selects the columns you want to use for clustering.
It then defines the number of clusters you want to form.
The code performs K-Means clustering on your data using the KMeans() function from scikit-learn.
Finally, it visualizes the resulting clusters using the plt.scatter() function from matplotlib.
Example output
# Importing libraries
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# Load the data
data = pd.read_csv('raw_data.csv')
# Select the columns you want to use for clustering
X = data[['Feature 1', 'Feature 2', 'Feature 3', 'Feature 4']].values
# Define the number of clusters you want to form
num_clusters = 3
# Perform K-Means clustering
kmeans = KMeans(n_clusters=num_clusters, init='k-means++', max_iter=300, n_init=10, random_state=0)
y_kmeans = kmeans.fit_predict(X)
# Visualize the clusters
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow', label = 'Centroids')
plt.title('Clusters')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()