DBSCAN clustering algorithm :
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
# Generate sample data
np.random.seed(0)
X = np.concatenate([np.random.randn(50, 2), np.random.randn(50, 2) + np.array([10, 10])])
# Fit the DBSCAN model
dbscan = DBSCAN(eps=1, min_samples=10)
labels = dbscan.fit_predict(X)
# Plot the data points with different colors for each cluster
plt.scatter(X[:,0], X[:,1], c=labels)
# Add a title and axis labels
plt.title("DBSCAN Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
# Show the plot
plt.show()
In this example, we generate 100 data points using numpy's randn function, with 50 data points located around the origin, and another 50 data points located 10 units away in both x and y dimensions. These data points are then fit to a DBSCAN model with eps set to 1 and min_samples set to 10. The eps parameter determines the maximum distance between two samples for them to be considered as part of the same cluster, while the min_samples parameter determines the minimum number of samples required to form a dense region. The cluster assignments for each data point are obtained using the fit_predict method of the fitted model. Finally, the data points are plotted using matplotlib's scatter function, where each data point is colored based on its cluster assignment.
DBSCAN is a density-based clustering algorithm that can handle data with arbitrary shapes and is able to identify clusters with varying densities. Unlike KMeans, DBSCAN does not require the user to specify the number of clusters, but instead determines the number of clusters based on the input data.
Matplot Lib :
Comentarios