!pip install --upgrade pip
!pip install scikit-learn
!pip install matplotlib
!pip install seaborn
!pip install numpy

Requirement already satisfied: pip in ./venv/lib/python3.10/site-packages (23.2)
Requirement already satisfied: scikit-learn in ./venv/lib/python3.10/site-packages (1.3.0)
Requirement already satisfied: numpy>=1.17.3 in ./venv/lib/python3.10/site-packages (from scikit-learn) (1.25.1)
Requirement already satisfied: scipy>=1.5.0 in ./venv/lib/python3.10/site-packages (from scikit-learn) (1.11.1)
Requirement already satisfied: joblib>=1.1.1 in ./venv/lib/python3.10/site-packages (from scikit-learn) (1.3.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in ./venv/lib/python3.10/site-packages (from scikit-learn) (3.2.0)
Requirement already satisfied: matplotlib in ./venv/lib/python3.10/site-packages (3.7.2)
Requirement already satisfied: contourpy>=1.0.1 in ./venv/lib/python3.10/site-packages (from matplotlib) (1.1.0)
Requirement already satisfied: cycler>=0.10 in ./venv/lib/python3.10/site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in ./venv/lib/python3.10/site-packages (from matplotlib) (4.41.0)
Requirement already satisfied: kiwisolver>=1.0.1 in ./venv/lib/python3.10/site-packages (from matplotlib) (1.4.4)
Requirement already satisfied: numpy>=1.20 in ./venv/lib/python3.10/site-packages (from matplotlib) (1.25.1)
Requirement already satisfied: packaging>=20.0 in ./venv/lib/python3.10/site-packages (from matplotlib) (23.1)
Requirement already satisfied: pillow>=6.2.0 in ./venv/lib/python3.10/site-packages (from matplotlib) (10.0.0)
Requirement already satisfied: pyparsing<3.1,>=2.3.1 in ./venv/lib/python3.10/site-packages (from matplotlib) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in ./venv/lib/python3.10/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: six>=1.5 in ./venv/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Requirement already satisfied: seaborn in ./venv/lib/python3.10/site-packages (0.12.2)
Requirement already satisfied: numpy!=1.24.0,>=1.17 in ./venv/lib/python3.10/site-packages (from seaborn) (1.25.1)
Requirement already satisfied: pandas>=0.25 in ./venv/lib/python3.10/site-packages (from seaborn) (2.0.3)
Requirement already satisfied: matplotlib!=3.6.1,>=3.1 in ./venv/lib/python3.10/site-packages (from seaborn) (3.7.2)
Requirement already satisfied: contourpy>=1.0.1 in ./venv/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.1.0)
Requirement already satisfied: cycler>=0.10 in ./venv/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in ./venv/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (4.41.0)
Requirement already satisfied: kiwisolver>=1.0.1 in ./venv/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.4.4)
Requirement already satisfied: packaging>=20.0 in ./venv/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (23.1)
Requirement already satisfied: pillow>=6.2.0 in ./venv/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (10.0.0)
Requirement already satisfied: pyparsing<3.1,>=2.3.1 in ./venv/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in ./venv/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in ./venv/lib/python3.10/site-packages (from pandas>=0.25->seaborn) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in ./venv/lib/python3.10/site-packages (from pandas>=0.25->seaborn) (2023.3)
Requirement already satisfied: six>=1.5 in ./venv/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.1->seaborn) (1.16.0)
Requirement already satisfied: numpy in ./venv/lib/python3.10/site-packages (1.25.1)

from sklearn import datasets
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

iris = datasets.load_iris(as_frame=True)

print(f"Contents of data: \n {iris.data}")

Contents of data: 
      sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                  5.1               3.5                1.4               0.2
1                  4.9               3.0                1.4               0.2
2                  4.7               3.2                1.3               0.2
3                  4.6               3.1                1.5               0.2
4                  5.0               3.6                1.4               0.2
..                 ...               ...                ...               ...
145                6.7               3.0                5.2               2.3
146                6.3               2.5                5.0               1.9
147                6.5               3.0                5.2               2.0
148                6.2               3.4                5.4               2.3
149                5.9               3.0                5.1               1.8

[150 rows x 4 columns]

print(f"Contents of target: \n {iris.target}")

Contents of target: 
 0      0
1      0
2      0
3      0
4      0
      ..
145    2
146    2
147    2
148    2
149    2
Name: target, Length: 150, dtype: int64

print(f"Contents of target_names. \nNote that 0 = setosa, 1 = versicolor, and 2 = virginica: \n {iris.target_names}")

Contents of target_names. 
Note that 0 = setosa, 1 = versicolor, and 2 = virginica: 
 ['setosa' 'versicolor' 'virginica']

sns.set()

plt.figure(figsize=(12, 4))
for index, feature in enumerate(iris.feature_names):
    plt.subplot(1, 4, index + 1)
    sns.boxplot(x=iris.target_names[iris.target], y=iris.data[feature], palette='Set3')
    plt.xlabel('Species')
    plt.ylabel(feature)

plt.tight_layout()
plt.show()

df = pd.concat([iris.data, iris.target], axis=1)
df.head()

sns.pairplot(data=df, hue='target')

/Users/scottcampit/Projects/intro-to-ml/venv/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)

<seaborn.axisgrid.PairGrid at 0x17e7f8d00>

from sklearn.cluster import KMeans

model = KMeans(n_clusters=3)

kmeans_pred = model.fit_predict(iris.data)

/Users/scottcampit/Projects/intro-to-ml/venv/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)

import numpy as np

# Predict the cluster labels
predicted_labels = model.labels_

# Create a mapping from predicted labels to true labels
label_mapping = {}
for cluster in np.unique(predicted_labels):
    cluster_labels = iris.target[predicted_labels == cluster]
    mapped_label = np.argmax(np.bincount(cluster_labels))
    label_mapping[cluster] = mapped_label

# Map the predicted labels to the true labels
mapped_predicted_labels = np.array([label_mapping[cluster] for cluster in predicted_labels])

# Print the mapped predicted labels
print("Mapped Predicted Labels:", mapped_predicted_labels)

Mapped Predicted Labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 2 2 2 2
 2 2 1 1 2 2 2 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2 1 2 2 2 1 2
 2 1]

df2 = pd.concat([iris.data, pd.Series(mapped_predicted_labels, name='prediction')], axis=1)
df2.head()

sns.pairplot(data=df, hue='target')

/Users/scottcampit/Projects/intro-to-ml/venv/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)

<seaborn.axisgrid.PairGrid at 0x2aef5a530>

sns.pairplot(df2, hue='prediction')

/Users/scottcampit/Projects/intro-to-ml/venv/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)

<seaborn.axisgrid.PairGrid at 0x2bd475060>

Iris setosa	Iris versicolor	Iris virginica

Source	Source	Source

Scikit-Learn for K-Means Clustering¶

Learning Objectives¶

Before we get started, here are some tips for beginners:¶

1. What is machine learning?¶

The Machine Learning Workflow¶

When can you use machine learning?¶

Check-in 1: Think about situations where machine learning is NOT useful? What are some situations where you wouldn't want to use machine learning?¶

When NOT to use machine learning¶

2. Project Overview¶

The Iris Dataset¶

Unsupervised Learning and Clustering¶

Project Goal¶

3. Training a K-means clustering algorithm on the Iris Dataset¶

Overview of `scikit-learn` and relevant data science libraries¶

Installing ML/Data Science Libraries¶

4. The Iris Dataset¶

Background¶

Load the Iris dataset¶

4. Exploratory Data Analysis¶

Box plots¶

Exercise 2: Are there other distinguishing things you can note about the feature distribution?¶

Pairplots¶

5. Training a K-Means Clustering Algorithm¶

Intuition for K-Means Clustering¶

Training a K-Means Algorithm using `scikit-learn`¶

6. Evaluating K-Means Clustering Through Visual Inspection¶

7. Challenges with evaluating K-means clustering¶

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

Scikit-Learn for K-Means Clustering¶

Learning Objectives¶

Before we get started, here are some tips for beginners:¶

1. What is machine learning?¶

The Machine Learning Workflow¶

When can you use machine learning?¶

Check-in 1: Think about situations where machine learning is NOT useful? What are some situations where you wouldn't want to use machine learning?¶

When NOT to use machine learning¶

2. Project Overview¶

The Iris Dataset¶

Unsupervised Learning and Clustering¶

Project Goal¶

3. Training a K-means clustering algorithm on the Iris Dataset¶

Overview of scikit-learn and relevant data science libraries¶

Installing ML/Data Science Libraries¶

4. The Iris Dataset¶

Background¶

Load the Iris dataset¶

4. Exploratory Data Analysis¶

Box plots¶

Exercise 2: Are there other distinguishing things you can note about the feature distribution?¶

Pairplots¶

5. Training a K-Means Clustering Algorithm¶

Intuition for K-Means Clustering¶

Training a K-Means Algorithm using scikit-learn¶

6. Evaluating K-Means Clustering Through Visual Inspection¶

7. Challenges with evaluating K-means clustering¶

Overview of `scikit-learn` and relevant data science libraries¶

Training a K-Means Algorithm using `scikit-learn`¶

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2