Skip to content

Commit d402cd0

Browse files
authored
Fix SettingWithCopy warning by pandas (TheAlgorithms#2346)
* Fix SettingWithCopy warning in pandas TheAlgorithms#2282 * Update k_means_clust.py * Update k_means_clust.py * Update k_means_clust.py * Update k_means_clust.py * Update k_means_clust.py * Update k_means_clust.py
1 parent ee28dee commit d402cd0

File tree

1 file changed

+2
-15
lines changed

1 file changed

+2
-15
lines changed

machine_learning/k_means_clust.py

+2-15
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,10 @@
11
"""README, Author - Anurag Kumar(mailto:anuragkumarak95@gmail.com)
2-
32
Requirements:
43
- sklearn
54
- numpy
65
- matplotlib
7-
86
Python:
97
- 3.5
10-
118
Inputs:
129
- X , a 2D numpy array of features.
1310
- k , number of clusters to create.
@@ -16,20 +13,16 @@
1613
- maxiter , maximum number of iterations to process.
1714
- heterogeneity , empty list that will be filled with hetrogeneity values if passed
1815
to kmeans func.
19-
2016
Usage:
2117
1. define 'k' value, 'X' features array and 'hetrogeneity' empty list
22-
2318
2. create initial_centroids,
2419
initial_centroids = get_initial_centroids(
2520
X,
2621
k,
2722
seed=0 # seed value for initial centroid generation,
2823
# None for randomness(default=None)
2924
)
30-
3125
3. find centroids and clusters using kmeans function.
32-
3326
centroids, cluster_assignment = kmeans(
3427
X,
3528
k,
@@ -38,19 +31,14 @@
3831
record_heterogeneity=heterogeneity,
3932
verbose=True # whether to print logs in console or not.(default=False)
4033
)
41-
42-
4334
4. Plot the loss function, hetrogeneity values for every iteration saved in
4435
hetrogeneity list.
4536
plot_heterogeneity(
4637
heterogeneity,
4738
k
4839
)
49-
5040
5. Transfers Dataframe into excel format it must have feature called
5141
'Clust' with k means clustering numbers in it.
52-
53-
5442
"""
5543
import warnings
5644

@@ -222,7 +210,6 @@ def ReportGenerator(
222210
in order to run the function following libraries must be imported:
223211
import pandas as pd
224212
import numpy as np
225-
226213
>>> data = pd.DataFrame()
227214
>>> data['numbers'] = [1, 2, 3]
228215
>>> data['col1'] = [0.5, 2.5, 4.5]
@@ -287,10 +274,10 @@ def ReportGenerator(
287274
.T.reset_index()
288275
.rename(index=str, columns={"level_0": "Features", "level_1": "Type"})
289276
) # rename columns
290-
277+
# calculate the size of cluster(count of clientID's)
291278
clustersize = report[
292279
(report["Features"] == "dummy") & (report["Type"] == "count")
293-
] # calculate the size of cluster(count of clientID's)
280+
].copy() # avoid SettingWithCopyWarning
294281
clustersize.Type = (
295282
"ClusterSize" # rename created cluster df to match report column names
296283
)

0 commit comments

Comments
 (0)