Binning Method
Binning is a technique for smoothing data or dealing with noisy data. The data is sorted first, and then the sorted values are dispersed into a number of buckets or bins in this approach. Binning methods provide local smoothing since they consult the vicinity of values.
Smoothing can be accomplished in three ways:
Bin smoothing entails: Each value in a bin is replaced by the bin's mean value when smoothing by bin means is used.
Smoothing by bin median: Each bin value is replaced by its bin median value in this method.
Smoothing by bin borders: In smoothing by bin boundaries, the bin boundaries are determined as the minimum and maximum values in a given bin. The nearest boundary value is then used to replace each bin value.
Example:
Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
Smoothing by bin median:
- Bin 1: 9 9, 9, 9
- Bin 2: 24, 24, 24, 24
- Bin 3: 29, 29, 29, 29
Implementation in Python
import numpy as np
import math
from sklearn.datasets import load_iris
from sklearn import datasets, linear_model, metrics
# load iris data set
dataset = load_iris()
a = dataset.data
b = np.zeros(150)
# take 1st column among 4 column of data set
for i in range (150):
b[i]=a[i,1]
b=np.sort(b) #sort the array
# create bins
bin1=np.zeros((30,5))
bin2=np.zeros((30,5))
bin3=np.zeros((30,5))
# Bin mean
for i in range (0,150,5):
k=int(i/5)
mean=(b[i] + b[i+1] + b[i+2] + b[i+3] + b[i+4])/5
for j in range(5):
bin1[k,j]=mean
print("Bin Mean: \n",bin1)
# Bin boundaries
for i in range (0,150,5):
k=int(i/5)
for j in range (5):
if (b[i+j]-b[i]) < (b[i+4]-b[i+j]):
bin2[k,j]=b[i]
else:
bin2[k,j]=b[i+4]
print("Bin Boundaries: \n",bin2)
# Bin median
for i in range (0,150,5):
k=int(i/5)
for j in range (5):
bin3[k,j]=b[i+2]
print("Bin Median: \n",bin3)
Output:
Bin Mean: [[2.18 2.18 2.18 2.18 2.18] [2.34 2.34 2.34 2.34 2.34] [2.48 2.48 2.48 2.48 2.48] [2.52 2.52 2.52 2.52 2.52] [2.62 2.62 2.62 2.62 2.62] [2.7 2.7 2.7 2.7 2.7 ] [2.74 2.74 2.74 2.74 2.74] [2.8 2.8 2.8 2.8 2.8 ] [2.8 2.8 2.8 2.8 2.8 ] [2.86 2.86 2.86 2.86 2.86] [2.9 2.9 2.9 2.9 2.9 ] [2.96 2.96 2.96 2.96 2.96] [3. 3. 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3.04 3.04 3.04 3.04 3.04] [3.1 3.1 3.1 3.1 3.1 ] [3.12 3.12 3.12 3.12 3.12] [3.2 3.2 3.2 3.2 3.2 ] [3.2 3.2 3.2 3.2 3.2 ] [3.26 3.26 3.26 3.26 3.26] [3.34 3.34 3.34 3.34 3.34] [3.4 3.4 3.4 3.4 3.4 ] [3.4 3.4 3.4 3.4 3.4 ] [3.5 3.5 3.5 3.5 3.5 ] [3.58 3.58 3.58 3.58 3.58] [3.74 3.74 3.74 3.74 3.74] [3.82 3.82 3.82 3.82 3.82] [4.12 4.12 4.12 4.12 4.12]] Bin Boundaries: [[2. 2.3 2.3 2.3 2.3] [2.3 2.3 2.3 2.4 2.4] [2.4 2.5 2.5 2.5 2.5] [2.5 2.5 2.5 2.5 2.6] [2.6 2.6 2.6 2.6 2.7] [2.7 2.7 2.7 2.7 2.7] [2.7 2.7 2.7 2.8 2.8] [2.8 2.8 2.8 2.8 2.8] [2.8 2.8 2.8 2.8 2.8] [2.8 2.8 2.9 2.9 2.9] [2.9 2.9 2.9 2.9 2.9] [2.9 2.9 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3. 3. 3. 3.1 3.1] [3.1 3.1 3.1 3.1 3.1] [3.1 3.1 3.1 3.1 3.2] [3.2 3.2 3.2 3.2 3.2] [3.2 3.2 3.2 3.2 3.2] [3.2 3.2 3.3 3.3 3.3] [3.3 3.3 3.3 3.4 3.4] [3.4 3.4 3.4 3.4 3.4] [3.4 3.4 3.4 3.4 3.4] [3.5 3.5 3.5 3.5 3.5] [3.5 3.6 3.6 3.6 3.6] [3.7 3.7 3.7 3.8 3.8] [3.8 3.8 3.8 3.8 3.9] [3.9 3.9 3.9 4.4 4.4]] Bin Median: [[2.2 2.2 2.2 2.2 2.2] [2.3 2.3 2.3 2.3 2.3] [2.5 2.5 2.5 2.5 2.5] [2.5 2.5 2.5 2.5 2.5] [2.6 2.6 2.6 2.6 2.6] [2.7 2.7 2.7 2.7 2.7] [2.7 2.7 2.7 2.7 2.7] [2.8 2.8 2.8 2.8 2.8] [2.8 2.8 2.8 2.8 2.8] [2.9 2.9 2.9 2.9 2.9] [2.9 2.9 2.9 2.9 2.9] [3. 3. 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3. 3. 3. 3. 3. ] [3.1 3.1 3.1 3.1 3.1] [3.1 3.1 3.1 3.1 3.1] [3.2 3.2 3.2 3.2 3.2] [3.2 3.2 3.2 3.2 3.2] [3.3 3.3 3.3 3.3 3.3] [3.3 3.3 3.3 3.3 3.3] [3.4 3.4 3.4 3.4 3.4] [3.4 3.4 3.4 3.4 3.4] [3.5 3.5 3.5 3.5 3.5] [3.6 3.6 3.6 3.6 3.6] [3.7 3.7 3.7 3.7 3.7] [3.8 3.8 3.8 3.8 3.8] [4.1 4.1 4.1 4.1 4.1]]
Comments