Skip to main content

Binning Method by Data smoothing in python

 Binning Method

Binning is a technique for smoothing data or dealing with noisy data. The data is sorted first, and then the sorted values are dispersed into a number of buckets or bins in this approach. Binning methods provide local smoothing since they consult the vicinity of values. 

Smoothing can be accomplished in three ways:

Bin smoothing entails: Each value in a bin is replaced by the bin's mean value when smoothing by bin means is used. 

Smoothing by bin median: Each bin value is replaced by its bin median value in this method. 

Smoothing by bin borders: In smoothing by bin boundaries, the bin boundaries are determined as the minimum and maximum values in a given bin. The nearest boundary value is then used to replace each bin value.

Example:

Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34

Smoothing by bin means:

      - Bin 1: 9, 9, 9, 9

      - Bin 2: 23, 23, 23, 23

      - Bin 3: 29, 29, 29, 29

Smoothing by bin boundaries:

      - Bin 1: 4, 4, 4, 15

      - Bin 2: 21, 21, 25, 25

      - Bin 3: 26, 26, 26, 34

Smoothing by bin median:

      - Bin 1: 9 9, 9, 9

      - Bin 2: 24, 24, 24, 24

      - Bin 3: 29, 29, 29, 29


Implementation in Python

import numpy as np

import math

from sklearn.datasets import load_iris

from sklearn import datasets, linear_model, metrics


# load iris data set

dataset = load_iris()

a = dataset.data

b = np.zeros(150)


# take 1st column among 4 column of data set

for i in range (150):

b[i]=a[i,1]


b=np.sort(b) #sort the array


# create bins

bin1=np.zeros((30,5))

bin2=np.zeros((30,5))

bin3=np.zeros((30,5))


# Bin mean

for i in range (0,150,5):

k=int(i/5)

mean=(b[i] + b[i+1] + b[i+2] + b[i+3] + b[i+4])/5

for j in range(5):

bin1[k,j]=mean

print("Bin Mean: \n",bin1)

# Bin boundaries

for i in range (0,150,5):

k=int(i/5)

for j in range (5):

if (b[i+j]-b[i]) < (b[i+4]-b[i+j]):

bin2[k,j]=b[i]

else:

bin2[k,j]=b[i+4]

print("Bin Boundaries: \n",bin2)


# Bin median

for i in range (0,150,5):

k=int(i/5)

for j in range (5):

bin3[k,j]=b[i+2]

print("Bin Median: \n",bin3)

Output:

Bin Mean: 
 [[2.18 2.18 2.18 2.18 2.18]
 [2.34 2.34 2.34 2.34 2.34]
 [2.48 2.48 2.48 2.48 2.48]
 [2.52 2.52 2.52 2.52 2.52]
 [2.62 2.62 2.62 2.62 2.62]
 [2.7  2.7  2.7  2.7  2.7 ]
 [2.74 2.74 2.74 2.74 2.74]
 [2.8  2.8  2.8  2.8  2.8 ]
 [2.8  2.8  2.8  2.8  2.8 ]
 [2.86 2.86 2.86 2.86 2.86]
 [2.9  2.9  2.9  2.9  2.9 ]
 [2.96 2.96 2.96 2.96 2.96]
 [3.   3.   3.   3.   3.  ]
 [3.   3.   3.   3.   3.  ]
 [3.   3.   3.   3.   3.  ]
 [3.   3.   3.   3.   3.  ]
 [3.04 3.04 3.04 3.04 3.04]
 [3.1  3.1  3.1  3.1  3.1 ]
 [3.12 3.12 3.12 3.12 3.12]
 [3.2  3.2  3.2  3.2  3.2 ]
 [3.2  3.2  3.2  3.2  3.2 ]
 [3.26 3.26 3.26 3.26 3.26]
 [3.34 3.34 3.34 3.34 3.34]
 [3.4  3.4  3.4  3.4  3.4 ]
 [3.4  3.4  3.4  3.4  3.4 ]
 [3.5  3.5  3.5  3.5  3.5 ]
 [3.58 3.58 3.58 3.58 3.58]
 [3.74 3.74 3.74 3.74 3.74]
 [3.82 3.82 3.82 3.82 3.82]
 [4.12 4.12 4.12 4.12 4.12]]
Bin Boundaries: 
 [[2.  2.3 2.3 2.3 2.3]
 [2.3 2.3 2.3 2.4 2.4]
 [2.4 2.5 2.5 2.5 2.5]
 [2.5 2.5 2.5 2.5 2.6]
 [2.6 2.6 2.6 2.6 2.7]
 [2.7 2.7 2.7 2.7 2.7]
 [2.7 2.7 2.7 2.8 2.8]
 [2.8 2.8 2.8 2.8 2.8]
 [2.8 2.8 2.8 2.8 2.8]
 [2.8 2.8 2.9 2.9 2.9]
 [2.9 2.9 2.9 2.9 2.9]
 [2.9 2.9 3.  3.  3. ]
 [3.  3.  3.  3.  3. ]
 [3.  3.  3.  3.  3. ]
 [3.  3.  3.  3.  3. ]
 [3.  3.  3.  3.  3. ]
 [3.  3.  3.  3.1 3.1]
 [3.1 3.1 3.1 3.1 3.1]
 [3.1 3.1 3.1 3.1 3.2]
 [3.2 3.2 3.2 3.2 3.2]
 [3.2 3.2 3.2 3.2 3.2]
 [3.2 3.2 3.3 3.3 3.3]
 [3.3 3.3 3.3 3.4 3.4]
 [3.4 3.4 3.4 3.4 3.4]
 [3.4 3.4 3.4 3.4 3.4]
 [3.5 3.5 3.5 3.5 3.5]
 [3.5 3.6 3.6 3.6 3.6]
 [3.7 3.7 3.7 3.8 3.8]
 [3.8 3.8 3.8 3.8 3.9]
 [3.9 3.9 3.9 4.4 4.4]]
Bin Median: 
 [[2.2 2.2 2.2 2.2 2.2]
 [2.3 2.3 2.3 2.3 2.3]
 [2.5 2.5 2.5 2.5 2.5]
 [2.5 2.5 2.5 2.5 2.5]
 [2.6 2.6 2.6 2.6 2.6]
 [2.7 2.7 2.7 2.7 2.7]
 [2.7 2.7 2.7 2.7 2.7]
 [2.8 2.8 2.8 2.8 2.8]
 [2.8 2.8 2.8 2.8 2.8]
 [2.9 2.9 2.9 2.9 2.9]
 [2.9 2.9 2.9 2.9 2.9]
 [3.  3.  3.  3.  3. ]
 [3.  3.  3.  3.  3. ]
 [3.  3.  3.  3.  3. ]
 [3.  3.  3.  3.  3. ]
 [3.  3.  3.  3.  3. ]
 [3.  3.  3.  3.  3. ]
 [3.1 3.1 3.1 3.1 3.1]
 [3.1 3.1 3.1 3.1 3.1]
 [3.2 3.2 3.2 3.2 3.2]
 [3.2 3.2 3.2 3.2 3.2]
 [3.3 3.3 3.3 3.3 3.3]
 [3.3 3.3 3.3 3.3 3.3]
 [3.4 3.4 3.4 3.4 3.4]
 [3.4 3.4 3.4 3.4 3.4]
 [3.5 3.5 3.5 3.5 3.5]
 [3.6 3.6 3.6 3.6 3.6]
 [3.7 3.7 3.7 3.7 3.7]
 [3.8 3.8 3.8 3.8 3.8]
 [4.1 4.1 4.1 4.1 4.1]]












Comments

Popular posts from this blog

Big Data Analytics Programs

  List of Programs for Big Data Analytics   CLICK ON ME 1.  Implement the following Data structures in Java       a)  Linked Lists            b)   Stacks       c)  Queues     d)   Set            e)   Map 2.  Perform setting up and Installing Hadoop in its three operating modes:      Standalone,     Pseudo distributed,     Fully distributed. 3.  Implement the following file management tasks in Hadoop:    a) Adding files and directories    b) Retrieving files    c) Deleting files 4. Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm. 5. Write a Map Reduce program that mines weather data.     Weather sensors collecting data every hour at many locations across the globe gather a large volume of log data, which is a good candidate for analysis with MapReduce since it is semi-structured and record-oriented. 6. Implement Matrix Multiplication with Hadoop Map Reduce 7. Write a MapReduce program to count the occurrence of similar words in a file. Use partitioner to part

How to Install Parrot Operating System in Virtual Box using OVA

Step by Step Process of Parrot OS Installation What is Parrot OS Parrot is a free and open-source Linux system based on Debian that is popular among security researchers, security experts, developers, and privacy-conscious users. It comes with cyber security and digital forensics arsenal that is totally portable. It also includes everything you'll need to make your own apps and protect your online privacy. Parrot is offered in Home and Security Editions, as well as a virtual machine and a Docker image, featuring the KDE and Mate desktop environments. Features of Parrot OS The following are some of the features of Parrot OS that set it apart from other Debian distributions: Tor, Tor chat, I2P, Anonsurf, and Zulu Crypt, which are popular among developers, security researchers, and privacy-conscious individuals, are included as pre-installed development, forensics, and anonymity applications. It has a separate "Forensics Mode" that does not mount any of the system's hard

LAB

 Big Data Analytics Lab Programs 1.        Implement the following Data structures in Java for Linked Lists 2.    Perform setting up and Installing Hadoop in its three operating modes: Standalone, Pseudo distributed, Fully distributed 3.        Implement the following Data structures in Java for Stack 4.        Install and Run Pig then write Pig Latin scripts to sort, group, join, project, and filter your data 5.        Implement the following Data structures in Java for Queues 6.        Write a MapReduce program to search for a specific keyword in a file 7.        Implement the following Data structures in Java for Set 8.      Write a MapReduce program to count the occurrence of similar words in a file. Use partitioner to partition key based on alphabets 9.        Implement the following Data structures in Java for Map 10.   Implement the following file management tasks in Hadoop: 1. Adding files and directories 2. Retrieving files 3. Deleting files 11.    Run a basic Word Count Map R