Skip to main content

Posts

Hadoop Mode of Operations

  Aim:         (i) Perform setting up and Installing Hadoop in its three operating modes: Standalone, Pseudo distributed, Fully distributed         (ii)Use web-based tools to monitor your Hadoop setup Ans ) It is an open-source framework written in java that is used to store, analyze and process huge amounts of data in a distributed environment across clusters of computers in an efficient manner. It provides the capability to process distributed data using a simplified programming model. It is used by Google, Facebook, yahoo, youtube, Twitter, etc. It is developed by Doug Cutting at Yahoo in 2006 which is inspired by Google File System and Google Map Reduce algorithm. It is a file system provided by Linux to store the data. Operational modes of configuring Hadoop cluster Hadoop can be run in one of the three supported modes   1) Local(Standalone) mode -By default, Hadoop is configured to run in a single-node, non-distributed mode, as a single Java process. This is u

Credit Card Fraud Detection project

Before going to the code it is requested to work on a jupyter notebook. If not installed on your machine you can use  Google colab . You can download the dataset from  this link If the link is not working please go to  this  link and log in to kaggle to download the dataset.  Importing all the necessary Libraries # import the necessary packages import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from matplotlib import gridspec Loading the Dataset # Load the dataset from the csv file using pandas # best way is to mount the drive on colab and # copy the path for the csv file data = pd.read_csv( "credit.csv" ) Understanding the dataset # Grab a peek at the data data.head() Describing the Data # Print the shape of the data # data = data.sample(frac = 0.1, random_state = 48) print (data.shape) print (data.describe()) Imbalance in the data # Determine number of fraud cases in dataset fraud = data[data[ 'Class' ] = = 1 ] vali