Preface
Chapter 1:Big Data Analytics with Java
Why data analytics on big data?
Big data for analytics
Big data - a bigger pay package for Java developers
Basics of Hadoop - a Java sub-project
Distributed computing on Hadoop
HDFS concepts
Design and architecture of HDFS
Main components of HDFS
HDFS simple commands
Apache Spark
Concepts
Transformations
Actions
Spark Java API
Spark samples using Java 8
Loading data
Data oraios - cleansing and munging
Analyzing data - count, projection, grouping, aggregation, and max/min
Actions on RDDs
Paired RDDs
Saving data
Collecting and printing results
Executing Spark programs on Hadoop
Apache Spark sub-projects
Spark machine learning modules
Mahou- ppular Java ML library
Deeplearning4j - a deep learning library
Summary
Chapter 2: First Steps in Data Analysis
Datasets
Data cleaning and munging
Basic analysis of data with Spark SL
Building SparkConf and context
Dataframe and datasets
Load and parse data
Analyzing data - the Spark-SL way
Spark SL for data exploration and analytics
Market basket analysis - Apriori algorithm
Implementation of the Apriori algorithm in Apache Spark
Efficient market basket analysis using FP-Growth algorithm
Running FP-Growth on Apache Spark
Summary
Chapter 3: Data Visualization
Data visualization with Java JFreeChart
Using charts in big data analytics
Time Series chart
All India seasonal and annual average temperature series dataset
Simple single Time Series chart
Multiple Time Series on a single chart window
Bar charts
Histograms
When would you use a histogram?
How to make histograms using JFreeChart?
Line charts
Scatter plots
Box plots
Advanced visualization technique
Prefuse
IVTK Graph toolkit
Other libraries
Summary
Chapter 4: Basics of Machine Learning
What is machine learning?
Real-life examples of machine learning
Type of machine learning
A small sample case study of supervised and unsupervised learning
Steps for machine learning problems
Choosing the machine learning model
What are the feature types that can be extracte fo the datasets?
How do you select the best features to train your models?
How do you run machine learning analytics on big data?
Getting and preparing data in Hadoop
Training and storing models on big data
Apache Spark machine learning API
Summary
Chapter 5: Regression on Big Data
Linear regression
What is simple linear regression?
Where is linear regression used?
Logistic regression
Which mathematical functions does logistic regression use?
Where is logistic regression used?
Predicting heart disease using logistic regression
Summary
Chapter 6: Naive Bayes and Sentiment Analysis
Conditional probability
Bayes theorem
Naive Bayes algorithm
Advantages of Naive Bayes
Disadvantages of Naive Bayes
Sentimental analysis
Concepts for sentimental analysis
Tokenization
Stop words removal
Stemming
N-grams
Term presence and Term Frequency
TF-F
Bag of words
Dataset
Data exploration of text data
Sentimental analysis on this dataset
SVM or Support Vector Machine
Summary
Chapter 7: Decision Trees
What is a decision tree?
Building a decision tree
Choosin te est features for splitting the datasets
Dataset
Data exploration
Cleaning and munging the data
Training and testing the model
Summary
Chapter 8: Ensembling on Big Data
Ensembling
Types of ensembling
Bagging
Boosting
Advantages and disadvantages of ensembling
Random forests
Gradient boosted trees (GBTs)
Classification problem and dataset used
Data exploration
Training and testing our random forest model
Training and testing our gradient boosted tree model
Summary
Chapter 9: Recommendation Systems
Recommendation systems and their types
Content-based recommendation systems
Dataset
Content-based recommender on MovieLens dataset
Collaborative recommendation systems
Advantages
Disadvantages
Alternating least square - collaborative filtering
Summary
Chapter 10: Clustering and Customer Segmentation on Big Data
Clustering
Types of clustering
Hierarchical clustering
K-means clustering
Bisecting k-means clustering
Customer segmentation
Dataset
Data exploration
Clustering for customer segmentation
Changing the clustering algorithm
Summary
Chapter 11: Massive Graphs on Big Data
Refresher on graphs
Representing graphs
Common terminology on graphs
Common algorithms on graphs
Plotting graphs
Massive graphs on big data
Graph analytics
GraphFrames
Building a graph using GraphFrames
Graph analytics on airports and their ihts
Datasets
Graph analytics on ihts data
Summary
Chapter 12: Real-Time Analytics on Big Data
Real-time analytics
Big data stack for real-time analytics
Real-time SL queries on big data
Real-time data ingestion and storage
Real-time data processing
Real-time SL queries using Impala
Flight delay analysis using Impala
Apache Kafka
Spark Streaming
Trending videos
Summary
Chapter 13: Deep Learning Using Big Data
Introduction to neural networks
Perceptron
Problems with perceptrons
Sigmoid neuron
Multi-layer perceptrons
Accuracy of multi-layer perceptrons
Deep learning
Advantages and use cases of deep learning
Flower species classification using multi-Layer perceptrons
Deeplearning4j
Hand written digit recognizition using CNN
Diving into the code:
Summary
Index