Preface
Part Ⅰ.Gentle Overview of Big Data and Spark
1.What Is Apache Spark?
Apache Sparks Philosophy
Context:The Big Data Problem
History of Spark
The Present and Future of Spark
Running Spark
Downloading Spark Locally
Launching Sparks Interactive Consoles
Running Spark in the Cloud
Data Used in This Book
2.A Gentle Introduction to Spark
Sparks Basic Architecture
Spark Applications
Sparks Language APIs
Sparks APIs
Starting Spark
The SparkSession
DataFrames
Partitions
Transformations
Lazy Evaluation
Actions
Spark UI
An End-to-End Example
DataFrames and SL
Conclusion
3.A Tour of Sparks Toset
Running Production Applications
Datasets:Type-Safe Structured APIs
Structured Streaming
Machine Learning and Advanced Analytics
Lower-Level APIs
SparkR
Sparks Ecosystem and Packages
Conclusion
Part Ⅱ.Structured APls-DataFrames,SL,and Datasets
4.Structured API Overview
DataFrames and Datasets
Schemas
Overview of Structured Spark Types
DataFrames Versus Datasets
Columns
Rows
Spark Types
Overview of Structured API Execution
Logical Planning
Physical Planning
Execution
Conclusion
5.Basic Structured Oraios
Schemas
Columns and Expressions
Columns
Expressions
Records and Rows
Creating Rows
DataFrame Transformations
Creating DataFrames
select and selectExpr
Converting to Spark Types (Literals)
Adding Columns
……
6.Working with Different Types of Data
7.Aggregations
8.Joins
9.Data Sources
10.Spark SL
11.Datasets
Part Ⅲ.Low-Level APIs
12.Resilient Distributed Datasets(RDDs)
13.Advanced RDDs
14.Distributed Shared Variables
Part Ⅳ.Production Applications
15.HowSparkRunson a Cluster
16.Developing Spark Applications
17.Deploying Spark
18.Monitoring and Debugging
19.Performance Tuning
Part Ⅴ.Streaming
20.Stream Processing Fundamentals
21.Structured Streaming Basics
22.Event-Time and Stateful Processing
.Structured Streaming in Production
Part Ⅵ.Advanced Analytics and Machine Learning
24.Advanced Analytics and Machine Learning Overview
25.Preprocessing and Feature Engineering
26.Classification
27.Regression
28.Recommendation
29.Unsupervised Learning
30.Graph Analytics
31.Deep Learning
Part Ⅶ.Ecosystem
32.Language Specifics:Python(PySpark)and R(SparkR and sparklyr)
33.Ecosystem and Community
Index