Simon Walkowiak,a cognitive neuroscientist and a managing director of Mind Project Ltd - a Big Data and Predictive Analytics consultancy based in London, United Kingdom. As a former data curator at the UK Data Service (UKDS,&null
无
Preface Chapter 1:The Era of Big Data Big Data - The monster re-defined Big Data toolbox - dealing with the giant Hadoop - the elephant in the room Databases Hadoop Spark-ed up R- The unsung Big Data hero Summary Chapter 2:Introduction to R Programming Language and Statistical Environment Learning R Revisiting R basics Getting R and RStudio ready Setting the URLs to R repositories R data structures Vectors Scalars Matrices Arrays Data frames Lists Exporting R data objects Applied data science with R Importing data from different formats Exploratory Data Analysis Data aggregations and contingency tables Hypothesis testing and statistical inference Tests of differences Independent t-test example (with power and effect size estimates) ANOVA example Tests of relationships An example of Pearson's r correlations Multiple regression example Data visualization packages Summary Chapter 3:Unleashing the Power of R from Within Traditional limitations of R Out-of-memory data Processing speed To the memory limits and beyond Data transformations and aggregations with the ff and ffbase packages Generalized linear models with the ff and ffbase packages Logistic regression example with ffbase and biglm Expanding memory with the bigmemory package Parallel R From bigmemory to faster computations An apply() example with the big.matrix object A for() loop example with the ffdf object Using apply() and for() loop examples on a data.frame A parallel package example A foreach package example The future of parallel processing in R Utilizing Graphics Processing Units with R Multi-threading with Microsoft R Open distribution Parallel machine learning with H20 and R Boosting R performance with the data.table package and other tools Fast data import and manipulation with the data.table package Data import with data.table Lightning-fast subsets and aggregations on data.table Chaining, more complex aggregations, and pivot tables with data.table Writing better R code Summary Chapter 4:Hadoop and MapReduce Framework for R Hadoop architecture Hadoop Distributed File System MapReduce framework A simple MapReduce word count example Other Hadoop native tools Learning Hadoop A single-node Hadoop in Cloud Deploying Hortonworks Sandbox on Azure A word count example in Hadoop using Java A word count example in Hadoop using the R language RStudio Server on a Linux RedHat/CentOS virtual machine Installing and configuring RHadoop packages HDFS management and MapReduce in R - a word count example HDInsight - a multi-node Hadoop cluster on Azure Creating your first HDInsight cluster Creating a new Resource Group Deploying a Virtual Network Creating a Network Security Group Setting up and configuring an HDInsight cluster Starting the cluster and exploring Ambari Connecting to the HDInsight cluster and installing RStudio Server Adding a new inbound security rule for port 8787 Editing the Virtual Network's public IP address for the head node Smart energy meter readings analysis example - using R on HDInsight cluster Summary Chapter 5:R with Relational Database Management Systems (RDBMSs) Relational Database Management Systems (RDBMSs) A short overview of used RDBMSs Structured Query Language (SQL) SQLite with R Preparing and importing data into a local SQLite database Connecting to SQLite from RStudio MariaDB with R on a Amazon EC2 instance Preparing the EC2 instance and RStudio Server for use Preparing MariaDB and data for use Working with MariaDB from RStudio PostgreSQL with R on Amazon RDS Launching an Amazon RDS database instance Preparing and uploading data to Amazon RDS Remotely querying PostgreSQL on Amazon RDS from RStudio Summary Chapter 6:R with Non-Relational (NoSQL) Databases Introduction to NoSQL databases Review of leading non-relational databases MongoDB with R Introduction to MongoDB MongoDB data models Installing MongoDB with R on Amazon EC2 Processing Big Data using MongoDB with R Importing data into MongoDB and basic MongoDB commands MongoDB with R using the rmongodb package MongoDB with R using the RMongo package MongoDB with R using the mongolite package HBase with R Azure HDInsight with HBase and RStudio Server Importing the data to HDFS and HBase Reading and querying HBase using the rhbase package Summary Chapter 7:Faster than Hadoop - Spark with R Spark for Big Data analytics Spark with R on a multi-node HDInsight cluster Launching HDInsight with Spark and R/RStudio Reading the data into HDFS and Hive Getting the data into HDFS Importing data from HDFS to Hive Bay Area Bike Share analysis using SparkR Summary Chapter 8:Machine Learning Methods for Big Data in R What is machine learning? Supervised and unsupervised machine learning methods Classification and clustering algorithms Machine learning methods with R Big Data machine learning tools GLM example with Spark and R on the HDInsight cluster Preparing the Spark cluster and reading the data from HDFS Logistic regression in Spark with R Naive Bayes with H20 on Hadoop with R Running an H2O instance on Hadoop with R Reading and exploring the data in H2O Naive Bayes on H2O with R Neural Networks with H2O on Hadoop with R How do Neural Networks work? Running Deep Learning models on H20 Summary Chapter 9:The Future of R - Big, Fast, and Smart Data The current state of Big Data analytics with R Out-of-memory data on a single machine Faster data processing with R Hadoop with R Spark with R R with databases Machine learning with R The future of R Big Data Fast data Smart data Where to go next Summary Index