返回首页
苏宁会员
购物车 0
易付宝
手机苏宁

服务体验

店铺评分与同行业相比

用户评价:----

物流时效:----

售后服务:----

  • 服务承诺: 正品保障
  • 公司名称:
  • 所 在 地:
本店所有商品

  • 全新正版Hadoop应用架构9787564170011东南大学出版社
    • 作者: Mark Grover[等]著著 | Mark Grover[等]著编 | Mark Grover[等]著译 | Mark Grover[等]著绘
    • 出版社: 东南大学出版社
    • 出版时间:2016-10-01
    送至
  • 由""直接销售和发货,并提供售后服务
  • 加入购物车 购买电子书
    服务

    看了又看

    商品预定流程:

    查看大图
    /
    ×

    苏宁商家

    商家:
    如梦图书专营店
    联系:
    • 商品

    • 服务

    • 物流

    搜索店内商品

    商品参数
    • 作者: Mark Grover[等]著著| Mark Grover[等]著编| Mark Grover[等]著译| Mark Grover[等]著绘
    • 出版社:东南大学出版社
    • 出版时间:2016-10-01
    • 版次:1
    • 印刷时间:2017-02-01
    • 字数:490千字
    • 页数:371
    • 开本:小16开
    • ISBN:9787564170011
    • 版权提供:东南大学出版社
    • 作者:Mark Grover[等]著
    • 著:Mark Grover[等]著
    • 装帧:平装-胶订
    • 印次:暂无
    • 定价:89.00
    • ISBN:9787564170011
    • 出版社:东南大学出版社
    • 开本:小16开
    • 印刷时间:2017-02-01
    • 语种:英语
    • 出版时间:2016-10-01
    • 页数:371
    • 外部编号:8898542
    • 版次:1
    • 成品尺寸:暂无

    Foreword
    Preface
    Part Ⅰ. Architectural Considerations for Hadoop Applications
    1. Data Modeling in Hadoop
    Data Storage Options
    Standard File Formats
    Hadoop File Types
    Serialization Formats
    Columnar Formats
    Compression
    HDFS Schema Design
    Location of HDFS Files
    Advanced HDFS Schema Design
    HDFS Schema Design Summary
    HBase Schema Design
    Row Key
    Timestamp
    Hops
    Tables and Regions
    Using Columns
    Using Column Families
    Time-to-Live
    Managing Metadata
    What Is Metadata?
    Why Care About Metadata?
    Where to Store Metadata?
    Examples of Managing Metadata
    Limitations of the Hive Metastore and HCatalog
    Other Ways of Storing Metadata
    Conclusion
    2. Data Movement
    Data Ingestion Considerations
    Timeliness of Data Ingestion
    Incremental Updates
    Access Patterns
    Original Source System and Data Structure
    Transformations
    Network Bottlenecks
    Network Security
    Push or Pull
    Failure Handling
    Level of Complexity
    Data Ingestion Options
    File Transfers
    Considerations for File Transfers versus Other Ingest Methods
    Sqoop: Batch Transfer Between Hadoop and Relational Databases
    Flume: Event-Based Data Collection and Processing
    Kafka
    Data Extraction
    Conclusion
    3. Processing Data in Hadoop
    MapReduce
    MapReduce Overview
    Example for MapReduce
    When to Use MapReduce
    Spark
    Spark Overview
    Overview of Spark Components
    Basic Spark Concepts
    Benefits of Using Spark
    Spark Example
    When to Use Spark
    Abstractions
    Pig
    Pig Example
    When to Use Pig
    Crunch
    Crunch Example
    When to Use Crunch
    Cascading
    Cascading Example
    When to Use Cascading
    Hive
    Hive Overview
    Example of Hive Code
    When to Use Hive
    Impala
    Impala Overview
    Speed-Oriented Design
    Impala Example
    When to Use Impala
    Conclusion
    4. Common Hadoop Processing Patterns
    Pattern: Removing Duplicate Records by Primary Key
    Data Generation for Depcation Example
    Code Example: Spark Depcation in Scala
    Code Example: Depcation in SL
    Pattern: Windowing Analysis
    Data Generation for Windowing Analysis Example
    Code Example: Peaks and Valleys in Spark
    Code Example: Peaks and Valleys in SL
    Pattern: Time Series Modifications
    Use HBase and Versioning
    Use HBase with a RowKey of RecordKey and StartTime
    Use HDFS and Rewrite the Whole Table
    Use Partitions on HDFS for Current and Historical Records
    Data Generation for Time Series Example
    Code Example: Time Series in Spark
    Code Example: Time Series in SL
    Conclusion
    5. Graph Processing on Hadoop
    What Is a Graph?
    What Is Graph Processing?
    How Do You Process a Graph in a Distributed System?
    The Bulk Synchronous Parallel Model
    BSP by Example
    Giraph
    Read and Partition the Data
    Batch Process the Graph with BSP
    Write the Graph Back to Disk
    Putting It All Together
    When Should You Use Giraph?
    GraphX
    Just Another RDD
    GraphX Pregel Interface
    vprog0
    sendMessage0
    mergeMessage0
    Which Tool to Use?
    Conclusion
    6. Orchestration
    Why We Need Workflow Orchestration
    The Limits of Scripting
    The Enterprise Job Scheduler and Hadoop
    Orchestration Frameworks in the Hadoop Ecosystem
    Oozie Terminology
    Oozie Overview
    Oozie Workflow
    Workflow Patterns
    Point-to-Point Workflow
    Fan- Out Workflow
    Capture-and-Decide Workflow
    Parameterizing Workflows
    Classpath Definition
    Scheng Patterns
    Frequency Scheng
    Time and Data Triggers
    Executing Workflows
    Conclusion
    7. Near-Real-Time Processing with Hadoop
    Stream Processing
    Apache Storm
    Storm High-Level Architecture
    Storm Topologies
    Tuples and Streams
    SpousndBlts
    Stream Groupings
    Reliability of Storm Applications
    Exactly-Once Processing
    Fault Tolerance
    Integrating Storm with HDFS
    Integrating Storm with HBase
    Storm Example: Simple Moving Average
    Evaluating Storm
    Trident
    Trident Example: Simple Moving Average
    Evaluating Trident
    Spark Streaming
    Overview of Spark Streaming
    Spark Streaming Example: Simple Count
    Spark Streaming Example: Multiple Inputs
    Spark Streaming Example: Maintaining State
    Spark Streaming Example: Windowing
    Spark Streaming Example: Streaming versus ETL Code
    Evaluating Spark Streaming
    Flume Interceptors
    Which Tool to Use?
    Low-Latency Enrichment, Validation, Alerting, and Ingestion
    NRT Counting, Rolling Averages, and Iterative Processing
    Complex Data Pipelines
    Conclusion
    Part Ⅱ. Case Studies
    8. Clickstream Analysis
    Defining the Use Case
    Using Hadoop for Clickstream Analysis
    Design Overview
    Storage
    Ingestion
    The Client Tier
    The Collector Tier
    Processing
    Data Depcation
    Sessionization
    Analyzing
    Orchestration
    Conclusion
    9. Fraud Detection
    Continuous Improvement
    Taking Action
    Architectural Requirements of Fraud Detection Systems
    Introducing Our Use Case
    High-Level Design
    Client Architecture
    Profile Storage and Retrieval
    Caching
    HBase Data Definition
    Delivering Transaction Status: Approved or Denied?
    Ingest
    Path Between the Client and Flume
    Near-Real-Time and Exploratory Analytics
    Near-Real-Time Processing
    Exploratory Analytics
    What About Other Architectures?
    Flume Interceptors
    Kafka to Storm or Spark Streaming
    External Business Rules Engine
    Conclusion
    10. Data Warehouse
    Using Hadoop for Data Warehousing
    Defining the Use Case
    OLTP Schema
    Data Warehouse: Introduction and Terminology
    Data Warehousing with Hadoop
    High-Level Design
    Data Modeling and Storage
    Ingestion
    Data Processing and Access
    Aggregations
    Data Export
    Orchestration
    Conclusion
    A. Joins in Impala
    Index

    Mark Grover,是Apache Bigtop的代码贡献者以及ApacheSentry的项目管理委员会成员和代码贡献者。Ted Malaska,是Cloude ra的不错应用架构师,帮客户使用Hadoop及其生态系统。

    Jonathan Seidman,是Cloudera的应用架构师,帮合作伙伴把他们的解决方案集成到Cloudera的软件栈中。

    Gwen Shapira,是Cloudera的应用架构师,在为客户设计可扩展的数据架构方面有15年的经验。



    在使用Apache Hadoop设计端到端数据管理解决方案时获得专家级指导。当很多渠道还停留在解释Hadoop生态系统中该如何使用各种纷繁复杂的组件时,这本专注实践的书已带领你从架构的整体角度思考,它对于你的应用场景而言是必不可少的,将所有组件紧密结合在一起,形成完整有针对的应程序。

    为了学习效果,本书第二部分提供了各种详细的架构案例.涵盖部分*常见的Hadoop应用场景。

    无论你是在设计一个新的Hadoop应用还是正计划将 Hadoop整合到现有的数据基础架构中,Mark Grover 、Ted Malaska、Jonathan Seidman、Gwen Shapira编*的《Hadoop应用架构()(英文版) 》都将在这整个过程提技巧的指导。

    使用Hadoop存放数据和建模数据时需要考虑的要素 在系统中导入数据和从系统中导出数据的*佳实践指导 数据处理的框架,包括MapReduce、Spark和 Hive 常用Hadoop处理模式,例如移除重复记录和使用窗口分析 Giraph,GraphX以及Hadoop上的大图片处理工具 使用工作流协作和调度工具,例如Apache Oozie 使用Apache Storm、Apache Spark Streaming 和Apache Flume处理准实时数据流 点击流分析、欺诈防止和数据仓库的架构实例 

    售后保障

    最近浏览

    猜你喜欢

    该商品在当前城市正在进行 促销

    注:参加抢购将不再享受其他优惠活动

    x
    您已成功将商品加入收藏夹

    查看我的收藏夹

    确定

    非常抱歉,您前期未参加预订活动,
    无法支付尾款哦!

    关闭

    抱歉,您暂无任性付资格

    此时为正式期SUPER会员专享抢购期,普通会员暂不可抢购