Edward Capriolo Book

Language: en
Pages: 351

Programming Hive

Author(s): Edward Capriolo, Dean Wampler, Jason Rutherglen

Categories: Computers

Type: Book
-
Published: 2012-09-26
-
Publisher: "O'Reilly Media, Inc."

Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem. This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data. Use Hive to create, alter, and drop data...

Language: en
Pages: 350

Programming Hive

Author(s): Edward Capriolo, Dean Wampler, Jason Rutherglen

Categories: Computers

Type: Book
-
Published: 2012-09-19
-
Publisher: "O'Reilly Media, Inc."

Language: en
Pages: 251

Big Data Analytics: Applications, Hadoop Technologies and Hive

Author(s): Dr.P.Pushpa, Dr.V.Thamilarasi, Dr. S. Lakshmi Prabha, Mrs.Sudha Nagarajan

Categories: Computers

Type: Book
-
Published: 2024-04-22
-
Publisher: Leilani Katie Publication

Dr.P.Pushpa, Lecturer, School of Software Engineering, East China University of Technology, Nanchang, Jiangxi, China. Dr.V.Thamilarasi, Assistant Professor, Department of Computer Science, Sri Sarada College for Women(Autonomous), Salem, Tamil Nadu, India. Dr. S. Lakshmi Prabha, Associate Professor, Department of Computer Science, Seethalakshmi Ramaswami College, Tiruchirappalli, Tamil Nadu, India. Mrs.Sudha Nagarajan, Assistant Professor, Department of Computer Science, Excel College for Commerce and Science, Komarapalayam, Namakkal, Tamil Nadu, India.

Language: en
Pages: 391

Delta Lake: The Definitive Guide

Author(s): Denny Lee, Tristen Wentling, Scott Haines, Prashanth Babu

Categories: Computers

Type: Book
-
Published: 2024-10-30
-
Publisher: "O'Reilly Media, Inc."

Ready to simplify the process of building data lakehouses and data pipelines at scale? In this practical guide, learn how Delta Lake is helping data engineers, data scientists, and data analysts overcome key data reliability challenges with modern data engineering and management techniques. Authors Denny Lee, Tristen Wentling, Scott Haines, and Prashanth Babu (with contributions from Delta Lake maintainer R. Tyler Croy) share expert insights on all things Delta Lake--including how to run batch and streaming jobs concurrently and accelerate the usability of your data. You'll also uncover how ACID transactions bring reliability to data lakehouses at scale. This book helps you: Understand key data reliability challenges and how Delta Lake solves them Explain the critical role of Delta transaction logs as a single source of truth Learn the Delta Lake ecosystem with technologies like Apache Flink, Kafka, and Trino Architect data lakehouses with the medallion architecture Optimize Delta Lake performance with features like deletion vectors and liquid clustering

Language: en
Pages: 307

Cassandra High Performance Cookbook

Author(s): Edward Capriolo

Categories: Computers

Type: Book
-
Published: 2011
-
Publisher: Packt Pub Limited

This is a cookbook and all tasks are approached as recipes. A recipe describes a task and outlines the steps necessary to complete this task. Some recipes in the book are examples of writing code. An example of this is a recipe that stores and accesses the entries of a phone book in Cassandra. The recipe consists of a description of the program, a full code example is given, the example is run, the output is displayed, and finally the how it works section describes the process or code in greater detail. Other recipes in the book describe a task. An example of this is a recipe that takes a snapshot back up of data in Cassandra. This recipe contains a description of the process, it then shows ...

Language: en
Pages: 756

Hadoop

Author(s): Tom E. White

Categories: Computers

Type: Book
-
Published: 2015
-
Publisher: "O'Reilly Media, Inc."

''Offers information on how to build and maintain reliable, scalable, distributed systems with Apache Hadoop covering such topics as MapReduce, HDFS, YARN, Avro for data serialization, Parquet for nested data, and data ingestion tools Flume and Sqoop.''--

Language: en
Pages: 505

Professional Hadoop Solutions

Author(s): Boris Lublinsky, Kevin T. Smith, Alexey Yakubovich

Categories: Computers

Type: Book
-
Published: 2013-09-12
-
Publisher: John Wiley & Sons

The go-to guidebook for deploying Big Data solutions with Hadoop Today's enterprise architects need to understand how the Hadoop frameworks and APIs fit together, and how they can be integrated to deliver real-world solutions. This book is a practical, detailed guide to building and implementing those solutions, with code-level instruction in the popular Wrox tradition. It covers storing data with HDFS and Hbase, processing data with MapReduce, and automating data processing with Oozie. Hadoop security, running Hadoop with Amazon Web Services, best practices, and automating Hadoop processes in real time are also covered in depth. With in-depth code examples in Java and XML and the latest on ...

Language: en
Pages: 288

Data Analytics with Hadoop

Author(s): Benjamin Bengfort, Jenny Kim

Categories: Computers

Type: Book
-
Published: 2016-06
-
Publisher: "O'Reilly Media, Inc."

Ready to use statistical and machine-learning techniques across large data sets? This practical guide shows you why the Hadoop ecosystem is perfect for the job. Instead of deployment, operations, or software development usually associated with distributed computing, you’ll focus on particular analyses you can build, the data warehousing techniques that Hadoop provides, and higher order data workflows this framework can produce. Data scientists and analysts will learn how to perform a wide range of techniques, from writing MapReduce and Spark applications with Python to using advanced modeling and data management with Spark MLlib, Hive, and HBase. You’ll also learn about the analytical pr...

Language: en
Pages: 331

Learning Apache Drill

Author(s): Charles Givre, Paul Rogers

Categories: Computers

Type: Book
-
Published: 2018-11-02
-
Publisher: O'Reilly Media

Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databases. Drill works everywhere: on your laptop or in your largest cluster. In this practical book, Drill committers Charles Givre and Paul Rogers show analysts and data scientists how to query and analyze raw data using this powerful tool. Data scientists today spend about 80% of their time just gathering and cleaning data. With this book, you’ll learn how Dri...

Language: en
Pages: 633

Architecting Modern Data Platforms

Author(s): Jan Kunigk, Ian Buss, Paul Wilkinson, Lars George

Categories: Computers

Type: Book
-
Published: 2018-12-05
-
Publisher: O'Reilly Media

There’s a lot of information about big data technologies, but splicing these technologies into an end-to-end enterprise data platform is a daunting task not widely covered. With this practical book, you’ll learn how to build big data infrastructure both on-premises and in the cloud and successfully architect a modern data platform. Ideal for enterprise architects, IT managers, application architects, and data engineers, this book shows you how to overcome the many challenges that emerge during Hadoop projects. You’ll explore the vast landscape of tools available in the Hadoop and big data realm in a thorough technical primer before diving into: Infrastructure: Look at all component lay...

Welcome to our book review site go-pdf.online!

Programming Hive

Programming Hive

Big Data Analytics: Applications, Hadoop Technologies and Hive

Delta Lake: The Definitive Guide

Cassandra High Performance Cookbook

Hadoop

Professional Hadoop Solutions

Data Analytics with Hadoop

Learning Apache Drill

Architecting Modern Data Platforms

Recently Searched