Tathagata Das Book

Language: en
Pages: 400

Learning Spark

Author(s): Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee

Categories: Computers

Type: Book
-
Published: 2020-07-16
-
Publisher: O'Reilly Media

Data is bigger, arrives faster, and comes in a variety of formats—and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you’ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow

Language: en
Pages: 276

Learning Spark

Author(s): Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia

Categories: Computers

Type: Book
-
Published: 2015-01-28
-
Publisher: "O'Reilly Media, Inc."

This book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. You'll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.--

Language: en
Pages: 275

High-Performance Big Data Computing

Author(s): Dhabaleswar K. Panda, Xiaoyi Lu, Dipti Shankar

Categories: Computers

Type: Book
-
Published: 2022-08-02
-
Publisher: MIT Press

An in-depth overview of an emerging field that brings together high-performance computing, big data processing, and deep lLearning. Over the last decade, the exponential explosion of data known as big data has changed the way we understand and harness the power of data. The emerging field of high-performance big data computing, which brings together high-performance computing (HPC), big data processing, and deep learning, aims to meet the challenges posed by large-scale data processing. This book offers an in-depth overview of high-performance big data computing and the associated technical issues, approaches, and solutions. The book covers basic concepts and necessary background knowledge, ...

Language: en
Pages: 383

Delta Lake: The Definitive Guide

Author(s): Denny Lee, Tristen Wentling, Scott Haines, Prashanth Babu

Categories: Computers

Type: Book
-
Published: 2024-10-30
-
Publisher: "O'Reilly Media, Inc."

Ready to simplify the process of building data lakehouses and data pipelines at scale? In this practical guide, learn how Delta Lake is helping data engineers, data scientists, and data analysts overcome key data reliability challenges with modern data engineering and management techniques. Authors Denny Lee, Tristen Wentling, Scott Haines, and Prashanth Babu (with contributions from Delta Lake maintainer R. Tyler Croy) share expert insights on all things Delta Lake--including how to run batch and streaming jobs concurrently and accelerate the usability of your data. You'll also uncover how ACID transactions bring reliability to data lakehouses at scale. This book helps you: Understand key data reliability challenges and how Delta Lake solves them Explain the critical role of Delta transaction logs as a single source of truth Learn the Delta Lake ecosystem with technologies like Apache Flink, Kafka, and Trino Architect data lakehouses with the medallion architecture Optimize Delta Lake performance with features like deletion vectors and liquid clustering

Language: en
Pages: 453

Stream Processing with Apache Spark

Author(s): Gerard Maas, Francois Garillot

Categories: Computers

Type: Book
-
Published: 2019-06-05
-
Publisher: O'Reilly Media

Before you can build analytics tools to gain quick insights, you first need to know how to process data in real time. With this practical guide, developers familiar with Apache Spark will learn how to put this in-memory framework to use for streaming data. You’ll discover how Spark enables you to write streaming jobs in almost the same way you write batch jobs. Authors Gerard Maas and François Garillot help you explore the theoretical underpinnings of Apache Spark. This comprehensive guide features two sections that compare and contrast the streaming APIs Spark now supports: the original Spark Streaming library and the newer Structured Streaming API. Learn fundamental stream processing co...

Language: en
Pages: 1128

Advances in Radiation Oncology in Lung Cancer

Author(s): Branislav Jeremić

Categories: Science

Type: Book
-
Published: 2023-08-14
-
Publisher: Springer Nature

This is the third, completely updated edition of a comprehensive book in which many of the world’s leading lung cancer specialists discuss the recent advances in the radiation oncology of lung cancer and reflect on the latest research findings in lung cancer and other intrathoracic malignancies. Lung cancer remains the major cancer killer in both sexes worldwide. It is so despite significant progress in recent decades in both diagnostic and treatment approaches. New biological and technological advances in this field are now faster incorporated in the overall decision-making process and are bringing fast and substantial improvements in both survivals and quality of life of lung cancer pati...

Language: en
Pages: 290

Big Data Analytics with Spark

Author(s): Mohammed Guller

Categories: Computers

Type: Book
-
Published: 2015-12-29
-
Publisher: Apress

Big Data Analytics with Spark is a step-by-step guide for learning Spark, which is an open-source fast and general-purpose cluster computing framework for large-scale data analysis. You will learn how to use Spark for different types of big data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. In addition, this book will help you become a much sought-after Spark expert. Spark is one of the hottest Big Data technologies. The amount of data generated today by devices, applications and users is exploding. Therefore, there is a critical need for tools that can analyze large-scale data and unlock value from it. Spark is a powerful tech...

Language: en
Pages: 454

Fundamentals of Data Engineering

Author(s): Joe Reis, Matt Housley

Categories: Computers

Type: Book
-
Published: 2022-06-22
-
Publisher: "O'Reilly Media, Inc."

Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, ...

Language: en
Pages: 556

Mastering Java Machine Learning

Author(s): Dr. Uday Kamath, Krishna Choppella

Categories: Computers

Type: Book
-
Published: 2017-07-11
-
Publisher: Packt Publishing Ltd

Become an advanced practitioner with this progressive set of master classes on application-oriented machine learning About This Book Comprehensive coverage of key topics in machine learning with an emphasis on both the theoretical and practical aspects More than 15 open source Java tools in a wide range of techniques, with code and practical usage. More than 10 real-world case studies in machine learning highlighting techniques ranging from data ingestion up to analyzing the results of experiments, all preparing the user for the practical, real-world use of tools and data analysis. Who This Book Is For This book will appeal to anyone with a serious interest in topics in Data Science or those...

Language: en
Pages: 283

Cost-Effective Data Pipelines

Author(s): Sev Leonard

Categories: Computers

Type: Book
-
Published: 2023-07-13
-
Publisher: "O'Reilly Media, Inc."

The low cost of getting started with cloud services can easily evolve into a significant expense down the road. That's challenging for teams developing data pipelines, particularly when rapid changes in technology and workload require a constant cycle of redesign. How do you deliver scalable, highly available products while keeping costs in check? With this practical guide, author Sev Leonard provides a holistic approach to designing scalable data pipelines in the cloud. Intermediate data engineers, software developers, and architects will learn how to navigate cost/performance trade-offs and how to choose and configure compute and storage. You'll also pick up best practices for code develop...

Welcome to our book review site go-pdf.online!

Learning Spark

Learning Spark

High-Performance Big Data Computing

Delta Lake: The Definitive Guide

Stream Processing with Apache Spark

Advances in Radiation Oncology in Lung Cancer

Big Data Analytics with Spark

Fundamentals of Data Engineering

Mastering Java Machine Learning

Cost-Effective Data Pipelines

Recently Searched