You may have to Search all our reviewed books and magazines, click the sign up button below to create a free account.
Before you can build analytics tools to gain quick insights, you first need to know how to process data in real time. With this practical guide, developers familiar with Apache Spark will learn how to put this in-memory framework to use for streaming data. You’ll discover how Spark enables you to write streaming jobs in almost the same way you write batch jobs. Authors Gerard Maas and François Garillot help you explore the theoretical underpinnings of Apache Spark. This comprehensive guide features two sections that compare and contrast the streaming APIs Spark now supports: the original Spark Streaming library and the newer Structured Streaming API. Learn fundamental stream processing co...
Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way. Expanded from Tyler Akidau’s popular blog posts "Streaming 101" and "Streaming 102", this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You’ll also dive deep into watermarks and exactly-once proc...
This book presents an end-to-end architecture for demand-based data stream gathering, processing, and transmission. The Internet of Things (IoT) consists of billions of devices which form a cloud of network connected sensor nodes. These sensor nodes supply a vast number of data streams with massive amounts of sensor data. Real-time sensor data enables diverse applications including traffic-aware navigation, machine monitoring, and home automation. Current stream processing pipelines are demand-oblivious, which means that they gather, transmit, and process as much data as possible. In contrast, a demand-based processing pipeline uses requirement specifications of data consumers, such as failu...
Organizations today often struggle to balance business requirements with ever-increasing volumes of data. Additionally, the demand for leveraging large-scale, real-time data is growing rapidly among the most competitive digital industries. Conventional system architectures may not be up to the task. With this practical guide, you’ll learn how to leverage large-scale data usage across the business units in your organization using the principles of event-driven microservices. Author Adam Bellemare takes you through the process of building an event-driven microservice-powered organization. You’ll reconsider how data is produced, accessed, and propagated across your organization. Learn power...
Develop production-ready ETL pipelines by leveraging Python libraries and deploying them for suitable use cases Key Features Understand how to set up a Python virtual environment with PyCharm Learn functional and object-oriented approaches to create ETL pipelines Create robust CI/CD processes for ETL pipelines Purchase of the print or Kindle book includes a free PDF eBook Book DescriptionModern extract, transform, and load (ETL) pipelines for data engineering have favored the Python language for its broad range of uses and a large assortment of tools, applications, and open source components. With its simplicity and extensive library support, Python has emerged as the undisputed choice for d...
Software development today is embracing events and streaming data, which optimizes not only how technology interacts but also how businesses integrate with one another to meet customer needs. This phenomenon, called flow, consists of patterns and standards that determine which activity and related data is communicated between parties over the internet. This book explores critical implications of that evolution: What happens when events and data streams help you discover new activity sources to enhance existing businesses or drive new markets? What technologies and architectural patterns can position your company for opportunities enabled by flow? James Urquhart, global field CTO at VMware, g...
Due to the increasing need to solve complex problems, high-performance computing (HPC) is now one of the most fundamental infrastructures for scientific development in all disciplines, and it has progressed massively in recent years as a result. HPC facilitates the processing of big data, but the tremendous research challenges faced in recent years include: the scalability of computing performance for high velocity, high variety and high volume big data; deep learning with massive-scale datasets; big data programming paradigms on multi-core; GPU and hybrid distributed environments; and unstructured data processing with high-performance computing. This book presents 19 selected papers from the TopHPC2017 congress on Advances in High-Performance Computing and Big Data Analytics in the Exascale era, held in Tehran, Iran, in April 2017. The book is divided into 3 sections: State of the Art and Future Scenarios, Big Data Challenges, and HPC Challenges, and will be of interest to all those whose work involves the processing of Big Data and the use of HPC.
This book focuses on big data in business intelligence, data management, machine learning, cloud computing, and smart cities. It also provides an interdisciplinary platform to present and discuss recent innovations, trends, and concerns in the fields of big data and analytics. Big Data Analysis for Green Computing: Concepts and Applications presents the latest technologies and covers the major challenges, issues, and advances of big data and data analytics in green computing. It explores basic as well as high-level concepts. It also includes the use of machine learning using big data and discusses advanced system implementation for smart cities. The book is intended for business and management educators, management researchers, doctoral scholars, university professors, policymakers, and higher academic research organizations.
Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build on top of the Google Cloud Platform (GCP). This hands-on guide shows developers entering the data science field how to implement an end-to-end data pipeline, using statistical and machine learning methods and tools on GCP. Through the course of the book, you’ll work through a sample business decision by employing a variety of data science approaches. Follow along by implementing these statistical and machine learning solutions in your own project on GCP, and discover how this platform provides a transformative and more collaborative way of doing data science. You’ll ...