You may have to Search all our reviewed books and magazines, click the sign up button below to create a free account.
Incomplete data is part of life and almost all areas of scientific studies. Users tend to skip certain fields when they fill out online forms; participants choose to ignore sensitive questions on surveys; sensors fail, resulting in the loss of certain readings; publicly viewable satellite map services have missing data in many mobile applications; and in privacy-preserving applications, the data is incomplete deliberately in order to preserve the sensitivity of some attribute values. Query processing is a fundamental problem in computer science, and is useful in a variety of applications. In this book, we mostly focus on the query processing over incomplete databases, which involves finding ...
Generative adversarial networks (GANs) were introduced by Ian Goodfellow and his co-authors including Yoshua Bengio in 2014, and were to referred by Yann Lecun (Facebook’s AI research director) as “the most interesting idea in the last 10 years in ML.” GANs’ potential is huge, because they can learn to mimic any distribution of data, which means they can be taught to create worlds similar to our own in any domain: images, music, speech, prose. They are robot artists in a sense, and their output is remarkable – poignant even. In 2018, Christie’s sold a portrait that had been generated by a GAN for $432,000. Although image generation has been challenging, GAN image generation has p...
This book constitutes the proceedings of the Second International Conference on Spatial Data and Intelligence, SpatialDI 2021, which was held during April 22-24, 2021 in Hangzhou, China. The 14 full papers and 7 short papers presented in this volume were carefully reviewed and selected from 72 submissions. They are organized in the topical sections named: traffic management, data science, and city analysis.
Entity Resolution (ER) lies at the core of data integration and cleaning and, thus, a bulk of the research examines ways for improving its effectiveness and time efficiency. The initial ER methods primarily target Veracity in the context of structured (relational) data that are described by a schema of well-known quality and meaning. To achieve high effectiveness, they leverage schema, expert, and/or external knowledge. Part of these methods are extended to address Volume, processing large datasets through multi-core or massive parallelization approaches, such as the MapReduce paradigm. However, these early schema-based approaches are inapplicable to Web Data, which abound in voluminous, noi...
Data profiling refers to the activity of collecting data about data, {i.e.}, metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for a particular task at hand. Data profiling results are also important in a variety of other situations, including query optimization, data integration, and data cleaning. Simple metadata are statistics, such as the number of rows and columns, schema and datatype information, the number of distinct values, statistical value distributions, and the number of null or empty values in each column. More...
Graph data modeling and querying arises in many practical application domains such as social and biological networks where the primary focus is on concepts and their relationships and the rich patterns in these complex webs of interconnectivity. In this book, we present a concise unified view on the basic challenges which arise over the complete life cycle of formulating and processing queries on graph databases. To that purpose, we present all major concepts relevant to this life cycle, formulated in terms of a common and unifying ground: the property graph data model—the pre-dominant data model adopted by modern graph database systems. We aim especially to give a coherent and in-depth perspective on current graph querying and an outlook for future developments. Our presentation is self-contained, covering the relevant topics from: graph data models, graph query languages and graph query specification, graph constraints, and graph query processing. We conclude by indicating major open research challenges towards the next generation of graph data management systems.
Large-scale data analytics using machine learning (ML) underpins many modern data-driven applications. ML systems provide means of specifying and executing these ML workloads in an efficient and scalable manner. Data management is at the heart of many ML systems due to data-driven application characteristics, data-centric workload characteristics, and system architectures inspired by classical data management techniques. In this book, we follow this data-centric view of ML systems and aim to provide a comprehensive overview of data management in ML systems for the end-to-end data science or ML lifecycle. We review multiple interconnected lines of work: (1) ML support in database (DB) systems...
The topic of using views to answer queries has been popular for a few decades now, as it cuts across domains such as query optimization, information integration, data warehousing, website design and, recently, database-as-a-service and data placement in cloud systems. This book assembles foundational work on answering queries using views in a self-contained manner, with an effort to choose material that constitutes the backbone of the research. It presents efficient algorithms and covers the following problems: query containment; rewriting queries using views in various logical languages; equivalent rewritings and maximally contained rewritings; and computing certain answers in the data-inte...
This book is a gentle introduction to dominance-based query processing techniques and their applications. The book aims to present fundamental as well as some advanced issues in the area in a precise, but easy-to-follow, manner. Dominance is an intuitive concept that can be used in many different ways in diverse application domains. The concept of dominance is based on the values of the attributes of each object. An object dominates another object if is better than . This goodness criterion may differ from one user to another. However, all decisions boil down to the minimization or maximization of attribute values. In this book, we will explore algorithms and applications related to dominance-based query processing. The concept of dominance has a long history in finance and multi-criteria optimization. However, the introduction of the concept to the database community in 2001 inspired many researchers to contribute to the area. Therefore, many algorithmic techniques have been proposed for the efficient processing of dominance-based queries, such as skyline queries, -dominant queries, and top- dominating queries, just to name a few.
The three-volume set LNCS 13245, 13246 and 13247 constitutes the proceedings of the 26th International Conference on Database Systems for Advanced Applications, DASFAA 2022, held online, in April 2021. The total of 72 full papers, along with 76 short papers, are presented in this three-volume set was carefully reviewed and selected from 543 submissions. Additionally, 13 industrial papers, 9 demo papers and 2 PhD consortium papers are included. The conference was planned to take place in Hyderabad, India, but it was held virtually due to the COVID-19 pandemic.