You may have to Search all our reviewed books and magazines, click the sign up button below to create a free account.
Written for practitioners of data mining, data cleaning and database management. Presents a technical treatment of data quality including process, metrics, tools and algorithms. Focuses on developing an evolving modeling strategy through an iterative data exploration loop and incorporation of domain knowledge. Addresses methods of detecting, quantifying and correcting data quality issues that can have a significant impact on findings and decisions, using commercially available tools as well as new algorithmic approaches. Uses case studies to illustrate applications in real life scenarios. Highlights new approaches and methodologies, such as the DataSphere space partitioning and summary based analysis techniques. Exploratory Data Mining and Data Cleaning will serve as an important reference for serious data analysts who need to analyze large amounts of unfamiliar data, managers of operations databases, and students in undergraduate or graduate level courses dealing with large scale data analys is and data mining.
Describes statistical intervals to quantify sampling uncertainty,focusing on key application needs and recently developed methodology in an easy-to-apply format Statistical intervals provide invaluable tools for quantifying sampling uncertainty. The widely hailed first edition, published in 1991, described the use and construction of the most important statistical intervals. Particular emphasis was given to intervals—such as prediction intervals, tolerance intervals and confidence intervals on distribution quantiles—frequently needed in practice, but often neglected in introductory courses. Vastly improved computer capabilities over the past 25 years have resulted in an explosion of the ...
The big data era is upon us: data are being generated, analyzed, and used at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of big data. BDI differs from traditional data integration along the dimensions of volume, velocity, variety, and veracity. First, not only can data sources contain a huge volume of data, but also the number of data sources is now in the millions. Second, because of the rate at which newly collected data are made available, many of the data sources a...
Historical scholarship is currently undergoing a digital turn. All historians have experienced this change in one way or another, by writing on word processors, applying quantitative methods on digitalized source materials, or using internet resources and digital tools. Digital Histories showcases this emerging wave of digital history research. It presents work by historians who – on their own or through collaborations with e.g. information technology specialists – have uncovered new, empirical historical knowledge through digital and computational methods. The topics of the volume range from the medieval period to the present day, including various parts of Europe. The chapters apply an...
This book is a significant contribution to the subject of mining time-changing data streams and addresses the design of learning algorithms for this purpose. It introduces new contributions on several different aspects of the problem, identifying research opportunities and increasing the scope for applications. It also includes an in-depth study of stream mining and a theoretical analysis of proposed methods and algorithms. The first section is concerned with the use of an adaptive sliding window algorithm (ADWIN). Since this has rigorous performance guarantees, using it in place of counters or accumulators, it offers the possibility of extending such guarantees to learning and mining algori...
None
This book constitutes the refereed proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007, held in Nanjing, China, May 2007. It covers new ideas, original research results and practical development experiences from all KDD-related areas including data mining, machine learning, data warehousing, data visualization, automatic scientific discovery, knowledge acquisition and knowledge-based systems.
In the last years, Linked Data initiatives have encouraged the publication of large graph-structured datasets using the Resource Description Framework (RDF). Due to the constant growth of RDF data on the web, more flexible data management infrastructures must be able to efficiently and effectively exploit the vast amount of knowledge accessible on the web. This book presents flexible query processing strategies over RDF graphs on the web using the SPARQL query language. In this work, we show how query engines can change plans on-the-fly with adaptive techniques to cope with unpredictable conditions and to reduce execution time. Furthermore, this work investigates the application of crowdsourcing in query processing, where engines are able to contact humans to enhance the quality of query answers. The theoretical and empirical results presented in this book indicate that flexible techniques allow for querying RDF data sources efficiently and effectively.
Data obtained from foreign data sources often come with only superficial structural information, such as relation names and attribute names. Other types of metadata that are important for effective integration and meaningful querying of such data sets are missing. In particular, relationships among attributes, such as foreign keys, are crucial metadata for understanding the structure of an unknown database. The discovery of such relationships is difficult, because in principle for each pair of attributes in the database each pair of data values must be compared. A precondition for a foreign key is an inclusion dependency (IND) between the key and the foreign key attributes. We present with S...
This book constitutes the refereed proceedings of the 8th International Conference on Intelligent Data Analysis, IDA 2009, held in Lyon, France, August 31 – September 2, 2009. The 33 revised papers, 18 full oral presentations and 15 poster and short oral presentations, presented were carefully reviewed and selected from almost 80 submissions. All current aspects of this interdisciplinary field are addressed; for example interactive tools to guide and support data analysis in complex scenarios, increasing availability of automatically collected data, tools that intelligently support and assist human analysts, how to control clustering results and isotonic classification trees. In general the areas covered include statistics, machine learning, data mining, classification and pattern recognition, clustering, applications, modeling, and interactive dynamic data visualization.