数据挖掘电子书下载-相识电子书

标签：数据挖掘

数据挖掘技术

作者：[美]MichaelJ.A.B

本书是数据挖掘领域的经典著作，数年来畅销不衰。全书从技术和应用两个方面，全面、系统地介绍了数据挖掘的商业环境、数据挖掘技术及其在商业环境中的应用。自从1997年本书第1版出版以来，数据挖掘界发生了巨大的变化，其中的大部分核心算法仍然保持不变，但是算法嵌入的软件、应用算法的数据库以及用于解决的商业问题都有所演进。第2版展示如何利用基本的数据挖掘方法和技术，解决常见的商业问题。本书涵盖核心的数据挖掘技术，包括：决策树、神经网络、协同过滤、关联规则、链接分析、聚类和生存分析等。此外，还提供了数据挖掘最佳实践、数据挖掘的最新进展和一些富有挑战性的研究课题，极具技术深度与广度。配套网站www．data-miners．com／companion提供了每章的练习和用于测试各种数据挖掘技术的数据。全书语句凝炼、清新，对复杂概念的实际应用进行了生动解释，是必不可少的数据挖掘教材。
Data Mining

作者：Ian H. Witten,Eibe F

As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references. The highlights for the new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; plus much more; algorithmic methods at the heart of successful data mining-including tried and true techniques as well as leading edge methods; performance improvement techniques that work by transforming the input or output; and, downloadable Weka, a collection of machine learning algorithms for data mining tasks, including tools for data pre-processing, classification, regression, clustering, association rules, and visualization-in a new, interactive interface.
社会计算

作者：Lei Tang,Huan Liu

在刚过去的十年我们见证了共享Web和社会媒体的诞生，它们用各种富有创意的方式将人们联系在一起。目前，成千上万的用户忙着在线玩、加标签、工作以及开展社交活动，合作、通信和智能正采取着前所未有的新形式。社会媒体的出现促进了商业模式的改变，影响了人们观点和情感的沟通，为大规模地研究人际交互和集体行为提供了无数机会。本书从数据挖掘角度介绍社会媒体的性质，评述社会媒体计算的代表性成果，并描述社会媒体带来的挑战。书中介绍了基本概念，使用浅显易懂的例子展示最新的和有效的评价方法。特别是讨论了基于图的社区发现技术并对处理社会媒体中动态的、混杂的网络进行了重要延伸。另外还展示了发现的社区模式怎样用于社会媒体挖掘。本书中的概念、算法和方法能够帮助人们更好地利用社会媒体，并为建立社会化智能系统提供支持。本书是研究社会媒体中社区发现与挖掘技术的入门级读物，适合以数据为中心的社会媒体学科的学生、研究者和实践者阅读。本书网站http://dmml.asu.edu/cdm/提供了讲课幻灯片、书中所有的图、主要的参考文献、书中使用的一些小型数据集，以及一些代表性算法的源代码。
深入浅出数据分析

作者：迈克尔•米尔顿 (Michael Mil

《深入浅出数据分析》以类似“章回小说”的活泼形式，生动地向读者展现优秀的数据分析人员应知应会的技术：数据分析基本步骤、实验方法、最优化方法、假设检验方法、贝叶斯统计方法、主观概率法、启发法、直方图法、回归法、误差处理、相关数据库、数据整理技巧；正文之后，意犹未尽地以三篇附录介绍数据分析十大要务、R工具及ToolPak工具，在充分展现目标知识以外，为读者搭建了走向深入研究的桥梁。本书构思跌宕起伏，行文妙趣横生，无论读者是职场老手，还是业界新人；无论是字斟句酌，还是信手翻阅，都能跟着文字在职场中走上几回，体味数据分析领域的乐趣与挑战。
网络科学导论

作者：汪小帆,李翔,陈关荣

对各种复杂网络的定量与定性特征的科学理解已成为网络时代科学研究中一个极其重要的挑战性课题，网络科学就是一门正在兴起的面对这一挑战的交叉性学科。本书致力于系统地介绍网络科学的基本概念、思想和方法，使得具有高等数学基础的读者都能够看懂，并具备把网络科学方法用于实际网络分析的能力。为此，本书没有过多地陷入数学和物理推导，而是更为关注网络科学的思维习惯和研究方式。本书在概要介绍了网络科学的背景和研究意义之后，分为四个部分详细介绍了网络基本概念、网络拓扑性质、网络拓扑模型和网络动力学。本书适合作为研究生和高年级本科生的网络科学教材，也可供自然科学、工程技术科学和社会科学领域的研究人员与学生参考。
Mining of Massive Datasets

作者：Anand Rajaraman,Jeff

The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.
Scaling up Machine Learning

作者：Bekkerman, Ron; Bile

This book presents an integrated collection of representative approaches for scaling up machine learning and data mining methods on parallel and distributed computing platforms. Demand for parallelizing learning algorithms is highly task-specific: in some settings it is driven by the enormous dataset sizes, in others by model complexity or by real-time performance requirements. Making task-appropriate algorithm and platform choices for large-scale machine learning requires understanding the benefits, trade-offs and constraints of the available options. Solutions presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters, concurrent programming frameworks including CUDA, MPI, MapReduce and DryadLINQ, and learning settings (supervised, unsupervised, semi-supervised and online learning). Extensive coverage of parallelization of boosted trees, SVMs, spectral clustering, belief propagation and other popular learning algorithms and deep dives into several applications make the book equally useful for researchers, students and practitioners.
智能Web算法

作者：Haralambos Marmanis,

本书涵盖了五类重要的智能算法：搜索、推荐、聚类、分类和分类器组合，并结合具体的案例讨论了它们在Web应用中的角色及要注意的问题。除了第1章的概要性介绍以及第7章对所有技术的整合应用外，第2～6章以代码示例的形式分别对这五类算法进行了介绍。本书面向的是广大普通读者，特别是对算法感兴趣的工程师与学生，所以对于读者的知识背景并没有过多的要求。本书中的例子和思想应用广泛，所以对于希望从业务角度更好地理解有关技术的技术经理、产品经理和管理层来说，本书也有一定的价值。
Statistical Analysis with R

作者：John M. Quick

This is a practical, step by step guide that will help you to quickly become proficient in the data analysis using R. The book is packed with clear examples, screenshots, and code to carry on your data analysis without any hurdle. If you are a data analyst, business or information technology professional, student, educator, researcher, or anyone else who wants to learn to analyze the data effectively then this book is for you. No prior experience with R is necessary. Knowledge of other programming languages, software packages, or statistics may be helpful, but is not required.
Clementine数据挖掘方法及应用

作者：薛薇//陈欢歌

《Clementine数据挖掘方法及应用》以数据挖掘的实践过程为主线，通过生动的应用案例，从数据挖掘实施角度，系统介绍了经典的数据挖掘方法和利用Clementine实现数据挖掘的全部过程，讲解方法从易到难，说明问题从浅至深。《Clementine数据挖掘方法及应用》力求以最通俗的方式阐述数据挖掘方法的核心思想与基本原理，同时配合Clementine软件操作的说明，希望读者能够直观了解方法本质，尽快掌握Clementine软件使用，并应用到数据挖掘实践中。为方便读者学习，书中所有数据和案例均与所附光盘内容一致。《Clementine数据挖掘方法及应用》适合于从事数据分析各应用领域的读者，尤其适合于商业管理、财政经济、金融保险、社会研究、人文教育等行业的相关人员。同时，也能够作为高等院校计算机类、财经类、管理类专业本科生和研究生的数据挖掘教材。数据挖掘是当前数据分析领域中最活跃最前沿的地带。Clementine充分利用计算机系统的运算处理能力和图形展现能力，将数据挖掘方法、应用与工具有机地融为一体，成为内容最为全面，功能最为强大的数据挖掘软件产品，是解决数据挖掘问题的最理想工具。
Data Mining with R

作者：Luis Torgo

The versatile capabilities and large set of add-on packages make R an excellent alternative to many existing and often expensive data mining tools. Exploring this area from the perspective of a practitioner, Data Mining with R: Learning with Case Studies uses practical examples to illustrate the power of R and data mining. Assuming no prior knowledge of R or data mining/statistical techniques, the book covers a diverse set of problems that pose different challenges in terms of size, type of data, goals of analysis, and analytical tools. To present the main data mining processes and techniques, the author takes a hands-on approach that utilizes a series of detailed, real-world case studies: Predicting algae blooms Predicting stock market returns Detecting fraudulent transactions Classifying microarray samples With these case studies, the author supplies all necessary steps, code, and data. Web Resource A supporting website mirrors the do-it-yourself approach of the text. It offers a collection of freely available R source files that encompass all the code used in the case studies. The site also provides the data sets from the case studies as well as an R package of several functions.
Head First Data Analysis

作者：Michael Milton

Today, interpreting data is a critical decision-making factor for businesses and organizations. If your job requires you to manage and analyze all kinds of data, turn to "Head First Data Analysis", where you'll quickly learn how to collect and organize data, sort the distractions from the truth, find meaningful patterns, draw conclusions, predict the future, and present your findings to others. Whether you're a product developer researching the market viability of a new product or service, a marketing manager gauging or predicting the effectiveness of a campaign, a salesperson who needs data to support product presentations, or a lone entrepreneur responsible for all of these data-intensive functions and more, the unique approach in "Head First Data Analysis" is by far the most efficient way to learn what you need to know to convert raw data into a vital business tool. You'll learn how to: determine which data sources to use for collecting information; assess data quality and distinguish signal from noise; build basic data models to illuminate patterns, and assimilate new information into the models; cope with ambiguous information; design experiments to test hypotheses and draw conclusions; use segmentation to organize your data within discrete market groups; visualize data distributions to reveal new relationships and persuade others; predict the future with sampling and probability models; clean your data to make it useful; and, communicate the results of your analysis to your audience. Using the latest research in cognitive science and learning theory to craft a multi-sensory learning experience, "Head First Data Analysis" uses a visually rich format designed for the way your brain works, not a text-heavy approach that puts you to sleep.
神经网络与机器学习

作者：(加)海金

《神经网络与机器学习(英文版第3版)》的可读性非常强，作者举重若轻地对神经网络的基本模型和主要学习理论进行了深入探讨和分析，通过大量的试验报告、例题和习题来帮助读者更好地学习神经网络。神经网络是计算智能和机器学习的重要分支，在诸多领域都取得了很大的成功。在众多神经网络著作中，影响最为广泛的是SimonHaykin的《神经网络原理》(第4版更名为《神经网络与机器学习》）。在《神经网络与机器学习(英文版第3版)》中，作者结合近年来神经网络和机器学习的最新进展，从理论和实际应用出发，全面。系统地介绍了神经网络的基本模型、方法和技术，并将神经网络和机器学习有机地结合在一起。《神经网络与机器学习(英文版第3版)》不但注重对数学分析方法和理论的探讨，而且也非常关注神经网络在模式识别、信号处理以及控制系统等实际工程问题中的应用。本版在前一版的基础上进行了广泛修订，提供了神经网络和机器学习这两个越来越重要的学科的最新分析。
可视化数据

作者：Ben Fry

这是一本关于计算信息设计的书籍。从如何获取原始数据开始，到如何理解原始数据，本书都作了非常详尽的介绍。书中使用由作者开发的开源编程环境Processing编程，它非常简单易用。对于熟悉Java的程序员来说，本书后面的章节还介绍了如何将Processing和Java结合使用。本书是写给那些拥有一个数据集合，好奇如何探索它，并且考虑如何交流这些数据的人们的。随着我们处理越来越多的信息，需要可视化数据的人的数量增长非常迅速。更重要的是，读者已经超越了某些可视化领域的专家。通过让更大范围的人接触到可视化思想，在接下来的几十年中应该可以看到一些真正让人惊叹的成果。
机器学习

作者：弗拉赫 (Peter Flach)

本书是最全面的机器学习教材之一。书中首先介绍了机器学习的构成要素（任务、模型、特征）和机器学习任务，接着详细分析了逻辑模型（树模型、规则模型）、几何模型（线性模型和基于距离的模型）和概率模型，然后讨论了特征、模型的集成，以及被机器学习研究者称为“实验”的方法。作者不仅使用了已有术语，还引入了一些新的概念，同时提供了大量精选的示例和插图解说。
数据可视化

作者：陈为,沈则潜

全书共有16 章，分为4 篇。基础篇，阐述数据可视化的基础理论和概念，从人的感知和认知出发，介绍数据模型和可视化基础；时空数据篇，介绍带有空间坐标或时间信息的数据的可视化方法，此类数据通过设备在真实物理空间中采集得到或由科学计算模拟产生；非时空数据篇，描述非结构化和非几何的抽象数据的可视化，这些数据既存在于真实物理空间，又是社会空间和网络信息空间的基本表达形式；用户篇，介绍面向各类数据的可视化在实际应用中共同需要的方法、技术和工具，例如交互和可视化评测方法，以及在具体领域的可视化和应用系统。本书从研究者的角度，介绍数据可视化的定义、方法、效用和工具，既可作为初学者的领路手册，也可用于可视化研究和可视化工具使用的参考指南。
数据挖掘与数据化运营实战

作者：卢辉

《数据挖掘与数据化运营实战:思路、方法、技巧与应用》是目前有关数据挖掘在数据化运营实践领域比较全面和系统的著作，也是诸多数据挖掘书籍中为数不多的穿插大量真实的实践应用案例和场景的著作，更是创造性地针对数据化运营中不同分析挖掘课题类型，推出一一对应的分析思路集锦和相应的分析技巧集成，为读者提供“菜单化”实战锦囊的著作。作者结合自己数据化运营实践中大量的项目经验，用通俗易懂的“非技术”语言和大量活泼生动的案例，围绕数据分析挖掘中的思路、方法、技巧与应用，全方位整理、总结、分享，帮助读者深刻领会和掌握“以业务为核心，以思路为重点，以分析技术为辅佐”的数据挖掘实践应用宝典。《数据挖掘与数据化运营实战:思路、方法、技巧与应用》共19章，分为三个部分：基础篇（第1～4章）系统介绍了数据分析挖掘和数据化运营的相关背景、数据化运营中“协调配合”的核心，以及实践中常见分析项目类型；实战篇（第6～13章）主要介绍实践中常见的分析挖掘技术的实用技巧，并对大量的实践案例进行了全程分享展示；思想意识篇（第5章，第14～19章）主要是有关数据分析师的责任、意识、思维的培养和提升的总结和探索，以及一些有效的项目质控制度和经典的方法论介绍。海报：
数据挖掘与R语言

作者：(葡)Luis Torgo

“如果你想学习如何用一款统计专家和数据挖掘专家所开发的免费软件包，那就选这本书吧。本书包括大量实际案例，它们充分体现了R软件的广度和深度。” —— Bernhard Pfahringer, 新西兰怀卡托大学本书利用大量给出必要步骤、代码和数据的具体案例，详细描述了数据挖掘的主要过程和技术，广泛涵盖数据大小、数据类型、分析目标、分析工具等方面的各种具有挑战性的问题。本书的支持网站（http://www.liaad.up.pt/~ltorgo/DataMiningWithR/）给出了案例研究的所有代码、数据集以及R函数包。本书特色通过仔细选择的案例涵盖了主要的数据挖掘技术。给出的代码和方法可以方便地复制或者改编后应用于自己的问题。不要求读者具有R、数据挖掘或统计技术的基础知识。包含R和MySQL基础知识的简介。提供了对数据挖掘技术的特性、缺点和分析目标的基本理解。
概率图模型学习理论及其应用

作者：赵悦

《概率图模型学习理论及其应用》是系统论述概率图模型的基本理论、学习算法及其应用的中文专著，内容包括概率图模型基本概念；完整数据集的概率图模型的学习理论；不完整数据集的概率图模型学习理论；无向概率图模型学习；新型学习方法；概率图模型在计算机视觉、个人信用风险评估及语言识别领域中的应用等部分。《概率图模型学习理论及其应用》从实例出发，由浅入深，直观与严谨相结合，并提供了详尽的参考文献。
Mahout实战

作者：[美] Sean Owen,[美] Ro

通过收集数据来学习和演进的计算机系统威力无穷。Mahout作为Apache的开源机器学习项目，把推荐系统、分类和聚类等领域的核心算法浓缩到了可扩展的现成的库中。使用Mahout，你可以立即在自己的项目中应用亚马逊、Netflix及其他互联网公司所采用的机器学习技术。本书出自Mahout核心成员之手，得到Apache官方推荐，权威性毋庸置疑。作者凭借多年实战经验，为读者展现了丰富的应用案例，并细致地介绍了Mahout的解决之道。本书还重点讨论了可扩展性问题，介绍了如何利用Apache Hadoop框架应对大数据的挑战。本书内容： • 利用分组数据实现个性化推荐； • 寻找数据中的逻辑簇； • 通过即时分类实现过滤与调优。