欢迎来到相识电子书!
标签:机器学习
-
Artificial Intelligence
For one or two-semester, undergraduate or graduate-level courses in Artificial Intelligence. The long-anticipated revision of this best-selling text offers the most comprehensive, up-to-date introduction to the theory and practice of artificial intelligence. Click on "Features" tab below for more information Resources: Visit the author's website http://aima.cs.berkeley.edu/ to access both student and instructor resources including Power Point slides, syllabus. homework and exams, and solutions text problems. -
机器学习实战
机器学习是人工智能研究领域中一个极其重要的研究方向,在现今的大数据时代背景下,捕获数据并从中萃取有价值的信息或模式,成为各行业求生存、谋发展的决定性手段,这使得这一过去为分析师和数学家所专属的研究领域越来越为人们所瞩目。 本书第一部分主要介绍机器学习基础,以及如何利用算法进行分类,并逐步介绍了多种经典的监督学习算法,如k近邻算法、朴素贝叶斯算法、Logistic回归算法、支持向量机、AdaBoost集成方法、基于树的回归算法和分类回归树(CART)算法等。第三部分则重点介绍无监督学习及其一些主要算法:k均值聚类算法、Apriori算法、FP-Growth算法。第四部分介绍了机器学习算法的一些附属工具。 全书通过精心编排的实例,切入日常工作任务,摒弃学术化语言,利用高效的可复用Python代码来阐释如何处理统计数据,进行数据分析及可视化。通过各种实例,读者可从中学会机器学习的核心算法,并能将其运用于一些策略性任务中,如分类、预测、推荐。另外,还可用它们来实现一些更高级的功能,如汇总和简化等。 -
机器学习
这本书为机器学习技术提供了一些非常棒的案例研究。它并不想成为一本关于机器学习的工具书或者理论书籍,它注重的是一个学习的过程,因而对于任何有一些编程背景和定量思维的人来说,它都是不错的选择。 ——Max Shron OkCupid 机器学习是计算机科学和人工智能中非常重要的一个研究领域,近年来,机器学习不但在计算机科学的众多领域中大显身手,而且成为一些交叉学科的重要支撑技术。本书比较全面系统地介绍了机器学习的方法和技术,不仅详细阐述了许多经典的学习方法,还讨论了一些有生命力的新理论、新方法。 全书案例既有分类问题,也有回归问题;既包含监督学习,也涵盖无监督学习。本书讨论的案例从分类讲到回归,然后讨论了聚类、降维、最优化问题等。这些案例包括分类:垃圾邮件识别,排序:智能收件箱,回归模型:预测网页访问量,正则化:文本回归,最优化:密码破解,无监督学习:构建股票市场指数,空间相似度:用投票记录对美国参议员聚类,推荐系统:给用户推荐R语言包,社交网络分析:在Twitter上感兴趣的人,模型比较:给你的问题找到最佳算法。各章对原理的叙述力求概念清晰、表达准确,突出理论联系实际,富有启发性,易于理解。在探索这些案例的过程中用到的基本工具就是R统计编程语言。R语言非常适合用于机器学习的案例研究,因为它是一种用于数据分析的高水平、功能性脚本语言。 本书主要内容: ·开发一个朴素贝叶斯分类器,仅仅根据邮件的文本信息来判断这封邮件是否是垃圾邮件; ·使用线性回归来预测互联网排名前1000网站的PV; ·利用文本回归理解图书中词与词之间的关系; ·通过尝试破译一个简单的密码来学习优化技术; ·利用无监督学习构建股票市场指数,用于衡量整体市场行情的好坏; ·根据美国参议院的投票情况,从统计学的角度对美国参议员聚类; ·通过K近邻算法构建向用户推荐R语言包; ·利用Twitter数据来构建一个“你可能感兴趣的人”的推荐系统; ·模型比较:给你的问题找到最佳算法。 -
利用Python进行数据分析
【名人推荐】 “科学计算和数据分析社区已经等待这本书很多年了:大量具体的实践建议,以及大量综合应用方法。本书在未来几年里肯定会成为Python领域中技术计算的权威指南。” ——Fernando Pérez 加州大学伯克利分校 研究科学家, IPython的创始人之一 【内容简介】 还在苦苦寻觅用Python控制、处理、整理、分析结构化数据的完整课程?本书含有大量的实践案例,你将学会如何利用各种Python库(包括NumPy、pandas、matplotlib以及IPython等)高效地解决各式各样的数据分析问题。 由于作者Wes McKinney是pandas库的主要作者,所以本书也可以作为利用Python实现数据密集型应用的科学计算实践指南。本书适合刚刚接触Python的分析人员以及刚刚接触科学计算的Python程序员。 •将IPython这个交互式Shell作为你的首要开发环境。 •学习NumPy(Numerical Python)的基础和高级知识。 •从pandas库的数据分析工具开始。 •利用高性能工具对数据进行加载、清理、转换、合并以及重塑。 •利用matplotlib创建散点图以及静态或交互式的可视化结果。 •利用pandas的groupby功能对数据集进行切片、切块和汇总操作。 •处理各种各样的时间序列数据。 •通过详细的案例学习如何解决Web分析、社会科学、金融学以及经•济学等领域的问题。 -
Learning from Data
An interdisciplinary framework for learning methodologies—covering statistics, neural networks, and fuzzy logic, this book provides a unified treatment of the principles and methods for learning dependencies from data. It establishes a general conceptual framework in which various learning methods from statistics, neural networks, and fuzzy logic can be applied—showing that a few fundamental principles underlie most new methods being proposed today in statistics, engineering, and computer science. Complete with over one hundred illustrations, case studies, and examples making this an invaluable text. -
信息论、推理与学习算法
本书是英国剑桥大学卡文迪许实验室的著名学者David J.C.MacKay博士总结多年教学经验和科研成果,于2003年推出的一部力作。本书作者不仅透彻地论述了传统信息论的内容和最新编码算法,而且以高度的学科驾驭能力,匠心独具地在一个统一框架下讨论了贝叶斯数据建模、蒙特卡罗方法、聚类算法、神经网络等属于机器学习和推理领域的主题,从而很好地将诸多学科的技术内涵融会贯通。本书注重理论与实际的结合,内容组织科学严谨,反映了多门学科的内在联系和发展趋势。同时,本书还包含了丰富的例题和近400道习题(其中许多习题还配有详细的解答),便于教学或自学,适合作为信息科学与技术相关专业高年级本科生和研究生教材,对相关专业技术人员也不失为一本有益的参考书。... -
Bayesian Reasoning and Machine Learning
Machine learning methods extract value from vast data sets quickly and with modest resources. They are established tools in a wide range of industrial applications, including search engines, DNA sequencing, stock market analysis, and robot locomotion, and their use is spreading rapidly. People who know the methods have their choice of rewarding jobs. This hands-on text opens these opportunities to computer science students with modest mathematical backgrounds. It is designed for final-year undergraduates and master's students with limited background in linear algebra and calculus. Comprehensive and coherent, it develops everything from basic reasoning to advanced techniques within the framework of graphical models. Students learn more than a menu of techniques, they develop analytical and problem-solving skills that equip them for the real world. Numerous examples and exercises, both computer based and theoretical, are included in every chapter. Resources for students and instructors, including a MATLAB toolbox, are available online. -
这就是搜索引擎
搜索引擎作为互联网发展中至关重要的一种应用,已经成为互联网各个领域的制高点,其重要性不言而喻。搜索引擎领域也是互联网应用中不多见的以核心技术作为其命脉的领域,搜索引擎各个子系统是如何设计的?这成为广大技术人员和搜索引擎优化人员密切关注的内容。 本书的最大特点是内容新颖全面而又通俗易懂。对于实际搜索引擎所涉及的各种核心技术都有全面细致的介绍,除了作为搜索系统核心的网络爬虫、索引系统、排序系统、链接分析及用户分析外,还包括网页反作弊、缓存管理、网页去重技术等实际搜索引擎必须关注的技术,同时用相当大的篇幅讲解了云计算与云存储的核心技术原理。另外,本书也密切关注搜索引擎发展的前沿技术:Google的咖啡因系统及Megastore等云计算新技术、百度的暗网抓取技术阿拉丁计划、内容农场作弊、机器学习排序等。诸多新技术在相关章节都有详细讲解,同时对于社会化搜索、实时搜索及情境搜索等搜索引擎的未来发展方向做了技术展望。为了增进读者的理解,全书大量引入形象的图片来讲解算法原理,相信读者会发现原来搜索引擎的核心技术理解起来比原先想象的要简单得多。 -
数学之美 (第二版)
几年前,“数学之美”系列文章原刊载于谷歌黑板报,获得上百万次点击,得到读者高度评价。读者说,读了“数学之美”,才发现大学时学的数学知识,比如马尔可夫链、矩阵计算,甚至余弦函数原来都如此亲切,并且栩栩如生,才发现自然语言和信息处理这么有趣。 在纸本书的创作中,作者吴军博士几乎把所有文章都重写了一遍,为的是把高深的数学原理讲得更加通俗易懂,让非专业读者也能领略数学的魅力。读者通过具体的例子学到的是思考问题的方式 —— 如何化繁为简,如何用数学去解决工程问题,如何跳出固有思维不断去思考创新。 第二版增加了针对大数据和机器学习的内容,以便满足人们对当下技术的学习需求;同时,根据专家和读者的反馈更正了一些错漏,并更新了部分内容。 《数学之美》第一版荣获国家图书馆第八届文津图书奖; 入选广电总局“2014年向全国青少年推荐百种优秀图书书目”; 荣获2012-2013年度全行业优秀畅销书; 《浪潮之巅》、《文明之光》作者吴军博士最新力作,李开复作序推荐,Google黑板报百万点击! 新版增加了大数据和机器学习等最新内容,以满足人们对当下技术的学习需求;同时,根据专家和读者的反馈更正了错漏,并更新了部分内容 -
社交网站的数据挖掘与分析
Facebook、Twitter和LinkedIn产生了大量宝贵的社交数据,但是你怎样才能找出谁通过社交媒介正在进行联系?他们在讨论些什么?或者他们在哪儿?这本简洁而且具有可操作性的书将揭示如何回答这些问题甚至更多的问题。你将学到如何组合社交网络数据、分析技术,如何通过可视化帮助你找到你一直在社交世界中寻找的内容,以及你闻所未闻的有用信息。 每个独立的章节介绍了在社交网络的不同领域挖掘数据的技术,这些领域包括博客和电子邮件。你所需要具备的就是一定的编程经验和学习基本的Python工具的意愿。 •获得对社交网络世界的直观认识 •使用GitHub上灵活的脚本来获取从诸如Twitter、Facebook和LinkedIn之类的社交网络API中的数据 •学习如何应用便捷的Python工具来交叉分析你所收集的数据 •通过XHTML朋友圈探讨基于微格式的社交联系 •应用诸如TF-IDF、余弦相似性、搭配分析、文档摘要、派系检测之类的先进挖掘技术 •通过基于HTML5和JavaScript工具包的网络技术建立交互式可视化 -
Introduction to Information Retrieval
Class-tested and coherent, this groundbreaking new textbook teaches classic web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Written from a computer science perspective by three leading experts in the field, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Although originally designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also create a buzz for researchers and professionals alike. Contents 1. Information retrieval using the Boolean model; 2. The dictionary and postings lists; 3. Tolerant retrieval; 4. Index construction; 5. Index compression; 6. Scoring and term weighting; 7. Vector space retrieval; 8. Evaluation in information retrieval; 9. Relevance feedback and query expansion; 10. XML retrieval; 11. Probabilistic information retrieval; 12. Language models for information retrieval; 13. Text classification and Naive Bayes; 14. Vector space classification; 15. Support vector machines and kernel functions; 16. Flat clustering; 17. Hierarchical clustering; 18. Dimensionality reduction and latent semantic indexing; 19. Web search basics; 20. Web crawling and indexes; 21. Link analysis. Reviews “This is the first book that gives you a complete picture of the complications that arise in building a modern web-scale search engine. You'll learn about ranking SVMs, XML, DNS, and LSI. You'll discover the seedy underworld of spam, cloaking, and doorway pages. You'll see how MapReduce and other approaches to parallelism allow us to go beyond megabytes and to efficiently manage petabytes." -Peter Norvig, Director of Research, Google Inc. "Introduction to Information Retrieval is a comprehensive, up-to-date, and well-written introduction to an increasingly important and rapidly growing area of computer science. Finally, there is a high-quality textbook for an area that was desperately in need of one." -Raymond J. Mooney, Professor of Computer Sciences, University of Texas at Austin “Through compelling exposition and choice of topics, the authors vividly convey both the fundamental ideas and the rapidly expanding reach of information retrieval as a field.” -Jon Kleinberg, Professor of Computer Science, Cornell University -
集体智慧编程
想要探寻搜索排名、产品推荐、社会化书签和在线匹配背后的力量吗?这本颇具魅力的书籍向你展现如何创建Web 2.0应用程序,从参与性?Internet应用程序产生的大量数据中挖掘金矿。运用本书中介绍的先进算法,你可以编写聪明的程序,以访问其他网站那些有趣的数据集,从自有应用程序的用户中收集数据,或者分析和理解你所发现的数据。 《集体智慧编程》将你带入机器学习和统计的世界,并且阐释了如何从你和他人每天收集的信息中获得关于用户体验、市场营销、个性品味及人类行为的结论。每个算法的描述都十分简明清晰,相关代码均可以立即用于你的网站、博客、Wiki或特定应用程序。本书讲解了下列主题: 可以让在线零售商推荐产品或媒体的协作过滤技术 用于在大数据集中发现同类项组的聚类方法 从数以百万计可能方案中选择问题最佳解决方案的最优化算法 贝叶斯过滤,用在基于单词类型和其他特征的垃圾信息过滤中 支持向量(support-vector)机器,用于在线交友网站中的速配 用于问题解决的演化智能——计算机如何通过多次玩同样的游戏,改进自身代码并获得技能提升 每一章都包含了相关练习,可通过扩展使算法变得更强大。超越简单的数据库支持应用程序模式,让 Internet数据财富为你所用。 -
Statistical Decision Theory and Bayesian Analysis
In this new edition the author has added substantial material on Bayesian analysis, including lengthy new sections on such important topics as empirical and hierarchical Bayes analysis, Bayesian calculation, Bayesian communication, and group decision making. With these changes, the book can be used as a self-contained introduction to Bayesian analysis. In addition, much of the decision-theoretic portion of the text was updated, including new sections covering such modern topics as minimax multivariate (Stein) estimation. -
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Handling inherent uncertainty and exploiting compositional structure are fundamental to understanding and designing large-scale systems. Statistical relational learning builds on ideas from probability theory and statistics to address uncertainty while incorporating tools from logic, databases and programming languages to represent structure. In Introduction to Statistical Relational Learning, leading researchers in this emerging area of machine learning describe current formalisms, models, and algorithms that enable effective and robust reasoning about richly structured systems and data. The early chapters provide tutorials for material used in later chapters, offering introductions to representation, inference and learning in graphical models, and logic. The book then describes object-oriented approaches, including probabilistic relational models, relational Markov networks, and probabilistic entity-relationship models as well as logic-based formalisms including Bayesian logic programs, Markov logic, and stochastic logic programs. Later chapters discuss such topics as probabilistic models with unknown objects, relational dependency networks, reinforcement learning in relational domains, and information extraction. By presenting a variety of approaches, the book highlights commonalities and clarifies important differences among proposed approaches and, along the way, identifies important representational and algorithmic issues. Numerous applications are provided throughout.Lise Getoor is Assistant Professor in the Department of Computer Science at the University of Maryland. Ben Taskar is Assistant Professor in the Computer and Information Science Department at the University of Pennsylvania. -
All of Statistics
WINNER OF THE 2005 DEGROOT PRIZE! This book is for people who want to learn probability and statistics quickly. It brings together many of the main ideas in modern statistics in one place. The book is suitable for students and researchers in statistics, computer science, data mining and machine learning. This book covers a much wider range of topics than a typical introductory text on mathematical statistics. It includes modern topics like nonparametric curve estimation, bootstrapping and classification, topics that are usually relegated to follow-up courses. The reader is assumed to know calculus and a little linear algebra. No previous knowledge of probability and statistics is required. The text can be used at the advanced undergraduate and graduate level. -
Programming Collective Intelligence
Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it. Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains: * Collaborative filtering techniques that enable online retailers to recommend products or media * Methods of clustering to detect groups of similar items in a large dataset * Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm * Optimization algorithms that search millions of possible solutions to a problem and choose the best one * Bayesian filtering, used in spam filters for classifying documents based on word types and other features * Using decision trees not only to make predictions, but to model the way decisions are made * Predicting numerical values rather than classifications to build price models * Support vector machines to match people in online dating sites * Non-negative matrix factorization to find the independent features in a dataset * Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details." -- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect -
数据科学实战
• 统计推断、探索性数据分析(EDA)及数据科学工作流程 • 算法 • 垃圾邮件过滤、朴素贝叶斯和数据清理 • 逻辑回归 • 金融建模 • 推荐引擎和因果关系 • 数据可视化 • 社交网络与数据新闻 • 数据工程、MapReduce、Pregel和Hadoop -
模式分类
《模式分类》(英文版)(第2版)简明易读,新增的图表使得许多统计和数学题材非常生动。最终以完美和谐的形式,引导读者深入新的主题。
热门标签
下载排行榜
- 1 梦的解析:最佳译本
- 2 李鸿章全传
- 3 淡定的智慧
- 4 心理操控术
- 5 哈佛口才课
- 6 俗世奇人
- 7 日瓦戈医生
- 8 笑死你的逻辑学
- 9 历史老师没教过的历史
- 10 1分钟和陌生人成为朋友