欢迎来到相识电子书!

标签:R

  • ggplot2:数据分析与图形艺术

    作者:哈德利·威克姆 (Hadley Wick

    中译本序 每当我们看到一个新的软件,第一反应会是:为什么又要发明一个新软件?ggplot2是R世界里相对还比较年轻的一个包,在它之前,官方R已经有自己的基础图形系统(graphics包)和网格图形系统(grid包),并且Deepayan Sarkar也开发了lattice包,看起来R的世界对图形的支持已经足够强大了。那么我们不禁要问,为什么还要发明一套新的系统? 设计理念 打个比方,想想我们小时候怎样学中文的。最开始的时候我们要识字,不认识字就没法阅读和写作,但我们并不是一直按照一个个汉字学习的,而是通过句子和具体的场景故事学习的。为什么不在小学时背六年字典呢?那样可能认识所有的汉字。原因很简单,光有单字,我们不会说话,也无法阅读和写作。缺的是什么?答案是对文字的组织能力,或者说语法。 R的基础图形系统基本上是一个“纸笔模型”,即:一块画布摆在面前,你可以在这里画几个点,在那里画几条线,指哪儿画哪儿。后来lattice包的出现稍微改善了这种情况,你可以说,我要画散点图或直方图,并且按照某个分类变量给图中的元素上色,此时数据才在画图中扮演了一定的中心角色,我们不用去想具体这个点要用什么颜色(颜色会根据变量自动生成)。然而,lattice继承了R语言的一个糟糕特征,就是参数设置铺天盖地,足以让人窒息,光是一份xyplot()函数的帮助文档,恐怕就够我们消磨一天时间了,更重要的是,lattice仍然面向特定的统计图形,像基础图形系统一样,有直方图、箱线图、条形图等等,它没有一套可以让数据分析者说话的语法。 那么数据分析者是怎样说话的呢?他们从来不会说这条线用#FE09BE颜色,那个点用三角形状,他们只会说,把图中的线用数据中的职业类型变量上色,或图中点的形状对应性别变量。有时候他们画了一幅散点图,但马上他们发现这幅图太拥挤,最好是能具体看一下里面不同收入阶层的特征,所以他们会说,把这幅图拆成七幅小图,每幅图对应一个收入阶层。然后发现散点图的趋势不明显,最好加上回归直线,看看回归模型反映的趋势是什么,或者发现图中离群点太多,最好做一下对数变换,减少大数值对图形的主导性。 从始至终,数据分析者都在数据层面上思考问题,而不是拿着水彩笔和调色板在那里一笔一划作图,而计算机程序员则倾向于画点画线。Leland Wilkinson的著作在理论上改善了这种状况,他提出了一套图形语法,让我们在考虑如何构建一幅图形的时候不再陷在具体的图形元素里面,而是把图形拆分为一些互相独立并且可以自由组合的成分。这套语法提出来之后他自己也做了一套软件,但显然这套软件没有被广泛采用;幸运的是,Hadley Wickham在R语言中把这套想法巧妙地实现了。 为了说明这种语法的想法,我们考虑图形中的一个成分:坐标系。常见的坐标系有两种:笛卡尔坐标系和极坐标系。在语法中,它们属于一个成分,可自由拆卸替换。笛卡尔坐标系下的条形图实际上可以对应极坐标系下的饼图,因为条形图的高可以对应饼图的角度,本质上没什么区别。因此在ggplot2中,从一幅条形图过渡到饼图,只需要加极少量的代码,把坐标系换一下就可以了。如果我们用纸笔模型,则可以想象,这完全是不同的两幅图,一幅图里面要画的是矩形,另一幅图要画扇形。 更多的细节在本书中会介绍,这里我们只是简略说明用语法画图对用纸笔画图来说在思维上的优越性;前者是说话,后者是说字。 发展历程 ggplot2是Hadley在爱荷华州立大学博士期间的作品,也是他博士论文的主题之一,实际上ggplot2还有个前身ggplot,但后来废弃了,某种程度上这也是Hadley写软件的特征,熟悉他的人就知道这不是他第一个“2”版本的包了(还有reshape2)。带2的包和原来的包在语法上会有很大的改动,基本上不兼容。尽管如此,他的R代码风格在R社区可谓独树一帜,尤其是他的代码结构很好,可读性很高,ggplot2是R代码抽象的一个杰作。读者若感兴趣,可以在GitHub网站上浏览他的包:https://github.com/hadley。在用法方面,ggplot2也开创了一种奇特而绝妙的语法,那就是加号:一幅图形从背后的设计来说,是若干图形语法的叠加,从外在的代码来看,也是若干R对象的相加。这一点精妙尽管只是ggplot2系统的很小一部分,但我个人认为没有任何程序语言可比拟,它对作为泛型函数的加号的扩展只能用两个字形容:绝了。 至2013年2月26日,ggplot2的邮件列表(http://groups.google.com/group/ggplot2 )订阅成员已达3394人,邮件总数为15185封,已经成为一个丰富、活跃的用户社区。未来ggplot2的发展也将越来越依赖于用户的贡献,这也是很多开源软件最终的走向。 关于版本更新 原书面世之时,ggplot2的版本号是0.8.3,译者开始翻译此书时是0.9.0版本;该版本较之0.8.3,内部做了一些大改动。此后,ggplot2频繁升级,目前版本号是0.9.3,当然这也给本书的翻译过程带来了相当大的麻烦。因为译者不但要修正原书中大量过时的代码、重新画图,还要修正过时的理念,以及处理数次版本更新的影响。所幸,在翻译过程中,译者得到了本书审校殷腾飞博士、ggplot2开发者Hadley Wickham和Wistong Chang的大力帮助。 如果你是老用户,那么可能需要阅读下面的小节。之后ggplot2有过多次更新,尤其是0.9.0之后,ggplot2的绘图速度和帮助文档有了质的飞跃。关于0.9的更新,读者可以从https://github.com/downloads/hadley/ggplot2/guide-col.pdf下载一份细致的说明文档,但原文档比较长,而且有些内部更新问题我们也不一定需要了解,因此这里给一段概述。 ggplot2的帮助文档大大扩充了,过去头疼的问题之一就是一个函数里面不知道有哪些可能的参数,例如theme()函数,现在已经有了详细说明。 新增图例向导函数guide_legend()和guide_colorbar(),前者可以用来指导图例的排版,例如可以安排图例中元素排为n行m列;后者增强了连续变量图例的展示,例如当我们把颜色映射到一个连续变量上时,过去生成的图例是离散的,现在可以用这个函数生成连续颜色的图例(渐变色)。 新增几何对象函数geom_map()(让地图语法变得更简单),geom_raster()(更高效的geom_tile()),geom_dotplot()(一维点图,展示变量密度分布)和geom_violin()(小提琴,实为密度曲线)。 新增统计变换函数stat_summary2d()(在二维网格上计算数据密度),stat_summary_hex()(在六边形“蜂巢”上计算数据密度),stat_bindot()(一维点图密度),stat_ydensity()(密度曲线,用于小提琴图)。 facet_grid()支持X轴和Y轴其中一者可以有自由的刻度(根据数据范围而定),以往要么所有切片使用同样的坐标轴刻度,要么所有都自由。 geom_boxplot()开始支持画箱线图的凹槽(notch),就像R基础图形系统中的boxplot()函数。 新增函数ggmissing()用来展示缺失值的分布,ggorder()按照数据观察顺序先后画折线图,ggstructure()展示数据热图。 另外这次更新涉及到一些函数参数名称的变化,如果旧代码在这个版本中报错说有未使用的参数,那么用户需要再次查看帮助文档,确保输入的参数在函数中存在。在所有这些表面的更新背后,实际上ggplot2很大程度上被重写了,例如开始使用R自带的S3泛型函数设计,以及将过去ggplot2的功能继续模块化为一些独立的包,一个典型的例子就是标度部分的功能被抽象到scales包中,从数据映射到颜色、大小等外观属性可以由这个包直接完成。这种分拆也使得其他开发者可使用过去ggplot2内部的一些功能函数。 0.9.1版本主要解决了0.9.0版本中的一些漏洞。ggplot2在2012年9月4日发布了新的版本0.9.2,其中一些特性和更新有必要提及: 采用了全新的主题(theme)系统,opts()函数已被标记为“不推荐使用”(deprecated),将在未来版本中被取消,取而代之的是theme()函数,主题元素(theme element)由属性列表构成,支持继承,主题之间可以直接进行合并等操作。详情参见wiki页面:https://github.com/wch/ggplot2/wiki/New-theme-system 。 依赖于新的gtable包。 用来更方便地调整修改ggplot2图形中的图元,ggplotGrob()会返回一个gtable类,这个对象可以利用gtable包中提供的函数和接口进行操作。 所有“模板”类型的图形函数,比如plotmatrix(),ggorder()等等,已被标记为“不推荐使用”(deprecated),将在未来版本中取消。 在本书出版之际,ggplot2更新到了版本0.9.3,修复了0.9.2的一些漏洞,其主要更新包括 不再支持plotmatrix()函数。 geom_polygon()提速,比如世界地图的绘制快了12倍左右。 新增部分主题,比如theme_minimal(),theme_classic()。 本书的所有代码和图片都是针对新版本0.9.3的,在内容方面也根据版本更新对原文做了适当的增删填补,以满足读者的需求。 本书把影响正文阅读的彩图集中放在附录后面,读者可以随时翻阅。 致谢 在听说我们翻译完这本书之后,本书原著Hadley很高兴,给我们发邮件说: I am excited and honoured to have my book translated to Chinese. ggplot2 has become far more popular than I ever imagined, and I'm excited that this translation will allow many more people to learn ggplot2. I'm very grateful that Yihui and his team of translators (Nan Xiao, Tao Gao, Yixuan Qiu, Weicheng Zhu, Taiyun Wei and Lanfeng Pan) made this possible. One of the biggest improvements to ggplot2 since the book was first written is the ggplot2 mailing list. This is a very friendly environment where you can get help with your visualisations, and improve your own knowledge of ggplot2 by helping others solve their problems. I'd strongly encourage you to join the mailing list, even if you think your English is not very good -- we are very friendly people. 我们感谢这本书的译者,包括邱怡轩(第1~2章)、主伟呈(第3~4章)、肖楠(第5~6章)、高涛(第7~8章)、潘岚锋(第9章)、魏太云(第10章、附录以及翻译过程的协调安排和全书的LaTeX排版工作)。所有译者均来自于统计之都(http://cos.name )。 爱荷华州立大学的殷腾飞博士、中国人民大学统计学院的孟生旺教授、浙江大学的张政同学通读了译稿,提出了很多有用的建议,殷腾飞博士还提供了大多数新版本中的解决方案,并担任本书的审校。肖凯老师和余光创博士分别对第1~4章、第8~10章以及附录提出了很多修改意见,此外,中国人民大学的陈妍、李晓矛、谢漫锜三位同学、中国再保险公司的李皞先生、百度公司的韩帅先生、eBay公司的陈丽云女士、Mango Solutions公司的李舰先生、京东商城的刘思喆先生、首钢总公司的邓一硕先生、新华社的陈堰平先生在此书的翻译过程中也曾提过不少宝贵的建议,在此一并表示感谢。 为了更好地服务社区,我们还建立了翻译主页:https://github.com/cosname/ggplot2-translation ,读者可以在这里得到最新的勘误和书中的代码,也可以随时提出任何问题。 谢益辉 2013年2月26日
  • 数据挖掘与R语言

    作者:(葡)Luis Torgo

    “如果你想学习如何用一款统计专家和数据挖掘专家所开发的免费软件包,那就选这本书吧。本书包括大量实际案例,它们充分体现了R软件的广度和深度。” —— Bernhard Pfahringer, 新西兰怀卡托大学 本书利用大量给出必要步骤、代码和数据的具体案例,详细描述了数据挖掘的主要过程和技术,广泛涵盖数据大小、数据类型、分析目标、分析工具等方面的各种具有挑战性的问题。 本书的支持网站(http://www.liaad.up.pt/~ltorgo/DataMiningWithR/)给出了案例研究的所有代码、数据集以及R函数包。 本书特色 通过仔细选择的案例涵盖了主要的数据挖掘技术。 给出的代码和方法可以方便地复制或者改编后应用于自己的问题。 不要求读者具有R、数据挖掘或统计技术的基础知识。 包含R和MySQL基础知识的简介。 提供了对数据挖掘技术的特性、缺点和分析目标的基本理解。
  • Reproducible Research with R and RStudio

    作者:Gandrud, Christopher

    Bringing together computational research tools in one accessible source, Reproducible Research with R and RStudio guides you in creating dynamic and highly reproducible research. Suitable for researchers in any quantitative empirical discipline, it presents practical tools for data collection, data analysis, and the presentation of results. With straightforward examples, the book takes you through a reproducible research workflow, showing you how to use: R for dynamic data gathering and automated results presentation knitr for combining statistical analysis and results into one document LaTeX for creating PDF articles and slide shows, and Markdown and HTML for presenting results on the web Cloud storage and versioning services that can store data, code, and presentation files; save previous versions of the files; and make the information widely available Unix-like shell programs for compiling large projects and converting documents from one markup language to another RStudio to tightly integrate reproducible research tools in one place Whether you're an advanced user or just getting started with tools such as R and LaTeX, this book saves you time searching for information and helps you successfully carry out computational research. It provides a practical reproducible research workflow that you can use to gather and analyze data as well as dynamically present results in print and on the web. Supplementary files used for the examples and a reproducible research project are available on the author's website.
  • 空间数据分析与R语言实践

    作者:Roger S.Bivand,Edzer

    《空间数据分析与R语言实践》较全面地介绍了R应用于空间数据分析的原理和方法。在介绍R中空间数据类、方法、空间对象、空间点类、空间线类、空间面类及空间网格的基础上,首先介绍了空间数据的可视化、空间数据的导入导出、空间数据的处理及定制多点数据、六角形网格、时空网格及大型网格数据类的方法;然后介绍了空间点模式分析、插值与地统计分析、面数据和空间自相关分析和面数据建模;最后介绍了空间数据分析在疾病数据制图及分析中的应用。
  • Dynamic Documents with R and Knitr, Second Edition

    作者:Xie, Yihui

    Features · Provides an authoritative and comprehensive guide to the knitr package in R · Emphasizes reproducible research · Covers both simple examples and full applications, from homework to websites to books · Describes a wide range of options, useful tricks, and solutions · Explains the internal design and extensions of the knitr package · Offers demos and other information about the package on the author's website Summary Quickly and Easily Write Dynamic Documents Suitable for both beginners and advanced users, Dynamic Documents with R and knitr, Second Edition makes writing statistical reports easier by integrating computing directly with reporting. Reports range from homework, projects, exams, books, blogs, and web pages to virtually any documents related to statistical graphics, computing, and data analysis. The book covers basic applications for beginners while guiding power users in understanding the extensibility of the knitr package. New to the Second Edition · A new chapter that introduces R Markdown v2 · Changes that reflect improvements in the knitr package · New sections on generating tables, defining custom printing methods for objects in code chunks, the C/Fortran engines, the Stan engine, running engines in a persistent session, and starting a local server to serve dynamic documents Boost Your Productivity in Statistical Report Writing and Make Your Scientific Computing with R Reproducible Like its highly praised predecessor, this edition shows you how to improve your efficiency in writing reports. The book takes you from program output to publication-quality reports, helping you fine-tune every aspect of your report.
  • Time Series Analysis and Its Applications

    作者:Robert H. Shumway,Da

    Time Series Analysis and Its Applications presents a balanced and comprehensive treatment of both time and frequency domain methods with accompanying theory. Numerous examples using non-trivial data illustrate solutions to problems such as evaluating pain perception experiments using magnetic resonance imaging or monitoring a nuclear test ban treaty. The book is designed to be useful as a text for graduate level students in the physical, biological and social sciences and as a graduate level text in statistics. Some parts may also serve as an undergraduate introductory course. Theory and methodology are separated to allow presentations on different levels. Material from the earlier 1988 Prentice-Hall text Applied Statistical Time Series Analysis has been updated by adding modern developments involving categorical time sries analysis and the spectral envelope, multivariate spectral methods, long memory series, nonlinear models, longitudinal data analysis, resampling techniques, ARCH models, stochastic volatility, wavelets and Monte Carlo Markov chain integration methods. These add to a classical coverage of time series regression, univariate and multivariate ARIMA models, spectral analysis and state-space models. The book is complemented by ofering accessibility, via the World Wide Web, to the data and an exploratory time series analysis program ASTSA for Windows that can be downloaded as Freeware. Robert H. Shumway is Professor of Statistics at the University of California, Davis. He is a Fellow of the American Statistical Association and a member of the Inernational Statistical Institute. He won the 1986 American Statistical Association Award for Outstanding Statistical Application and the 1992 Communicable Diseases Center Statistics Award; both awards were for joint papers on time series applications. He is the author of a previous 1988 Prentice-Hall text on applied time series analysis and is currenlty a Departmental Editor for the Journal of Forecasting. David S. Stoffer is Professor of Statistics at the University of Pittsburgh. He has made seminal contributions to the analysis of categorical time series and won the 1989 American Statistical Association Award for Outstanding Statistical Application in a joint paper analyzing categorical time series arising in infant sleep-state cycling. He is currently an Associate Editor of the Journal of Forecasting and has served as an Associate Editor for the Journal fo the American Statistical Association.
  • Discovering Statistics Using R

    作者:Andy Field,Jeremy Mi

  • R Inferno

    作者:Patrick Burns

    An essential guide to the trouble spots and oddities of R. In spite of the quirks exposed here, R is the best computing environment for most data analysis tasks. R is free, open-source, and has thousands of contributed packages. It is used in such diverse fields as ecology, finance, genomics and music. If you are using spreadsheets to understand data, switch to R. You will have safer -- and ultimately, more convenient -- computations.
  • The Art of R Programming

    作者:Norman Matloff

  • Modern Applied Statistics with S

    作者:W.N. Venables,B.D. R

  • R in a Nutshell

    作者:Joseph Adler

    R is rapidly becoming the standard for developing statistical software, and R in a Nutshell provides a quick and practical way to learn this increasingly popular open source language and environment. You'll not only learn how to program in R, but also how to find the right user-contributed R packages for statistical modeling, visualization, and bioinformatics.
  • A Beginner's Guide to R

    作者:Alain F. Zuur,Elena

    Based on their extensive experience with teaching R and statistics to applied scientists, the authors provide a beginner's guide to R. To avoid the difficulty of teaching R and statistics at the same time, statistical methods are kept to a minimum. The text covers how to download and install R, import and manage data, elementary plotting, an introduction to functions, advanced plotting, and common beginner mistakes. This book contains everything you need to know to get started with R.
  • ggplot2

    作者:Hadley Wickham

    This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkison''s Grammar of Graphics to create a powerful and flexible system for creating data graphics. With ggplot2, it''s easy to: * produce handsome, publication-quality plots, with automatic legends created from the plot specification * superpose multiple layers (points, lines, maps, tiles, box plots to name a few) from different data sources, with automatically adjusted common scales * add customisable smoothers that use the powerful modelling capabilities of R, such as loess, linear models, generalised additive models and robust regression * save any ggplot2 plot (or part thereof) for later modification or reuse * create custom themes that capture in-house or journal style requirements, and that can easily be applied to multiple plots * approach your graph from a visual perspective, thinking about how each component of the data is represented on the final plot. This book will be useful to everyone who has struggled with displaying their data in an informative and attractive way. You will need some basic knowledge of R (i.e. you should be able to get your data into R), but ggplot2 is a mini-language specifically tailored for producing graphics, and you''ll learn everything you need in the book. After reading this book you''ll be able to produce graphics customized precisely for your problems, and you''ll find it easy to get graphics out of your head and on to the screen or page.
  • R语言与Bioconductor生物信息学应用

    作者:高山,欧剑虹,肖凯

  • Graphical Models with R

    作者:H. Jsgaard, S. Ren;

    Graphical models in their modern form have been around since the late 1970s and appear today in many areas of the sciences. Along with the ongoing developments of graphical models, a number of different graphical modeling software programs have been written over the years. In recent years many of these software developments have taken place within the R community, either in the form of new packages or by providing an R interface to existing software. This book attempts to give the reader a gentle introduction to graphical modeling using R and the main features of some of these packages. In addition, the book provides examples of how more advanced aspects of graphical modeling can be represented and handled within R. Topics covered in the seven chapters include graphical models for contingency tables, Gaussian and mixed graphical models, Bayesian networks and modeling high dimensional data.
  • 多元统计分析及R语言建模

    作者:王斌会

    《多元统计分析及R语言建模(第2版)》共分14章,主要内容有:多元数据的收集和整理、多元数据的直观显示、线性与非线性模型及广义线性模型、判别分析、聚类分析、主成分分析、因子分析、对应分析、典型相关分析等常见的主流方法。《多元统计分析及R语言建模(第2版)》还参考国内外大量文献,系统地介绍了这些年在经济管理等领域应用颇广的一些较新方法,可作为统计学专业本科生和研究生的多元分析课程教材。
  • Introducing Monte Carlo Methods with R (Use R)

    作者:Christian Robert,Geo

    Computational techniques based on simulation have now become an essential part of the statistician's toolbox. It is thus crucial to provide statisticians with a practical understanding of those methods, and there is no better way to develop intuition and skills for simulation than to use simulation to solve statistical problems. Introducing Monte Carlo Methods with R covers the main tools used in statistical simulation from a programmer's point of view, explaining the R implementation of each simulation technique and providing the output for better understanding and comparison. While this book constitutes a comprehensive treatment of simulation methods, the theoretical justification of those methods has been considerably reduced, compared with Robert and Casella (2004). Similarly, the more exploratory and less stable solutions are not covered here. This book does not require a preliminary exposure to the R programming language or to Monte Carlo methods, nor an advanced mathematical background. While many examples are set within a Bayesian framework, advanced expertise in Bayesian statistics is not required. The book covers basic random generation algorithms, Monte Carlo techniques for integration and optimization, convergence diagnoses, Markov chain Monte Carlo methods, including Metropolis {Hastings and Gibbs algorithms, and adaptive algorithms. All chapters include exercises and all R programs are available as an R package called mcsm. The book appeals to anyone with a practical interest in simulation methods but no previous exposure. It is meant to be useful for students and practitioners in areas such as statistics, signal processing, communications engineering, control theory, econometrics, finance and more. The programming parts are introduced progressively to be accessible to any reader.
  • Introductory Statistics with R

    作者:Peter Dalgaard

    This book provides an elementary-level introduction to R, targeting both non-statistician scientists in various fields and students of statistics. The main mode of presentation is via code examples with liberal commenting of the code and the output, from the computational as well as the statistical viewpoint. Brief sections introduce the statistical methods before they are used. A supplementary R package can be downloaded and contains the data sets. All examples are directly runnable and all graphics in the text are generated from the examples. The statistical methodology covered includes statistical standard distributions, one- and two-sample tests with continuous data, regression analysis, one-and two-way analysis of variance, regression analysis, analysis of tabular data, and sample size calculations. In addition, the last four chapters contain introductions to multiple linear regression analysis, linear models in general, logistic regression, and survival analysis.
  • 多元统计分析及R语言建模

    作者:王斌会

    《暨南大学研究生教材•多元统计分析及R语言建模》共分15章,主要内容有:多元数据的收集和整理、多元数据的直观显示、线性与非线性模型及广义线性模型、判别分析、聚类分析、主成分分析、因子分析、对应分析、典型相关分析等常见的主流方法。《暨南大学研究生教材•多元统计分析及R语言建模》还参考国内外大量文献,系统地介绍了这些年在经济管理等领域应用颇广的一些较新方法,可作为统计学专业本科生和研究生的多元分析课程教材。《暨南大学研究生教材•多元统计分析及R语言建模》还可作为非统计学专业研究生的量化分析教材。
  • Data Analysis Using Regression and Multilevel/Hierarchical Models

    作者:Andrew Gelman,Jennif

    Data Analysis Using Regression and Multilevel/Hierarchical Models is a comprehensive manual for the applied researcher who wants to perform data analysis using linear and nonlinear regression and multilevel models. The book introduces a wide variety of models, whilst at the same time instructing the reader in how to fit these models using available software packages. The book illustrates the concepts by working through scores of real data examples that have arisen from the authors' own applied research, with programming codes provided for each one. Topics covered include causal inference, including regression, poststratification, matching, regression discontinuity, and instrumental variables, as well as multilevel logistic regression and missing-data imputation. Practical tips regarding building, fitting, and understanding are provided throughout. Author resource page: http://www.stat.columbia.edu/~gelman/arm/