作者:谢佳标
从事数据挖掘建模工作已有9年, 曾经从事过咨询、电商、电购、电力、游戏等行业,了解不同领域的数据特点。 有丰富的利用R语言进行数据挖掘实战经验。 合著《R语言与数据挖掘》及《数据实践之美》等书籍
R语言是一款非常优秀的数据挖掘工具,拥有顶尖的数据处理、数据挖掘课数据可视化。是数据从业者必备的一把利器。但是其基于内存的诟病也一直被人所嫌弃,虽然这几年很多优秀的扩展包极大提升了R语言的性能,但是在面对企业级大数据挖掘面前,也会显得力不从心。
现在我们也不用担心R语言这个问题了,自从微软收购了商业版R以后,就进行了很多的整合和优化,之前只面向高校学生免费试用,现在,我们企业界的数据从业者也可以免费下载Microsoft R Server ,利用MRS处理大数据,MRS对开源R100%兼容,能充分利用CRAN 现有的10000+扩展包,实现不同数据挖掘需求。
关于Microsoft R Server的安装,陈堰平老师已经写了一篇非常详细的文档,感兴趣者可以点击以下链接,查看文档进行安装。
注意:请确保最后你都进入页面https://my.visualstudio.com,选择最新的Microsoft R Server 9.10版本下载。贴心的微软为我们提供了不同系统的安装版本,如下图所示:
我们按照堰平老师的安装步骤安装成功后,会在你的计算机出现:
说明MRS已经在你计算机安装成功。我们点击RGui,可以出现类似R相似界面:
我们也可以利用RStudio来调用MRS。操作如下:
选择MRS即可。
好了,既然我们安装好了MRS,那么接下来就用一个简单的案例来帮助大家快速上手。
如果数据集不大,可以直接导入到R或MRS中再进行处理(MRS速度优于R)。我们导入ccFraud.csv数据集,有一千万条记录。对比MRS和R导入数据的时间如下:
可见,利用MRS导数据的时间花了31秒,利用R导数据的时间花了1分钟,速度差不多提高了一倍。
如果是大数据集,MRS也提供了将数据集先保存为.xdf格式(在硬盘中),该数据对象可供大多数RevoScaleR包中的函数使用(数据处理、数据转换、数据建模等)。我们可以利用rxImport函数实现,将其outFile参数设置为你要保存的文件名即可。
> # 导入csv数据集> readpath <- "D:/MRS/Data"> infile <- file.path(readpath,"ccFraud.csv")> ccFraud_xdf <- rxImport(inData = infile,+outFile = "ccFraud.xdf",+overwrite = TRUE)Rows Read: 500000, Total Rows Processed: 500000, Total Chunk Time: 1.393 secondsRows Read: 500000, Total Rows Processed: 1000000, Total Chunk Time: 1.440 secondsRows Read: 500000, Total Rows Processed: 1500000, Total Chunk Time: 1.462 secondsRows Read: 500000, Total Rows Processed: 2000000, Total Chunk Time: 1.462 secondsRows Read: 500000, Total Rows Processed: 2500000, Total Chunk Time: 1.475 secondsRows Read: 500000, Total Rows Processed: 3000000, Total Chunk Time: 1.408 secondsRows Read: 500000, Total Rows Processed: 3500000, Total Chunk Time: 1.454 secondsRows Read: 500000, Total Rows Processed: 4000000, Total Chunk Time: 1.381 secondsRows Read: 500000, Total Rows Processed: 4500000, Total Chunk Time: 1.417 secondsRows Read: 500000, Total Rows Processed: 5000000, Total Chunk Time: 1.429 secondsRows Read: 500000, Total Rows Processed: 5500000, Total Chunk Time: 1.440 secondsRows Read: 500000, Total Rows Processed: 6000000, Total Chunk Time: 1.425 secondsRows Read: 500000, Total Rows Processed: 6500000, Total Chunk Time: 1.452 secondsRows Read: 500000, Total Rows Processed: 7000000, Total Chunk Time: 1.456 secondsRows Read: 500000, Total Rows Processed: 7500000, Total Chunk Time: 1.406 secondsRows Read: 500000, Total Rows Processed: 8000000, Total Chunk Time: 1.379 secondsRows Read: 500000, Total Rows Processed: 8500000, Total Chunk Time: 1.434 secondsRows Read: 500000, Total Rows Processed: 9000000, Total Chunk Time: 1.409 secondsRows Read: 500000, Total Rows Processed: 9500000, Total Chunk Time: 1.422 secondsRows Read: 500000, Total Rows Processed: 10000000, Total Chunk Time: 1.384 seconds >结束后,我们利用rxGetInfo函数查看.xdf文件的数据结构(将参数getVarInfo设置为TRUE),并查看数据的前十行(numRows = 10)。
> rxGetInfo("ccFraud.xdf",getVarInfo = TRUE,numRows = 10)File name: C:\Program Files\Microsoft\R Server\R_SERVER\library\RevoScaleR\rxLibs\x64\ccFraud.xdf Number of observations: 1e+07 Number of variables: 9 Number of blocks: 20 Compression type: zlib Variable information: Var 1: custID, Type: integer, Low/High: (1, 1e+07)Var 2: gender, Type: integer, Low/High: (1, 2)Var 3: state, Type: integer, Low/High: (1, 51)Var 4: cardholder, Type: integer, Low/High: (1, 2)Var 5: balance, Type: integer, Low/High: (0, 41485)Var 6: numTrans, Type: integer, Low/High: (0, 100)Var 7: numIntlTrans, Type: integer, Low/High: (0, 60)Var 8: creditLine, Type: integer, Low/High: (1, 75)Var 9: fraudRisk, Type: integer, Low/High: (0, 1)Data (10 rows starting with row 1):custID gender state cardholder balance numTrans numIntlTrans creditLine fraudRisk1113513000414202222109018033221027916044115101205055146101116706624425546210130771312000410108811016016203609923212428410220101012310185650>我们也可以在保存.xdf文件时,利用stringsAsFactors,colClasses,和colInfo等参数改变变量的数据类型。比如我们利用colInfo将变量gender从数值型变为因子型,且因子水平为“F”、“M”,利用colClasses将变量fraudRisk从数值型变成因子型。
> # 改变变量的数据存储类型> ccFraud_xdf <- rxImport(inData = infile,+outFile = "ccFraud.xdf",+colClasses = c(fraudRisk = "factor"),+colInfo = list("gender" = list(type = "factor",+levels = c("1","2"),+newLevels = c("F","M"))),+overwrite = TRUE)Rows Read: 500000, Total Rows Processed: 500000, Total Chunk Time: 1.871 secondsRows Read: 500000, Total Rows Processed: 1000000, Total Chunk Time: 1.832 secondsRows Read: 500000, Total Rows Processed: 1500000, Total Chunk Time: 1.825 secondsRows Read: 500000, Total Rows Processed: 2000000, Total Chunk Time: 1.808 secondsRows Read: 500000, Total Rows Processed: 2500000, Total Chunk Time: 2.018 secondsRows Read: 500000, Total Rows Processed: 3000000, Total Chunk Time: 2.061 secondsRows Read: 500000, Total Rows Processed: 3500000, Total Chunk Time: 2.158 secondsRows Read: 500000, Total Rows Processed: 4000000, Total Chunk Time: 1.917 secondsRows Read: 500000, Total Rows Processed: 4500000, Total Chunk Time: 1.852 secondsRows Read: 500000, Total Rows Processed: 5000000, Total Chunk Time: 1.795 secondsRows Read: 500000, Total Rows Processed: 5500000, Total Chunk Time: 1.829 secondsRows Read: 500000, Total Rows Processed: 6000000, Total Chunk Time: 1.793 secondsRows Read: 500000, Total Rows Processed: 6500000, Total Chunk Time: 1.849 secondsRows Read: 500000, Total Rows Processed: 7000000, Total Chunk Time: 1.806 secondsRows Read: 500000, Total Rows Processed: 7500000, Total Chunk Time: 1.773 secondsRows Read: 500000, Total Rows Processed: 8000000, Total Chunk Time: 1.813 secondsRows Read: 500000, Total Rows Processed: 8500000, Total Chunk Time: 1.812 secondsRows Read: 500000, Total Rows Processed: 9000000, Total Chunk Time: 1.850 secondsRows Read: 500000, Total Rows Processed: 9500000, Total Chunk Time: 1.824 secondsRows Read: 500000, Total Rows Processed: 10000000, Total Chunk Time: 1.828 seconds > > # 查看ccFraud_xdf的数据结构> rxGetInfo(ccFraud_xdf,getVarInfo = TRUE,numRows = 5)File name: C:\Program Files\Microsoft\R Server\R_SERVER\library\RevoScaleR\rxLibs\x64\ccFraud.xdf Number of observations: 1e+07 Number of variables: 9 Number of blocks: 20 Compression type: zlib Variable information: Var 1: custID, Type: integer, Low/High: (1, 1e+07)Var 2: gender2 factor levels: F MVar 3: state, Type: integer, Low/High: (1, 51)Var 4: cardholder, Type: integer, Low/High: (1, 2)Var 5: balance, Type: integer, Low/High: (0, 41485)Var 6: numTrans, Type: integer, Low/High: (0, 100)Var 7: numIntlTrans, Type: integer, Low/High: (0, 60)Var 8: creditLine, Type: integer, Low/High: (1, 75)Var 9: fraudRisk2 factor levels: 0 1Data (5 rows starting with row 1):custID gender state cardholder balance numTrans numIntlTrans creditLine fraudRisk11F35130004142022M2109018033M21027916044F15101205055F4610111670>从数据结构可知,变量的类型已经发生改变,且gender的因子水平从1、2变成F、M。
我们也可以对.xdf文件进行描述性统计分析,通过rxSummary函数实现。
> # 利用rxSummary函数对数据进行描述性统计分析> rxSummary(~.,ccFraud_xdf) # 对全部变量进行统计Rows Read: 500000, Total Rows Processed: 500000, Total Chunk Time: 0.069 secondsRows Read: 500000, Total Rows Processed: 1000000, Total Chunk Time: 0.072 secondsRows Read: 500000, Total Rows Processed: 1500000, Total Chunk Time: 0.078 secondsRows Read: 500000, Total Rows Processed: 2000000, Total Chunk Time: 0.079 secondsRows Read: 500000, Total Rows Processed: 2500000, Total Chunk Time: 0.080 secondsRows Read: 500000, Total Rows Processed: 3000000, Total Chunk Time: 0.081 secondsRows Read: 500000, Total Rows Processed: 3500000, Total Chunk Time: 0.081 secondsRows Read: 500000, Total Rows Processed: 4000000, Total Chunk Time: 0.080 secondsRows Read: 500000, Total Rows Processed: 4500000, Total Chunk Time: 0.077 secondsRows Read: 500000, Total Rows Processed: 5000000, Total Chunk Time: 0.080 secondsRows Read: 500000, Total Rows Processed: 5500000, Total Chunk Time: 0.080 secondsRows Read: 500000, Total Rows Processed: 6000000, Total Chunk Time: 0.085 secondsRows Read: 500000, Total Rows Processed: 6500000, Total Chunk Time: 0.080 secondsRows Read: 500000, Total Rows Processed: 7000000, Total Chunk Time: 0.078 secondsRows Read: 500000, Total Rows Processed: 7500000, Total Chunk Time: 0.082 secondsRows Read: 500000, Total Rows Processed: 8000000, Total Chunk Time: 0.084 secondsRows Read: 500000, Total Rows Processed: 8500000, Total Chunk Time: 0.082 secondsRows Read: 500000, Total Rows Processed: 9000000, Total Chunk Time: 0.079 secondsRows Read: 500000, Total Rows Processed: 9500000, Total Chunk Time: 0.086 secondsRows Read: 500000, Total Rows Processed: 10000000, Total Chunk Time: 0.078 seconds Computation time: 1.661 seconds.Call:rxSummary(formula = ~., data = ccFraud_xdf)Summary Statistics Results for: ~.Data: ccFraud_xdf (RxXdfData Data Source)File name: ccFraud.xdfNumber of valid observations: 1e+07NameMeanStdDevMin MaxValidObs MissingObscustID5.000001e+06 2.886751e+06 110000000 1e+070state2.466127e+01 1.497012e+01 151 1e+070cardholder1.030004e+00 1.705991e-01 12 1e+070balance4.109920e+03 3.996847e+03 041485 1e+070numTrans2.893519e+01 2.655378e+01 0100 1e+070numIntlTrans 4.047190e+00 8.602970e+00 060 1e+070creditLine9.134469e+00 9.641974e+00 175 1e+070Category Counts for genderNumber of categories: 2Number of valid observations: 1e+07Number of missing observations: 0gender CountsF6178231M3821769Category Counts for fraudRiskNumber of categories: 2Number of valid observations: 1e+07Number of missing observations: 0fraudRisk Counts094039861596014>跟普通summary函数相似,对数值型变量返回平均值、标准差、最小值、最大值、样本个数和缺失值个数,对因子型变量则返回频数。
除了这些简单的处理外,ScaleR也包含了丰富的数据处理和算法,具体如下所示:
最后,让我们利用rxLogit函数构建Logistic回归模型(R中的glm函数也适用),并利用summary函数查看模型信息。
> # logitic回归模型> ccFraudglm <- rxLogit(fraudRisk ~ gender + cardholder + balance + numTrans++ numIntlTrans + creditLine,data = ccFraud_xdf)Rows Read: 500000, Total Rows Processed: 500000, Total Chunk Time: 0.080 secondsRows Read: 500000, Total Rows Processed: 1000000, Total Chunk Time: 0.081 secondsRows Read: 500000, Total Rows Processed: 1500000, Total Chunk Time: 0.080 secondsRows Read: 500000, Total Rows Processed: 2000000, Total Chunk Time: 0.071 secondsRows Read: 500000, Total Rows Processed: 2500000, Total Chunk Time: 0.071 secondsRows Read: 500000, Total Rows Processed: 3000000, Total Chunk Time: 0.068 secondsRows Read: 500000, Total Rows Processed: 3500000, Total Chunk Time: 0.075 secondsRows Read: 500000, Total Rows Processed: 4000000, Total Chunk Time: 0.070 secondsRows Read: 500000, Total Rows Processed: 4500000, Total Chunk Time: 0.075 secondsRows Read: 500000, Total Rows Processed: 5000000, Total Chunk Time: 0.069 secondsRows Read: 500000, Total Rows Processed: 5500000, Total Chunk Time: 0.073 secondsRows Read: 500000, Total Rows Processed: 6000000, Total Chunk Time: 0.072 secondsRows Read: 500000, Total Rows Processed: 6500000, Total Chunk Time: 0.067 secondsRows Read: 500000, Total Rows Processed: 7000000, Total Chunk Time: 0.077 secondsRows Read: 500000, Total Rows Processed: 7500000, Total Chunk Time: 0.074 secondsRows Read: 500000, Total Rows Processed: 8000000, Total Chunk Time: 0.073 secondsRows Read: 500000, Total Rows Processed: 8500000, Total Chunk Time: 0.070 secondsRows Read: 500000, Total Rows Processed: 9000000, Total Chunk Time: 0.077 secondsRows Read: 500000, Total Rows Processed: 9500000, Total Chunk Time: 0.069 secondsRows Read: 500000, Total Rows Processed: 10000000, Total Chunk Time: 0.071 seconds Starting values (iteration 1) time: 1.558 secs.Rows Read: 500000, Total Rows Processed: 500000, Total Chunk Time: 0.061 secondsRows Read: 500000, Total Rows Processed: 1000000, Total Chunk Time: 0.182 secondsRows Read: 500000, Total Rows Processed: 1500000, Total Chunk Time: 0.193 secondsRows Read: 500000, Total Rows Processed: 2000000, Total Chunk Time: 0.184 secondsRows Read: 500000, Total Rows Processed: 2500000, Total Chunk Time: 0.190 secondsRows Read: 500000, Total Rows Processed: 3000000, Total Chunk Time: 0.193 secondsRows Read: 500000, Total Rows Processed: 3500000, Total Chunk Time: 0.193 secondsRows Read: 500000, Total Rows Processed: 4000000, Total Chunk Time: 0.192 secondsRows Read: 500000, Total Rows Processed: 4500000, Total Chunk Time: 0.188 secondsRows Read: 500000, Total Rows Processed: 5000000, Total Chunk Time: 0.197 secondsRows Read: 500000, Total Rows Processed: 5500000, Total Chunk Time: 0.188 secondsRows Read: 500000, Total Rows Processed: 6000000, Total Chunk Time: 0.195 secondsRows Read: 500000, Total Rows Processed: 6500000, Total Chunk Time: 0.195 secondsRows Read: 500000, Total Rows Processed: 7000000, Total Chunk Time: 0.200 secondsRows Read: 500000, Total Rows Processed: 7500000, Total Chunk Time: 0.188 secondsRows Read: 500000, Total Rows Processed: 8000000, Total Chunk Time: 0.193 secondsRows Read: 500000, Total Rows Processed: 8500000, Total Chunk Time: 0.185 secondsRows Read: 500000, Total Rows Processed: 9000000, Total Chunk Time: 0.186 secondsRows Read: 500000, Total Rows Processed: 9500000, Total Chunk Time: 0.196 secondsRows Read: 500000, Total Rows Processed: 10000000, Total Chunk Time: 0.192 seconds Iteration 2 time: 3.866 secs.Rows Read: 500000, Total Rows Processed: 500000, Total Chunk Time: 0.070 secondsRows Read: 500000, Total Rows Processed: 1000000, Total Chunk Time: 0.189 secondsRows Read: 500000, Total Rows Processed: 1500000, Total Chunk Time: 0.195 secondsRows Read: 500000, Total Rows Processed: 2000000, Total Chunk Time: 0.205 secondsRows Read: 500000, Total Rows Processed: 2500000, Total Chunk Time: 0.194 secondsRows Read: 500000, Total Rows Processed: 3000000, Total Chunk Time: 0.188 secondsRows Read: 500000, Total Rows Processed: 3500000, Total Chunk Time: 0.199 secondsRows Read: 500000, Total Rows Processed: 4000000, Total Chunk Time: 0.194 secondsRows Read: 500000, Total Rows Processed: 4500000, Total Chunk Time: 0.199 secondsRows Read: 500000, Total Rows Processed: 5000000, Total Chunk Time: 0.194 secondsRows Read: 500000, Total Rows Processed: 5500000, Total Chunk Time: 0.187 secondsRows Read: 500000, Total Rows Processed: 6000000, Total Chunk Time: 0.191 secondsRows Read: 500000, Total Rows Processed: 6500000, Total Chunk Time: 0.190 secondsRows Read: 500000, Total Rows Processed: 7000000, Total Chunk Time: 0.195 secondsRows Read: 500000, Total Rows Processed: 7500000, Total Chunk Time: 0.184 secondsRows Read: 500000, Total Rows Processed: 8000000, Total Chunk Time: 0.201 secondsRows Read: 500000, Total Rows Processed: 8500000, Total Chunk Time: 0.188 secondsRows Read: 500000, Total Rows Processed: 9000000, Total Chunk Time: 0.208 secondsRows Read: 500000, Total Rows Processed: 9500000, Total Chunk Time: 0.216 secondsRows Read: 500000, Total Rows Processed: 10000000, Total Chunk Time: 0.191 seconds Iteration 3 time: 3.950 secs.Rows Read: 500000, Total Rows Processed: 500000, Total Chunk Time: 0.068 secondsRows Read: 500000, Total Rows Processed: 1000000, Total Chunk Time: 0.197 secondsRows Read: 500000, Total Rows Processed: 1500000, Total Chunk Time: 0.191 secondsRows Read: 500000, Total Rows Processed: 2000000, Total Chunk Time: 0.191 secondsRows Read: 500000, Total Rows Processed: 2500000, Total Chunk Time: 0.187 secondsRows Read: 500000, Total Rows Processed: 3000000, Total Chunk Time: 0.190 secondsRows Read: 500000, Total Rows Processed: 3500000, Total Chunk Time: 0.192 secondsRows Read: 500000, Total Rows Processed: 4000000, Total Chunk Time: 0.201 secondsRows Read: 500000, Total Rows Processed: 4500000, Total Chunk Time: 0.193 secondsRows Read: 500000, Total Rows Processed: 5000000, Total Chunk Time: 0.190 secondsRows Read: 500000, Total Rows Processed: 5500000, Total Chunk Time: 0.193 secondsRows Read: 500000, Total Rows Processed: 6000000, Total Chunk Time: 0.211 secondsRows Read: 500000, Total Rows Processed: 6500000, Total Chunk Time: 0.191 secondsRows Read: 500000, Total Rows Processed: 7000000, Total Chunk Time: 0.189 secondsRows Read: 500000, Total Rows Processed: 7500000, Total Chunk Time: 0.199 secondsRows Read: 500000, Total Rows Processed: 8000000, Total Chunk Time: 0.198 secondsRows Read: 500000, Total Rows Processed: 8500000, Total Chunk Time: 0.190 secondsRows Read: 500000, Total Rows Processed: 9000000, Total Chunk Time: 0.188 secondsRows Read: 500000, Total Rows Processed: 9500000, Total Chunk Time: 0.192 secondsRows Read: 500000, Total Rows Processed: 10000000, Total Chunk Time: 0.181 seconds Iteration 4 time: 3.914 secs.Rows Read: 500000, Total Rows Processed: 500000, Total Chunk Time: 0.061 secondsRows Read: 500000, Total Rows Processed: 1000000, Total Chunk Time: 0.191 secondsRows Read: 500000, Total Rows Processed: 1500000, Total Chunk Time: 0.197 secondsRows Read: 500000, Total Rows Processed: 2000000, Total Chunk Time: 0.201 secondsRows Read: 500000, Total Rows Processed: 2500000, Total Chunk Time: 0.199 secondsRows Read: 500000, Total Rows Processed: 3000000, Total Chunk Time: 0.189 secondsRows Read: 500000, Total Rows Processed: 3500000, Total Chunk Time: 0.204 secondsRows Read: 500000, Total Rows Processed: 4000000, Total Chunk Time: 0.192 secondsRows Read: 500000, Total Rows Processed: 4500000, Total Chunk Time: 0.196 secondsRows Read: 500000, Total Rows Processed: 5000000, Total Chunk Time: 0.202 secondsRows Read: 500000, Total Rows Processed: 5500000, Total Chunk Time: 0.194 secondsRows Read: 500000, Total Rows Processed: 6000000, Total Chunk Time: 0.185 secondsRows Read: 500000, Total Rows Processed: 6500000, Total Chunk Time: 0.188 secondsRows Read: 500000, Total Rows Processed: 7000000, Total Chunk Time: 0.195 secondsRows Read: 500000, Total Rows Processed: 7500000, Total Chunk Time: 0.207 secondsRows Read: 500000, Total Rows Processed: 8000000, Total Chunk Time: 0.199 secondsRows Read: 500000, Total Rows Processed: 8500000, Total Chunk Time: 0.192 secondsRows Read: 500000, Total Rows Processed: 9000000, Total Chunk Time: 0.186 secondsRows Read: 500000, Total Rows Processed: 9500000, Total Chunk Time: 0.202 secondsRows Read: 500000, Total Rows Processed: 10000000, Total Chunk Time: 0.209 seconds Iteration 5 time: 3.963 secs.Rows Read: 500000, Total Rows Processed: 500000, Total Chunk Time: 0.066 secondsRows Read: 500000, Total Rows Processed: 1000000, Total Chunk Time: 0.191 secondsRows Read: 500000, Total Rows Processed: 1500000, Total Chunk Time: 0.198 secondsRows Read: 500000, Total Rows Processed: 2000000, Total Chunk Time: 0.187 secondsRows Read: 500000, Total Rows Processed: 2500000, Total Chunk Time: 0.191 secondsRows Read: 500000, Total Rows Processed: 3000000, Total Chunk Time: 0.200 secondsRows Read: 500000, Total Rows Processed: 3500000, Total Chunk Time: 0.189 secondsRows Read: 500000, Total Rows Processed: 4000000, Total Chunk Time: 0.197 secondsRows Read: 500000, Total Rows Processed: 4500000, Total Chunk Time: 0.187 secondsRows Read: 500000, Total Rows Processed: 5000000, Total Chunk Time: 0.197 secondsRows Read: 500000, Total Rows Processed: 5500000, Total Chunk Time: 0.187 secondsRows Read: 500000, Total Rows Processed: 6000000, Total Chunk Time: 0.189 secondsRows Read: 500000, Total Rows Processed: 6500000, Total Chunk Time: 0.188 secondsRows Read: 500000, Total Rows Processed: 7000000, Total Chunk Time: 0.190 secondsRows Read: 500000, Total Rows Processed: 7500000, Total Chunk Time: 0.208 secondsRows Read: 500000, Total Rows Processed: 8000000, Total Chunk Time: 0.204 secondsRows Read: 500000, Total Rows Processed: 8500000, Total Chunk Time: 0.184 secondsRows Read: 500000, Total Rows Processed: 9000000, Total Chunk Time: 0.193 secondsRows Read: 500000, Total Rows Processed: 9500000, Total Chunk Time: 0.186 secondsRows Read: 500000, Total Rows Processed: 10000000, Total Chunk Time: 0.197 seconds Iteration 6 time: 3.903 secs.Rows Read: 500000, Total Rows Processed: 500000, Total Chunk Time: 0.061 secondsRows Read: 500000, Total Rows Processed: 1000000, Total Chunk Time: 0.192 secondsRows Read: 500000, Total Rows Processed: 1500000, Total Chunk Time: 0.198 secondsRows Read: 500000, Total Rows Processed: 2000000, Total Chunk Time: 0.198 secondsRows Read: 500000, Total Rows Processed: 2500000, Total Chunk Time: 0.204 secondsRows Read: 500000, Total Rows Processed: 3000000, Total Chunk Time: 0.193 secondsRows Read: 500000, Total Rows Processed: 3500000, Total Chunk Time: 0.196 secondsRows Read: 500000, Total Rows Processed: 4000000, Total Chunk Time: 0.201 secondsRows Read: 500000, Total Rows Processed: 4500000, Total Chunk Time: 0.197 secondsRows Read: 500000, Total Rows Processed: 5000000, Total Chunk Time: 0.195 secondsRows Read: 500000, Total Rows Processed: 5500000, Total Chunk Time: 0.181 secondsRows Read: 500000, Total Rows Processed: 6000000, Total Chunk Time: 0.193 secondsRows Read: 500000, Total Rows Processed: 6500000, Total Chunk Time: 0.198 secondsRows Read: 500000, Total Rows Processed: 7000000, Total Chunk Time: 0.194 secondsRows Read: 500000, Total Rows Processed: 7500000, Total Chunk Time: 0.199 secondsRows Read: 500000, Total Rows Processed: 8000000, Total Chunk Time: 0.191 secondsRows Read: 500000, Total Rows Processed: 8500000, Total Chunk Time: 0.192 secondsRows Read: 500000, Total Rows Processed: 9000000, Total Chunk Time: 0.201 secondsRows Read: 500000, Total Rows Processed: 9500000, Total Chunk Time: 0.198 secondsRows Read: 500000, Total Rows Processed: 10000000, Total Chunk Time: 0.201 seconds Iteration 7 time: 3.956 secs.Rows Read: 500000, Total Rows Processed: 500000, Total Chunk Time: 0.060 secondsRows Read: 500000, Total Rows Processed: 1000000, Total Chunk Time: 0.189 secondsRows Read: 500000, Total Rows Processed: 1500000, Total Chunk Time: 0.199 secondsRows Read: 500000, Total Rows Processed: 2000000, Total Chunk Time: 0.215 secondsRows Read: 500000, Total Rows Processed: 2500000, Total Chunk Time: 0.231 secondsRows Read: 500000, Total Rows Processed: 3000000, Total Chunk Time: 0.220 secondsRows Read: 500000, Total Rows Processed: 3500000, Total Chunk Time: 0.202 secondsRows Read: 500000, Total Rows Processed: 4000000, Total Chunk Time: 0.195 secondsRows Read: 500000, Total Rows Processed: 4500000, Total Chunk Time: 0.213 secondsRows Read: 500000, Total Rows Processed: 5000000, Total Chunk Time: 0.211 secondsRows Read: 500000, Total Rows Processed: 5500000, Total Chunk Time: 0.191 secondsRows Read: 500000, Total Rows Processed: 6000000, Total Chunk Time: 0.188 secondsRows Read: 500000, Total Rows Processed: 6500000, Total Chunk Time: 0.209 secondsRows Read: 500000, Total Rows Processed: 7000000, Total Chunk Time: 0.190 secondsRows Read: 500000, Total Rows Processed: 7500000, Total Chunk Time: 0.195 secondsRows Read: 500000, Total Rows Processed: 8000000, Total Chunk Time: 0.202 secondsRows Read: 500000, Total Rows Processed: 8500000, Total Chunk Time: 0.222 secondsRows Read: 500000, Total Rows Processed: 9000000, Total Chunk Time: 0.224 secondsRows Read: 500000, Total Rows Processed: 9500000, Total Chunk Time: 0.213 secondsRows Read: 500000, Total Rows Processed: 10000000, Total Chunk Time: 0.223 seconds Iteration 8 time: 4.194 secs.Elapsed computation time: 29.316 secs.
在数据的最后一列增加预测结果。

好了,今晚就先分享到这里,目的是让大家了解Microsoft R Server的一些基本用法。如果大家面临企业大数据难以分析建模的困境,可以下载安装MRS来尝试解决你们现实的业务问题。除了以上简单函数意外,微软也专门开发了一个MicrosoftML包,其提供了新的机器学习功能,具有更高的速度,性能和可扩展性,特别是处理大量的文本数据或高维分类数据。



彩蛋要不要

R语言资深实战讲师谢佳标老师,继R语言十三式之后,利剑再出!
十五大案例,面向实战,案例满满,学完即用!
心血课程、口碑讲师,扬帆出发,理论结合应用,握住开启R语言实战之门金钥匙!
课程更有天善学院促销活动等你来,炎炎六月,热力绝配!
点击阅读原文或扫码立即学习








共有条评论 网友评论