我有一个包含两列的数据框,我想删除所有行,其中每一行中的值之一小于0或大于指定的数字(出于参数目的,我们将此称为2000)。
这是数据框
structure(list(xx = c(134.697838289433, 222.004361198059, 131.230956160172,
206.658871436917, 111.25078650042, 241.965831417648, 171.46912254679,
116.860666678254, 196.894985820028, 135.309699618638, 133.082437475133,
185.509376072318, 718.998297748551, 745.902984215293, 752.655615982603,
633.199684348903, 764.983924278636, 694.856525559398, 773.56532078895,
757.32358575657, 709.924023536199, 658.863564702233, 733.076690816291,
745.9306541374, 788.134444412421, 759.445624288787, 796.989170170713,
632.952543475636, 746.103571612919, 715.296116988119, 766.899107551248,
628.268453830605, 658.574104878488, 689.916530654021, 820.841422812349,
709.097957368612, 793.109262845978, 716.713801941779, 726.83260343463,
746.547080776193, 759.644057119419, 757.41275593749, 723.539527360327,
839.816318612061, 795.655016954661, 766.245386324182, 756.300015395758,
808.255074043333, 745.915083305187, 685.465492956583, 694.567959198318,
786.919467838804, 699.521900871042, 749.041223560884, 700.079697765533,
753.805501259023, 745.080253997501, 846.982894686656, 775.66384433188,
809.39649823454, 841.009469183585, 790.987061753069, 792.441925234251,
1377.97739642236, 1353.19738061511, 1259.94435540633, 1276.25060187203,
1331.26106031956, 1227.68481147557, 1345.95561236514, 1309.51489973952,
1285.62680259649, 1329.46388049714, 1256.00394500077, 1294.0505313591,
1349.09440181876, 1294.72661682462, 1339.38577920408, 1277.114896541,
1267.54884404031, 1291.32793111573, 1254.85565551553, 1298.78499697743,
1283.89664572036, 1273.92831816666, 1310.221891323, 1327.89682404014,
1310.81394400863, 595.342571560588, 689.892254230306, 562.390766853428,
736.319251501976, 609.577261412134, 641.591997384705, 682.957658696869,
580.320759093636, 560.64984978551, 643.487033739876, 688.457314818318,
631.156743281308, 659.535909106305), yy = c(1169.70954243065,
1259.830208937, 1172.21661417439, 1097.62724268622, 1198.15024522658,
1231.90665701131, 1211.36196331211, 1152.4207367321, 1287.57553021171,
1120.61366993258, 1234.70366243878, 1258.47454705197, 893.983957068268,
994.99854601335, 916.330965835536, 947.536265806389, 950.345051732045,
934.313361799171, 1018.76942964176, 918.182358835366, 1005.51128858608,
967.577307930044, 997.239384198691, 995.866808447868, 962.292293255127,
864.624084608006, 895.091604672023, 906.22162647536, 1024.45206885923,
908.693026118345, 923.625774785301, 931.801569764776, 1007.88553380827,
848.55309782664, 927.608364899483, 1024.60765786828, 1085.64295260059,
1057.90632135992, 1195.30607038065, 1151.39888340311, 1168.2831257626,
1137.15375447446, 1145.42393212912, 1108.89072769468, 1075.15451622384,
1129.91711324634, 1191.94330388541, 1132.41649984784, 1210.89342724886,
1100.60339252755, 1083.5987922884, 1056.69487941162, 1150.2707936581,
1055.75678264632, 1055.53323667429, 1049.79655119467, 1166.86598024805,
1141.82593378866, 1066.37755267981, 1160.55793904653, 1162.65728735716,
1060.29360609309, 1107.40480300404, 1825.01445883899, 1802.95011068891,
1692.84948509132, 1675.97166713074, 1758.10341887143, 1788.48414279738,
1680.15824054313, 1756.01930833023, 1706.98458587119, 1770.57687329296,
1692.21991398915, 1835.60585163662, 1790.6487914694, 1787.52076839767,
1704.25313427813, 1735.96312434652, 1813.02044772293, 1847.21159474717,
1725.63580525853, 1841.32016678, 1713.80845602987, 1770.39756152819,
1747.72988313376, 1778.13110060636, 1786.3871288087, 6.01666671271317,
19.2497357431764, 9.6964112500295, -3.23929433528044, 89.4863211231715,
86.0082947221296, 42.7982120490919, 2.19886414532234, 12.8780844043502,
30.694893442471, 7.58386594976601, 83.8385161493349, 36.4551491976192
)), row.names = 100:200, class = "data.frame")
首先,我创建一个函数来消除满足条件的点。
routliers<-function(x){
if(x>2000|x<0){
rm(x)
}
}
然后,我使用上面的函数在行中使用apply函数消除点(上面的dput()称为cds)。
cds<-data.frame(apply(cds,1,routliers))
但这消除了所有要点
length(cds)
[1]0
有趣的是,如果我用print()替换rm()函数,则在使用apply函数时确实打印出了所需的点,但是却收到错误消息“参数暗示不同的行数:0、2”。另外,我不确定在使用apply()函数时指定的函数是否适用于两列数据,因为在print()中没有看到仅满足第二列点条件的数据点。第一列是x坐标,第二列是y坐标。我认为错误“参数表示行数:0,2的差异不同”表明,仅针对该功能测试了行中的第一个值。
如果一个或多个数据点满足我的条件,如何编写消除行的代码?
当列是单独的向量(x <-x [!condition])时,这很容易做到,但是我无法轻松地将它们再次加在一起,因此我更喜欢在点的数据帧上这样做。
请检查此代码是否适合您,并与df
您共享数据:
#Code
new <- df[!rowSums(df < 0 | df>2000) > 0, ]
或这个:
#Code 2
new <- df[which(apply(df,1,function(x) sum(x<0 | x>2000))==0),]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句