如何根据提供的条件从数据框列表中检索特定值?

沙菲克·拉哈曼

我有一个数据框列表(下面的示例),其中数据是关于每个州的医院列表。

  • outcome_split 是一个列表,其中包含每个状态的数据框列表。
  • rank在 state 中添加了一个AL,该列对该特定州的所有医院进行了排名,同样(使用 for 循环)我将向列表中的所有数据框添加一个排名变量。
  • 我正在尝试创建一个函数,然后给出结果(心脏病发作、心力衰竭等)和排名(数字),该函数将返回与输入的数字(排名)匹配的医院和美国州的名称

如上所述,第二个元素具有等级变量,因此我尝试调用该元素并匹配指定的等级。我是初学者,我想我对“==”和“=”感到困惑。

 > outcome_split[[2]][, "hospital name"]["rank"==2]
    character(0)
    > outcome_split[[2]][, "hospital name"]["rank"=7]
    [1] "BIBB MEDICAL CENTER"

我想返回与指定等级匹配的医院名称,但我不确定如何执行此操作。如前所述,混淆了 '==' 和 '=' 因为 '==' 返回,character(0)而 '=' 在第二个元素中返回医院的名称,但此返回不是基于等级变量而是基于 ID 值7,提到的医院存在但不是排名第7。

> outcome_split[[2]][, c("hospital name","rank")]
                                       hospital name rank
1                        ANDALUSIA REGIONAL HOSPITAL   52
2                          ATHENS-LIMESTONE HOSPITAL    9
3                          ATMORE COMMUNITY HOSPITAL   53
4                        BAPTIST MEDICAL CENTER EAST    2
5                       BAPTIST MEDICAL CENTER SOUTH   46
6                   BAPTIST MEDICAL CENTER-PRINCETON    8
7                                BIBB MEDICAL CENTER   54
8                       BIRMINGHAM VA MEDICAL CENTER   26
9                           BROOKWOOD MEDICAL CENTER   30
10                    BRYAN W WHITFIELD MEM HOSP INC   55

样本数据:

outcome_split <- structure(list(AK = structure(list(`hospital name` = c("PROVIDENCE ALASKA MEDICAL CENTER", 
"MAT-SU REGIONAL MEDICAL CENTER", "BARTLETT REGIONAL HOSPITAL", 
"FAIRBANKS MEMORIAL HOSPITAL", "ALASKA REGIONAL HOSPITAL", "YUKON KUSKOKWIM DELTA REG HOSPITAL", 
"CENTRAL PENINSULA GENERAL HOSPITAL", "ALASKA NATIVE MEDICAL CENTER", 
"MT EDGECUMBE HOSPITAL", "PROVIDENCE VALDEZ MEDICAL CENTER", 
"PROVIDENCE SEWARD HOSPITAL", "SITKA COMMUNITY HOSPITAL", "PROVIDENCE KODIAK ISLAND MEDICAL CTR", 
"CORDOVA COMMUNITY MEDICAL CENTER", "NORTON SOUND REGIONAL HOSPITAL", 
"PEACEHEALTH KETCHIKAN MEDICAL             CENTER", "SOUTH PENINSULA HOSPITAL"
), state = c("AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", 
"AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK"), `heart attack` = c("13.4", 
"17.7", "Not Available", "15.5", "14.5", "Not Available", "Not Available", 
"15.7", "Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available"), `heart failure` = c("12.4", "11.4", "11.6", 
"15.6", "13.4", "11.2", "11.6", "11.6", "Not Available", "Not Available", 
"Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available", "11.4", "10.8"), pneumonia = c("10.5", "12.1", 
"11.6", "13.4", "12.5", "9.7", "13.8", "15.5", "14.2", "Not Available", 
"Not Available", "11.5", "12.0", "Not Available", "11.6", "11.3", 
"12.2")), .Names = c("hospital name", "state", "heart attack", 
"heart failure", "pneumonia"), row.names = 99:115, class = "data.frame"), 
    AL = structure(list(`hospital name` = c("ANDALUSIA REGIONAL HOSPITAL", 
    "ATHENS-LIMESTONE HOSPITAL", "ATMORE COMMUNITY HOSPITAL", 
    "BAPTIST MEDICAL CENTER EAST", "BAPTIST MEDICAL CENTER SOUTH", 
    "BAPTIST MEDICAL CENTER-PRINCETON", "BIBB MEDICAL CENTER", 
    "BIRMINGHAM VA MEDICAL CENTER", "BROOKWOOD MEDICAL CENTER", 
    "BRYAN W WHITFIELD MEM HOSP INC", "BULLOCK COUNTY HOSPITAL", 
    "CALLAHAN EYE FOUNDATION HOSPITAL", "CHEROKEE MEDICAL CENTER", 
    "CHILTON MEDICAL CENTER", "CITIZENS BAPTIST MEDICAL CENTER", 
    "CLAY COUNTY HOSPITAL", "COMMUNITY HOSPITAL INC", "COOPER GREEN MERCY HOSPITAL", 
    "COOSA VALLEY MEDICAL CENTER", "CRENSHAW COMMUNITY HOSPITAL", 
    "CRESTWOOD MEDICAL CENTER", "CULLMAN REGIONAL MEDICAL CENTER", 
    "D C H REGIONAL MEDICAL CENTER", "D W MCMILLAN MEMORIAL HOSPITAL", 
    "DALE MEDICAL CENTER", "DECATUR GENERAL HOSPITAL", "DEKALB REGIONAL MEDICAL CENTER", 
    "EAST ALABAMA MEDICAL CENTER AND SNF", "ELBA GENERAL HOSPITAL", 
    "ELIZA COFFEE MEMORIAL HOSPITAL", "ELMORE COMMUNITY HOSPITAL", 
    "EVERGREEN MEDICAL CENTER", "FAYETTE MEDICAL CENTER", "FLORALA MEMORIAL HOSPITAL", 
    "FLOWERS HOSPITAL", "GADSDEN REGIONAL MEDICAL CENTER", "GEORGE H. LANIER MEMORIAL HOSPITAL", 
    "GEORGIANA HOSPITAL", "GREENE COUNTY HOSPITAL", "GROVE HILL MEMORIAL HOSPITAL", 
    "HALE COUNTY HOSPITAL", "HELEN KELLER MEMORIAL HOSPITAL", 
    "HIGHLANDS MEDICAL CENTER", "HILL HOSPITAL OF SUMTER COUNTY", 
    "HUNTSVILLE HOSPITAL", "INFIRMARY WEST", "J PAUL JONES HOSPITAL", 
    "JACK HUGHSTON MEMORIAL HOSPITAL", "JACKSON HOSPITAL & CLINIC INC", 
    "JACKSON MEDICAL CENTER", "JACKSONVILLE MEDICAL CENTER", 
    "L V STABLER MEMORIAL HOSPITAL", "LAKE MARTIN COMMUNITY HOSPITAL", 
    "LAKELAND COMMUNITY HOSPITAL", "LAWRENCE MEDICAL CENTER", 
    "MARION REGIONAL MEDICAL CENTER", "MARSHALL MEDICAL CENTER NORTH", 
    "MARSHALL MEDICAL CENTER SOUTH", "MEDICAL CENTER BARBOUR", 
    "MEDICAL CENTER ENTERPRISE", "MEDICAL WEST, AN AFFILIATE OF UAB HEALTH SYSTEM", 
    "MIZELL MEMORIAL HOSPITAL", "MOBILE INFIRMARY", "MONROE COUNTY HOSPITAL", 
    "NORTH BALDWIN INFIRMARY", "NORTHEAST ALABAMA REGIONAL MED CENTER", 
    "NORTHWEST MEDICAL CENTER", "PARKWAY MEDICAL CENTER", "PICKENS COUNTY MEDICAL CENTER", 
    "PRATTVILLE BAPTIST HOSPITAL", "PROVIDENCE HOSPITAL", "RED BAY HOSPITAL", 
    "RIVERVIEW REGIONAL MEDICAL CENTER", "RUSSELL HOSPITAL", 
    "RUSSELLVILLE HOSPITAL", "SHELBY BAPTIST MEDICAL CENTER", 
    "SHOALS HOSPITAL", "SOUTH BALDWIN REGIONAL MEDICAL CENTER", 
    "SOUTHEAST ALABAMA MEDICAL CENTER", "SPRINGHILL MEDICAL CENTER", 
    "ST VINCENT'S BIRMINGHAM", "ST VINCENT'S EAST", "ST VINCENT'S ST CLAIR", 
    "ST VINCENTS BLOUNT", "STRINGFELLOW MEMORIAL HOSPITAL", "THOMAS HOSPITAL", 
    "TRINITY MEDICAL CENTER", "TROY REGIONAL MEDICAL CENTER", 
    "TUSCALOOSA VA MEDICAL CENTER", "UNIV OF S AL CHILDREN'S & WOMEN'S HOS", 
    "UNIV OF SOUTH ALABAMA MEDICAL CENTER", "UNIVERSITY OF ALABAMA HOSPITAL", 
    "VA CENTRAL ALABAMA HEALTHCARE SYSTEM - MONTGOMERY", "VAUGHAN REG MED CENTER PARKWAY CAMPUS", 
    "WALKER BAPTIST MEDICAL CENTER", "WASHINGTON COUNTY HOSPITAL", 
    "WEDOWEE HOSPITAL", "WIREGRASS MEDICAL CENTER"), state = c("AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL"), `heart attack` = c("Not Available", 
    "15.0", "Not Available", "14.2", "17.8", "14.9", "Not Available", 
    "16.1", "16.5", "Not Available", "Not Available", "Not Available", 
    "Not Available", "Not Available", "17.3", "16.7", "17.1", 
    "Not Available", "15.2", "Not Available", "13.3", "17.1", 
    "15.8", "15.7", "17.3", "16.8", "18.0", "16.3", "Not Available", 
    "18.1", "Not Available", "Not Available", "16.7", "Not Available", 
    "15.2", "16.7", "15.4", "14.5", "Not Available", "Not Available", 
    "Not Available", "19.6", "15.0", "Not Available", "15.2", 
    "Not Available", "Not Available", "Not Available", "17.5", 
    "Not Available", "Not Available", "Not Available", "Not Available", 
    "Not Available", "15.6", "Not Available", "Not Available", 
    "18.5", "Not Available", "16.6", "15.3", "Not Available", 
    "19.3", "Not Available", "Not Available", "15.6", "Not Available", 
    "15.8", "Not Available", "14.6", "15.2", "Not Available", 
    "16.9", "17.1", "Not Available", "15.9", "Not Available", 
    "15.8", "14.3", "16.0", "16.2", "17.7", "Not Available", 
    "Not Available", "16.4", "14.7", "16.8", "Not Available", 
    "Not Available", "Not Available", "Not Available", "15.0", 
    "Not Available", "14.7", "17.0", "Not Available", "Not Available", 
    "Not Available"), `heart failure` = c("10.1", "11.7", "10.8", 
    "9.6", "11.8", "11.4", "14.0", "10.4", "13.5", "11.7", "12.3", 
    "Not Available", "12.1", "11.5", "14.9", "12.6", "12.3", 
    "Not Available", "11.7", "13.8", "13.8", "12.1", "11.2", 
    "14.8", "11.8", "10.9", "16.6", "12.9", "Not Available", 
    "11.3", "11.3", "9.1", "11.7", "10.4", "12.0", "10.7", "8.8", 
    "10.8", "11.2", "10.4", "10.7", "12.6", "13.4", "Not Available", 
    "12.4", "12.5", "Not Available", "10.8", "10.2", "12.3", 
    "16.4", "11.1", "10.9", "13.6", "9.9", "11.5", "12.5", "15.2", 
    "13.5", "12.9", "11.4", "13.6", "10.7", "13.0", "11.5", "11.2", 
    "11.8", "10.5", "12.6", "14.8", "13.5", "12.6", "10.8", "11.6", 
    "14.8", "13.6", "13.6", "15.1", "11.4", "10.4", "10.6", "10.9", 
    "10.8", "13.0", "12.0", "12.8", "12.9", "11.2", "Not Available", 
    "Not Available", "12.5", "12.5", "12.2", "12.0", "10.8", 
    "Not Available", "10.4", "10.6"), pneumonia = c("11.1", "12.1", 
    "13.0", "10.2", "14.3", "11.6", "13.6", "11.0", "13.0", "9.1", 
    "12.1", "Not Available", "14.7", "11.2", "12.1", "11.8", 
    "11.6", "Not Available", "11.4", "15.8", "10.4", "12.1", 
    "11.3", "12.6", "9.9", "11.9", "15.8", "12.1", "12.0", "13.4", 
    "11.2", "12.0", "12.9", "12.1", "11.3", "14.6", "10.3", "11.3", 
    "11.5", "12.1", "11.5", "15.0", "12.9", "Not Available", 
    "14.1", "13.1", "11.4", "10.9", "14.7", "9.3", "19.2", "13.0", 
    "10.8", "10.7", "9.8", "10.0", "8.7", "13.9", "15.0", "12.9", 
    "12.1", "14.9", "12.5", "15.6", "14.6", "13.2", "13.1", "11.9", 
    "12.4", "14.2", "10.6", "11.6", "12.7", "14.9", "11.5", "10.7", 
    "12.8", "9.8", "10.9", "13.8", "12.6", "16.2", "11.4", "15.3", 
    "12.0", "13.1", "13.9", "11.1", "Not Available", "Not Available", 
    "Not Available", "12.7", "11.3", "14.0", "11.9", "Not Available", 
    "13.9", "12.3"), rank = c(52L, 9L, 53L, 2L, 46L, 8L, 54L, 
    26L, 30L, 55L, 56L, 57L, 58L, 59L, 42L, 32L, 39L, 60L, 12L, 
    61L, 1L, 40L, 21L, 20L, 43L, 35L, 47L, 28L, 62L, 48L, 63L, 
    64L, 33L, 65L, 13L, 34L, 17L, 4L, 66L, 67L, 68L, 51L, 10L, 
    69L, 14L, 70L, 71L, 72L, 44L, 73L, 74L, 75L, 76L, 77L, 18L, 
    78L, 79L, 49L, 80L, 31L, 16L, 81L, 50L, 82L, 83L, 19L, 84L, 
    22L, 85L, 5L, 15L, 86L, 37L, 41L, 87L, 24L, 88L, 23L, 3L, 
    25L, 27L, 45L, 89L, 90L, 29L, 6L, 36L, 91L, 92L, 93L, 94L, 
    11L, 95L, 7L, 38L, 96L, 97L, 98L)), class = "data.frame", .Names = c("hospital name", 
    "state", "heart attack", "heart failure", "pneumonia", "rank"
    ), row.names = c(NA, -98L))), .Names = c("AK", "AL"))
smci

你的rank专栏没有按顺序排列,请看下面我按等级排列的地方。

选择是带有 dplyr(或带有 data.table)的单行:

require(dplyr)

output_split[[2]] %>% filter(rank == 2) %>% select('hospital name')

                hospital name
1 BAPTIST MEDICAL CENTER EAST

output_split[[2]] %>% filter(rank == '7') %>% select('hospital name')
                      hospital name
1 VAUGHAN REG MED CENTER PARKWAY CAMPUS

# Here's the hospital order when we arrange by 'rank':
output_split[[2]] %>% arrange(rank) %>% select('hospital name', 'rank') %>% head(7)
                          hospital name rank
1              CRESTWOOD MEDICAL CENTER    1
2           BAPTIST MEDICAL CENTER EAST    2
3      SOUTHEAST ALABAMA MEDICAL CENTER    3
4                    GEORGIANA HOSPITAL    4
5           PRATTVILLE BAPTIST HOSPITAL    5
6                       THOMAS HOSPITAL    6
7 VAUGHAN REG MED CENTER PARKWAY CAMPUS    7

# ... and here was your original order
output_split[[2]] %>% select('hospital name', 'rank') %>% head(7)
                     hospital name rank
1      ANDALUSIA REGIONAL HOSPITAL   52
2        ATHENS-LIMESTONE HOSPITAL    9
3        ATMORE COMMUNITY HOSPITAL   53
4      BAPTIST MEDICAL CENTER EAST    2
5     BAPTIST MEDICAL CENTER SOUTH   46
6 BAPTIST MEDICAL CENTER-PRINCETON    8
7              BIBB MEDICAL CENTER   54

顺便说一下,为了避免麻烦,在列名中使用下划线而不是空格,然后我们不需要在 'hospital_name' 等周围加上引号。

names(os[[2]]) <- gsub(' ', '_', names(os[[2]]))) 重命名它们 "hospital_name" "state" "heart_attack" "heart_failure" "pneumonia" "rank"

或者您可以使用make.names(),它会破坏字母数字、下划线和点以外的任何字符。和 gsub() 如果你想要更好的控制。

您可以将 dfs 列表折叠为一个大 df:

output_split[[1]]$rank <- NA
do.call(function(...) rbind(..., make.row.names=F), output_split)

这样做。现在您的 dplyr 选择很简单%>% filter(state=='AL', rank==2) %>% select('hospital name')

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

Pandas:如何根据特定列上特定值的条件选择数据框中的行

如何根据条件(特定列的相同值)从其他数据框中复制值?

根据条件将数据框的值移动到列表中

如何根据 Pyspark 数据框中的条件设置新列表值?

如何根据条件更改python数据框中的值(即列表)?

如何根据 R 数据框中特定列的条件获得行式最大值?

如何根据条件替换熊猫数据框中的值?

如何根据R中的数据框条件指定值?

如何根据外部列表中的值过滤熊猫数据框?

如何根据列表移动数据框中的值?

如何让熊猫根据数据框中的特定值调整公式?

如何根据条件更新数据框值?

如何根据特定条件用另一个数据框中的值替换数据框中的值?

如何根据单个数据框中的值在数据框列表中分配值?

根据条件从数据框列表中删除数据框

根据列表中的值过滤数据框

如何根据多种条件更改熊猫数据框列系列中的特定单元格值?

如何从数据框中获取以特定值开头的列列表?

如何从数据框列表中修改特定值?

如何从较大列表中的特定列表中检索特定值?

如何根据jquery中提供的值来检查并禁用列表框中的选项?

根据条件和值列表创建pyspark数据框

根据条件值列表从数据框中删除一行

Python中根据值从列表中检索数据

如何遍历 Pandas 数据框,以便根据条件划分特定值?

如何根据熊猫数据框数据透视表中的条件获取列中的值?

Python数据框:根据特定条件合并列的值

熊猫-根据特定条件查找数据框中是否存在值

应用特定功能根据数据框中另一列的条件替换列的值