在Python中删除特定的字符/字符串/字符序列

低级程序员

我正在创建一长串似乎是元组的列表,以后我希望将其转换为Dataframe,但是有某些常见的字符序列阻止了此操作。以及一部分输出的示例:

0,"GAME_ID                      21900001
EVENTNUM                            2
EVENTMSGTYPE                       12
EVENTMSGACTIONTYPE                  0
PERIOD                              1
WCTIMESTRING                  8:04 PM
PCTIMESTRING                    12:00
HOMEDESCRIPTION                      
NEUTRALDESCRIPTION                   
VISITORDESCRIPTION                   
SCORE                             NaN
SCOREMARGIN                       NaN
PERSON1TYPE                         0
PLAYER1_ID                          0
PLAYER1_NAME                      NaN
PLAYER1_TEAM_ID                   NaN
PLAYER1_TEAM_CITY                 NaN
PLAYER1_TEAM_NICKNAME             NaN
PLAYER1_TEAM_ABBREVIATION         NaN
PERSON2TYPE                         0
PLAYER2_ID                          0
PLAYER2_NAME                      NaN
PLAYER2_TEAM_ID                   NaN
PLAYER2_TEAM_CITY                 NaN
PLAYER2_TEAM_NICKNAME             NaN
PLAYER2_TEAM_ABBREVIATION         NaN
PERSON3TYPE                         0
PLAYER3_ID                          0
PLAYER3_NAME                      NaN
PLAYER3_TEAM_ID                   NaN
PLAYER3_TEAM_CITY                 NaN
PLAYER3_TEAM_NICKNAME             NaN
PLAYER3_TEAM_ABBREVIATION         NaN
VIDEO_AVAILABLE_FLAG                0
DESCRIPTION                          
TIME_ELAPSED                        0
TIME_ELAPSED_PERIOD                 0
Name: 0, dtype: object"

而所需的输出将是:

GAME_ID                      21900001
EVENTNUM                            2
EVENTMSGTYPE                       12
EVENTMSGACTIONTYPE                  0
PERIOD                              1
WCTIMESTRING                  8:04 PM
PCTIMESTRING                    12:00
HOMEDESCRIPTION                      
NEUTRALDESCRIPTION                   
VISITORDESCRIPTION                   
SCORE                             NaN
SCOREMARGIN                       NaN
PERSON1TYPE                         0
PLAYER1_ID                          0
PLAYER1_NAME                      NaN
PLAYER1_TEAM_ID                   NaN
PLAYER1_TEAM_CITY                 NaN
PLAYER1_TEAM_NICKNAME             NaN
PLAYER1_TEAM_ABBREVIATION         NaN
PERSON2TYPE                         0
PLAYER2_ID                          0
PLAYER2_NAME                      NaN
PLAYER2_TEAM_ID                   NaN
PLAYER2_TEAM_CITY                 NaN
PLAYER2_TEAM_NICKNAME             NaN
PLAYER2_TEAM_ABBREVIATION         NaN
PERSON3TYPE                         0
PLAYER3_ID                          0
PLAYER3_NAME                      NaN
PLAYER3_TEAM_ID                   NaN
PLAYER3_TEAM_CITY                 NaN
PLAYER3_TEAM_NICKNAME             NaN
PLAYER3_TEAM_ABBREVIATION         NaN
VIDEO_AVAILABLE_FLAG                0
DESCRIPTION                          
TIME_ELAPSED                        0
TIME_ELAPSED_PERIOD                 0

我如何在开始时摆脱0和“,然后在末尾删除垃圾TIME_ELAPSED_PERIOD?在开始时int和最下面一行中的int增加1直到程序结束,这很可能会消失大约320,000以上,因此我将需要代码以适应一定范围的int值。我认为创建列表后最容易做到这一点,因此不需要我展示您可以使用我的任何代码,只需系统地操作字符就可以解决问题,谢谢!

卡西克·莫汉拉

如果您的输入数据为列表形式,则可以尝试以下操作来满足您的要求:

inputlist = Your_list_to_be_corrected  #Assign your input list here

# Now, remove the rows in the list that have the format "Name: 0, dtype: object""
inputlist = [ x for x in inputlist if "dtype: object" not in x ]

#Now, correct the rows containing GAME_ID by removing the int number and special characters
sep = 'GAME_ID'
for index, element in enumerate(inputlist):
    if "GAME_ID" in element:
        inputlist[index] = 'GAME_ID' + element.split(sep, 1)[1]

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章