我在具有不同字符串的数据框中有一列。
Additional Information |
IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn
IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray
Undefined System Error
Specific groupname=CUSTGR1
IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck
我想要做的是使用上面列中的相应值创建新列,即IP地址和MAC地址。
这样预期的输出如下所示:
Additional Information |IP Address | MAC Address |
IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn |192.168.1.1 |00:0a:95:9d:68:16|
IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray|192.168.0.1 |00:0a:95:9d:68:17|
Undefined System Error | | |
Specific groupname=CUSTGR1 | | |
IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck |192.168.1.2 |00:1B:44:11:3A:B7|
问题是,我无法处理不包含IP和MAC的行。我尝试使用np.where拆分以及找到部分匹配项,但未成功。
想法是使用列表理解,如果不丢失值或无且存在,
且过滤,则使用过滤,并=
传递给DataFrame
构造函数,最后一次使用DataFrame.join
为原始方法:
L = [dict(y.split("=") for y in v.split(", "))
if pd.notna(v) and ('=' in v) and (', ' in v)
else {}
for v in df['Additional Information']]
df1 = pd.DataFrame(L, index=df.index)
print (df1)
IP MAC ADDR USER
0 192.168.1.1 00:0a:95:9d:68:16 kwfinn
1 192.168.0.1 00:0a:95:9d:68:17 wattray
2 NaN NaN NaN
3 NaN NaN NaN
4 192.168.1.2 00:1B:44:11:3A:B7 stwnck
df = df.join(df1[['IP','MAC ADDR']])
print (df)
Additional Information IP \
0 IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, US... 192.168.1.1
1 IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, US... 192.168.0.1
2 Undefined System Error NaN
3 Specific groupname=CUSTGR1 NaN
4 IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, US... 192.168.1.2
MAC ADDR
0 00:0a:95:9d:68:16
1 00:0a:95:9d:68:17
2 NaN
3 NaN
4 00:1B:44:11:3A:B7
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句