我正在尝试使用locals()或* args来遍历多个函数参数。但是,我将函数参数定义为数据帧中的列。如何编辑以下内容以使float_format函数遍历可变数量的参数?
#! /usr/bin/env python3
import pandas as pd
def float_format(a, b, c, d, e, f): #Change to single *args function argument?
for x in range(len(data[a])):
data[a][x] = data[a][x].replace(' Mbps', '')
for x in range(len(data[b])):
data[b][x] = data[b][x].replace(' Mbps', '')
for x in range(len(data[c])):
data[c][x] = data[c][x].replace(' Mbps', '')
for x in range(len(data[d])):
data[d][x] = data[d][x].replace(' Mbps', '')
for x in range(len(data[e])):
data[e][x] = data[e][x].replace(' Mbps', '')
for x in range(len(data[f])):
data[f][x] = data[f][x].replace(' Mbps', '')
file = r'Original_File.xls'
data = pd.read_excel(file, header=[2])
float_format('Average Receive bps',
'Peak Receive bps',
'Received Bandwidth',
'Average Transmit bps',
'Peak Transmit bps',
'Transmit Bandwidth')
data.to_excel('results.xlsx', 'w+')
所以如果我尝试
def float_format(*iterate):
for arg in iterate:
for x in range(len(data[iterate])):
data[iterate][x] = data[iterate][x].replace(' Mbps', '')
我在函数运行方式中遇到回溯错误。
例子df
>>> data
Display Name Interface Name ... Peak Transmit bps Transmit Bandwidth
0 1951 - LAB - FW1 port1 ... 0.56 Mbps 10.00 Mbps
1 1951 - LAB - FW1 port1 ... 0.37 Mbps 10.00 Mbps
2 1951 - LAB - FW1 port1 ... 0.34 Mbps 10.00 Mbps
3 1951 - LAB - FW1 port1 ... 0.36 Mbps 10.00 Mbps
4 1951 - LAB - FW1 port1 ... 0.83 Mbps 10.00 Mbps
5 1951 - LAB - FW1 port1 ... 0.55 Mbps 10.00 Mbps
6 1951 - LAB - FW1 port1 ... 0.27 Mbps 10.00 Mbps
7 1951 - LAB - FW1 port1 ... 0.41 Mbps 10.00 Mbps
8 1951 - LAB - FW1 port2 ... 0.00 Mbps 1000.00 Mbps
9 1951 - LAB - FW1 port2 ... 0.00 Mbps 1000.00 Mbps
10 1951 - LAB - FW1 port2 ... 0.00 Mbps 1000.00 Mbps
11 1951 - LAB - FW1 port2 ... 0.00 Mbps 1000.00 Mbps
12 1951 - LAB - FW1 port2 ... 0.00 Mbps 1000.00 Mbps
13 1951 - LAB - FW1 port2 ... 0.00 Mbps 1000.00 Mbps
14 1951 - LAB - FW1 port2 ... 0.19 Mbps 1000.00 Mbps
15 1951 - LAB - FW1 port2 ... 0.31 Mbps 1000.00 Mbps
此处无需使用* args或类似的东西,我们可以利用Pandas提供的操作。
import numpy as np
import pandas as pd
df_1 = pd.DataFrame(data={'col_1': np.random.randint(0, 10, 10),
'col_2': np.random.randint(0, 50, 10),
'col_3': np.random.randint(0, 5, 10)})
df_1[['col_1', 'col_3']] = df_1[['col_1', 'col_3']].astype(str) + ' Mbps'
print(df_1)
print(df_1.dtypes)
输出:
col_1 col_2 col_3
0 1 Mbps 45 0 Mbps
1 2 Mbps 34 1 Mbps
2 6 Mbps 46 2 Mbps
3 7 Mbps 2 1 Mbps
4 6 Mbps 36 0 Mbps
5 9 Mbps 36 3 Mbps
6 4 Mbps 39 1 Mbps
7 4 Mbps 26 1 Mbps
8 1 Mbps 10 1 Mbps
9 6 Mbps 1 1 Mbps
col_1 object
col_2 int64
col_3 object
dtype: object
Series.str.extract()
使用循环
cols_to_change = ['col_1', 'col_3']
for col_name in cols_to_change:
df_1[col_name] = df_1[col_name].str.extract(r"(\d+) Mbps", expand=False).astype(int)
使用 DataFrame.apply()
cols_to_change = ['col_1', 'col_3']
df_1[cols_to_change] = df_1[cols_to_change].apply(lambda col: col.str.extract(r"(\d+) Mbps", expand=False)).astype(int)
Series.str.slice()
使用循环
cols_to_change = ['col_1', 'col_3']
for col_name in cols_to_change:
df_1[col_name] = df_1[col_name].str.slice(stop=-5).astype(int)
使用 DataFrame.apply()
cols_to_change = ['col_1', 'col_3']
df_1[cols_to_change] = df_1[cols_to_change].apply(lambda col: col.str.slice(stop=-5)).astype(int)
DataFrame内容:
col_1 col_2 col_3
0 9 40 3
1 4 8 3
2 6 49 4
3 4 38 4
4 6 25 4
5 3 8 3
6 3 27 3
7 0 45 1
8 7 24 4
9 3 29 2
dtypes
:
col_1 int64
col_2 int64
col_3 int64
dtype: object
有任何问题请告诉我:)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句