我有一个CSV文件,看起来像:
CountryCode, NumberCalled, CallPrice, CallDuration
BS,+1234567,0.20250,29
BS,+19876544,0.20250,1
US,+121234,0.01250,4
US,+1543215,0.01250,39
US,+145678,0.01250,11
US,+18765678,None,0
我希望能够分析文件以从数据中获取一些统计信息:
CountryCode, NumberOfTimesCalled, TotalPrice, TotalCallDuration
US, 4, 1.555, 54
目前,我有dict多数民众赞成在设置:
CalledStatistics = {}
当我从CSV读取每一行时,将数据放入dict的最佳方法是什么?:
CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}
添加第二条美国行会覆盖第一行还是会基于键“ CountryCode”添加数据?
这些调用中的每一个:
CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}
会覆盖之前的通话。
为了计算所需的总和,您可以使用dict的dict。就像在for循环中,您将数据包含在以下变量中:country_code,call_duration,call_price以及将数据存储在collected_statistics中的位置:(编辑:添加了第一行,以便在将call_price记录为None的情况下将其变为0数据;这段代码旨在处理一致的数据(仅像整数一样),如果可能还有其他类型的数据,则需要将它们转换为整数(或相同类型的任何数字),然后python才能将它们求和)
call_price = call_price if call_price != None else 0
if country_code not in collected_statistics:
collected_statistics[country_code] = {'CallDuration' : [call_duration],
'CallPrice' : [call_price]}
else:
collected_statistics[country_code]['CallDuration'] += [call_duration]
collected_statistics[country_code]['CallPrice'] += [call_price]
在循环之后,对于每个country_code:
number_of_times_called[country_code] = len(collected_statistics[country_code]['CallDuration']
total_call_duration[country_code] = sum(collected_statistics[country_code]['CallDuration'])
total_price[country_code] = sum(collected_statistics[country_code]['CallPrice'])
好的,所以最后这是一个完整的工作脚本,可以处理您给出的示例:
#!/usr/bin/env python3
import csv
import decimal
with open('CalledData', newline='') as csvfile:
csv_r = csv.reader(csvfile, delimiter=',', quotechar='|')
# btw this creates a dict, not a set
collected_statistics = {}
for row in csv_r:
[country_code, number_called, call_price, call_duration] = row
# Only to avoid the first line, but would be better to have a list of available
# (and correct) codes, and check if the country_code belongs to this list:
if country_code != 'CountryCode':
call_price = call_price if call_price != 'None' else 0
if country_code not in collected_statistics:
collected_statistics[country_code] = {'CallDuration' : [int(call_duration)],
'CallPrice' : [decimal.Decimal(call_price)]}
else:
collected_statistics[country_code]['CallDuration'] += [int(call_duration)]
collected_statistics[country_code]['CallPrice'] += [decimal.Decimal(call_price)]
for country_code in collected_statistics:
print(str(country_code) + ":")
print("number of times called: " + str(len(collected_statistics[country_code]['CallDuration'])))
print("total price: " + str(sum(collected_statistics[country_code]['CallPrice'])))
print("total call duration: " + str(sum(collected_statistics[country_code]['CallDuration'])))
使用CalledData作为具有与您提供的内容完全相同的文件的文件,它将输出:
$ ./test_script
BS:
number of times called: 2
total price: 0.40500
total call duration: 30
US:
number of times called: 4
total price: 0.03750
total call duration: 54
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句