Python从CSV文件在一个字典中添加多个数据点

马修·詹金森（Mathew Jenkinson）

我有一个CSV文件，看起来像：

CountryCode, NumberCalled, CallPrice, CallDuration
BS,+1234567,0.20250,29
BS,+19876544,0.20250,1
US,+121234,0.01250,4
US,+1543215,0.01250,39
US,+145678,0.01250,11
US,+18765678,None,0

我希望能够分析文件以从数据中获取一些统计信息：

CountryCode, NumberOfTimesCalled, TotalPrice, TotalCallDuration
US, 4, 1.555, 54

目前，我有dict多数民众赞成在设置：

CalledStatistics = {}

当我从CSV读取每一行时，将数据放入dict的最佳方法是什么？：

CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}

添加第二条美国行会覆盖第一行还是会基于键“ CountryCode”添加数据？

泽佐洛

这些调用中的每一个：

CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}

会覆盖之前的通话。

为了计算所需的总和，您可以使用dict的dict。就像在for循环中，您将数据包含在以下变量中：country_code，call_duration，call_price以及将数据存储在collected_statistics中的位置：（编辑：添加了第一行，以便在将call_price记录为None的情况下将其变为0数据；这段代码旨在处理一致的数据（仅像整数一样），如果可能还有其他类型的数据，则需要将它们转换为整数（或相同类型的任何数字），然后python才能将它们求和）

call_price = call_price if call_price != None else 0

if country_code not in collected_statistics:
    collected_statistics[country_code] = {'CallDuration' : [call_duration],
                                          'CallPrice' : [call_price]}
else:
    collected_statistics[country_code]['CallDuration'] += [call_duration]
    collected_statistics[country_code]['CallPrice'] += [call_price]

在循环之后，对于每个country_code：

number_of_times_called[country_code] = len(collected_statistics[country_code]['CallDuration']

total_call_duration[country_code] = sum(collected_statistics[country_code]['CallDuration'])
total_price[country_code] = sum(collected_statistics[country_code]['CallPrice'])

好的，所以最后这是一个完整的工作脚本，可以处理您给出的示例：

#!/usr/bin/env python3

import csv
import decimal

with open('CalledData', newline='') as csvfile:
    csv_r = csv.reader(csvfile, delimiter=',', quotechar='|')

    # btw this creates a dict, not a set
    collected_statistics = {}

    for row in csv_r:

        [country_code, number_called, call_price, call_duration] = row

        # Only to avoid the first line, but would be better to have a list of available
        # (and correct) codes, and check if the country_code belongs to this list:
        if country_code != 'CountryCode':

            call_price = call_price if call_price != 'None' else 0

            if country_code not in collected_statistics:
                collected_statistics[country_code] = {'CallDuration' : [int(call_duration)],
                                                      'CallPrice' : [decimal.Decimal(call_price)]}
            else:
                collected_statistics[country_code]['CallDuration'] += [int(call_duration)]
                collected_statistics[country_code]['CallPrice'] += [decimal.Decimal(call_price)]


    for country_code in collected_statistics:
        print(str(country_code) + ":")
        print("number of times called: " + str(len(collected_statistics[country_code]['CallDuration'])))
        print("total price: " + str(sum(collected_statistics[country_code]['CallPrice'])))
        print("total call duration: " + str(sum(collected_statistics[country_code]['CallDuration'])))

使用CalledData作为具有与您提供的内容完全相同的文件的文件，它将输出：

$ ./test_script
BS:
number of times called: 2
total price: 0.40500
total call duration: 30
US:
number of times called: 4
total price: 0.03750
total call duration: 54

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-04-9

我来说两句

0 条评论

登录后参与评论

上一篇：Wordpress MYSQL错误

TOP 榜单

文章

Python从CSV文件在一个字典中添加多个数据点

Python从CSV文件在一个字典中添加多个数据点

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序