TypeError：无法连接“ str”和“ float”对象：pandas

姜饼

我正在尝试使用熊猫解决数据科学问题。我的数据集包含以下列：“国家/地区”，“转化”，“测试”，“用户ID”等。在“国家/地区”列中，大约有10个国家/地区。“测试”列的值0和1表示两种类型的测试：控制0和实验1。“转化”也具有值0和1，表示此人是否已转化。

我想按国家/地区分组，并为每个组计算p值和test == 0和test == 1的平均值。我正在尝试使用以下函数，但是会引发错误“ TypeError：无法连接'str'和'float'对象”。有人可以阐明这一点吗？

def f(x):
        control = x.loc[(x.test==0)]
        test = x.loc[(x.test==1)]
        p_value = stats.ttest_ind(control,test)[0]
        control_mean = control['conversion'].mean()
        test_mean = test['conversion'].mean()
        return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})     

bycountry = data1.groupby('country').apply(f) 
bycountry = bycountry.reset_index(level='None')
bycountry

完整的错误消息：

TypeError                                 Traceback (most recent call last)
<ipython-input-495-bd6227878520> in <module>()
      7     return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})
      8 
----> 9 bycountry = data1.groupby("country").apply(f)
     10 bycountry = bycountry.reset_index(level='None')
     11 bycountry

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in apply(self, func, *args, **kwargs)
    649         # ignore SettingWithCopy here in case the user mutates
    650         with option_context('mode.chained_assignment', None):
--> 651             return self._python_apply_general(f)
    652 
    653     def _python_apply_general(self, f):

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in _python_apply_general(self, f)
    653     def _python_apply_general(self, f):
    654         keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 655                                                    self.axis)
    656 
    657         return self._wrap_applied_output(

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in apply(self, f, data, axis)
   1525             # group might be modified
   1526             group_axes = _get_axes(group)
-> 1527             res = f(group)
   1528             if not _is_indexed_like(res, group_axes):
   1529                 mutated = True

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in f(g)
    645         @wraps(func)
    646         def f(g):
--> 647             return func(g, *args, **kwargs)
    648 
    649         # ignore SettingWithCopy here in case the user mutates

<ipython-input-495-bd6227878520> in f(x)
      2     control = x.loc[(x.test==0)]
      3     test = x.loc[(x.test==1)]
----> 4     p_value = stats.ttest_ind(control,test)[0]
      5     control_mean = control['conversion'].mean()
      6     test_mean = test['conversion'].mean()

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\scipy\stats\stats.pyc in ttest_ind(a, b, axis, equal_var, nan_policy)
   3865         return Ttest_indResult(np.nan, np.nan)
   3866 
-> 3867     v1 = np.var(a, axis, ddof=1)
   3868     v2 = np.var(b, axis, ddof=1)
   3869     n1 = a.shape[axis]

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\numpy\core\fromnumeric.pyc in var(a, axis, dtype, out, ddof, keepdims)
   3098 
   3099     return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
-> 3100                          keepdims=keepdims)

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\numpy\core\_methods.pyc in _var(a, axis, dtype, out, ddof, keepdims)
     89     # Note that if dtype is not of inexact type then arraymean will
     90     # not be either.
---> 91     arrmean = umr_sum(arr, axis, dtype, keepdims=True)
     92     if isinstance(arrmean, mu.ndarray):
     93         arrmean = um.true_divide(

TypeError: cannot concatenate 'str' and 'float' objects

df.dtypes的输出：

user_id                      int64
date                datetime64[ns]
source                      object
device                      object
browser_language            object
ads_channel                 object
browser                     object
conversion                   int64
test                         int64
sex                         object
age                        float64
country                     object
dtype: object

姜饼

def f(x):
    control = x.loc[(x.test==0)]
    control = control['conversion']
    test = x.loc[(x.test==1)]
    test = test['conversion']
    p_value = stats.ttest_ind(control,test)[0]
    control_mean = control.mean()
    test_mean = test.mean()
    return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})

这成功了！！再次感谢您@ juanpa.arrivillaga！

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-05-22

我来说两句

0 条评论

登录后参与评论

TOP 榜单

文章

TypeError：无法连接“ str”和“ float”对象：pandas

TypeError：无法连接“ str”和“ float”对象：pandas

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用