我需要根据这些条件更新列值
i. if score > 3, set score to 1.
ii. if score <= 2, set score to 0.
iii. if score == 3, drop that row.
分数介于1到5之间
我已经编写了以下代码,但是所有值都被更改为0。
reviews.loc[reviews['Score'] > 3, 'Score'] = 1
reviews.loc[reviews['Score'] <= 2, 'Score'] = 0
reviews.drop(reviews[reviews['Score'] == 3].index, inplace = True)
请指出这是在做的错误。
存在逻辑问题:
reviews = pd.DataFrame({'Score':range(6)})
print (reviews)
Score
0 0
1 1
2 2
3 3
4 4
5 5
如果将所有值都设置得较高3
,1
则需要:
reviews.loc[reviews['Score'] > 3, 'Score'] = 1
print (reviews)
Score
0 0
1 1
2 2
3 3
4 1
5 1
然后将所有没有3
的值设置为0
,因此也1
从中替换reviews['Score'] > 3
:
reviews.loc[reviews['Score'] <= 2, 'Score'] = 0
print (reviews)
Score
0 0
1 0
2 0
3 3
4 0
5 0
最后是已删除的3
行,仅获0
取值:
reviews.drop(reviews[reviews['Score'] == 3].index, inplace = True)
print (reviews)
Score
0 0
1 0
2 0
4 0
5 0
您可以更改解决方案:
reviews = pd.DataFrame({'Score':range(6)})
print (reviews)
Score
0 0
1 1
2 2
3 3
4 4
5 5
首先3
通过过滤器删除不等于3
in的所有行boolean indexing
:
reviews = reviews[reviews['Score'] != 3].copy()
然后将值设置为0
和1
:
reviews['Score'] = (reviews['Score'] > 3).astype(int)
#alternative
reviews['Score'] = np.where(reviews['Score'] > 3, 1, 0)
print (reviews)
Score
0 0
1 0
2 0
4 1
5 1
编辑1:
您的解决方案应使用交换线进行更改-首先设置交换线0
,然后1
为避免覆盖值:
reviews.loc[reviews['Score'] <= 2, 'Score'] = 0
reviews.loc[reviews['Score'] > 3, 'Score'] = 1
reviews.drop(reviews[reviews['Score'] == 3].index, inplace = True)
print (reviews)
Score
0 0
1 0
2 0
4 1
5 1
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句