第一次在 stackoverflow 上发帖,如果我做错了,请耐心等待:)
我正在尝试使用 geopy 计算两点之间的距离,但我无法使计算的实际应用发挥作用。
这是我正在使用的数据帧的头部(数据帧后面有一些缺失值,不确定这是否是问题或如何处理):
start lat start long end_lat end_long
0 38.902760 -77.038630 38.880300 -76.986200
2 38.895914 -77.026064 38.915400 -77.044600
3 38.888251 -77.049426 38.895914 -77.026064
4 38.892300 -77.043600 38.888251 -77.049426
我设置了一个函数:
def dist_calc(st_lat, st_long, fin_lat, fin_long):
from geopy.distance import vincenty
start = (st_lat, st_long)
end = (fin_lat, fin_long)
return vincenty(start, end).miles
当给定手动输入时,这个工作正常。
但是,当我尝试 apply() 函数时,我遇到了以下代码的问题:
distances = df.apply(lambda row: dist_calc(row[-4], row[-3], row[-2], row[-1]), axis=1)
我对python相当陌生,任何帮助将不胜感激!
编辑:错误信息:
distances = df.apply(lambda row: dist_calc2(row[-4], row[-3], row[-2], row[-1]), axis=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 4262, in apply
ignore_failures=ignore_failures)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 4358, in _apply_standard
results[i] = func(v)
File "<stdin>", line 1, in <lambda>
File "<stdin>", line 5, in dist_calc2
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/geopy/distance.py", line 322, in __init__
super(vincenty, self).__init__(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/geopy/distance.py", line 115, in __init__
kilometers += self.measure(a, b)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/geopy/distance.py", line 414, in measure
u_sq = cos_sq_alpha * (major ** 2 - minor ** 2) / minor ** 2
UnboundLocalError: ("local variable 'cos_sq_alpha' referenced before assignment", 'occurred at index 10')
pandas 函数的默认设置通常用于导入这样的文本数据(pd.read_table() 等)会将前 2 列名称中的空格解释为分隔符,因此您最终会得到 6 列而不是 4 列,并且您的数据会错位:
In [23]: df = pd.read_clipboard()
In [24]: df
Out[24]:
start lat start.1 long end_lat end_long
0 0 38.902760 -77.038630 38.880300 -76.986200 NaN
1 2 38.895914 -77.026064 38.915400 -77.044600 NaN
2 3 38.888251 -77.049426 38.895914 -77.026064 NaN
3 4 38.892300 -77.043600 38.888251 -77.049426 NaN
In [25]: df.columns
Out[25]: Index(['start', 'lat', 'start.1', 'long', 'end_lat', 'end_long'], dtype='object')
注意列名是错误的,最后一列充满了 NaN 等等。如果我将你的函数应用到这种形式的数据框,我会得到和你一样的错误。
通常最好在将其作为数据框导入之前尝试解决此问题。我可以想到2种方法:
以下是案例 (2) 的示例:
In [35]: df = pd.read_clipboard(sep=r'\s{2,}|\s(?=-)', engine='python')
In [36]: df = df.rename_axis({'start lat': 'start_lat', 'start long': 'start_long'}, axis=1)
In [37]: df
Out[37]:
start_lat start_long end_lat end_long
0 38.902760 -77.038630 38.880300 -76.986200
2 38.895914 -77.026064 38.915400 -77.044600
3 38.888251 -77.049426 38.895914 -77.026064
4 38.892300 -77.043600 38.888251 -77.049426
指定分隔符必须包含 2 个以上的空格字符,或 1 个空格后跟连字符(减号)。然后我将列重命名为我假设的预期值。
从这一点来看,您的函数 / apply 工作正常,但我对其进行了一些更改:
例如:
In [51]: def dist_calc(row):
...: start = row[['start_lat','start_long']]
...: end = row[['end_lat', 'end_long']]
...: return vincenty(start, end).miles
...:
In [52]: df.apply(lambda row: dist_calc(row), axis=1)
Out[52]:
0 3.223232
2 1.674780
3 1.365851
4 0.420305
dtype: float64
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句