Numpy / Scipy中“沿路径”的快速线性插值

8one6：

假设我有来自3个（已知）海拔的气象站的数据。具体来说，每个站点每分钟都会在其位置记录一次温度测量值。我想执行两种插值。而且我希望能够快速执行每一项。

因此，让我们设置一些数据：

import numpy as np
from scipy.interpolate import interp1d
import pandas as pd
import seaborn as sns

np.random.seed(0)
N, sigma = 1000., 5

basetemps = 70 + (np.random.randn(N) * sigma)
midtemps = 50 + (np.random.randn(N) * sigma)
toptemps = 40 + (np.random.randn(N) * sigma)
alltemps = np.array([basetemps, midtemps, toptemps]).T # note transpose!
trend = np.sin(4 / N * np.arange(N)) * 30
trend = trend[:, np.newaxis]

altitudes = np.array([500, 1500, 4000]).astype(float)

finaltemps = pd.DataFrame(alltemps + trend, columns=altitudes)
finaltemps.index.names, finaltemps.columns.names = ['Time'], ['Altitude']
finaltemps.plot()

太好了，所以我们的温度看起来像这样：

将所有时间插值到相同的高度：

我认为这很简单。假设我想每次将温度提高到1,000。我可以使用内置的scipy插值方法：

interping_function = interp1d(altitudes, finaltemps.values)
interped_to_1000 = interping_function(1000)

fig, ax = plt.subplots(1, 1, figsize=(8, 5))
finaltemps.plot(ax=ax, alpha=0.15)
ax.plot(interped_to_1000, label='Interped')
ax.legend(loc='best', title=finaltemps.columns.name)

这很好。让我们来看看速度：

%%timeit
res = interp1d(altitudes, finaltemps.values)(1000)
#-> 1000 loops, best of 3: 207 µs per loop

Interpolate "along a path":

So now I have a second, related problem. Say I know the altitude of a hiking party as a function of time, and I want to compute the temperature at their (moving) location by linearly interpolating my data through time. In particular, the times at which I know the location of the hiking party are the same times at which I know the temperatures at my weather stations. I can do this without too much effort:

location = np.linspace(altitudes[0], altitudes[-1], N)
interped_along_path = np.array([interp1d(altitudes, finaltemps.values[i, :])(loc) 
                                             for i, loc in enumerate(location)])

fig, ax = plt.subplots(1, 1, figsize=(8, 5))
finaltemps.plot(ax=ax, alpha=0.15)
ax.plot(interped_along_path, label='Interped')
ax.legend(loc='best', title=finaltemps.columns.name)

So this works really nicely, but its important to note that the key line above is using list comprehension to hide an enormous amount of work. In the previous case, scipy is creating a single interpolation function for us, and evaluating it once on a large amount of data. In this case, scipy is actually constructing N individual interpolating functions and evaluating each once on a small amount of data. This feels inherently inefficient. There is a for loop lurking here (in the list comprehension) and moreover, this just feels flabby.

Not surprisingly, this is much slower than the previous case:

%%timeit
res = np.array([interp1d(altitudes, finaltemps.values[i, :])(loc) 
                            for i, loc in enumerate(location)])
#-> 10 loops, best of 3: 145 ms per loop

So the second example runs 1,000 slower than the first. I.e. consistent with the idea that the heavy lifting is the "make a linear interpolation function" step...which is happening 1,000 times in the second example but only once in the first.

So, the question: is there a better way to approach the second problem? For example, is there a good way to set it up with 2-dimensinoal interpolation (which could perhaps handle the case where the times at which the hiking party locations are known are not the times at which the temperatures have been sampled)? Or is there a particularly slick way to handle things here where the times do line up? Or other?

Dietrich :

For a fixed point in time, you can utilize the following interpolation function:

g(a) = cc[0]*abs(a-aa[0]) + cc[1]*abs(a-aa[1]) + cc[2]*abs(a-aa[2])

where a is the hiker's altitude, aa the vector with the 3 measurement altitudes and cc is a vector with the coefficients. There are three things to note:

For given temperatures (alltemps) corresponding to aa, determining cc can be done by solving a linear matrix equation using np.linalg.solve().
g(a) is easy to vectorize for a (N,) dimensional a and (N, 3) dimensional cc (including np.linalg.solve() respectively).
g(a) is called a first order univariate spline kernel (for three points). Using abs(a-aa[i])**(2*d-1) would change the spline order to d. This approach could be interpreted a simplified version of a Gaussian Process in Machine Learning.

So the code would be:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# generate temperatures
np.random.seed(0)
N, sigma = 1000, 5
trend = np.sin(4 / N * np.arange(N)) * 30
alltemps = np.array([tmp0 + trend + sigma*np.random.randn(N)
                     for tmp0 in [70, 50, 40]])

# generate attitudes:
altitudes = np.array([500, 1500, 4000]).astype(float)
location = np.linspace(altitudes[0], altitudes[-1], N)


def doit():
    """ do the interpolation, improved version for speed """
    AA = np.vstack([np.abs(altitudes-a_i) for a_i in altitudes])
    # This is slighty faster than np.linalg.solve(), because AA is small:
    cc = np.dot(np.linalg.inv(AA), alltemps)

    return (cc[0]*np.abs(location-altitudes[0]) +
            cc[1]*np.abs(location-altitudes[1]) +
            cc[2]*np.abs(location-altitudes[2]))


t_loc = doit()  # call interpolator

# do the plotting:
fg, ax = plt.subplots(num=1)
for alt, t in zip(altitudes, alltemps):
    ax.plot(t, label="%d feet" % alt, alpha=.5)
ax.plot(t_loc, label="Interpolation")
ax.legend(loc="best", title="Altitude:")
ax.set_xlabel("Time")
ax.set_ylabel("Temperature")
fg.canvas.draw()

Measuring the time gives:

In [2]: %timeit doit()
10000 loops, best of 3: 107 µs per loop

更新：我将原始列表推导替换为doit()30％的导入速度（对于N=1000）。

此外，按照比较的要求，我的机器上@moarningsun的基准代码块：

10 loops, best of 3: 110 ms per loop  
interp_checked
10000 loops, best of 3: 83.9 µs per loop
scipy_interpn
1000 loops, best of 3: 678 µs per loop
Output allclose:
[True, True, True]

注意，这N=1000是一个相对较小的数字。使用N=100000产生结果：

interp_checked
100 loops, best of 3: 8.37 ms per loop

%timeit doit()
100 loops, best of 3: 5.31 ms per loop

这表明该方法在规模上N要比该interp_checked方法更好。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-08-8

我来说两句

0 条评论

登录后参与评论

TOP 榜单

文章