耦合分布式和串行组件时残差不正确

蒂莫西·布鲁克斯(Timothy Brooks)

我已经注意到,非线性求解器中的剩余范数相对于将分布式(并行)组件耦合到非分布式(串行)组件时所使用的处理器数量有一点依赖性。我在下面附加了一个示例脚本。

'''
Simple example coupling a serial and distributed ImplicitComponent
'''

import numpy as np

import openmdao.api as om
from mpi4py import MPI
from openmdao.utils.array_utils import evenly_distrib_idxs

rank = MPI.COMM_WORLD.rank

size = 3
A = np.array([[1.0, 8.0, 0.0], [-1.0, 10.0, 2.0], [3.0, 100.5, 1.0]])

'''
This component solves the following quadratic equation in parallel:
    a_i0 * y_i^2 + a_i1 * y_i + a_i2 = x_i
    for i = {0,1,2}
where the coefficients are the components of the matrix A
'''
class DistribQuadtric(om.ImplicitComponent):
    def initialize(self):
        self.options['distributed'] = True
        self.options.declare('size', types=int, default=1,
            desc="Size of input and output vectors.")

    def setup(self):
        comm = self.comm
        rank = comm.rank

        size_total = self.options['size']

        # Distribute x and y vectors across each processor as evenly as possible
        sizes, offsets = evenly_distrib_idxs(comm.size, size_total)
        start = offsets[rank]
        end = start + sizes[rank]
        self.size_local = size_local = sizes[rank]

        # Get the local slice of A that this processor will be working with
        self.A_local = A[start:end,:]

        self.add_input('x', np.ones(size_local, float),
                       src_indices=np.arange(start, end, dtype=int))

        self.add_output('y', np.ones(size_local, float))

    def apply_nonlinear(self, inputs, outputs, residuals):
        x = inputs['x']
        y = outputs['y']
        r = residuals['y']
        for i in range(self.size_local):
            r[i] = self.A_local[i, 0] * y[i]**2 + self.A_local[i, 1] * y[i] \
            + self.A_local[i, 2] - x[i]

    def solve_nonlinear(self, inputs, outputs):
        x = inputs['x']
        y = outputs['y']
        for i in range(self.size_local):
            a = self.A_local[i, 0]
            b = self.A_local[i, 1]
            c = self.A_local[i, 2] - x[i]
            y[i] = (-b + np.sqrt(b**2 - 4*a*c))/(2*a)

'''
This component solves the following linear equation in serial:
    Ax = y
'''
class SerialLinear(om.ImplicitComponent):
    def initialize(self):

        self.options.declare('size', types=int, default=1,
                             desc="Size of input and output vectors.")

    def setup(self):
        size = self.options['size']

        self.add_input('y', np.ones(size, float))

        self.add_output('x', np.ones(size, float))

        self.A = A

    def apply_nonlinear(self, inputs, outputs, residuals):
        y = inputs['y']
        x = outputs['x']
        r = residuals['x']
        r = y - A.dot(x)

    def solve_nonlinear(self, inputs, outputs):
        y = inputs['y']
        x = outputs['x']
        x[:] = np.linalg.inv(A).dot(y)

# Create a couple problem between the linear and quadratic components
prob = om.Problem()
top_group = prob.model
top_group.add_subsystem("distributed_quad", DistribQuadtric(size=size))
top_group.add_subsystem("serial_linear", SerialLinear(size=size))

# Connect variables between components
top_group.connect('serial_linear.x', 'distributed_quad.x')
top_group.connect('distributed_quad.y', 'serial_linear.y')

# Need a nonlinear solver since the model is coupled
top_group.nonlinear_solver = om.NonlinearBlockGS(iprint=2, maxiter=20)

# Setup problem
prob.setup()

# Solver problem
prob.run_model()

# Print out solution
if prob.comm.rank == 0:
    print('x', prob['serial_linear.x'])
    print('y', prob['serial_linear.y'])

在1个处理器上运行此代码时,打印输出如下所示:

NL: NLBGS 0 ; 2.35754338 1
NL: NLBGS 1 ; 0.256315721 0.108721529
NL: NLBGS 2 ; 0.036527896 0.0154940504
NL: NLBGS 3 ; 0.00641965062 0.00272302545
NL: NLBGS 4 ; 0.0011292331 0.000478987198
NL: NLBGS 5 ; 0.000198654857 8.42635002e-05
NL: NLBGS 6 ; 3.49479079e-05 1.48238663e-05
NL: NLBGS 7 ; 6.14814792e-06 2.60786205e-06
NL: NLBGS 8 ; 1.08160237e-06 4.58783657e-07
NL: NLBGS 9 ; 1.90279057e-07 8.0710734e-08
NL: NLBGS 10 ; 3.34745201e-08 1.41988989e-08
NL: NLBGS 11 ; 5.8889481e-09 2.49791717e-09
NL: NLBGS 12 ; 1.03600386e-09 4.3944212e-10
NL: NLBGS 13 ; 1.8225669e-10 7.7307884e-11
NL: NLBGS Converged
('x', array([-0.01251987,  0.00136932, -0.11111688]))
('y', array([-0.00156529, -0.19602066, -0.01105954]))

但是在3个处理器上运行时,打印输出为:

NL: NLBGS 0 ; 5.66931072 1
NL: NLBGS 1 ; 0.6855401 0.120921243
NL: NLBGS 2 ; 0.0993351375 0.0175215546
NL: NLBGS 3 ; 0.0174731006 0.00308205026
NL: NLBGS 4 ; 0.00307353315 0.000542135243
NL: NLBGS 5 ; 0.00054069662 9.537255e-05
NL: NLBGS 6 ; 9.51208366e-05 1.67782013e-05
NL: NLBGS 7 ; 1.67339624e-05 2.95167495e-06
NL: NLBGS 8 ; 2.94389363e-06 5.19268351e-07
NL: NLBGS 9 ; 5.17899477e-07 9.1351401e-08
NL: NLBGS 10 ; 9.11105862e-08 1.60708401e-08
NL: NLBGS 11 ; 1.60284752e-08 2.82723526e-09
NL: NLBGS 12 ; 2.81978416e-09 4.97376895e-10
NL: NLBGS 13 ; 4.96064272e-10 8.74999266e-11
NL: NLBGS Converged
('x', array([-0.01251987,  0.00136932, -0.11111688]))
('y', array([-0.00156529, -0.19602066, -0.01105954]))

尽管对耦合问题的最终解决方案是相同的,但非线性解决方案中使用的残差范数随着处理器数量的增加而增长。当将分布式组件耦合到分布式组件或将非分布式组件耦合到非分布式组件时,仅在混合它们时才不会发生这种情况。

我相信造成这种差异的原因在于用于并行问题并在OpenMDAO源代码中定义的底层petsc_vector类。具体来说,该类的规范定义如下所示:

    def get_norm(self):
        """
        Return the norm of this vector.

        Returns
        -------
        float
            norm of this vector.
        """
        return self._system.comm.allreduce(np.linalg.norm(self._data))

此方法使用allreduce将所有处理器上的向量分量累积到范数中。虽然这将为为分布式组件定义的任何向量提供正确的结果(因为向量的组件在所有处理器中均已分解),但串行组件的向量在每个处理器上都包含向量的相同副本,因此会在其中多次计数规范取决于所使用的处理器数量。

尽管在我展示的示例中这种效果微不足道,但对于可能在大量处理器上运行的更复杂的模型而言,它的影响会更大。这可能会导致收敛,并行可校正性研究和解决公差问题。有办法避免这种情况吗?

肯尼斯·摩尔

感谢您的报告。这绝对是一个错误,并且当前已在OpenMDAO存储库中的最新版本(0d50e7e2c26140b603460f2324e3d1d95513264a)中修复。

最新版本(2.8)也包含此修复程序。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章