从存储到SQL的Azure数据工厂复制活动：挂起70000行

詹姆斯·艾伦

我有一个带有管道复制活动的数据工厂，如下所示：

{
  "type": "Copy",
  "name": "Copy from storage to SQL",
  "inputs": [
    {
      "name": "storageDatasetName"
    }
  ],
  "outputs": [
    {
      "name": "sqlOutputDatasetName"
    }
  ],
  "typeProperties": {
    "source": {
      "type": "BlobSource"
    },
    "sink": {
      "type": "SqlSink"
    }
  },
  "policy": {
    "concurrency": 1,
    "retry": 3
  },
  "scheduler": {
    "frequency": "Month",
    "interval": 1
  }
}

输入数据的大小约为90MB，约150万行，分为大约2个字节。Azure存储中的20 x 4.5MB块Blob文件。这是数据（CSV）的示例：

A81001,1,1,1,2,600,3.0,0.47236654,141.70996,0.70854986 A81001,4,11,0,25,588,243.0,5.904582,138.87576,57.392536 A81001,7,4,1,32,1342,278.0,7.5578647,316.95795， 65.65895

该接收器是类型为S2的Azure SQL Server，其额定值为50 DTU。我创建了一个简单的表，其中包含明智的数据类型，并且没有键，索引或任何花哨的东西，只有列：

CREATE TABLE [dbo].[Prescriptions](
    [Practice] [char](6) NOT NULL,
    [BnfChapter] [tinyint] NOT NULL,
    [BnfSection] [tinyint] NOT NULL,
    [BnfParagraph] [tinyint] NOT NULL,
    [TotalItems] [int] NOT NULL,
    [TotalQty] [int] NOT NULL,
    [TotalActCost] [float] NOT NULL,
    [TotalItemsPerThousand] [float] NOT NULL,
    [TotalQtyPerThousand] [float] NOT NULL,
    [TotalActCostPerThousand] [float] NOT NULL
)

源，接收器和数据工厂都在同一地区（北欧）。

根据Microsoft的“复制活动性能和调优指南”，对于Azure存储源和Azure SQL S2接收器，我应该获得0.4 MBps的速度。根据我的计算，这意味着90MB应该在大约半小时内传输（是吗？）。

由于某种原因，它非常快地复制了70,000行，然后似乎挂起了。使用SQL Management Studio，我可以看到数据库表中的行数恰好是70,000，并且在7个小时内根本没有增加。但是复制任务仍在运行，没有错误：

Any ideas why this is hanging at 70,000 rows? I can't see anything unusual about the 70,001st data row which would cause a problem. I've tried compeltely trashing the data factory and starting again, and I always get the same behaviour. i have another copy activity with a smaller table (8000 rows), which completes in 1 minute.

James Allen

Just to answer my own question in case it helps anyone else:

The issue was with null values. The reason that my run was hanging at 70,000 rows was that at row 76560 of my blob source file, there was a null value in one of the columns. The HIVE script I had used to generate this blob file had written the null value as '\N'. Also, my sink SQL table specified 'NOT NULL' as part of the column, and the column was a FLOAT value.

因此，我进行了两项更改：在我的Blob数据集定义中添加了以下属性：

"nullValue": "\\N"

并使我的SQL表列可为空。现在它可以完全运行并且不会挂起！:)

问题是数据工厂没有出错，只是卡住了-如果作业失败并显示一条有用的错误消息，并告诉我问题出在哪一行数据，那就太好了。我认为，因为默认情况下写入批处理大小为10,000，这就是为什么它被固定为70,000而不是76560的原因。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-2

我来说两句

0 条评论

登录后参与评论

上一篇：如何删除PhpStorm上的现有服务器？

TOP 榜单

文章

从存储到SQL的Azure数据工厂复制活动：挂起70000行

从存储到SQL的Azure数据工厂复制活动：挂起70000行

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序