将多个表中的记录合并到一个表中并删除文本字段重复项

杰夫·斯沃茨

我正在开发一个应用程序，其中我们在自己的表中表示代理、人员、客户、工作等实体。最初的开发者还为每个实体创建了一个笔记表，格式为agentnotes、personnotes、customernotes、jobnotes等。最终，在note entry页面，增加了一个功能，当你创建一个人的notes时，你可以选择一个选择向任何相关机构、客户或工作说明写相同的说明。显然，这导致了所有实体类型的大量重复注释。

我们希望将所有笔记合并到一个笔记集合中，这些笔记使用单个笔记实例标记有不同的相关记录。然后我们希望将其放入 elasticsearch 进行搜索，因此最终我们将导出为 json 格式。

问题是我们总共处理了 140 万条笔记，笔记正文是 sql server 中的一个文本字段。这是我到目前为止的一些代码。

using (var cn = new DbContext(DataSource))
{
    foreach (var agencynote in cn.AgenciesNotes.Where(x => !x.Processed).Take(100).ToList())
    {
        decimal customerid, peopleid;
        customerid = peopleid = 0;

        var custnote = cn.CustomerNotes.FirstOrDefault(x => x.Notes == agencynote.Notes);
        if (custnote != null)
        {
            customerid = custnote.CustomerID;
            custnote.Processed = true;
        }

        var peoplenote = cn.PeopleNotes.FirstOrDefault(x => x.Notes == agencynote.Notes);
        if (peoplenote != null)
        {
            peopleid = peoplenote.PeopleID;
            peoplenote.Processed = true;
        }
        
        var newNote = new NotesAll()
        {
            AgencyID = agencynote.AgencyID,
            CustomerID = customerid,
            EnteredDate = agencynote.EnteredDate,
            Notes = agencynote.Notes,
            NotesTypeID = agencynote.NotesTypeID,
            PeopleId = peopleid
        };

        cn.NotesAlls.Add(newNote);
        cn.SaveChanges();
    }
}

当我运行它时，它会在这条线上中断。

var custnote = cn.CustomerNotes.FirstOrDefault(x => x.Notes == agencynote.Notes);

错误是您无法比较 varchar 和 text 字段。首先，这两个字段在数据库中都定义为文本，EF 模型上的数据注释也指定了 [Column(TypeName = "text")]。那么，知道为什么它认为一个是 varchar 而另一个是文本吗？

另外，有没有更好的方法来做到这一点 - 特别是知道为elasticsearch生成json文件的最终目标？我知道这需要很长时间才能完成，但不确定是否有另一种删除重复项的方法。谢谢。

阿尔姆胡兰

您可以将text列转换为varchar(max)，或.ToString()在客户端代码中对它们执行操作。尽管很难说这是否是您实现最终目标的“正确”解决方案。

话虽如此……直接在 SQL 中对它们进行重复数据删除可能更容易。

纯 SQL 解决方案示例：

set nocount on;
-- pretend data
create table A(aid int identity(1,1), note text);
create table B(bid int identity(1,1), note text);
create table C(cid int identity(1,1), note text);
create table D(did int identity(1,1), note text);
-- helper tally table
create table ints (i int identity(1,1));
go
insert ints default values;
go 100 -- slow but concise demo code

-- bunch-o-junk notes, some shared
insert A (note) select cast(i as char(3)) from ints where i % 2 = 0;
insert B (note) select cast(i as char(3)) from ints where i % 3 = 0;
insert C (note) select cast(i as char(3)) from ints where i % 4 = 0;
insert D (note) select cast(i as char(3)) from ints where i % 5 = 0;

-- end of prep, start of actual solution
alter table A alter column note varchar(max);
alter table B alter column note varchar(max);
alter table C alter column note varchar(max);
alter table D alter column note varchar(max);
go

-- notes and associated ids from all tables for any note shared across 2 or more tables
select      a.aid, b.bid, c.cid, d.did, coalesce(a.note, b.note, c.note, d.note)
from        A
full join   B  on a.note = b.note
full join   C  on c.note = a.note or c.note = b.note
full join   D  on d.note = a.note or d.note = b.note or d.note = c.note
cross apply (  -- using this construction because it is easy to extend to more tables
                select  count(c)
                from    (values (aid), (bid), (cid), (did)) v (c)
            ) u (c)
where       u.c > 1;

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-08-11

我来说两句

0 条评论

登录后参与评论

上一篇：在 Watson Studio Notebooks 中使用 TQDM

TOP 榜单

文章

将多个表中的记录合并到一个表中并删除文本字段重复项

将多个表中的记录合并到一个表中并删除文本字段重复项

蓝屏死机没有修复解决方案

计算数据帧中每行的NA

UITableView的项目向下滚动后更改颜色，然后快速备份

Node.js中未捕获的异常错误，发生调用

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

Linux的官方Adobe Flash存储库是否已过时？

验证REST API参数

ggplot：对齐多个分面图-所有大小不同的分面

Mac OS X更新后的GRUB 2问题

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

带有错误“ where”条件的查询如何返回结果？

用日期数据透视表和日期顺序查询

VB.net将2条特定行导出到DataGridView

如何从视图一次更新多行（ASP.NET - Core）

Java Eclipse中的错误13，如何解决？

尝试反复更改屏幕上按钮的位置 - kotlin android studio

离子动态工具栏背景色

应用发明者仅从列表中选择一个随机项一次

当我尝试下载 StanfordNLP en 模型时，出现错误

python中的boto3文件上传

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID