在MySQL中删除重复数据

八度

我试图在这个SO问题中模拟一个可接受的答案:删除MySQL中除“一个”以外的所有重复行?[重复]有点曲折,我希望一个表的数据(自动递增ID)确定要在另一表中删除的行。SQLFiddle在这里显示数据。

在上面引用的小提琴中,我正在寻找的最终结果是eventdetails_new要删除其中Event_ID = 4和6的行(EVENTDETAILS_ID的5和6、9和10),剩下第3和5行(EVENTDETAILS_ID的3和4和9)。 7&8)。我希望这是有道理的。理想情况下events_new具有相同Event_ID的行也将被删除(我尚未开始处理,因此没有代码示例)。

这是我正在尝试进行的查询,但是我有点头疼:

SELECT *
FROM eventdetails_new AS EDN1, eventdetails_new AS EDN2
INNER JOIN events_new AS E1 ON `E1`.`Event_ID` = `EDN1`.`Event_ID`
INNER JOIN events_new AS E2 ON `E2`.`Event_ID` = `EDN2`.`Event_ID`
WHERE `E1`.`Event_ID` > `E2`.`Event_ID` 
AND `E1`.`DateTime` = `E2`.`DateTime`
AND events_new.EventType_ID = 6;

这是与该查询的结果相同的SQLFiddle不好。我可以在数据中看到Event_ID,但是由于某种原因该查询无法执行。不知道如何继续解决此问题。

我知道这是一个SELECT查询,但我想不出一种在DELETE查询中有两个别名表的方法(我认为我需要吗?)。我想如果可以选择,可以用一些C#代码将其删除。但是理想情况下,所有这些都可以在单个查询或一组语句中完成,而不必离开MySQL。

这是我在查询中的第一个切入点,但同样糟糕:

DELETE e1 FROM eventdetails_new e1 
WHERE `events_new`.`Event_ID` > `events_new`.`Event_ID` 
AND events_new.DateTime = events_new.DateTime AND events_new.EventType_ID = 6;

SQLFiddle根本不让我运行此查询,因此并没有太大帮助。但是,它给了我与上面的错误相同的错误:Error Code: 1054. Unknown column 'events_new.Event_ID' in 'where clause'

如果有更好的方法,我绝不会嫁给这两个查询中的任何一个。我要寻找的最终结果是删除一堆重复的数据。

我有成千上万个这样的结果,而且我知道其中大约有1/3是重复的,在我们上线数据库之前,需要删除它们。

八度

这就是我最终要做的事情。我和我的同事提出了一个查询,该查询将为我们提供具有重复数据的Event_ID列表(我们实际上使用了Access 2010的查询生成器,并对其进行了MySQL验证)。请记住,这是一个完整的解决方案,其中原始问题没有链接表那么详细。如果您对此有疑问,请随时提出,我将尽力帮助您:

SELECT `Events_new`.`Event_ID`
    FROM Events_new
    GROUP BY `Events_new`.`PCBID`, `Events_new`.`EventType_ID`, `Events_new`.`DateTime`, `Events_new`.`User`
    HAVING (((COUNT(`Events_new`.`PCBID`)) > 1) AND ((COUNT(`Events_new`.`User`)) > 1) AND ((COUNT(`Events_new`.`DateTime`)) > 1))

由此,我Event_ID以迭代的方式处理了每个重复项以删除重复项。基本上,我必须删除从最低的最后一张表开始的所有子行,以免受到外键约束的影响。

这部分代码是用LinqPAD作为C#语句编写的:(sbCommonFunctions是一个内部DLL,旨在使大多数(但并非全部)数据库函数以相同或更简单的方式处理)

sbCommonFunctions.Database testDB = new sbCommonFunctions.Database();
testDB.Connect("production", "database", "user", "password");
List<string> listEventIDs = new List<string>();
List<string> listEventDetailIDs = new List<string>();
List<string> listTestInformationIDs = new List<string>();
List<string> listTestStepIDs = new List<string>();
List<string> listMeasurementIDs = new List<string>();
string dtQuery = (String.Format(@"SELECT `Events_new`.`Event_ID`
        FROM Events_new
        GROUP BY `Events_new`.`PCBID`,
        `Events_new`.`EventType_ID`,
        `Events_new`.`DateTime`,
        `Events_new`.`User`
        HAVING (((COUNT(`Events_new`.`PCBID`)) > 1)
        AND ((COUNT(`Events_new`.`User`)) > 1) 
        AND ((COUNT(`Events_new`.`DateTime`)) > 1))"));

int iterations = 0;
DataTable dtEventIDs = getDT(dtQuery, testDB);
while (dtEventIDs.Rows.Count > 0)
{
    Console.WriteLine(dtEventIDs.Rows.Count);
    Console.WriteLine(iterations);
    iterations++;
    foreach(DataRowView eventID in dtEventIDs.DefaultView)
    {
        listEventIDs.Add(eventID.Row[0].ToString());
        DataTable dtEventDetails = testDB.QueryDatabase(String.Format(
        "SELECT * FROM EventDetails_new WHERE Event_ID = {0}",
           eventID.Row[0]));
        foreach(DataRowView drvEventDetail in dtEventDetails.DefaultView)
        {
            listEventDetailIDs.Add(drvEventDetail.Row[0].ToString());
        }   
        DataTable dtTestInformation = testDB.QueryDatabase(String.Format(
        @"SELECT TestInformation_ID 
        FROM TestInformation_new 
        WHERE Event_ID = {0}",
            eventID.Row[0]));
        foreach(DataRowView drvTest in dtTestInformation.DefaultView)
        {
            listTestInformationIDs.Add(drvTest.Row[0].ToString());
            DataTable dtTestSteps = testDB.QueryDatabase(String.Format(
            @"SELECT TestSteps_ID 
            FROM TestSteps_new 
            WHERE TestInformation_TestInformation_ID = {0}",
               drvTest.Row[0]));
            foreach(DataRowView drvTestStep in dtTestSteps.DefaultView)
            {
                listTestStepIDs.Add(drvTestStep.Row[0].ToString());
                DataTable dtMeasurements = testDB.QueryDatabase(String.Format(
                @"SELECT Measurements_ID 
                FROM Measurements_new 
                WHERE TestSteps_TestSteps_ID = {0}",
                   drvTestStep.Row[0]));
                foreach(DataRowView drvMeasurements in dtMeasurements.DefaultView)
                {
                    listMeasurementIDs.Add(drvMeasurements.Row[0].ToString());
                }
            }
        }
    }
    testDB.Disconnect();
    string mysqlConnection = 
    "server=server;\ndatabase=database;\npassword=password;\nUser ID=user;";
    MySqlConnection connection = new MySqlConnection(mysqlConnection);
    connection.Open();
    //start unwinding the duplicates from the lowest level upward
    whackDuplicates(listMeasurementIDs, "measurements_new", "Measurements_ID", connection);
    whackDuplicates(listTestStepIDs, "teststeps_new", "TestSteps_ID", connection);
    whackDuplicates(listTestInformationIDs, "testinformation_new", "testInformation_ID", connection);
    whackDuplicates(listEventDetailIDs, "eventdetails_new", "eventdetails_ID", connection);
    whackDuplicates(listEventIDs, "events_new", "event_ID", connection);
    connection.Close();
    //update iterator from inside the clause in case there are more duplicates.
    dtEventIDs = getDT(dtQuery, testDB);    }

}//goofy curly brace to allow LinqPAD to deal with inline classes
public void whackDuplicates(List<string> listOfIDs,
                            string table,
                            string pkID, 
                            MySqlConnection connection)
{
    foreach(string ID in listOfIDs)
    {
        MySqlCommand command = connection.CreateCommand();
        command.CommandText = String.Format(
        "DELETE FROM " + table + " WHERE " + pkID + " = {0}", ID);
        command.ExecuteNonQuery();
    }
}
public DataTable getDT(string query, sbCommonFunctions.Database db)
{
    return db.QueryDatabase(query);
//}/*this is deliberate, LinqPAD has a weird way of dealing with inline
     classes and the last one can't have a closing curly brace (and the 
     first one has to have an extra opening curly brace above it, go figure)
   */

基本上,这是一个巨大的while循环,子句迭代器从子句内部进行更新,直到Event_ID的数目降为零为止(需要5次迭代,某些数据具有多达六个重复项)。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章