What's best way to update a very large file with c#

Huan Jiang

I'm not asking about only reading a large file or reading/writing a xml file which I know there are Xml related classes for handling that. Let me give a more specific description of what I'm trying to do:

I have a very large file size that is about 10TB, which I can not load into memory at once. Meaning, I could not do as below:

        var lines = File.ReadAllLines("LargeFile.txt");
        var t = 1 << 40;
        for(var i= t; i< 2 * t; i++)
        {
            lines[i] = someWork(); //
        }

        File.WriteAllLines("LargeFile.txt", lines);

I want to read and update lines in a range between 1 and 2TB.

What's the best approach doing this? Examples of .Net classes or 3rd party libraries would be helpful. I'm also interested in how other languages handle this problem as well.


I tried David's suggestion by using position. However, i feel it doesn't work. 1. the size of FileStream seems fixed, I can modify the bytes, but it will overwrite byte by byte. it my newdata size is large/less than original line of data. I won't be able to update correctly. 2. I didn't find a O(1) way to convert line num to position num. it still take me O(n) to find the position.

below is my try

    public static void ReadWrite()
    {
        var fn = "LargeFile.txt";
        File.WriteAllLines(fn, Enumerable.Range(1, 20).Select(x => x.ToString()));

        var targetLine = 11; // zero based
        long pos = -1;
        using (var fs = new FileStream(fn, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            while (fs.Position != fs.Length)
            {
                if (targetLine == 0)
                {
                    pos = fs.Position +1; // move pos to begin of next line;
                }

                // still take average O(N) time to scan whole file to find the position.
                // I'm not sure if there is better way. to redirect to the pos of x line by O(1) time.
                if (fs.ReadByte() == '\n')
                {
                    targetLine--;
                }
            }
        }

        using (var fs = new FileStream(fn, FileMode.Open, FileAccess.ReadWrite))
        {
            var data = Encoding.UTF8.GetBytes("999"); 
            fs.Position = pos;
            // if the modify data has differnt size compare to the current one
            // it will overwrite next lines of data
            fs.Write(data, 0, data.Length);
        }
    }
David Browne - Microsoft

You don't have to read through the first 1TB to modify the middle of the file. FileStream supports random access. EG

    string fn = @"c:\temp\huge.dat";
    using (var fs = new FileStream(fn, FileMode.Open, FileAccess.Read, FileShare.Read))
    {

        fs.Position = (1024L * 1024L * 1024L);
        //. . .


    }

Once you reposition the filestream you can read and write at the current location, or open a StreamReader to read text from the file. You must, of course, ensure that you move to a byte offset that begins a character in the file's encoding.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

What's the best way to update the state manually in Apache Flink?

What’s the best way to handle an 'very large' inventory with SQL?

What's the best way to validate an XML file against an XSD file?

C++ what's the best way to make some functions inside source file private?

what's the fastest way to scan a very large file in java?

What's the best way to perform DFS on a very large tree?

What's the best way to check if a file exists in C?

Best way to read a large file into a byte array in C#?

Fastest way to read very large text file in C#

What's an efficient way to randomize the ordering of the contents of a very large file?

NodeJS: What's the most efficient way to read the last X bytes of a very large file (+1GB)?

What's the proper way to parse a very large JSON file in Ruby?

Oracle SQL: What is the best way to select a subset of a very large table

Best way to read a very large array into excel sheet

what is the most efficient way to iterate over a very large table and update the rows?

What's the best way to perform very large regex operations?

What's the best way to write a big file to S3?

What's the fastest way to write a very small string to a file in Java?

What is the best way to store very large binary numbers in JavaScript?

What's the best way to refactor a large switch statement in java?

What's the best way to execute SQL Query on large excel file using vb.net?

What's the best way to serialize a large scipy sparse matrix?

What is the best way to write and append a large file in java

Is there an efficient way to search dictionaries in a very large file?

C++ (not C++11) Best way to release array in very large methods

What's the best way to design this structure in c?

In C++, what is the best way to allocate (and de-allocate) a very large array on a computer cluster?

What is the best way to return multiple large objects in C++?

best way to import very large file

TOP Ranking

HotTag

Archive