What's the proper way to parse a very large JSON file in Ruby?

Alexander

How we could parse a json file in Ruby?

require 'json'

JSON.parse File.read('data.json')

What if the file is very large and we don't want to load it into memory at once? How would we parse it then?

mdegis

Since you said don't want to load it into memory at once, maybe doing this by chunks is more suitable for you. You can check yajl-ffi gem to achieve this. From their documantation:

For larger documents, we can use an IO object to stream it into the parser. We still need room for the parsed object, but the document itself is never fully read into memory.

require 'yajl/ffi'
stream = File.open('/tmp/test.json')
obj = Yajl::FFI::Parser.parse(stream)

However, when streaming small documents from disk, or over the network, the yajl-ruby gem will give us the best performance.

Huge documents arriving over the network in small chunks to an EventMachine receive_data loop is where Yajl::FFI is uniquely suited. Inside an EventMachine::Connection subclass we might have:

def post_init
  @parser = Yajl::FFI::Parser.new
  @parser.start_document { puts "start document" }
  @parser.end_document   { puts "end document" }
  @parser.start_object   { puts "start object" }
  @parser.end_object     { puts "end object" }
  @parser.start_array    { puts "start array" }
  @parser.end_array      { puts "end array" }
  @parser.key            { |k| puts "key: #{k}" }
  @parser.value          { |v| puts "value: #{v}" }
end

def receive_data(data)
  begin
    @parser << data
  rescue Yajl::FFI::ParserError => e
    close_connection
  end
end

The parser accepts chunks of the JSON document and parses up to the end of the available buffer. Passing in more data resumes the parse from the prior state. When an interesting state change happens, the parser notifies all registered callback procs of the event.

The event callback is where we can do interesting data filtering and passing to other processes. The above example simply prints state changes, but the callbacks might look for an array named rows and process sets of these row objects in small batches. Millions of rows, streaming over the network, can be processed in constant memory space this way.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

What's an efficient way to randomize the ordering of the contents of a very large file?

what's the fastest way to scan a very large file in java?

What's best way to update a very large file with c#

NodeJS: What's the most efficient way to read the last X bytes of a very large file (+1GB)?

What is the proper way to parse the entries of a manifest.mf file in jar?

What's the proper way to parse text with tag using Rust nom?

What’s the best way to handle an 'very large' inventory with SQL?

What's the best way to perform DFS on a very large tree?

What's the best way to perform very large regex operations?

Dealing with a very large JSON file

What's the proper way to import variables from a different file?

What's the proper way to edit the text and the links of a menubar in a php file?

Parse a very large text file with Python?

What's the proper way to convert a json.RawMessage to a struct?

What's the proper way of adding a json subDataSource in Jasper Reports?

What's the proper way to check if JSON has a key?

Parse very large JSON files with dynamic data

Most efficient way to parse every fourth line from a very large file

Proper way to implement RESTful large file upload

What is the proper way to parse a Lucene Query in python?

What's the fastest way to write a very small string to a file in Java?

Is there a proper way to parse JSON prepended with a loop in JavaScript?

Is there an efficient way to search dictionaries in a very large file?

best way to import very large file

Best way to parse large CSV files in ruby

What's the right way to implement very deep recursion in Ruby in *clean* way?

Parse large JSON file in Nodejs

Loading and cleaning a very large JSON file

How to parse large xml file in ruby

TOP Ranking

HotTag

Archive