I have a very large JSON-like file, but it is not using proper JSON syntax: the object keys are not quoted. I'd like to write a script to fix the file, so that I can load it with json.loads
.
I need to match all words followed by a colon and replace them with the quoted word. I think the regex is \w+\s*:
and that I should use re.sub
, but I'm not exactly sure how to do it.
How can I take the following input and get the given output?
# In
{abc : "xyz", cde : {}, fgh : ["hfz"]}
# Out
{"abc" : "xyz", "cde" : {}, "fgh" : ["hfz"]}
# In
{
a: "b",
b: {
c: "d",
d: []
},
e: "f"
}
# Out
{
"a": "b",
"b": {
"c": "d",
"d": []
},
"e": "f"
}
Rather than a potentially fragile regex solution, you can take advantage of the fact that while your log file isn't valid JSON, it is valid YAML. Using the PyYAML library, you can load it into a Python data structure and then write it back out as valid JSON:
import json
import yaml
with open("original.log") as f:
data = yaml.load(f)
with open("jsonified.log", "w") as f:
json.dump(data, f)
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments