Here is the CSV file I am working with:
"A","B","C","D","E","F","G","H","I","J"
"88",18,1,"<Req TID=""34"" ReqType=""MS""><IISO /><CID>2</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""3"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-10 12.00.02 AM,27-JUN-15 12.00.00 AM,"26",667,0
"22",22,1,"<Req TID=""45"" ReqType=""MS""><IISO /><CID>4</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-22 12.00.02 AM,27-JUN-22 12.00.00 AM,"26",667,0
"32",22,1,"<Req TID=""15"" ReqType=""MS""><IISO /><CID>45</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-20 12.00.02 AM,27-JUN-34 12.00.00 AM,"26",667,0
So far I have written two generator functions to parse the XML data in column E in order to convert the XML tags and their text into a Python dictionary. Specifically, the flatten_dict()
function returns an iterable sequence of (key, value) pairs. One can turn this to a list of pairs with list(flatten_dict(root)).
That generates the following list of tuples:
[('ResponseRequestType', 'MS'),
('ResponseT', '10'),
('PrivateMembers', None),
('Ob', '0-12-af'),
('Locator', None),
('S', '[REMOVED]'),
('CNum', '[REMOVED]'),
('FName', None),
('LaName', None),
('Address', None),
('HPhone', None),
('Email', None),
('IISO', None),
('MemID', None),
('MemPass', None),
('T', None),
('CID', None),
('T', None)]
The implementation block, starting with line 75, returns the csv file's columns as keys and a list object with all of the row instances as values
A list of pairs L (or a iterable of pairs) can be also transposed with zip(*L)
My error occurs on line 79 when I attempt to construct a dictionary on the generator function. I have reviewed several posts, namely this and this.
I realize I need to pass in a set of tuples, but why I receive this error is a paradox for me. I am using Python 3.4 and Jupyter notebooks for my experimentation.
I welcome constructive (emphasis on constructive here) feedback.
# In[37]:
import xml.etree.ElementTree as ET
def flatten_list(aList, prefix=''):
for i, element in enumerate(aList, 1):
eprefix = "{}{}".format(prefix, i)
if element:
# treat like dict
if len(element) == 1 or element[0].tag != element[1].tag:
yield flatten_dict(element, eprefix)
# treat like list
elif element[0].tag == element[1].tag:
yield flatten_list(element, eprefix)
elif element.text:
text = element.text.strip()
if text:
yield eprefix[:].rstrip('.'), element.text
def flatten_dict(parent_element, prefix=''):
prefix = prefix + parent_element.tag
if parent_element.items():
for k, v in parent_element.items():
yield prefix + k, v
for element in parent_element:
eprefix = element.tag
if element:
# treat like dict - we assume that if the first two tags
# in a series are different, then they are all different.
if len(element) == 1 or element[0].tag != element[1].tag:
yield flatten_dict(element, prefix=prefix)
# treat like list - we assume that if the first two tags
# in a series are the same, then the rest are the same.
else:
# here, we put the list in dictionary; the key is the
# tag name the list elements all share in common, and
# the value is the list itself
yield flatten_list(element, prefix=eprefix)
# if the tag has attributes, add those to the dict
if element.items():
for k, v in element.items():
yield eprefix+k
# this assumes that if you've got an attribute in a tag,
# you won't be having any text. This may or may not be a
# good idea -- time will tell. It works for the way we are
# currently doing XML configuration files...
elif element.items():
for k, v in element.items():
yield eprefix+k
# finally, if there are no child tags and no attributes, extract
# the text
else:
yield eprefix, element.text
# In[75]:
from glob import iglob
import csv
from collections import OrderedDict
from xml.etree.ElementTree import ParseError
from parsexml2 import flatten_dict, flatten_list
import xml.etree.cElementTree as ElementTree
import csv
headers = set()
results = []
with open('s.csv', 'rU') as infile:
reader = csv.DictReader(infile)
data = {}
for item in reader:
for header, value in item.items():
try:
data[header].append(value)
except KeyError:
data[header] = [value]
client_responses = data['E'] #returns a list of values
for client_response in client_responses:
print('\n' + client_response)
xml_string = (''.join(client_response)) #may be not necessary
print(xml_string)
xml_string = xml_string.replace('&', '')
xml_string = xml_string.replace('�','')
print(xml_string)
try:
roots = ElementTree.fromstring(xml_string) #serialization step here
except ET.ParseError:
print("catastrophic failure")
continue
# In[79]:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-79-7053ec7639d9> in <module>()
----> 1 dict(flatten_dict(root))
ValueError: dictionary update sequence element #0 has length 3; 2 is required
There are a few issues in your flatten_dict
code.
As noted by @PatrickMaupin , You are sometimes yield value like - yield eprefix+k
- If the generator does yield this , it would dict()
would not work on it, this could be what is causing the issue. I believe what you want there would be - yield eprefix+k,v
.
You are sometimes yielding - flatten_dict(element, prefix=prefix)
- (or the flatten_list()
counterpart) , that would not work either. Lets take a simple example -
>>> def a():
... yield a()
...
>>> for i in a():
... print(i)
...
<generator object a at 0x00593B48>
As you can see , this yielded the generator object, it did not iterate over that generator object and yield the actual results. For that you would need to iterate over and yield the results manually. Example -
if len(element) == 1 or element[0].tag != element[1].tag:
for k,v in flatten_dict(element, prefix=prefix):
yield k,v
Or from Python 3.3 onwards, you can use the yield from
construct, to yield the values from another iterable. Example -
if len(element) == 1 or element[0].tag != element[1].tag:
yield from flatten_dict(element, prefix=prefix)
Same applies to flatten_list()
.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments