I have a tuple,named a, with 10 columns.Sample data looks like
((12, '22', '32'),
Column-1 Column-2 Column-3 colum-4 Column-5 Colum-6 Colum-7 Week ACCT_YEAR NAME
12 22 32 … … … … 51 2016 Name-1
12 22 32 … … … … 51 2016 Name-2
12 22 32 … … … … 51 2016 Name-3
12 22 32 … … … … 51 2016 Name-4
12 22 32 … … … … 51 2016 Name-5
12 22 32 … … … … 51 2016 Name-6
12 22 32 … … … … 52 2016 Name-7
12 22 32 … … … … 52 2016 Name-8
12 22 32 … … … … 52 2016 Name-9
12 22 32 … … … … 52 2016 Name-10
12 22 32 … … … … 52 2016 Name-11
12 22 32 … … … … 52 2016 Name-12
12 22 32 … … … … 52 2016 Name-13
12 22 32 … … … … 52 2016 Name-14)
I want to convert it to pandas data frame. So I used following code
y=pd.DataFrame(list(a))
But y.shape[0] is showing 2 & after printing y,I'm seeing it contains 2 rows,where second row is column heading & first row contains data for some of the columns & None for other columns also it has more columns than my tuple a has. Can you please suggest me how to do it correctly in python 3.6
The output of a.repr() is given below
((12, '22', '32'), Column-1 Column-2 Column-3 Column-4 Column-5 \\\n1101 12 22 32 ... ... \n1102 12 22 32 ... ... \n1103 12 22 32 ... ... \n1104 12 22 32 ... ... \n1105 12 22 32 ... ... \n1106 12 22 32 ... ... \n1107 12 22 32 ... ... \n1108 12 22 32 ... ... \n1109 12 22 32 ... ... \n1110 12 22 32 ... ... \n1111 12 22 32 ... ... \n1112 12 22 32 ... ... \n1113 12 22 32 ... ... \n1114 12 22 32 ... ... \n1115 12 22 32 ... ... \n1116 12 22 32 ... ... \n1117 12 22 32 ... ... \n1118 12 22 32 ... ... \n1119 12 22 32 ... ... \n1120 12 22 32 ... ... \n1121 12 22 32 ... ... \n1122 12 22 32 ... ... \n1123 12 22 32 ... ... \n1124 12 22 32 ... ... \n1125 12 22 32 ... ... \n1126 12 22 32 ... ... \n1127 12 22 32 ... ... \n1128 12 22 32 ... ... \n\n Column-6 Column-7 W20162016k 51CC51_Y201651R \\\n1101 ... ... 515151P325151M51 2016 \n1102 ... ... 51 51 \n1103 ... ... 0000453 2016 \n1104 ... ... 0000512 2016 \n1105 ... ... 51 51 \n1106 ... ... 51 51 \n1107 ... ... 51 51 \n1108 ... ... 51 51 \n1109 ... ... 0000561 2016 \n1110 ... ... 0000871 2016 \n1111 ... ... 51 51 \n1112 ... ... 51 51 \n1113 ... ... 51 51 \n1114 ... ... C51 51 \n1115 ... ... 0000604 51 \n1116 ... ... 51 51 \n1117 ... ... 51 51 \n1118 ... ... 511 51 \n1119 ... ... 51 51 \n1120 ... ... 51 51 \n1121 ... ... 51 51 \n1122 ... ... 51 51 \n1123 ... ... 51 51 \n1124 ... ... 51 51 \n1125 ... ... 51 51 \n1126 ... ... 51 51 \n1127 ... ... 51 51 \n1128 ... ... 5151 51 \n\n N51M2016 \n1101 C \n1102 C \n1103 C \n1104 C \n1105 C \n1106 C \n1107 C \n1108 C \n1109 C \n1110 C \n1111 C \n1112 C \n1113 C \n1114 C \n1115 C \n1116 C \n1117 C \n1118 C \n1119 C \n1120 C \n1121 C \n1122 C \n1123 C \n1124 C \n1125 C \n1126 C \n1127 C \n1128 C )"
The answer strongly depends on the String in your tuple. If what you copied is actually whats in the string, you have to convert the string to something pandas can parse, that's why I added the regex substitution.
import pandas as pd
import io
import re
a = (('12','22','32'),
"""Column-1 Column-2 Column-3 colum-4 Column-5 Colum-6 Colum-7 Week ACCT_YEAR NAME
12 22 32 … … … … 51 2016 Name-1
12 22 32 … … … … 51 2016 Name-2
12 22 32 … … … … 51 2016 Name-3""")
# The following substitution is only valid if there are absolutely no spaces in values
b = re.sub(string=a[1], pattern=' +', repl=',')
y = pd.read_csv(io.StringIO(b))
y
NB: this answer assumes the first value in tuple a
is not part of the data that should be read into the DataFrame. This makes it more into the question how to read data saved in a string into a pandas.DataFrame rather than a tuple.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments