I have utf-8 encoded comma-delimited csv file that one of the columns contains multiple commas however I need to import them as one column for further manipulation. The data frame looks like
C1 C2 C3 C4 C5 C6 C7.... C27
1, 2, 3, 4, 5, A,B,C, 2 .......
3, 5, 3, 4, 6, A,B,C,D, 8 .......
1, 2, 2, 5, 8, A,B, 7 .......
3, 5, 3, 4, 6, ABCDE, 8 .......
1, 2, 3, 4, 5, A,B,C,D 2 .......
So the column 6 contains some Chinese character as well as different number of commas. The columns 5 and 7 are all numeric. The data frame has 27 columns in total. I want the characters in the 6th columns treated as value in one cell instead of values for more than one variables.
I know that you can use quotation sign first but I'm wondering how exactly you would do it. I have more than 1000 files like this that I have to open.
Any suggestions would be appreciated!
A follow-up question: What if the number of columns are different for different files? Is it possible to use regular expression to define the pattern of columns and get the number of the columns first, and then decide how to split the columns?
I am thinking now to get the columns of each files first and save them to a csv file, and then use the method in the possible duplicate question. But any suggestions on a more efficient way would be appreciated!
Since you know what the desired number of rows are what you want to do is take the difference between the back of the row and the front using set(). You can just change the num_cols for other files.
import csv
filename = 'mycsv.csv'
num_cols = 26 # "The data frame has 27 columns in total"
with open(filename, newline='') as f:
reader = csv.reader(f)
for row in reader:
try:
assert len(row) >= num_cols, f'The csv file does not contain at least {num_cols} columns.'
after_sixth = row[-21:] # everything after the '6th' column
before_sixth = row[:5]
everything_else = after_six + before_sixth
sixth_row = set(row)- set(everything_else)
new_row = before_sixth + sixth_row + everything_else
print(new_row)
except AssertionError as e:
print(e)
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments