Python: Read csv file of which one column contains multiple commas

RTian

I have utf-8 encoded comma-delimited csv file that one of the columns contains multiple commas however I need to import them as one column for further manipulation. The data frame looks like

C1 C2 C3 C4 C5 C6      C7.... C27
1, 2, 3, 4, 5, A,B,C,   2 .......
3, 5, 3, 4, 6, A,B,C,D, 8 .......
1, 2, 2, 5, 8, A,B,     7 .......
3, 5, 3, 4, 6, ABCDE,   8 .......
1, 2, 3, 4, 5, A,B,C,D  2 .......

So the column 6 contains some Chinese character as well as different number of commas. The columns 5 and 7 are all numeric. The data frame has 27 columns in total. I want the characters in the 6th columns treated as value in one cell instead of values for more than one variables.

I know that you can use quotation sign first but I'm wondering how exactly you would do it. I have more than 1000 files like this that I have to open.

Any suggestions would be appreciated!

A follow-up question: What if the number of columns are different for different files? Is it possible to use regular expression to define the pattern of columns and get the number of the columns first, and then decide how to split the columns?

I am thinking now to get the columns of each files first and save them to a csv file, and then use the method in the possible duplicate question. But any suggestions on a more efficient way would be appreciated!

Doug

Since you know what the desired number of rows are what you want to do is take the difference between the back of the row and the front using set(). You can just change the num_cols for other files.

import csv

filename = 'mycsv.csv'
num_cols = 26 # "The data frame has 27 columns in total"

with open(filename, newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        try:
            assert len(row) >= num_cols, f'The csv file does not contain at least {num_cols} columns.'
            after_sixth = row[-21:] # everything after the '6th' column
            before_sixth = row[:5]
            everything_else = after_six + before_sixth
            sixth_row = set(row)- set(everything_else)
            new_row = before_sixth + sixth_row + everything_else
            print(new_row)
        except AssertionError as e:
            print(e)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

read specified column from a file which contains multiple sets of data using Python

How to go about the .csv file which contains text lines with commas?

Reading and splitting a .csv file, which contains strings with commas in

Python - how to store multiple variables in one column of a .csv file and then read those variables into a list

Python - Column in CSV file contains multiple delimiters and results

how to read .csv file that contains more than one value in one column

CSV file has commas in data, which python interprets as extra columns

How to read a csv file with commas in field with pandas python?

Extract one column into multiple Column csv file

read multiple csv file in python

Commas in column values of Comma separate value file - python read issue

Multiple commas in csv file with delimiter = ','?

Can you read a CSV file as one column?

pandas read csv with extra commas in column

How to read a .csv file in Python/Pandas in which comma(,) is a delimiter and present in the column name as well?

Read csv with json column which is nested multiple times into dataframe

Extract a column from csv file which has few rows with extra commas as value(address field), which causes the column count to break

Python cannot read a file which contains a specific string

Read CSV file with embedded double quotes and commas

Read csv file by column number in pandas python

Does a single column CSV file have commas?

python read CSV with commas as separators but interpret commas inside quotes as thousands

How to a read CSV file that contains Arabic lines using Python

Read multiple file in python and generate one output

How to remove multiple commas but keep one in between two values in a csv file?

How to read multiple csv in python and get one csv as output

Convert multiple JSON File to CSV File, each in one column

Pandas read_csv big file puts every column into one

How to read only one column from csv file using Powershell

TOP Ranking

HotTag

Archive