Python: Read csv file of which one column contains multiple commas

RTian Published at Dev

RTian

I have utf-8 encoded comma-delimited csv file that one of the columns contains multiple commas however I need to import them as one column for further manipulation. The data frame looks like

C1 C2 C3 C4 C5 C6      C7.... C27
1, 2, 3, 4, 5, A,B,C,   2 .......
3, 5, 3, 4, 6, A,B,C,D, 8 .......
1, 2, 2, 5, 8, A,B,     7 .......
3, 5, 3, 4, 6, ABCDE,   8 .......
1, 2, 3, 4, 5, A,B,C,D  2 .......

So the column 6 contains some Chinese character as well as different number of commas. The columns 5 and 7 are all numeric. The data frame has 27 columns in total. I want the characters in the 6th columns treated as value in one cell instead of values for more than one variables.

I know that you can use quotation sign first but I'm wondering how exactly you would do it. I have more than 1000 files like this that I have to open.

Any suggestions would be appreciated!

A follow-up question: What if the number of columns are different for different files? Is it possible to use regular expression to define the pattern of columns and get the number of the columns first, and then decide how to split the columns?

I am thinking now to get the columns of each files first and save them to a csv file, and then use the method in the possible duplicate question. But any suggestions on a more efficient way would be appreciated!

Doug

Since you know what the desired number of rows are what you want to do is take the difference between the back of the row and the front using set(). You can just change the num_cols for other files.

import csv

filename = 'mycsv.csv'
num_cols = 26 # "The data frame has 27 columns in total"

with open(filename, newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        try:
            assert len(row) >= num_cols, f'The csv file does not contain at least {num_cols} columns.'
            after_sixth = row[-21:] # everything after the '6th' column
            before_sixth = row[:5]
            everything_else = after_six + before_sixth
            sixth_row = set(row)- set(everything_else)
            new_row = before_sixth + sixth_row + everything_else
            print(new_row)
        except AssertionError as e:
            print(e)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-05-26

Comments

0 comments

Extract a column from csv file which has few rows with extra commas as value(address field), which causes the column count to break

TOP Ranking

Article

Python: Read csv file of which one column contains multiple commas

Python: Read csv file of which one column contains multiple commas

pump.io port in URL

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

How to import an asset in swift using Bundle.main.path() in a react-native native module

Inner Loop design for webscrapping

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' (111 Connection refused)

Removed zsh, but forgot to change shell back to bash, and now Ubuntu crashes (wsl)

Ambiguous use of 'init' with CFStringTransform and Swift 3

Resetting Value of <input type="time"> in Firefox

Execute ./script.sh with a crontab

Converting a class method to a property with a backing field

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

How to update azerothcore-wotlk docker container

How to set tab order for array of cluster,where cluster elements have different data types in LabVIEW?

Grails with Oracle thick OCI driver authenticate to Oracle with wrong user

How to pass data to the ng2-bs3-modal?

Making Array From Page Elements in jQuery

Retrieve Element Tag Value XML Using Bash

Laravel's ORM sync with timestamps doesn't update timestamps

Do animations stop css changes after animation completion?