我试图从 csv 文件中获取一些项目,但有一个问题,它有不同的列数,所以我不能使用 pandas.read_csv(filepath) 函数来读取它。我需要打开它,然后才能选择显示的一些项目。csv 文件如下(每行之间添加一个空行,以便大家阅读):
“路径”、“文件”、“获取日期”、“样本”、“杂项”
"C:\msdchem\2\DATA\AlbertVirgili\DaniGM\","DGM_CPTIS003 1h.D","25-Mar-19, 11:55:48","DGM_CPTIS003 1h",""
“INT FID1A.CH”
“2019 年 3 月 25 日星期一 17:48:31”
“峰值”、“RT”、“开始”、“结束”、“PK TY”、“高度”、“面积”、“Pct Max”、“Pct Total”
1, 2.082, 2.063, 2.189, "BB ",223849319,4951058782,100.00, 46.349
2, 2.317, 2.281, 2.386,"BB",73209942,1093871144, 22.09, 10.240
3、3.343、3.224、3.403、“BB”、93165657、2220621038、44.85、20.788
4, 5.538, 5.409, 5.598,"BB",51783798,1975386485, 39.90, 18.492
5, 5.744, 5.693, 5.803,"BB",24084957,360235490, 7.28, 3.372
6、8.716、8.676、8.776、“BB”、8566883、80973220、1.64、0.758
“路径”、“文件”、“获取日期”、“样本”、“杂项”
"C:\msdchem\2\DATA\AlbertVirgili\DaniGM\","DGM_CPTIS003 2h.D","25-Mar-19, 12:15:42","DGM_CPTIS003 2h",""
“INT FID1A.CH”
“2019 年 3 月 25 日星期一 12:31:45”
“峰值”、“RT”、“开始”、“结束”、“PK TY”、“高度”、“面积”、“Pct Max”、“Pct Total”
1, 2.083, 2.064, 2.194, "BB ",232382153,5255486688,100.00, 59.673
2, 2.318, 2.282, 2.384,"BB",37916041,587535474, 11.18, 6.671
3、3.322、3.241、3.381、“BB”、67715293、1373898201、26.14、15.600
4, 5.509, 5.406, 5.569,"BB",39502747,1227609422, 23.36, 13.939
5, 5.731, 5.689, 5.791,"BB",17799521,230201751, 4.38, 2.614
6, 8.717, 8.674, 8.776,"BB",12367646,132409300, 2.52, 1.503
我需要做的是阅读标题下的项目:峰值,RT,开始,结束,PK TY,...但我不能这样做,因为它们与前几行的长度不同(带有标题路径,文件, 获取日期...)。我不能使用 skiprows 函数来消除 0-3 和 11-14 行,因为我想读取的部分的行数并不总是一致的(这种类型的文件是由外部程序生成的,我不能修改其结构)。有什么方法可以用来只读取属于我想要的标题下的 csv 代码部分,以便我可以使用它从这些值中选择所需的数据?
在此先感谢您的帮助。
你需要做一些预处理。如果您使用来自外部系统的数据,那么考虑这些集成点是很常见的。
外部文件包含结构化数据。一系列 CSV 行,每个项目有 5 个标题行。最后一个标题行包含 CSV 列标签。
从外部文件读入内容。根据您的需要调整下面的代码。
external_file_content = r'''
"Path","File","Date Acquired","Sample","Misc"
"C:\msdchem\2\DATA\AlbertVirgili\DaniGM\","DGM_CPTIS003 1h.D","25-Mar-19, 11:55:48","DGM_CPTIS003 1h"," "
"INT FID1A.CH"
"Mon Mar 25 17:48:31 2019"
"Peak","R.T.","Start","End","PK TY","Height","Area","Pct Max","Pct Total"
1, 2.082, 2.063, 2.189,"BB ",223849319,4951058782,100.00, 46.349
2, 2.317, 2.281, 2.386,"BB ",73209942,1093871144, 22.09, 10.240
3, 3.343, 3.224, 3.403,"BB ",93165657,2220621038, 44.85, 20.788
4, 5.538, 5.409, 5.598,"BB ",51783798,1975386485, 39.90, 18.492
5, 5.744, 5.693, 5.803,"BB ",24084957,360235490, 7.28, 3.372
6, 8.716, 8.676, 8.776,"BB ",8566883, 80973220, 1.64, 0.758
"Path","File","Date Acquired","Sample","Misc"
"C:\msdchem\2\DATA\AlbertVirgili\DaniGM\","DGM_CPTIS003 2h.D","25-Mar-19, 12:15:42","DGM_CPTIS003 2h"," "
"INT FID1A.CH"
"Mon Mar 25 12:31:45 2019"
"Peak","R.T.","Start","End","PK TY","Height","Area","Pct Max","Pct Total"
1, 2.083, 2.064, 2.194,"BB ",232382153,5255486688,100.00, 59.673
2, 2.318, 2.282, 2.384,"BB ",37916041,587535474, 11.18, 6.671
3, 3.322, 3.241, 3.381,"BB ",67715293,1373898201, 26.14, 15.600
4, 5.509, 5.406, 5.569,"BB ",39502747,1227609422, 23.36, 13.939
5, 5.731, 5.689, 5.791,"BB ",17799521,230201751, 4.38, 2.614
6, 8.717, 8.674, 8.776,"BB ",12367646,132409300, 2.52, 1.503
'''
使用定义明确的分隔符将序列拆分为独特的部分
parts = external_file_content.split('"Path","File","Date Acquired","Sample","Misc"')
选择单个部分以进一步处理成 Pandas DataFrame。配置pd.read_csv
为跳过 4 行。
df = pd.read_csv(StringIO(parts[1]), skiprows=4);
显示 DataFrame 的第一行
df.head(5)
Peak R.T. Start End PK TY Height Area Pct Max Pct Total
0 1 2.082 2.063 2.189 BB 223849319 4951058782 100.00 46.349
1 2 2.317 2.281 2.386 BB 73209942 1093871144 22.09 10.240
2 3 3.343 3.224 3.403 BB 93165657 2220621038 44.85 20.788
3 4 5.538 5.409 5.598 BB 51783798 1975386485 39.90 18.492
4 5 5.744 5.693 5.803 BB 24084957 360235490 7.28 3.372
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句