pd.read_html 更改了数字格式

玛丽

无法1,2,3,4,5,6从 的列中获取CCCCCCCpd.read_html格式更改为 后123456,我的预期结果应保留1,2,3,4,5,6

HTML代码

html = """<html>
<body>
<div id="MMMMMMMM" class="MMMMMMMMMMM" style="">
        <table class="OOOOOOOO" style="">
            <thead>
                <tr class="PPPPPPPPPP">
                    <td colspan="3" style="font-size:14px;font-weight:bold;" class="QQQQQQQQQQ">AAAAAAA</td>
                </tr>
                <tr class="RRRRRRRRRR">
                    <td>BBBBBB</td>
                    <td>CCCCCCC</td>
                    <td>AAAAAAA</td>
                </tr>
            </thead>
            <tbody>
                    <tr class="SSSSSSSS">
                        <td rowspan="1">DDDDDD</td>
                        <td class="L_LLLL67">1,2,3,4,5,6</td>
                        <td class="L_LLLL67 f_tar">1234.56</td>
                    </tr>
                    <tr class="">
                        <td rowspan="3">EEEEEEEEE</td>
                        <td class="L_LLLL67">1,2,3,4,5,6</td>
                        <td class="L_LLLL67 f_tar">1234.56</td>
                    </tr>
                        <tr class="">
                            <td class="L_LLLL67">1,2,3,4,5,6</td>
                            <td class="L_LLLL67 f_tar">1234.56</td>
                        </tr>
                        <tr class="">
                            <td class="L_LLLL67">1,2,3,4,5,6</td>
                            <td class="L_LLLL67 f_tar">1234.56</td>
                        </tr>
                    <tr class="">
                        <td rowspan="1">FFFFFFFFF</td>
                        <td class="L_LLLL67">1,2,3,4,5,6</td>
                        <td class="L_LLLL67 f_tar">1234.56</td>
                    </tr>
                    <tr class="TTTTTT">
                        <td rowspan="1">GGGGGGGGG</td>
                        <td class="L_LLLL67">1,2,3,4,5,6</td>
                        <td class="L_LLLL67 f_tar">1234.56</td>
                    </tr>
                    <tr class="">
                        <td rowspan="1">HHHHHHHHH</td>
                        <td class="L_LLLL67">1,2,3,4,5,6</td>
                        <td class="L_LLLL67 f_tar">1234.56</td>
                    </tr>
                    <tr class="TTTTTTT">
                        <td rowspan="1">IIIIIIIIII</td>
                        <td class="L_LLLL67">1,2,3,4,5,6</td>
                        <td class="L_LLLL67 f_tar">1234.56</td>
                    </tr>
                    <tr class="">
                        <td rowspan="1">JJJJJJJJ</td>
                        <td class="L_LLLL67">1,2,3,4,5,6</td>
                        <td class="L_LLLL67 f_tar">1234.56</td>
                    </tr>
                    <tr class="TTTTT">
                        <td rowspan="2">KKKKKKKK</td>
                        <td class="L_LLLL67">1/2/3/4/5/6</td>
                        <td class="L_LLLL67 f_tar">1234.56</td>
                    </tr>
                        <tr class="TTTTTT">
                            <td class="L_LLLL67">1/2/3/4/5/6</td>
                            <td class="L_LLLL67 f_tar">1234.56</td>
                        </tr>
            </tbody>
        </table>
</body>
</html>"""

Python代码

from bs4 import BeautifulSoup
import pandas as pd

soup = BeautifulSoup(html,'html.parser')
table = soup.find('div', attrs={'id':'MMMMMMMM'})
df_list = pd.read_html(str(table), header=1)
df_list

执行结果

 [        BBBBBB      CCCCCCC  AAAAAAA
 0       DDDDDD       123456  1234.56
 1    EEEEEEEEE       123456  1234.56
 2    EEEEEEEEE       123456  1234.56
 3    EEEEEEEEE       123456  1234.56
 4    FFFFFFFFF       123456  1234.56
 5    GGGGGGGGG       123456  1234.56
 6    HHHHHHHHH       123456  1234.56
 7   IIIIIIIIII       123456  1234.56
 8     JJJJJJJJ       123456  1234.56
 9     KKKKKKKK  1/2/3/4/5/6  1234.56
 10    KKKKKKKK  1/2/3/4/5/6  1234.56]

预期结果

 [        BBBBBB      CCCCCCC  AAAAAAA
 0       DDDDDD       1,2,3,4,5,6  1234.56
 1    EEEEEEEEE       1,2,3,4,5,6  1234.56
 2    EEEEEEEEE       1,2,3,4,5,6  1234.56
 3    EEEEEEEEE       1,2,3,4,5,6  1234.56
 4    FFFFFFFFF       1,2,3,4,5,6  1234.56
 5    GGGGGGGGG       1,2,3,4,5,6  1234.56
 6    HHHHHHHHH       1,2,3,4,5,6  1234.56
 7   IIIIIIIIII       1,2,3,4,5,6  1234.56
 8     JJJJJJJJ       1,2,3,4,5,6  1234.56
 9     KKKKKKKK       1/2/3/4/5/6  1234.56
 10    KKKKKKKK       1/2/3/4/5/6  1234.56]
 
NK03

您需要添加thousands参数并将其设置为None默认为','.

from bs4 import BeautifulSoup
import pandas as pd

soup = BeautifulSoup(html,'html.parser')
table = soup.find('div', attrs={'id':'MMMMMMMM'})
df_list = pd.read_html(str(table), header=1, thousands=None)
df_list
输出:
[        BBBBBB      CCCCCCC  AAAAAAA
 0       DDDDDD  1,2,3,4,5,6  1234.56
 1    EEEEEEEEE  1,2,3,4,5,6  1234.56
 2    EEEEEEEEE  1,2,3,4,5,6  1234.56
 3    EEEEEEEEE  1,2,3,4,5,6  1234.56
 4    FFFFFFFFF  1,2,3,4,5,6  1234.56
 5    GGGGGGGGG  1,2,3,4,5,6  1234.56
 6    HHHHHHHHH  1,2,3,4,5,6  1234.56
 7   IIIIIIIIII  1,2,3,4,5,6  1234.56
 8     JJJJJJJJ  1,2,3,4,5,6  1234.56
 9     KKKKKKKK  1/2/3/4/5/6  1234.56
 10    KKKKKKKK  1/2/3/4/5/6  1234.56]

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章