Say I am trying to build a dataframe to print out like a table for checking sectors:
SectorDescription SectorCode
0 State Energy Data Systems SEDS
1 Coal Data COAL
2 Petroleum Data PET
3 Natural Gas Data NG
4 Electricity Data ELEC
5 Petroleum Imports Data PET_IMPORTS
6 Short-Term Energy Outlook Data STEO
7 International Energy Data INTL
8 Annual Energy Outlook Data AEO
Right now I have:
QuandlEIASector = {"State Energy Data Systems":"SEDS",
"Coal Data":"COAL",
"Petroleum Data":"PET",
"Natural Gas Data":"NG",
"Electricity Data":"ELEC",
"Petroleum Imports Data":"PET_IMPORTS",
"Short-Term Energy Outlook Data":"STEO",
"International Energy Data":"INTL",
"Annual Energy Outlook Data":"AEO"}
What I did is to:
QuandlEIASectorList = pd.DataFrame()
QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
QuandlEIASectorList
But is there anyway quicker with python's comprehension one-liner to assign column values to a pandas dataframe?
Create Series
and then convert to DataFrame
:
QuandlEIASectorList = (pd.Series(QuandlEIASector)
.rename_axis('SectorDescription')
.reset_index(name='SectorCode'))
Similar:
QuandlEIASectorList = (pd.Series(QuandlEIASector, name='SectorCode')
.rename_axis('SectorDescription')
.reset_index())
Your code should be used with DataFrame
constructor:
QuandlEIASectorList = pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
'SectorCode': list(QuandlEIASector.values())})
Or:
QuandlEIASectorList = pd.DataFrame(list(QuandlEIASector.items()),
columns=['SectorDescription','SectorCode'])
Performance for 10k keys:
QuandlEIASector = dict(zip([f'{x} data' for x in np.arange(10000)],
[f'{x} keys' for x in np.arange(10000)]))
In [73]: %%timeit
...: QuandlEIASectorList = pd.DataFrame()
...: QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
...: QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
...:
5.94 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [74]: %%timeit
...: (pd.Series(QuandlEIASector)
...: .rename_axis('SectorDescription')
...: .reset_index(name='SectorCode'))
...:
5.37 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [75]: %%timeit
...: (pd.Series(QuandlEIASector, name='SectorCode')
...: .rename_axis('SectorDescription')
...: .reset_index())
...:
5.34 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [76]: %%timeit
...: pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
...: 'SectorCode': list(QuandlEIASector.values())})
...:
2.26 ms ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [77]: %%timeit
...: pd.DataFrame(list(QuandlEIASector.items()),
...: columns=['SectorDescription','SectorCode'])
...:
3.15 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments