Stack/Unstack Pandas Data Frame

Shck Tchamna

I have the following dataset Excel Dummy DataSet that consists of a concatenation of several Tables in an Excel sheet. They are all stacked vertically. The columns of the different tables are same, col_x, col_y, col_t, except for the column Y that varies as the tables change (See the figure below).

I somehow manage to get the output. However, I wonder if there is a simpler|more efficient way to do this?

enter image description here

This is what I have tried

import pandas as pd

# Import Data
path = r"/content/test_data.xlsx"
df_original = pd.read_excel(path, skiprows=4, usecols= range(0,4), header=None)
df_original.columns=["col_x","col_y","col_z","col_t"]

# Begining of the code
mask_col_x = df_original["col_x"] == "col_x"
df_break = df_original[mask_col_x]
index_break_list = df_break.index

range_list = []

for i, val in enumerate(index_break_list):

    if i < len(index_break_list)-1:

        span1 = (val+1,index_break_list[i+1],df_original["col_y"][val])

        range_list.append(span1)

    span1 = (val+1,len(df_original),df_original["col_y"][val])

range_list.append(span1)

dataframe_list = []

for elt in range_list:

    df_sub = df_original.iloc[elt[0]:elt[1]].copy()

    df_sub["Value y"] = elt[2]

    dataframe_list.append(df_sub)
 

new_df = pd.concat(dataframe_list,axis=0)

new_df.to_csv("test_data_result_combined.csv")
jezrael

You can create column Value y by mask with Series.where and then forward filling missing values by ffill and last filter out rows by invert mask by ~:

path = "test_data.xlsx"
df_original = pd.read_excel(path, skiprows=4, usecols= range(0,4), header=None)
df_original.columns=["col_x","col_y","col_z","col_t"]

mask_col_x = df_original["col_x"] == "col_x"
df_original['Value y'] = df_original["col_y"].where(mask_col_x).ffill()

new_df = df_original[~mask_col_x]
print (new_df)
      col_x          col_y          col_z          col_t Value y
1    index1  val_y1_table1  val_z1_table1  val_t1_table1      y1
2    index2  val_y2_table1  val_z2_table1  val_t2_table1      y1
3    index3  val_y3_table1  val_z3_table1  val_t3_table1      y1
4    index4  val_y4_table1  val_z4_table1  val_t4_table1      y1
6    index5  val_y1_table2  val_z1_table2  val_t1_table2      y2
7    index6  val_y2_table2  val_z2_table2  val_t2_table2      y2
8    index7  val_y3_table2  val_z3_table2  val_t3_table2      y2
10   index8  val_y1_table3  val_z1_table3  val_t1_table3      y3
11   index9  val_y2_table3  val_z2_table3  val_t2_table3      y3
13  index10  val_y1_table4  val_z1_table4  val_t1_table4      y4
15  index11  val_y1_table5  val_z1_table5  val_t1_table5      y5
16  index12  val_y2_table5  val_z2_table5  val_t2_table5      y5
17  index13  val_y3_table5  val_z3_table5  val_t3_table5      y5
18  index14  val_y4_table5  val_z4_table5  val_t4_table5      y5
19  index15  val_y5_table5  val_z5_table5  val_t5_table5      y5
20  index16  val_y6_table5  val_z6_table5  val_t6_table5      y5
21  index17  val_y7_table5  val_z7_table5  val_t7_table5      y5

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related