pandas - how do i get the difference from a dataframe on the same column

sree Published at Dev

38

sree

I have 2 identical dataframes apart from the 'value' column, need to get the difference of the 2 dataframes on the 'value' column based on year+name+month columns, and append that to the data set.

x1 = {
    "year": ["2018", "2018", "2018", "2018", "2018", "2018"],
    "name": ["abc", "xyz", "pqr", "stu", "hij", "efg"],
    "month": ["Jan-18", "Feb-18", "Mar-18", "Apr-18", "May-18", "Jun-18"],
    "value": [100, 200, 300, 400, 500, 600],
}
x2 = {
    "year": ["2019", "2019", "2019", "2019", "2019", "2019"],
    "name": ["abc", "xyz", "pqr", "stu", "hij", "efg"],
    "month": ["Jan-18", "Feb-18", "Mar-18", "Apr-18", "May-18", "Jun-18"],
    "value": [700, 300, 200, 500, 600, 100],
}
y1 = pd.DataFrame(x1).append(pd.DataFrame(x2), ignore_index=True)

print(y1)

result should be like rows 12 & 13

    year name   month  value
0   2018  abc  Jan-18    100
1   2018  xyz  Feb-18    200
...
...
6   2019  abc  Jan-18    700
7   2019  xyz  Feb-18    300
...
...
12   diff  abc  Jan-18    (700-100)
13   diff  xyz  Feb-18    (300-200)

BEN_YO

Check with groupby and diff after sort_values

y2=y1.copy()
y2=y2.sort_values('year')
y2['value']=y2.groupby(['name','month']).value.diff()
y1=y1.append(y2.dropna().assign(year='diff'))
y1
    year name   month  value
0   2018  abc  Jan-18  100.0
1   2018  xyz  Feb-18  200.0
2   2018  pqr  Mar-18  300.0
3   2018  stu  Apr-18  400.0
4   2018  hij  May-18  500.0
5   2018  efg  Jun-18  600.0
6   2019  abc  Jan-18  700.0
7   2019  xyz  Feb-18  300.0
8   2019  pqr  Mar-18  200.0
9   2019  stu  Apr-18  500.0
10  2019  hij  May-18  600.0
11  2019  efg  Jun-18  100.0
6   diff  abc  Jan-18  600.0
7   diff  xyz  Feb-18  100.0
8   diff  pqr  Mar-18 -100.0
9   diff  stu  Apr-18  100.0
10  diff  hij  May-18  100.0
11  diff  efg  Jun-18 -500.0

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-5

Comments

0 comments

Login to comment

Related

How do I get the value of a column in a Pandas DataFrame with the name based on another columns value on the same row?

How do I get all values from one position in a tuple in a pandas dataframe column?

How do I remove/omit the count column from the dataframe in Pandas?

How do I update the values in a pandas dataframe column until first occurance of a value in the same column?

How do I append to a Pandas DataFrame column?

How do I get the change from the same quarter in the previous year in a pandas datatable grouped by more than 1 column

In a Pandas dataframe, how can I extract the difference between the values on separate rows within the same column, conditional on a second column?

How to get a paired difference between values in the same dataframe column with R

How do I split a dataframe column values in pandas to get another column using python?

How do I iterate over a column in a Pandas.DataFrame and append the result of a function to the same row?

How do I compare dates in the same column based on criteria from another column in Pandas?

How do I split data out from one column of a pandas dataframe into multiple columns of a new dataframe

How Do I Create New Column In Pandas Dataframe Using Two Columns Simultaneously From A Different Dataframe?

How to get the previous value of the same row (previous column) from the pandas dataframe?

Pandas dataframe difference in same column by date

Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?

How do I turn a Pandas DataFrame object with 1 main column into a Pandas Series with the index column from the original DataFrame

How to get maximum difference of column "through time" in pandas dataframe

How do I get the value from 1 column based on highest value of another column of the same row?

Pandas: How do I extract only the years from the age data in a column from my dataframe?

From a DataFrame how do I get the column name for the column that has the max value

How do I sequentially get the index and column of the highest, next highest, etc. number in a pandas dataframe?

Why do I get same result from two difference?

How to compare data from the same column in a dataframe (Pandas)

How do I calculate day on day difference in a pandas dataframe

How do I create a column in a pandas dataframe using values from two rows?

How do I extract a column from a pandas dataframe in order to use it indepedently?

In Pandas, how do I create a dataframe from a count of items in a column that are separated by commas?

How do I generate a new column subtracting the sorted one from the original in a pandas DataFrame?

TOP Ranking

Article

HotTag

Archive