How to set a column by slicing values of other columns

Mario Diez Martínez

I have a dataframe with the ruling party of the US, but the column is set on this format yyyy-yyyy: 'democrat' and I want my final dataframe to be like this yyyy : 'democrat'. Instead of the range of the ruling party I want a column with all years between 1945 and 2022 and another column that contains a string with 'dremocrat' or 'republican'.

enter image description here

This is what Ive been trying

us_gov = pd.read_csv('/Users/elgasko/Documents/NUMERO ARMAS NUCLEARES/presidents.csv')
us_gov = us_gov.iloc[31:,1:4]
us_gov=us_gov[['Years In Office','Party']]
us_gov.sort_values(by=['Years In Office'])
years=range(1945,2023)
us_gov_def=pd.DataFrame(years, columns=['Year'])
us_gov_def.set_index('Year', drop=True, append=False, inplace=True, verify_integrity=False)
us_gov_def.insert(0, column='Party', value=np.nan)

for i in range(len(us_gov)):
    string=us_gov.iloc[i]['Years In Office']
    inicio=string[0:4]
    inicio=int(float(inicio))
    final=string[5:9]
    final=int(float(final))
    for j in us_gov_def.index :
        if j in range(inicio,final):
            us_gov_def.loc['Party',us_gov.Party[i]]
            
#https://github.com/awhstin/Dataset-List/blob/master/presidents.csv
ouroboros1

One solution could be as follows:

import pandas as pd

data = {'Years In Office': ['1933-1945','1945-1953','1953-1961'],
      'Party': ['Democratic', 'Democratic', 'Republican']}

df = pd.DataFrame(data)

df['Years In Office'] = df['Years In Office'].str.split('-').explode()\
    .groupby(level=0).apply(lambda x: range(x.astype(int).min(), 
                                            x.astype(int).max()+1))
df = df.explode('Years In Office')

print(df)

   Years In Office       Party
0             1933  Democratic
1             1934  Democratic
2             1935  Democratic
3             1936  Democratic
4             1937  Democratic
5             1938  Democratic
6             1939  Democratic
7             1940  Democratic
8             1941  Democratic
9             1942  Democratic
10            1943  Democratic
11            1944  Democratic
12            1945  Democratic
13            1945  Democratic
14            1946  Democratic
15            1947  Democratic
16            1948  Democratic
17            1949  Democratic
18            1950  Democratic
19            1951  Democratic
20            1952  Democratic
21            1953  Democratic
22            1953  Republican
23            1954  Republican
24            1955  Republican
25            1956  Republican
26            1957  Republican
27            1958  Republican
28            1959  Republican
29            1960  Republican
30            1961  Republican

Notice that you will end up with duplicates:

print(df[df['Years In Office'].duplicated(keep=False)])

   Years In Office       Party
12            1945  Democratic
13            1945  Democratic
21            1953  Democratic
22            1953  Republican

This is because the periods overlap on end year & start year (e.g. '1933-1945','1945-1953'). If you don't want this, you could add:

df = df.groupby('Years In Office', as_index=False).agg({'Party':', '.join})
print(df.loc[df['Years In Office'].isin([1945, 1953])])

   Years In Office                   Party
12            1945  Democratic, Democratic
20            1953  Democratic, Republican

Or you could drop only the years where the ruling party does not change. E.g.:

df = df[~df.duplicated()].reset_index(drop=True)
print(df.loc[df['Years In Office'].isin([1945, 1953])])

   Years In Office       Party
12            1945  Democratic
20            1953  Democratic
21            1953  Republican

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to set values of a column based on multiple conditions in other columns in python?

pandas slicing nested values along with other columns

VBA Set Columns equal to each other if other column values match

Set values in a column based on the values of other columns as a group

Set values of a column in pandas dataframe based on values in other columns

How to fill columns based on other column values?

How to update a column based on values of other columns

Pandas how to set column to NaN based on values in other columns using .loc

how to get values in new column based on other column group by columns

How to replace the values of a column to other columns only in NaN values?

How can I compare the values in one column to the values in other columns?

Numpy - how to count values in column based on binary values in other columns?

How to create index column for set of values in other column

Slicing a set of columns when a pandas dataframe does not include column labels

How to create a column with values dependent on other columns in r?

How to average certain values of a column based on other columns condition in pandas

Pandas: How to sum columns based on conditional of other column values?

How to sum a certain column based on other columns values

pandas how to aggregate sum on a column depending on values in other columns

How to groupby by a column and return the values of other columns as lists in pandas?

How can I create a columns with the values in other column (R)?

How to calculate a new column using individual values of other columns in a formula?

How to create a new column in a DataFrame based on values of two other columns

How to get distinct values in other columns per value in the primary column

How to change values in a column based on a function applied on two other columns

How to create a new column with values from comparing two other columns?

How to compare one column's value to multiple values in other columns?

how to constrain a column value in MySQL based on other columns values?

How to Convert rows into columns headers and values of other column as data?