How to apply a function that splits multiple numbers to the fields of a column in a dataframe in Python?

Elisa L.

I need to apply a function that splits multiple numbers from the fields of a dataframe.

In this dataframe there a all the kids' measurements that are needed for a school: Name, Height, Weight, and Unique Code, and their dream career.

  • The name is only formed of alpha-characters. But some kids might have both first name and middle name. (e.g. Vivien Ester)
  • The height is known to be >= 100 cm for every child.
  • The weight is known to be < 70 kg for every child.
  • The unique code is known to be any number, but it is associated with the letters 'AX', for every child. But the AX may not always be stick to the number (e.g. 7771AX), it might be a space next to it. (e.g. 100 AX)
  • Every kid has its dream career

They could appear in any order, but they always follow the rules from above. However, for some kids some measurements could not appear (e.g.: height or unique code or both are missing or all are missing).

So the dataframe is this:

data = { 'Dream Career': ['Scientist', 'Astronaut', 'Software Engineer', 'Doctor', 'Fashion Designer', 'Teacher', 'Architect'],
    'Measurements': ['Rachel 24.3 100.25 100 AX', '100.5 Tuuli 30.1', 'Michael 28.0 7771AX 113.75', 'Vivien Ester 40AX 115.20', 'Oliver 40.5', 'Julien 35.1 678 AX 111.1', 'Bee 20.0 100.80 88AX']
       }

df = pd.DataFrame (data, columns = ['Dream Career','Measurements'])

And it looks like this:

        Dream Career                Measurements
0          Scientist   Rachel 24.3 100.25 100 AX
1          Astronaut            100.5 Tuuli 30.1
2  Software Engineer  Michael 28.0 7771AX 113.75
3             Doctor    Vivien Ester 40AX 115.20
4   Fashion Designer                 Oliver 40.5
5            Teacher    Julien 35.1 678 AX 111.1
6          Architect        Bee 20.0 100.80 88AX

I try to split all of these measurements into different columns, based on the specified rules.

So the final dataframe should look like this:

       Dream Career         Names  Weight  Height Unique Code
0          Scientist       Rachael    24.3  100.25       100AX
1          Astronaut         Tuuli    30.1  100.50         NaN
2  Software Engineer       Michael    28.0  113.75      7771AX
3             Doctor  Vivien Ester     NaN  115.20        40AX
4   Fashion Designer        Oliver    40.5     NaN         NaN
5            Teacher        Julien    35.1  111.10       678AX
6          Architect           Bee    10.0  100.80        88AX

I tried this code and it works very well, but only on single strings. And I need to do this while in the dataframe and still keep every's kid's associate dream career (so the order is not lost).

num_rx = r'[-+]?\.?\d+(?:,\d{3})*\.?\d*(?:[eE][-+]?\d+)?'
def get_weight_height(s):
    nums = re.findall(num_rx, s)
    height = np.nan
    weight = np.nan
    if (len(nums) == 0):
        height = np.nan
        weight = np.nan
    elif (len(nums) == 1):
        if float(nums[0]) >= 100:
            height = nums[0]
            weight = np.nan
        else:
            weight = nums[0]
            height = np.nan
    elif (len(nums) == 2):
        if float(nums[0]) >= 100:
            height = nums[0]
            weight = nums[1]
        else:
            height = nums[1]
            weight = nums[0]
    return height, weight

class_code = {'Small': 'AX', 'Mid': 'BX', 'High': 'CX'}

def hasNumbers(inputString):
    return any(char.isdigit() for char in inputString)

def extract_measurements(string, substring_name):
    height = np.nan
    weight = np.nan
    unique_code = np.nan
    name = ''
    if hasNumbers(string):
        num_rx = r'[-+]?\.?\d+(?:,\d{3})*\.?\d*(?:[eE][-+]?\d+)?'
        nums = re.findall(num_rx, string)
        if (substring_name in string):
            special_match = re.search(rf'{num_rx}(?=\s*{substring_name}\b)', string)
            if special_match:
                unique_code = special_match.group()
                string = string.replace(unique_code, '')
                unique_code = unique_code + substring_name
            if len(nums) >= 2 & len(nums) <= 3:
                height, weight = get_weight_height(string)
        else:
            height, weight = get_weight_height(string)
    name = " ".join(re.findall("[a-zA-Z]+", string))
    name = name.replace(substring_name,'')
    return format(float(height), '.2f'), float(weight), unique_code, name

And I apply it like this:

string = 'Anya 101.30 23 4546AX'
height, weight, unique_code, name = extract_measurements(string, class_code['Small'])        
print( 'name is: ', name, '\nh is: ', height, '\nw is: ', weight, '\nunique code is: ', unique_code)

The results are very good.

I tried to apply the function on the dataframe, but I don't know how, I tried this as I got inspired from this and this and this... but they are all different than my problem:

df['height'], df['weight'], df['unique_code'], df['name'] = extract_measurements(df['Measurements'], class_code['Small']) 

I cannot figure out how to apply it on my dataframe. Please help me.

I am at the very beginning, I highly appreciate all the help if you could possibly help me!

piterbarg

Use apply for rows (axis=1) and choose 'expand' option. Then rename columns and concat to the original df:

pd.concat([df,(df.apply(lambda row : extract_measurements(row['Measurements'], class_code['Small']), axis = 1, result_type='expand')
   .rename(columns = {0:'height', 1:'weight', 2:'unique_code', 3:'name'})
)], axis = 1)

output:

    Dream Career       Measurements                  height    weight  unique_code    name
--  -----------------  --------------------------  --------  --------  -------------  ------------
 0  Scientist          Rachel 24.3 100.25 100 AX        100       100  100AX          Rachel
 1  Astronaut          100.5 Tuuli 30.1                 100       100  nan            Tuuli
 2  Software Engineer  Michael 28.0 7771AX 113.75       100       100  7771AX         Michael
 3  Doctor             Vivien Ester 40AX 115.20         100       100  40AX           Vivien Ester
 4  Fashion Designer   Oliver 40.5                      100       100  nan            Oliver
 5  Teacher            Julien 35.1 678 AX 111.1         100       100  678AX          Julien
 6  Architect          Bee 20.0 100.80 88AX             100       100  88AX           Bee

(note I stubbed def get_weight_height(string) function because your coded did not include it, to always return 100,100)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Apply a function with two fields of dataframe

pandas DataFrame, how to apply function to a specific column?

Apply Python function to one pandas column and apply the output to multiple columns

Why Python splits read function into multiple syscalls?

How to apply a function to a column of a Spark DataFrame?

Apply function to a column in dataframe - python

How to apply a function with multiple conditions to pandas dataframe?

Pandas Dataframe Multiple column headers, apply function

How do you apply a function to a dataframe column

Update Multiple Fields of Dataframe from Function return values using Apply, Python 3.6

How to apply function on the basis of column condition in a dataframe

How to apply method/function to a dataframe in python

How to apply regex for multiple phrases on a dataframe column?

How to apply lambda function to column of dataframe correctly?

Python dataframe apply function groupby and per column

How to apply function to multiple fields in jq

Pandas DataFrame - How to apply Lambda Function on multiple columns and create a new column

python - pass dataframe column as argument in apply function

How to apply multiple column filter(String) in python

How to apply Window function to multiple columns in DataFrame

How to apply my function to DataFrame column?

How to apply an if between function on a column in pandas' DataFrame

How to apply a function to a DataFrame column?

How to apply a custom function to the same dataframe multiple times in python?

How to apply a function with multiple arguments and create a dataframe?

How to apply a function to a dataframe column in R?

how to apply lme function to each column of dataframe?

How to apply a function to a column(text) in a dataframe(R)?

How to apply multiple functions to same column in Python?