How to apply a function to two columns of Pandas dataframe


Suppose I have a df which has columns of 'ID', 'col_1', 'col_2'. And I define a function :

f = lambda x, y : my_function_expression.

Now I want to apply the f to df's two columns 'col_1', 'col_2' to element-wise calculate a new column 'col_3' , somewhat like :

df['col_3'] = df[['col_1','col_2']].apply(f)  
# Pandas gives : TypeError: ('<lambda>() takes exactly 2 arguments (1 given)'

How to do ?

** Add detail sample as below ***

import pandas as pd

df = pd.DataFrame({‘ID’:[‘1’,‘2’,‘3’], ‘col_1’: [0,2,3], ‘col_2’:[1,4,5]}) mylist = [‘a’,‘b’,‘c’,’d’,’e’,‘f’]

def get_sublist(sta,end): return mylist[sta:end+1]

#df[‘col_3’] = df[[‘col_1’,‘col_2’]].apply(get_sublist,axis=1)

expect above to output df as below

ID col_1 col_2 col_3 0 1 0 1 [‘a’, ‘b’] 1 2 2 4 [‘c’, ’d’, ’e’] 2 3 3 5 [’d’, ’e’, ‘f’]


There is a clean, one-line way of doing this in Pandas:

df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)

This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.

Example with data (based on original question):

import pandas as pd

df = pd.DataFrame({‘ID’:[‘1’, ‘2’, ‘3’], ‘col_1’: [0, 2, 3], ‘col_2’:[1, 4, 5]}) mylist = [‘a’, ‘b’, ‘c’, ’d’, ’e’, ‘f’]

def get_sublist(sta,end): return mylist[sta:end+1]

df[‘col_3’] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)

Output of print(df):

  ID  col_1  col_2      col_3
0  1      0      1     [a, b]
1  2      2      4  [c, d, e]
2  3      3      5  [d, e, f]

If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:

df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)

