将Python函数应用于一个pandas列，并将输出应用于多个列

如何解决将Python函数应用于一个pandas列，并将输出应用于多个列

你好社区，

我已经阅读了很多答案和博客，但我无法弄清我错过了哪些简单的事情！。我正在使用“条件”功能来定义所有条件，并将其应用于一个数据框列。如果条件满足，它应该创建/更新2个新的数据框列“ cat”和“ subcat”。

如果你们能在这里帮助我，那将是一个很大的帮助！

dict = {'remark':['NA','NA','Category1','Category2','Category3'],'desc':['Present','Present','NA']
} 

df = pd.DataFrame(dict)

数据框看起来像这样：

          remark       desc
0         NA           Present      
1         NA           Present        
2         Category1    NA                   
3         Category2    Present                   
4         Category3    NA

我编写了一个定义条件的函数，如下所示：

def conditions(s):

    if (s == 'Category1'):
        x = 'insufficient'
        y = 'resolution'
    elif (s=='Category2):
        x= 'insufficient'
        y= 'information'
    elif (s=='Category3):
        x= 'Duplicate'
        y= 'ID repeated'
    else:
        x= 'NA'
        y= 'NA'
    
    return (x,y)

我有多种想法可以在dataframe列上执行上述功能，但是没有运气。

df[['cat','subcat']] = df['remark'].apply(lambda x: pd.Series([conditions(df)[0],conditions(df)[1]]))

我期望的数据框应如下所示：

          remark       desc        cat           subcat
0         NA           Present     NA            NA      
1         NA           Present     NA            NA
2         Category1    NA          insufficient  resolution         
3         Category2    Present     insufficient  information              
4         Category3    NA          Duplicate     ID repeated

非常感谢。

解决方法

解决这个问题的一种方法是使用列表理解：

df[['cat','subcat']] = [("insufficient","resolution")  if word == "Category1" else 
                         ("insufficient","information") if word == "Category2" else
                         ("Duplicate","ID repeated")    if word == "Category3" else 
                         ("NA","NA")
                         for word in df.remark]

  remark      desc               cat         subcat
0   NA        Present          NA              NA
1   NA        Present          NA              NA
2   Category1   NA          insufficient    resolution
3   Category2   Present     insufficient    information
4   Category3   NA          Duplicate       ID repeated

@ dm2的答案显示了如何使用您的函数将其提取。第一个apply(conditions)创建一个包含元组的序列，第二个apply创建各个列，形成一个数据框，然后可以将其分配给cat和subcat。

之所以我建议理解列表，是因为，您正在处理字符串，而在Pandas中，通过香草python处理字符串通常更快。同样，有了列表理解功能，处理就完成了一次，您无需应用条件函数，然后调用pd.Series。这样可以加快速度。测试会对此断言或揭穿。

您可以这样做：

 df[['cat','subcat']] = df['remark'].apply(conditions).apply(pd.Series)

输出：

  remark      desc               cat         subcat
0   NA        Present          NA              NA
1   NA        Present          NA              NA
2   Category1   NA          insufficient    resolution
3   Category2   Present     insufficient    information
4   Category3   NA          Duplicate       ID repeated

编辑：这可能是应用已有功能的更简单方法，但是如果您有一个庞大的DataFrame，请使用列表理解功能，通过@sammywemmy找出答案，以获取更快的代码。

您要传递整个dataframe，您只需要传递lambda变量（x）。

df[['cat','subcat']] = df['remark'].apply(lambda x: pd.Series([*conditions(x)]))

迭代器上的

*可以unpack进行迭代，因此您无需两次调用同一函数来提取输出。也许编译器可以解决这个问题，但我不这么认为...

您可以将self.home_frame.place(relwidth=1.0,relheight=1.0) self.start_btn.place(relx=0.5,rely=0,relwidth=0.25) self.about_btn.place(relx=0.75,relwidth=0.25)与映射字典一起使用

import tkinter as tk

class App():
    def __init__(self,parent):
        self.app = parent
        self.app.geometry("300x300")
        self.app.title("test application")

        f1 = tk.Frame(self.app,relief=tk.GROOVE,borderwidth=2)
        b1a = tk.Button(f1,text="Place A")
        b1b = tk.Button(f1,text="Place B")
        b1a.place(relx=0.5,relwidth=0.25)
        b1b.place(relx=0.75,relwidth=0.25)

        f2 = tk.Frame(self.app,borderwidth=2)
        b2a = tk.Button(f2,text="Two A")
        b2b = tk.Button(f2,text="Two B")
        b2b.pack(side=tk.RIGHT,anchor=tk.NE)
        b2a.pack(side=tk.RIGHT,anchor=tk.NE)

        f3 = tk.Frame(self.app,borderwidth=2)
        b3a = tk.Button(f3,text="Grid A")
        b3b = tk.Button(f3,text="Grid B")
        b3a.grid(row=0,column=0,sticky=tk.NE)
        b3b.grid(row=0,column=1,sticky=tk.NE)
        f3.grid_rowconfigure(0,weight=1)
        f3.grid_columnconfigure(0,weight=1)

        f1.grid(row=0,sticky=tk.NSEW)
        f2.grid(row=1,sticky=tk.NSEW)
        f3.grid(row=2,sticky=tk.NSEW)

        for row in range(0,3):
            parent.grid_rowconfigure(row,weight=1)
        parent.grid_columnconfigure(0,weight=1)


if __name__ == "__main__":
    root = tk.Tk()
    app = App(root)
    #app1.resizable(False,False)
    root.mainloop()

将Python函数应用于一个pandas列，并将输出应用于多个列

如何解决将Python函数应用于一个pandas列，并将输出应用于多个列

解决方法

相关推荐