如何解决情节:如何将类别变量插入平行坐标图? 情节:完整代码:
到目前为止,我已经尝试过:
import pandas as pd
import plotly.graph_objects as go
df = pd.read_csv('https://raw.githubusercontent.com/vyaduvanshi/helper-files/master/parallel_coordinates.csv')
dimensions = list([dict(range=[df['gm_Retail & Recreation'].min(),df['gm_Retail & Recreation'].max()],label='Retail & Recreation',values=df['gm_Retail & Recreation']),dict(range=[df['gm_Grocery & Pharmacy'].min(),df['gm_Grocery & Pharmacy'].max()],label='Grocery & Pharmacy',values=df['gm_Grocery & Pharmacy']),dict(range=[df['gm_Parks'].min(),df['gm_Parks'].max()],label='Parks',values=df['gm_Parks']),dict(range=[df['gm_Transit Stations'].min(),df['gm_Transit Stations'].max()],label='Transit Stations',values=df['gm_Transit Stations']),dict(range=[df['gm_Workplaces'].min(),df['gm_Workplaces'].max()],label='Workplaces',values=df['gm_Workplaces']),dict(range=[df['gm_Residential'].min(),df['gm_Residential'].max()],label='Residential',values=df['gm_Residential']),])
# dict(range=[0,len(df)],values=df['country'],# label='Country')])
fig = go.Figure(data=go.Parcoords(line = dict(color = '#ff0000',colorscale = 'Electric',showscale = True,cmin = -4000,cmax = -100),dimensions=dimensions))
fig.show()
它返回此:
我想要做的是将这些行分配给最后一列,即country
列(类别)。 (我的尝试在代码段中已注释掉)。我正在尝试思考如何将这些价值观与分类国家联系起来。索引可能是一种方法?我还想按国家/地区对行进行颜色编码,因此我猜想可以找到不同颜色的列表。我陷入困境,可以寻求帮助。
解决方法
在您的情况下,您可以通过使虚拟变量代表df['country]
中的每个唯一元素来实现此目的,此处具有长格式的数据集,因此您将获得重复的虚拟变量。但是不用担心,下面的代码将为您解决这些问题。然后,您可以将最后一个尺寸指定为:
dict(range=[0,df['dummy'].max()],tickvals = dfg['dummy'],ticktext = dfg['country'],label='Country',values=df['dummy']),
最后使用以下方法为线条分配颜色范围:
line = dict(color = df['dummy'],colorscale = [[0,'rgba(200,0.1)'],[0.5,'rgba(0,200,[1,0.1)']])
情节:
完整代码:
import pandas as pd
import plotly.graph_objects as go
df = pd.read_csv('https://raw.githubusercontent.com/vyaduvanshi/helper-files/master/parallel_coordinates.csv')
group_vars = df['country'].unique()
dfg = pd.DataFrame({'country':df['country'].unique()})
dfg['dummy'] = dfg.index
df = pd.merge(df,dfg,on = 'country',how='left')
dimensions = list([dict(range=[df['gm_Retail & Recreation'].min(),df['gm_Retail & Recreation'].max()],label='Retail & Recreation',values=df['gm_Retail & Recreation']),dict(range=[df['gm_Grocery & Pharmacy'].min(),df['gm_Grocery & Pharmacy'].max()],label='Grocery & Pharmacy',values=df['gm_Grocery & Pharmacy']),dict(range=[df['gm_Parks'].min(),df['gm_Parks'].max()],label='Parks',values=df['gm_Parks']),dict(range=[df['gm_Transit Stations'].min(),df['gm_Transit Stations'].max()],label='Transit Stations',values=df['gm_Transit Stations']),dict(range=[df['gm_Workplaces'].min(),df['gm_Workplaces'].max()],label='Workplaces',values=df['gm_Workplaces']),dict(range=[df['gm_Residential'].min(),df['gm_Residential'].max()],label='Residential',values=df['gm_Residential']),dict(range=[0,])
fig = go.Figure(data=go.Parcoords(line = dict(color = df['dummy'],0.1)']]),dimensions=dimensions))
fig.show()
,
使用 df.infer_objects() 自动推断每一列的数据类型。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。