如何解决np.select代替for while循环
我的目标是大幅提高我的代码,尽管我不知道如何,我认为可以使用np.select来完成。
这是我的代码执行时的当前输出:
date starting_temp average_high average_low limit_temp observation_date Date_Limit_reached
2019-12-03 22:30:00 NaN 13.0 14.8 NaN nan
2019-12-03 23:00:00 NaN 14.7 14.9 NaN nan
2019-12-03 23:30:00 NaN 13.0 13.9 NaN nan
2019-12-04 00:00:00 13.2 13.0 14.7 NaN 2019-12-04 10:00:00
2019-12-04 00:30:00 NaN 14.0 13.8 NaN nan
2019-12-04 01:00:00 NaN 13.9 13.8 NaN nan
2019-12-04 01:30:00 NaN 13.6 14.8 NaN nan
2019-12-04 02:00:00 NaN 13.1 14.5 NaN nan
2019-12-04 02:30:00 NaN 14.9 13.7 NaN nan
2019-12-04 03:00:00 NaN 14.2 14.1 NaN nan
2019-12-04 03:30:00 NaN 13.4 14.1 NaN nan
2019-12-04 04:00:00 NaN 14.3 13.0 NaN nan
2019-12-04 04:30:00 NaN 13.5 14.1 NaN nan
2019-12-04 05:00:00 NaN 13.6 13.4 NaN nan
2019-12-04 05:30:00 NaN 14.5 13.9 NaN nan
2019-12-04 06:00:00 NaN 14.4 14.5 NaN nan
2019-12-04 06:30:00 NaN 13.7 14.2 NaN nan
2019-12-04 07:00:00 NaN 13.7 14.2 NaN nan
2019-12-04 07:30:00 NaN 13.2 14.4 NaN nan
2019-12-04 08:00:00 NaN 13.9 13.1 NaN nan
2019-12-04 08:30:00 NaN 13.9 14.4 NaN nan
2019-12-04 09:00:00 NaN 14.4 13.9 NaN nan
2019-12-04 09:30:00 NaN 14.4 13.8 NaN nan
2019-12-04 10:00:00 NaN 15.0 14.0 NaN nan
2019-12-04 10:30:00 NaN 13.2 13.2 NaN nan
2019-12-04 11:00:00 NaN 14.0 13.3 NaN nan
2019-12-04 11:30:00 NaN 14.2 13.4 NaN nan
2019-12-04 12:00:00 NaN 14.2 13.4 NaN nan
2019-12-04 12:30:00 NaN 13.7 13.6 NaN nan
2019-12-04 13:00:00 NaN 14.1 13.3 NaN nan
2019-12-04 13:30:00 NaN 13.1 14.1 NaN nan
2019-12-04 14:00:00 NaN 13.2 14.3 NaN nan
2019-12-04 14:30:00 NaN 13.7 13.8 NaN nan
产生最终df ['Date_Limit_reached']列的代码太慢了,我在下面添加了它。我想尽可能将其结构更改为np.select
:
new_col = []
df_size = len(df)
# Loop the dataframe
for ind in df.index:
if not math.isnan(df['starting_temp'][ind]):
entry_price_val = df['starting_temp'][ind]
count = 0
hasValue = False
while count < df_size:
if df['starting_temp'][ind] > df['limit_temp'][ind] and df['limit_temp'][ind] >= df['asklow'][count] and df['date'][count] >= df['observation_date'][ind] :
new_col.append(df['date'][count])
hasValue = True
break # Break the loop if matching value meets
count += 1
elif df['starting_temp'][ind] < df['limit_temp'][ind] and df['limit_temp'][ind] <= df['average_high'][count] and df['date'][count] >= df['observation_date'][ind] :
new_col.append(df['date'][count])
hasValue = True
break # Break the loop if matching value meets
count += 1
# If matching value not meets,then append nan value to the column
if not hasValue:
new_col.append(float('nan'))
else:
new_col.append(float('nan'))
df['Date_Limit_reached'] = new_col
解决方法
由于缺少df导致我无法运行代码,我的建议是
-
使用较少的标志,但使用具体的值。使代码更具可读性。 hasValue-> val
-
如果有一个
df['starting_temp'][ind] == df['limit_temp'][ind]
条目,您将遇到问题,因为不会触发任何案例。也许这是慢代码的问题。 -
您可以预先计算while循环中的第一个布尔表达式。这可以从上述观点解决问题
-
您不使用
entry_price_val
-
为了进一步改进,请使用数据矢量化,在所有循环中都可以实现。 (由于无法测试,因此未显示在代码中)
这是我建议的代码
new_col = []
df_size = len(df)
for ind in df.index:
val = float('nan') # use data instead of flags
if not math.isnan(df['starting_temp'][ind]):
count = 0
if df['starting_temp'][ind] > df['limit_temp'][ind]:
while count < df_size:
if df['limit_temp'][ind] >= df['asklow'][count] and df['date'][count] >= df['observation_date'][ind] :
val=df['date'][count]
break # Break the loop if matching value meets
count += 1
elif df['starting_temp'][ind] < df['limit_temp'][ind]
while count < df_size:
if df['limit_temp'][ind] <= df['average_high'][count] and df['date'][count] >= df['observation_date'][ind] :
val = df['date'][count]
break # Break the loop if matching value meets
count += 1
new_col.append(val)
df['Date_Limit_reached'] = new_col
代码段未经测试,需要测试其正确性,并可能进一步改进(根据要求提供提示)。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。