如何解决如何找到足够的有效数字有效所需的最小序列长度?
我有一个包含多列的时间序列数据帧,其中包含彼此独立的NaN。
每个有效元素应该至少具有的序列都有一个给定的长度“ LEN”。 (通过“顺序,我的意思是之前收集索引中的值。”
迭代的时间效率极低,但看起来与此类似:
LEN = 100
maximum_sequence_len = 0
for i in range(len(df)): # for every index
for col in df.columns: # for every column
df_ = df[col].iloc[:i].dropna()
seq_end_ix = i
seq_start_ix = get_seq_start_where_every_col_has_enough_valids(
df,seq_end,LEN)
necessary_len = len( df.loc[seq_start_ix:seq_end_ix] )
if maximum_sequence_len < necessary_len :
maximum_sequence_len = necessary_len
get_seq_start_where_every_col_has_enough_valids(df,seq_end_ix,LEN)
# determine the index where every column contains at least "LEN" valid elements
first_SEQ_LEN_Sample_start_ix = start_ix
for col in df.columns:
col_df = df[col].dropna()
temp = col_df[col_df.index <= seq_end_ix ].index[-(LEN)]
if temp < first_SEQ_LEN_Sample_start_ix:
first_SEQ_LEN_Sample_start_ix = temp
seq_start_ix = first_SEQ_LEN_Sample_start_ix
return seq_start_ix
一个例子:
LEN = 6 # in this example we have to have at least 6 valid elements in the frame of rows before
print(df)
>>>>
A B C D E F
index
0 1 1 1 1 1 1
1 1 1 1 1 1 1
2 1 1 1 1 1 | 1
3 NaN 1 1 NaN 1 | 1
4 NaN 1 1 NaN 1 | 1
5 1 1 1 1 1 | 1
6 1 1 1 1 NaN | 1
7 NaN 1 1 NaN 1 | 1
8 NaN 1 1 1 1 | 1
9 1 1 1 1 NaN | 1
10 1 1 1 1 NaN | 1
11 1 1 1 NaN NaN | 1
12 1 1 1 1 NaN | 1
13 1 1 1 1 NaN | 1
14 1 NaN 1 1 NaN |* 1
16 1 1 1 1 1 NaN
17 NaN 1 1 1 1 1
18 NaN 1 1 1 1 NaN
19 1 1 1 1 1 1
# ==> Result: 13
# *here,longest sequence necessary to get minimum 6 valids in EVERY column has a length of 13. note,that if the other columns contained more NaNs in the marked indices,then it would probably have taken more than 13.
问题是我想创建序列样本,但不知道它们必须有多长时间,以便每个样本在每一列中至少具有“ LEN”个有效元素。
解决方法
本质上,您需要维护一个矢量计数器,每一列都必须有一个计数器。
如果所有计数器至少为6,向量计数器应发出“ window-ready”信号。如果窗口(start_index,end_index)已准备好,则可以发出窗口中的所有行并将窗口的start_index,end_index重置为当前行并将所有计数器重置为零。
重复直到数据结束。
Algorithm get_windows(data[][])
counters: array of integers of length = data.cols,values initialized to 0
Begin
window_start_index = 0
window_end_index = 0
for each row in data
for each col in row
if(value(col) != NaN)
counters[index(col)]++;
end if
next // col
// check if row causes window to continue
continue_flag = false;
for each counter in counters
if(counter != 6)
continue_flag = true
exit for loop
end if
next // counter
if(continue_flag)
window_end_index++;
else
// we have a window (window_start_index,window_end_index)
// both inclusive
// do something with the window
// reset counters
for each counter in counters
counter = 0
next
end if
next // row
End Algorithm
您需要这种单程算法吗?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。