按时间日期查找行的重叠

如何解决按时间日期查找行的重叠

我有这样的数据帧,其中 min-max 是间隔的开始和结束。 core_name 是实例名称,length_mins 是间隔长度。

                   min                 max length_mins core_name
1  2020-07-28 03:05:30 2020-07-28 05:45:15 159.75 mins       0,1
2  2020-07-14 14:29:30 2020-07-14 16:36:45 127.25 mins      0,10
3  2020-07-16 15:32:45 2020-07-16 16:16:00  43.25 mins      0,11
4  2020-07-17 02:37:30 2020-07-17 05:27:30 170.00 mins      0,11
5  2020-07-18 02:42:00 2020-07-18 05:24:30 162.50 mins      0,11
6  2020-07-25 02:21:15 2020-07-25 04:59:15 158.00 mins      0,12
7  2020-07-16 15:40:15 2020-07-16 16:13:45  33.50 mins      0,13
8  2020-07-16 13:18:30 2020-07-16 16:13:30 175.00 mins      0,15
9  2020-07-16 14:43:00 2020-07-16 15:49:30  66.50 mins       0,2
10 2020-07-14 14:29:30 2020-07-14 16:55:15 145.75 mins       0,4
11 2020-07-16 13:32:45 2020-07-16 17:21:00 228.25 mins       0,6
12 2020-07-27 02:15:30 2020-07-27 05:04:15 168.75 mins       0,6
13 2020-07-14 14:29:30 2020-07-14 16:53:30 144.00 mins       0,8
14 2020-07-16 16:40:30 2020-07-16 21:19:45 279.25 mins       1,0
15 2020-07-14 21:03:15 2020-07-14 22:49:45 106.50 mins       1,1
16 2020-07-15 03:32:45 2020-07-15 06:15:15 162.50 mins      1,10
17 2020-07-16 15:58:15 2020-07-16 21:18:30 320.25 mins      1,10
18 2020-07-14 18:44:00 2020-07-14 20:00:15  76.25 mins      1,11
19 2020-07-14 21:12:00 2020-07-15 00:56:00 224.00 mins      1,11
20 2020-07-16 16:32:30 2020-07-16 19:30:15 177.75 mins      1,12
21 2020-07-14 15:39:15 2020-07-15 00:35:15 536.00 mins      1,13
22 2020-07-16 15:14:15 2020-07-16 21:14:00 359.75 mins      1,14
23 2020-07-14 14:29:30 2020-07-15 00:48:45 619.25 mins      1,15
24 2020-07-16 16:34:00 2020-07-16 20:58:15 264.25 mins      1,16
25 2020-07-14 20:19:15 2020-07-15 00:54:30 275.25 mins      1,17
26 2020-07-16 16:35:00 2020-07-16 21:18:00 283.00 mins      1,18
27 2020-07-14 14:29:30 2020-07-14 19:20:45 291.25 mins      1,19
28 2020-07-14 20:13:00 2020-07-15 01:00:45 287.75 mins      1,19
29 2020-07-16 16:27:45 2020-07-16 21:07:15 279.50 mins       1,2
30 2020-07-14 14:29:30 2020-07-15 00:57:30 628.00 mins       1,3
31 2020-07-16 16:32:30 2020-07-16 21:15:45 283.25 mins       1,4
32 2020-07-14 20:42:15 2020-07-15 00:44:45 242.50 mins       1,5
33 2020-07-16 16:25:00 2020-07-16 21:16:45 291.75 mins       1,6
34 2020-07-14 18:24:00 2020-07-14 23:08:15 284.25 mins       1,7
35 2020-07-16 02:29:30 2020-07-16 05:11:00 161.50 mins       1,7
36 2020-07-16 16:37:45 2020-07-16 21:16:30 278.75 mins       1,8
37 2020-07-14 14:29:30 2020-07-15 00:59:15 629.75 mins       1,9

我需要:

  1. 查找相互重叠的行,
  2. 计算重叠数,
  3. 获取每个核心的重叠核心列表。

这是我收到的结果:

                   min                 max length_mins core_name overlaps
1  2020-07-14 14:29:30 2020-07-15 00:59:15 629.75 mins       1,9       15
2  2020-07-14 14:29:30 2020-07-15 00:57:30 628.00 mins       1,3       15
3  2020-07-14 14:29:30 2020-07-15 00:48:45 619.25 mins      1,15       15
4  2020-07-14 15:39:15 2020-07-15 00:35:15 536.00 mins      1,13       15
5  2020-07-16 15:14:15 2020-07-16 21:14:00 359.75 mins      1,14       15
6  2020-07-16 13:32:45 2020-07-16 17:21:00 228.25 mins       0,6       15
7  2020-07-16 15:58:15 2020-07-16 21:18:30 320.25 mins      1,10       14
8  2020-07-14 18:24:00 2020-07-14 23:08:15 284.25 mins       1,7       12
9  2020-07-16 16:25:00 2020-07-16 21:16:45 291.75 mins       1,6       11
10 2020-07-16 16:32:30 2020-07-16 21:15:45 283.25 mins       1,4       11
11 2020-07-16 16:35:00 2020-07-16 21:18:00 283.00 mins      1,18       11
12 2020-07-16 16:27:45 2020-07-16 21:07:15 279.50 mins       1,2       11
13 2020-07-16 16:40:30 2020-07-16 21:19:45 279.25 mins       1,0       11
14 2020-07-16 16:37:45 2020-07-16 21:16:30 278.75 mins       1,8       11
15 2020-07-16 16:34:00 2020-07-16 20:58:15 264.25 mins      1,16       11
16 2020-07-16 16:32:30 2020-07-16 19:30:15 177.75 mins      1,12       11
17 2020-07-14 14:29:30 2020-07-14 19:20:45 291.25 mins      1,19       10
18 2020-07-14 20:13:00 2020-07-15 01:00:45 287.75 mins      1,19       10
19 2020-07-14 20:19:15 2020-07-15 00:54:30 275.25 mins      1,17       10
20 2020-07-14 20:42:15 2020-07-15 00:44:45 242.50 mins       1,5       10
21 2020-07-14 21:12:00 2020-07-15 00:56:00 224.00 mins      1,11       10
22 2020-07-14 21:03:15 2020-07-14 22:49:45 106.50 mins       1,1       10
23 2020-07-14 14:29:30 2020-07-14 16:55:15 145.75 mins       0,4        8
24 2020-07-14 14:29:30 2020-07-14 16:53:30 144.00 mins       0,8        8
25 2020-07-14 14:29:30 2020-07-14 16:36:45 127.25 mins      0,10        8
26 2020-07-16 13:18:30 2020-07-16 16:13:30 175.00 mins      0,15        7
27 2020-07-14 18:44:00 2020-07-14 20:00:15  76.25 mins      1,11        7
28 2020-07-16 15:32:45 2020-07-16 16:16:00  43.25 mins      0,11        7
29 2020-07-16 15:40:15 2020-07-16 16:13:45  33.50 mins      0,13        7
30 2020-07-16 14:43:00 2020-07-16 15:49:30  66.50 mins       0,2        6
31 2020-07-17 02:37:30 2020-07-17 05:27:30 170.00 mins      0,11        1
32 2020-07-27 02:15:30 2020-07-27 05:04:15 168.75 mins       0,6        1
33 2020-07-18 02:42:00 2020-07-18 05:24:30 162.50 mins      0,11        1
34 2020-07-15 03:32:45 2020-07-15 06:15:15 162.50 mins      1,10        1
35 2020-07-16 02:29:30 2020-07-16 05:11:00 161.50 mins       1,7        1
36 2020-07-28 03:05:30 2020-07-28 05:45:15 159.75 mins       0,1        1
37 2020-07-25 02:21:15 2020-07-25 04:59:15 158.00 mins      0,12        1

                                                            cores_list
1  1,9;0,10;0,4;0,8;1,1;1,11;1,13;1,15;1,17;1,19;1,3;1,5;1,7
2  1,3;0,7;1,9
3  1,15;0,9
4  1,13;0,9
5  1,14;0,11;0,2;0,6;1,0;1,10;1,12;1,16;1,18;1,2;1,4;1,8
6  0,6;0,14;1,8
7      1,8
8               1,9
9                     1,8
10                    1,8
11                    1,18;0,8
12                    1,8
13                    1,0;0,8
14                    1,8;0,6
15                    1,16;0,8
16                    1,12;0,8
17                        1,19;0,9
18                        1,9
19                        1,9
20                        1,9
21                        1,9
22                                                 1,9
23                                 0,9
24                                 0,9
25                                 0,9
26                                    0,14
27                                     1,9
28                                    0,14
29                                    0,14
30                                         0,14
31                                                                0,11
32                                                                 0,6
33                                                                0,11
34                                                                1,10
35                                                                 1,7
36                                                                 0,1
37                                                                0,12

这是我的带有示例数据的代码:

# find overlaps

library(dplyr)

library(lubridate)

data.example <-
  structure(
    list(
      min = structure(
        c(
          1595894730,1594726170,1594902765,1594942650,1595029320,1595632875,1594903215,1594894710,1594899780,1594895565,1595805330,1594906830,1594749795,1594773165,1594904295,1594741440,1594750320,1594906350,1594730355,1594901655,1594906440,1594747155,1594906500,1594746780,1594906065,1594748535,1594905900,1594740240,1594855770,1594906665,1594726170
        ),tzone = "",class = c("POSIXct","POSIXt")
      ),max = structure(
        c(
          1595904315,1594733805,1594905360,1594952850,1595039070,1595642355,1594905225,1594905210,1594903770,1594734915,1594909260,1595815455,1594734810,1594923585,1594756185,1594782915,1594923510,1594746015,1594763760,1594917015,1594762515,1594923240,1594763325,1594922295,1594763670,1594923480,1594743645,1594764045,1594922835,1594763850,1594923345,1594763085,1594923405,1594757295,1594865460,1594923390,1594763955
        ),length_mins = structure(
        c(
          159.75,127.25,43.25,170,162.5,158,33.5,175,66.5,145.75,228.25,168.75,144,279.25,106.5,320.25,76.25,224,177.75,536,359.75,619.25,264.25,275.25,283,291.25,287.75,279.5,628,283.25,242.5,291.75,284.25,161.5,278.75,629.75
        ),class = "difftime",units = "mins"
      ),core_name = c(
        "0,1","0,10",11",12",13",15",2",4",6",8","1,0",14",16",17",18",19",3",5",7",9"
      )
    ),row.names = c(NA,-37L),class = "data.frame"
  )

print ( data.example)


data.example <- data.example  %>% mutate (overlaps = 1,cores_list = c(core_name))

print ("Calculating rows overlaps")

for (i in 1:(nrow(data.example)-1)) {
  
  min_el1 <- data.example[i,]$min
  max_el1 <- data.example[i,]$max
  
  for (k in (i+1):nrow(data.example)) {
    
    min_el2 <- data.example[k,]$min
    max_el2 <- data.example[k,]$max
    
    el1_interval <- interval(min_el1,max_el1)
    el2_interval <- interval(min_el2,max_el2)
    overlaps <- int_overlaps(el1_interval,el2_interval)
    
    if (overlaps == T) {
      
      print (paste ("row",i,"overlaps with row",k))
      
      data.example[k,]$overlaps <- data.example[k,]$overlaps +1
      data.example[i,]$overlaps <- data.example[i,]$overlaps +1
      
      
      if ( !grepl( data.example[i,]$cores_list,data.example[k,]$core_name,fixed = TRUE)) {
        data.example[i,]$cores_list <- paste(data.example[i,sep=';')
      }
      
      if ( !grepl( data.example[k,data.example[i,fixed = TRUE)) {
        data.example[k,]$cores_list <- paste(data.example[k,sep=';')
      }
    }
  }
}


data.example <-  data.example %>% arrange(desc(overlaps),desc(length_mins))

print (data.example)

我对结果很满意,但我的代码非常慢。如果我有数百行代码需要几分钟才能运行。我确信可以避免使用嵌套循环,并且可以显着加快代码速度。任何帮助将不胜感激。

解决方法

这应该可以工作..似乎与所需的输出相匹配..

library( data.table )
#make it a data.table
setDT( data.example )
#create temp id column and set is as key (for use with .EACHI later on)
data.example[,id := .I ]
setkey( data.example,id )
#self join on subset by row
data.example[ data.example,c("overlaps","cores_list") := {
                temp <- data.example[ min <= i.max & max >= i.min,]
                list( nrow(temp),paste0( temp$core_name,collapse = ";") )
              },by = .EACHI ]
#if desired,you can drop the id-columns using: data.example[,id := NULL]

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-