创建索引的子列表，每个子列表引用元组列表中的唯一元组集

如何解决创建索引的子列表，每个子列表引用元组列表中的唯一元组集

我试图通过对元组的索引进行分组来创建索引的子列表，其中元组列表中的任何元素都是常见的，或者将唯一的元组索引分开。唯一元组的定义（不是元组的元素）与列表中其他元组相同位置的元素相同。示例：列出将同一公司分组在一起的列表，其中同一公司定义为相同的名称或注册编号或首席执行官的名称。

company_list = [("companyA",0002,"ceoX"),("companyB"),"ceoY"),("companyC",0003,("companyD",004,"ceoZ")]

所需的输出将是：

[[0,1,2],[3]]

有人知道这个问题的解决方案吗？

解决方法

公司构成图表。您想从关联公司创建集群。

尝试一下：

company_list = [
  ("companyA",2,"ceoX"),("companyB","ceoY"),("companyC",3,("companyD",4,"ceoZ")
]

# Prepare indexes
by_name = {}
by_number = {}
by_ceo = {}
for i,t in enumerate(company_list):
  if t[0] not in by_name:
    by_name[t[0]] = []
  by_name[t[0]].append(i)
  if t[1] not in by_number:
    by_number[t[1]] = []
  by_number[t[1]].append(i)
  if t[2] not in by_ceo:
    by_ceo[t[2]] = []
  by_ceo[t[2]].append(i)

# BFS to propagate group to connected companies
groups = list(range(len(company_list)))
for i in range(len(company_list)):
  g = groups[i]
  queue = [g]
  while queue:
    x = queue.pop(0)
    groups[x] = g
    t = company_list[x]
    for y in by_name[t[0]]:
      if g < groups[y]:
        queue.append(y)
    for y in by_number[t[1]]:
      if g < groups[y]:
        queue.append(y)
    for y in by_ceo[t[2]]:
      if g < groups[y]:
        queue.append(y)

# Assemble result
result = []
current = None
last = None
for i,g in enumerate(groups):
  if g != last:
    if current:
      result.append(current)
    current = []
    last = g
  current.append(i)
if current:
  result.append(current)
print(result)

Fafl的回答肯定更出色。如果您不担心性能，这里有一个蛮力的解决方案，可能更易于阅读。试图通过一些注释使其清楚。

def find_index(res,target_index):
    for index,sublist in enumerate(res):
        if target_index in sublist:
            # yes,it's present
            return index

    return None  # not present
        
def main():
    company_list = [
        ('companyA','0002','CEOX'),('companyB','CEOY'),('companyC','0003',('companyD','0004','CEOZ'),('companyE','CEOM'),]

    res = []

    for index,company_detail in enumerate(company_list):
        # check if this `index` is already present in a sublist in `res`
        # if the `index` is already present in a sublist in `res`,then we need to add to that sublist
        # otherwise we will start a new sublist in `res`
        index_to_add_to = None

        if find_index(res,index) is None:
            # does not exist
            res.append([index])
            index_to_add_to = len(res) - 1
        else:
            # exists
            index_to_add_to = find_index(res,index)
        
        for c_index,c_company_detail in enumerate(company_list):
            # inner loop to compare company details with the other loop
            if c_index == index:
                # same,ignore
                continue
            if company_detail[0] == c_company_detail[0] or company_detail[1] == c_company_detail[1] or company_detail[2] == c_company_detail[2]:
                # something matches,so append
                res[index_to_add_to].append(c_index)
                res[index_to_add_to] = list(set(res[index_to_add_to]))  # make it unique

    print(res)

if __name__ == '__main__':
    main()

检查一下，我为此做了很多尝试。可能是我缺少一些测试用例。性能方面，我认为它很好。我用过set()并弹出了一组。

company_list = [
  ("companyA","ceoZ"),"ceoW")
]
index = {val: key for key,val in enumerate(company_list)}
res = []
while len(company_list):
      new_idx  = 0 
      temp = []
      val = company_list.pop(new_idx)
      temp.append(index[val])
      while new_idx < len(company_list) :
            if len(set(val + company_list[new_idx])) < 6:
                  temp.append(index[company_list.pop(new_idx)])
            else:
              new_idx += 1
      
      res.append(temp)
            
print(res)

创建索引的子列表，每个子列表引用元组列表中的唯一元组集

如何解决创建索引的子列表，每个子列表引用元组列表中的唯一元组集

解决方法

相关推荐