词典字典：共享至少两个公共键的打印词典

如何解决词典字典：共享至少两个公共键的打印词典

d = {\'g1\':{\'p1\':1,\'p2\':5,\'p3\':11,\'p4\':1},\'g2\':{\'p1\':7,\'p3\':1,\'p4\':2,\'p5\':8,\'p9\':11},\'g3\':{\'p7\':7,\'p8\':7},\'g4\':{\'p8\':9,\'p9\':1,\'p10\':7,\'p11\':8,\'p12\':3},\'g5\':{\'p1\':4,\'p13\':1},\'g6\':{\'p1\':4,\'p6\':2,\'p13\':1}
    }

对于给定的词典\'d \'，我想返回共享至少两个（\'n \'）键（存在于给定群集的所有子词典中）的子词典的群集。在这里，我们不在乎这些子词典的值。换句话说，给定群集中所有子词典的键的交集长度应至少为两个（或'n'）。

解决方法

我希望我正确理解了你想要什么。这种方法笨拙，我担心它的效率很低。我向d添加了字典g6以便产生更有趣的输出：

#! /usr/bin/env python
# -*- coding: utf-8 -*-

d = {\'g1\':{\'p1\':1,\'p2\':5,\'p3\':11,\'p4\':1},\'g2\':{\'p1\':7,\'p3\':1,\'p4\':2,\'p5\':8,\'p9\':11},\'g3\':{\'p7\':7,\'p8\':7},\'g4\':{\'p8\':9,\'p9\':1,\'p10\':7,\'p11\':8,\'p12\':3},\'g5\':{\'p1\':4,\'p13\':1},\'g6\':{\'p1\':1,\'p9\':2,\'p11\':12}
    }

clusters = {}

for key,value in d.items ():
    cluster = frozenset (value.keys () )
    if cluster not in clusters: clusters [cluster] = set ()
    clusters [cluster].add (key)


for a in clusters.keys ():
    for b in clusters.keys ():
        if len (a & b) > 1 and a ^ b:
            cluster = frozenset (a & b)
            if cluster not in clusters: clusters [cluster] = set ()
            for x in clusters [a]: clusters [cluster].add (x)
            for x in clusters [b]: clusters [cluster].add (x)

print \"Primitive clusters\"
for key,value in filter (lambda (x,y): len (y) == 1,clusters.items () ):
    print \"The dictionary %s has the keys %s\" % (value.pop (),\",\".join (key) )

print \"---------------------\"
print \"Non-primitive clusters:\"
for key,y): len (y) > 1,clusters.items () ):
    print \"The dictionaries %s share the keys %s\" % (\",\".join (value),\".join (key) )

, 我认为您应该先“倒置”字典，然后找到解决方案很容易：

import collections
inverted = collections.defaultdict(list)

for key,items in d.items():
    for sub_key in items:
        inverted[sub_key].append(key)

for sub_key,keys in inverted.items():
    if len(keys) >= 2:
        print sub_key,keys

, 就像是

for keya in d:
    tempd = {}
    keys = set()
    tempset = set(d[keya].keys())

    for keyb in d:
        tempset &= d[keyb].keys()

        if len(tempset) >= 2:
            keys.add(keyb)

    print({key: d[key] for key in keys})

可能会工作。编辑：不，不是很有效。我需要考虑一下。 , 如果将问题简化为仅长度为2的簇（即，成对的字典），它将变得更加清晰：从给定的可迭代对象生成固定长度的子序列正是itertools.combinations的工作：

>>> list(itertools.combinations(d,2))
[(\'g5\',\'g4\'),(\'g5\',\'g3\'),\'g2\'),\'g1\'),(\'g4\',\'g
2\'),(\'g3\',(\'g2\',\'g1\')]

通过意识到视图d.keys（）的行为类似于集合（在Python 3中；在Python 2中，它可能是一个列表），我们可以看到任何词典共有的键数：

>>> d[\'g1\'].keys() & d[\'g2\'].keys()
{\'p3\',\'p1\',\'p4\'}

＆是集合的交集运算符-它为我们提供了这些集合共有的所有项目的集合。因此，我们可以通过检查此集合的长度来检查其中是否有至少两个，从而得出：

>>> common_pairs = [[x,y] for x,y in itertools.combinations(d,2)
                                   if len(d[x].keys() & d[y].keys()) >= 2]
>>> common_pairs
[[\'g2\',\'g1\']]

解决未知的群集大小会稍微困难一些-如果我们不对此进行硬编码，则无法直接使用＆运算符。幸运的是，set类为我们提供了一种以set.intersection形式获取n个集合的交集的方法。它不会接受dict_keys实例，但是您可以通过调用set来轻松解决该问题：

>>> set.intersection(d[\'g1\'].keys(),d[\'g2\'].keys(),d[\'g5\'].keys())
Traceback (most recent call last):
  File \"<stdin>\",line 1,in <module>
TypeError: descriptor \'intersection\' requires a \'set\' object but received a \'dict_keys\'
>>> set.intersection(set(d[\'g1\']),set(d[\'g1\']),set(d[\'g5\']))
{\'p1\'}

您应该能够相当轻松地将其概括为大小为2到n的群集。

词典字典：共享至少两个公共键的打印词典

如何解决词典字典：共享至少两个公共键的打印词典

解决方法

相关推荐