今天在GREENPLUM的执行计划中看到了materialize,不知道这个步骤是做什么的,查了下文档,没有相关解释。没事,GREENPLUM不行还有POSTGRESQL呢,GOOGLE了一把,找到了答案。
Q:
What does materialize do? I'm joining two tables,not views or anything like that.
A:
A materialize node means the output of whatever is below it in the tree (which can be a scan,or a full set of joins or something like that) is materalized into memory before the upper node is executed. This is usually done when the outer node needs a source that it can re-scan for some reason or other.
So in your case,the planner is determining that the result of a scan on one of your tables will fit in memory,and it till make it possible to choose an upper join operation that requires rescans while still being cheaper.
具体的SQL如下
aligputf8=# select count(1) from ttt1;
count
-------
10000
(1 row)
aligputf8=#
aligputf8=# select count(1) from ttt2;
count
-------
10000
(1 row)
aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
QUERY PLAN
---------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice2; segments: 6) (cost=0.00..1409.07 rows=1667 width=8)
-> Nested Loop (cost=0.00..1409.07 rows=1667 width=8)
-> Broadcast Motion 1:6 (slice1; segments: 1) (cost=0.00..137.07 rows=6 width=4)
-> Seq Scan on ttt1 a (cost=0.00..137.00 rows=1 width=4)
Filter: id = 3
-> Seq Scan on ttt2 b (cost=0.00..112.00 rows=1667 width=4)
(6 rows)
aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
aligputf8=#
aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id<3000;
QUERY PLAN
------------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice2; segments: 6) (cost=122.00..3599268.93 rows=4998334 width=8)
-> Nested Loop (cost=122.00..3599268.93 rows=4998334 width=8)
-> Broadcast Motion 6:6 (slice1; segments: 6) (cost=0.00..346.93 rows=2999 width=4)
-> Seq Scan on ttt1 a (cost=0.00..137.00 rows=500 width=4)
Filter: id < 3000
-> Materialize (cost=122.00..222.00 rows=1667 width=4)
-> Seq Scan on ttt2 b (cost=0.00..112.00 rows=1667 width=4)
(7 rows)
我的理解就是PG为了加快nested loop循环的速度,把b表的数据缓存在了内存中。我们可以看到前面一个SQL,只对b表数据进行1次扫描,因此并不需要进行缓存;后者需要进行2999次扫描。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。