转载：Understanding How PostgreSQL Executes a Query

转载来自：http://blog.sciencenet.cn/home.php?mod=space&uid=643407&do=blog&id=517715

UnderstandingHowPostgreSQLExecutesaQuery

Before going much further,you should understand the procedure thatPostgreSQLfollows whenever itexecutesaqueryon your behalf.

After thePostgreSQLserver receivesaqueryfrom the client application,the text of thequeryis handed to theparser. The parser scans through thequeryand checks it for syntax errors. If thequeryis syntactically correct,the parser will transform thequerytext intoaparse tree.Aparse tree isadata structure that represents themeaningof yourqueryinaformal,unambiguous form.

Given thequery

SELECT customer_name,balance FROM customers WHERE balance > 0 ORDER BY balance

the parser might come up withaparse tree structured as shown in Figure 4.5.

Figure 4.5. A sample parse tree.

After the parser has completed parsing thequery,the parse tree is handed off to the planner/optimizer.

The planner is responsible for traversing the parse tree and finding all possible plans for executing thequery. The plan might includeasequential scan through the entire table and index scans if useful indexes have been defined. If thequeryinvolves two or more tables,the planner can suggestanumber of different methods for joining the tables. The execution plans are developed in terms ofqueryoperators. Eachqueryoperator transforms one or moreinput setsinto an intermediate result set. The Seq Scan operator,for example,transforms an input set (the physical table) intoaresult set,filtering out any rows that don't meet thequeryconstraints. The Sort operator producesaresult set by reordering the input set according to one or more sort keys. I'll describe each of thequeryoperators in more detailalittle later. Figure 4.6 shows an example ofasimple execution plan (it isanew example; it isnotrelated to the parse tree in Figure 4.5).

Figure 4.6. simple execution plan.

You can see that complex queries are broken down into simple steps. The input set foraqueryoperator at the bottom of the tree is usuallyaphysical table. The input set for an upper-level operator is the result set ofalower-level operator.

When all possible execution plans have been generated,the optimizer searches for the least-expensive plan. Each plan is assigned an estimated execution cost. Cost estimates are measured in units of disk I/O. An operator that readsasingle block of 8,192 bytes (8K) from the disk hasacost of one unit. CPU time is also measured in disk I/O units,but usually asafraction. For example,the amount of CPU time required to processasingle tuple is assumed to be 1/100^thofasingle disk I/O. You can adjust many of the cost estimates. Eachqueryoperator hasadifferent cost estimate. For example,the cost ofasequential scan of an entire table is computed as the number of 8K blocks in the table,plus some CPU overhead.

After choosing the (apparently) least-expensive execution plan,thequeryexecutor starts at the beginning of the plan and asks the topmost operator to producearesult set. Each operator transforms its input set intoaresult set?the input set may come from another operator lower in the tree. When the topmost operator completes its transformation,the results are returned to the client application.

EXPLAIN

The EXPLAIN statement gives you some insight intohowthePostgreSQLqueryplanner/optimizer decides to executeaquery.

First,you should know that the EXPLAIN statement can be used only to analyze SELECT,INSERT,DELETE,UPDATE,and DECLARE...CURSOR commands.

The syntax for the EXPLAIN command is

EXPLAIN [ANALYZE][VERBOSE] query ;

Let's start by looking atasimple example:

perf=# EXPLAIN ANALYZE SELECT * FROM recalls; NOTICE: QUERY PLAN: Seq Scan on recalls (cost=0.00..9217.41 rows=39241 width=1917) (actual time=69.35..3052.72 rows=39241 loops=1) Total runtime: 3144.61 msec

The format of the execution plan can bealittle mysterious at first. For each step in the execution plan,EXPLAIN prints the following information:

The type of operation required.
The estimated cost of execution.
If you specified EXPLAIN ANALYZE,the actual cost of execution. If you omit the ANALYZE keyword,thequeryis planned but not executed,and the actual cost is not displayed.

In this example,PostgreSQLhas decided to performasequential scan of the recalls table (Seq Scan on recalls). There are many operations thatPostgreSQLcan use to executeaquery. I'll explain the operation type in more detail inamoment.

There are three data items in the cost estimate. The first set of numbers (cost=0.00..9217.41) is an estimate ofhow"expensive" this operation will be. "Expensive" is measured in terms of disk reads. Two numbers are given: The first number representshowquickly the first row in the result set can be returned by the operation; the second (which is usually the most important) representshowlong the entire operation should take. The second data item in the cost estimate (rows=39241) showshowmany rowsPostgreSQLexpects to return from this operation. The final data item (width=1917) is an estimate of the width,in bytes,of the average row in the result set.

If you include the ANALYZE keyword in the EXPLAIN command,PostgreSQLwill execute thequeryand display theactualexecution costs.

Cost Estimates

I will remove the cost estimates from some of the EXPLAIN results in this chapter to make the planabit easier to read. Don't be confused by this?the EXPLAIN command will always print cost estimates.

This wasasimple example.PostgreSQLrequired only one step to execute thisquery(asequential scan on the entire table). Many queries require multiple steps and the EXPLAIN command will show you each of those steps. Let's look atamore complex example:

perf=# EXPLAIN ANALYZE SELECT * FROM recalls ORDER BY yeartxt; NOTICE: PLAN: Sort (cost=145321.51..145321.51 rows=39241 width=1911) (actual time=13014.92..13663.86 rows=39241 loops=1) ->Seq Scan on recalls (cost=0.00..9217.41 rows=39241 width=1917) (actual time=68.99..3446.74 rows=39241 loops=1) Total runtime: 16052.53 msec

This example showsatwo-stepqueryplan. In this case,the first step is actually listed at the end of the plan. When you readaqueryplan,it is important to remember that each step in the plan produces an intermediate result set. Each intermediate result set is fed into the next step of the plan.

Looking at this plan,PostgreSQLfirst produces an intermediate result set by performingasequential scan (Seq Scan) on the entire recalls table. That step should take about 9,217 disk page reads,and the result set will have about 39,241 rows,averaging 1,917 bytes each. Notice that these estimates are identical to those produced in the first example?and in both cases,you are executingasequential scan on the entire table.

After the sequential scan has finished building its intermediate result set,it is fed into the next step in the plan. The final step in this particular plan isasort operation,which is required to satisfy our ORDER BY clause^[8]. The sort operation reorders the result set produced by the sequential scan and returns the final result set to the client application.

^[8]An ORDER BY clause does not requireasort operation in all cases. The planner/optimizer may decide that it can use an index to order the result set.

The Sort operation expectsasingle operand?aresult set. The Seq Scan operation expectsasingle operand?atable. Some operations require more than one operand. Here isajoin between the recalls table and the mfgs table:

perf=# EXPLAIN SELECT * FROM recalls,mfgs perf-# WHERE recalls.mfgname = mfgs.mfgname; NOTICE: PLAN: Merge Join -> Sort -> Seq Scan on recalls -> Sort -> Seq Scan on mfgs

If you use your imagination,you will see that thisqueryplan is actuallyatree structure,as illustrated in Figure 4.7.

Figure 4.7. Execution plan viewed as a tree.

WhenPostgreSQLexecutesthisqueryplan,it starts at the top of the tree. The Merge Join operation requires two result sets for input,soPostgreSQLmust move down one level in the tree; let's assume that you traverse the left child first. Each Sort operation requiresasingle result set for input,so again thequeryexecutor moves down one more level. At the bottom of the tree,the Seq Scan operation simply readsarow fromatable and returns that row to its parent. AfteraSeq Scan operation has scanned the entire table,the left-hand Sort operation can complete. As soon as the left-hand Sort operation completes,the Merge Join operator will evaluate its right child. In this case,the right-hand child evaluates the same way as the left-hand child. When both Sort operations complete,the Merge Join operator will execute,producing the final result set.

So far,you've seen threequeryexecution operators in the execution plans.PostgreSQLcurrently has 19queryoperators. Let's look at each in more detail.

Seq Scan

The Seq Scan operator is the most basicqueryoperator. Any single-tablequerycan be carried out using the Seq Scan operator.

Seq Scan works by starting at the beginning of the table and scanning to the end of the table. For each row in the table,Seq Scan evaluates thequeryconstraints^[9](that is,the WHERE clause); if the constraints are satisfied,the required columns are added to the result set.

^[9]The entire WHERE clause may not be evaluated for each row in the input set.PostgreSQLevaluates only the portions of the clause that apply to the given row (if any). Forasingle-table SELECT,the entire WHERE clause is evaluated. Foramulti-table join,only the portion that applies to the given row is evaluated.

As you saw earlier in this chapter,atable can include dead (that is,deleted) rows and rows that may not be visible because they have not been committed. Seq Scan does not include dead rows in the result set,but it must read the dead rows,and that can be expensive inaheavily updated table.

The cost estimate foraSeq Scan operator gives youahint abouthowthe operator works:

Seq Scan on recalls (cost=0.00..9217.41 rows=39241 width=1917)

The startup cost is always 0.00. This implies that the first row ofaSeq Scan operator can be returned immediately and that Seq Scan does not read the entire table before returning the first row. If you openacursor againstaquerythat uses the Seq Scan operator (and no other operators),the first FETCH will return immediately?you won't have to wait for the entire result set to be materialized before you can FETCH the first row. Other operators (such as Sort)doread the entire input set before returning the first row.

The planner/optimizer choosesaSeq Scan if there are no indexes that can be used to satisfy thequery.ASeq Scan is also used when the planner/optimizer decides that it would be less expensive (or just as expensive) to scan the entire table and then sort the result set to meet an ordering constraint (such as an ORDER BY clause).

Index Scan

An Index Scan operator works by traversing an index structure. If you specifyastarting value for an indexed column (WHERE record_id >= 1000,for example),the Index Scan will begin at the appropriate value. If you specify an ending value (such as WHERE record_id < 2000),the Index Scan will complete as soon as it finds an index entry greater than the ending value.

The Index Scan operator has two advantages over the Seq Scan operator. First,aSeq Scan must read every row in the table?it can only remove rows from the result set by evaluating the WHERE clause for each row. Index Scan may not read every row if you provide starting and/or ending values. Second,aSeq Scan returns rows in table order,not in sorted order. Index Scan will return rows in index order.

Not all indexes are scannable. The B-Tree,R-Tree,and GiST index types can be scanned;aHash index cannot.

The planner/optimizer uses an Index Scan operator when it can reduce the size of the result set by traversingarange of indexed values,or when it can avoidasort because of the implicit ordering offered by an index.

Sort

The Sort operator imposes an ordering on the result set.PostgreSQLuses two different sort strategies: an in-memory sort and an on-disk sort. You can tuneaPostgreSQLinstance by adjusting the value of the sort_mem runtime parameter. If the size of the result set exceeds sort_mem,Sort will distribute the input set toacollection of sorted work files and then merge the work files back together again. If the result set will fit in sort_mem*1024 bytes,the sort is done in memory using the QSort algorithm.

ASort operator never reduces the size of the result set?it does not remove rows or columns.

Unlike Seq Scan and Index Scan,the Sort operator must process the entire input set before it can return the first row.

The Sort operator is used for many purposes. Obviously,aSort can be used to satisfy an ORDER BY clause. Somequeryoperators require their input sets to be ordered. For example,the Unique operator (we'll see that inamoment) eliminates rows by detecting duplicate values as it reads throughasorted input set. Sort will also be used for some join operations,group operations,and for some set operations (such as INTERSECT and UNION).

Unique

The Unique operator eliminates duplicate values from the input set. The input set must be ordered by the columns,and the columns must be unique. For example,the following command

SELECT DISTINCT mfgname FROM recalls;

might produce this execution plan:

Unique -> Sort -> Seq Scan on recalls

The Sort operation in this plan orders its input set by the mfgname column. Unique works by comparing the unique column(s) from each row to the previous row. If the values are the same,the duplicate is removed from the result set.

The Unique operator removes only rows?it does not remove columns and it does not change the ordering of the result set.

Unique can return the first row in the result set before it has finished processing the input set.

The planner/optimizer uses the Unique operator to satisfyaDISTINCT clause. Unique is also used to eliminate duplicates inaUNION.

LIMIT

The LIMIT operator is used to limit the size ofaresult set.PostgreSQLuses the LIMIT operator for both LIMIT and OFFSET processing. The LIMIT operator works by discarding the firstxrows from its input set,returning the nextyrows,and discarding the remainder. If thequeryincludes an OFFSET clause,xrepresents the offset amount; otherwise,xis zero. If thequeryincludesaLIMIT clause,yrepresents the LIMIT amount; otherwise,yis at least as large as the number of rows in the input set.

The ordering of the input set is not important to the LIMIT operator,but it is usually important to the overallqueryplan. For example,thequeryplan for thisquery

perf=# EXPLAIN SELECT * FROM recalls LIMIT 5; NOTICE: PLAN: Limit (cost=0.00..0.10 rows=5 width=1917) -> Seq Scan on recalls (cost=0.00..9217.41 rows=39241 width=1917)

shows that the LIMIT operator rejects all but the first five rows returned by the Seq Scan. On the other hand,thisquery

perf=# EXPLAIN ANALYZE SELECT * FROM recalls ORDER BY yeartxt LIMIT 5; NOTICE: PLAN: Limit (cost=0.00..0.10 rows=5 width=1917) ->Sort (cost=145321.51..145321.51 rows=39241 width=1911) ->Seq Scan on recalls (cost=0.00..9217.41 rows=39241 width=1917)

shows that the LIMIT operator returns the first five rows from an ordered input set.

The LIMIT operator never removes columns from the result set,but it obviously removes rows.

The planner/optimizer usesaLIMIT operator if thequeryincludesaLIMIT clause,an OFFSET clause,or both. If thequeryincludes onlyaLIMIT clause,the LIMIT operator can return the first row before it processes the entire set.

Aggregate

The planner/optimizer produces an Aggregate operator whenever thequeryincludes an aggregate function. The following functions are aggregate functions: AVG(),COUNT(),MAX(),MIN(),STDDEV(),SUM(),and VARIANCE().

Aggregate works by reading all the rows in the input set and computing the aggregate values. If the input set is not grouped,Aggregate producesasingle result row. For example:

movies=# EXPLAIN SELECT COUNT(*) FROM customers; Aggregate (cost=22.50..22.50 rows=1 width=0) -> Seq Scan on customers (cost=0.00..20.00 rows=1000 width=0)

If the input setisgrouped,Aggregate produces one result row for each group:

movies=# EXPLAIN movies-# SELECT COUNT(*),EXTRACT( DECADE FROM birth_date ) movies-# FROM customers movies-# GROUP BY EXTRACT( DECADE FROM birth_date ); NOTICE: PLAN: Aggregate (cost=69.83..74.83 rows=100 width=4) -> Group (cost=69.83..72.33 rows=1000 width=4) -> Sort (cost=69.83..69.83 rows=1000 width=4) -> Seq Scan on customers (cost=0.00..20.00 rows=1000 width=4)

Notice that the row estimate of an ungrouped aggregate is always 1; the row estimate ofagroup aggregate is 1/10^thof the size of the input set.

Append

The Append operator is used to implementaUNION. An Append operator will have two or more input sets. Append works by returning all rows from the first input set,then all rows from the second input set,and so on until all rows from all input sets have been processed.

Here isaqueryplan that shows the Append operator:

The cost estimate for an Append operator is simply the sum of cost estimates for all input sets. An Append operator can return its first row before processing all input rows.

The planner/optimizer uses an Append operator whenever it encountersaUNION clause. Append is also used when you select fromatable involved in an inheritance hierarchy. In Chapter 3,"PostgreSQLSQL Syntax and Use," I defined three tables,as shown in Figure 4.8.

Figure 4.8. Inheritance hierarchy.

The dvds table inherits from video,as does the tapes table. If you SELECT from dvds or video,PostgreSQLwill respond withasimplequeryplan:

movies=# EXPLAIN SELECT * FROM dvds; Seq Scan on dvds (cost=0.00..20.00 rows=1000 width=122) movies=# EXPLAIN SELECT * FROM tapes; Seq Scan on tapes (cost=0.00..20.00 rows=1000 width=86)

Remember,because of the inheritance hierarchy,advdisavideo andatapeisavideo. If you SELECT from video,you would expect to see all dvds,all tapes,and all videos. Thequeryplan reflects the inheritance hierarchy:

movies=# EXPLAIN SELECT * FROM video; Result(cost=0.00..60.00 rows=3000 width=86) ->Append(cost=0.00..60.00 rows=3000 width=86) ->Seq Scan on video (cost=0.00..20.00 rows=1000 width=86) ->Seq Scan on tapes video (cost=0.00..20.00 rows=1000 width=86) ->Seq Scan on dvds video (cost=0.00..20.00 rows=1000 width=86)

Look closely at the width clause in the preceding cost estimates. If you SELECT from the dvds table,the width estimate is 122 bytes per row. If you SELECT from the tapes table,the width estimate is 86 bytes per row. When you SELECT from video,all rows are expected to be 86 bytes long. Here are the commands used to create the tapes and dvds tables:

movies=# CREATE TABLE tapes ( ) INHERITS( video ); movies=# CREATE TABLE dvds movies-# ( movies(# region_id INTEGER,movies(# audio_tracks VARCHAR[] movies(# ) INHERITS ( video );

You can see thatarow from the tapes table is identical toarow in the video table?you would expect them to be the same size (86 bytes).Arow in the dvds table containsavideo plusafew extra columns,so you would expectadvds row to be longer thanavideo row. When you SELECT from the video table,you want all videos.PostgreSQLdiscards any columns that are not inherited from the video table.

Result

The Result operator is used in three contexts.

aResult operator is used to executeaquerythat does not retrieve data fromatable:

movies=# EXPLAIN SELECT timeofday(); Result

In this form,the Result operator simply evaluates the given expression(s) and returns the results.

Result is also used to evaluate the parts ofaWHERE clause that don't depend on data retrieved fromatable. For example:

movies=# EXPLAIN SELECT * FROM tapes WHERE 1 <> 1; Result ->Seq Scan on tapes

This might seem likeasillyquery,but some client applications will generateaqueryof this form as an easy way to retrieve the metadata (that is,column definitions) foratable.

The planner/optimizer also generatesaResult operator if the top node in thequeryplan is an Append operator. This isarather obscure rule that has no performance implications; it just happens to make thequeryplanner and executorabit simpler for thePostgreSQLdevelopers to maintain.

Nested Loop

The Nested Loop operator is used to performajoin between two tables.ANested Loop operator requires two input sets (given thataNested Loop joins two tables,this makes perfect sense).

Nested Loop works by fetching each from one of the input sets (called theouter table). For each row in the outer table,the other input (called theinner table) is searched forarow that meets the join qualifier.

Here is an example:

perf=# EXPLAIN perf-# SELECT * FROM customers,rentals perf=# WHERE customers.customer_id = rentals.customer_id; Nested Loop -> Seq Scan on rentals -> Index Scan using customer_id on customers

The outer table is always listed first in thequeryplan (in this case,rentals is the outer table). To execute this plan,the Nested Loop operator will read each row^[10]in the rentals table. For each rentals row,Nested Loop reads the corresponding customers row using an indexed lookup on the customer_id index.

^[10]Actually,Nested Loop reads only those rows that meet thequeryconstraints.

ANested Loop operator can be used to perform inner joins,left outer joins,and unions.

Because Nested Loop does not process the entire inner table,it can't be used for other join types (full,right join,and so on).

Merge Join

The Merge Join operator also joins two tables. Like the Nested Loop operator,Merge Join requires two input sets: an outer table and an inner table. Each input set must be ordered by the join columns.

Let's look at the previousquery,this time executed asaMerge Join:

Merge Join starts reading the first row from each table (see Figure 4.9).

Figure 4.9. Merge Join?Step 1.

If the join columns are equal (as in this case),Merge Join createsanew row containing the necessary columns from each input table and returns the new row. Merge Join then moves to the next row in the outer table and joins it with the corresponding row in the inner table (see Figure 4.10).

Figure 4.10. Merge Join?Step 2.

Next,Merge Join reads the third row in the outer table (see Figure 4.11).

Figure 4.11. Merge Join?Step 3.

Now Merge Join must advance the inner table twice before another result row can be created (see Figure 4.12).

Figure 4.12. Merge Join?Step 4.

After producing the result row for customer_id = 3,Merge Join moves to the last row in the outer table and then advances the inner table toamatching row (see Figure 4.13).

Figure 4.13. Merge Join?Step 5.

Merge Join completes by producing the final result row (customer_id = 4).

You can see that Merge Join works by walking through two sorted tables and finding matches?the trick is in keeping the pointers synchronized.

This example shows aninner join,but the Merge Join operator can be used for other join types by walking through the sorted input sets in different ways. Merge Join can do inner joins,outer joins,and unions.

Hash and Hash Join

The Hash and Hash Join operators work together. The Hash Join operator requires two input sets,again called the outer and inner tables. Here isaqueryplan that uses the Hash Join operator:

movies=# EXPLAIN movies-# SELECT * FROM customers,rentals movies-# WHERE rentals.customer_id = customers.customer_id; Hash Join -> Seq Scan on customers -> Hash -> Seq Scan on rentals

Unlike other join operators,Hash Join does not require either input set to be ordered by the join column. Instead,the inner table isalwaysahash table,and the ordering of the outer table is not important.

The Hash Join operator starts by creating its inner table using the Hash operator. The Hash operator createsatemporary Hash index that covers the join column in the inner table.

Once the hash table (that is,the inner table) has been created,Hash Join reads each row in the outer table,hashes the join column (from the outer table),and searches the temporary Hash index foramatching value.

AHash Join operator can be used to perform inner joins,Hei; font-size:14px; line-height:25.200000762939453px">Group

The Group operator is used to satisfyaGROUP BY clause.Asingle input set is required by the Group operator,7 and it must be ordered by the grouping column(s).

Group can work in two distinct modes. If you are computingagrouped aggregate,Group will return each row in its input set,following each group withaNULL row to indicate the end of the group (the NULL row is for internal bookkeeping only,and it will not show up in the final result set). For example:

Notice that the row count in the Group operator's cost estimate is the same as the size of its input set.

If you arenotcomputingagroup aggregate,Group will return one row for each group in its input set. For example:

movies=# EXPLAIN movies-# SELECT EXTRACT( DECADE FROM birth_date ) FROM customers movies-# GROUP BY EXTRACT( DECADE FROM birth_date ); Group (cost=69.83..69,83 rows=100 width=4) -> Sort (cost=69.83..69.83 rows=1000 width=4) -> Seq Scan on customers (cost=0.00..20.00 rows=1000 width=4)

In this case,the estimated row count is 1/10^thof the Group operator's input set.

Subquery Scan and Subplan

ASubquery Scan operator is used to satisfyaUNION clause; Subplan is used for subselects. These operators scan through their input sets,adding each row to the result set. Each of these operators are used for internal bookkeeping purposes and really don't affect the overallqueryplan?you can usually ignore them.

Just so you know when they are likely to be used,here are two samplequeryplans that show the Subquery Scan and Subplan operators:

perf=# EXPLAIN perf-# SELECT * FROM recalls WHERE mfgname = 'FORD' perf-# UNION perf=# SELECT * FROM recalls WHERE yeartxt = '1983'; Unique ->Sort ->Append ->Subquery Scan *SELECT* 1 ->Seq Scan on recalls ->Subquery Scan *SELECT* 2 ->Seq Scan on recalls movies=# EXPLAIN movies-# SELECT * FROM customers movies-# WHERE customer_id IN movies-# ( movies(# SELECT customer_id FROM rentals movies(# ); NOTICE: PLAN: Seq Scan on customers (cost=0.00..3.66 rows=2 width=47) SubPlan -> Seq Scan on rentals (cost=0.00..1.04 rows=4 width=4) Tid Scan

The Tid Scan (tuple ID scan) operator is rarely used.Atuple is roughly equivalent toarow. Every tuple has an identifier that is unique withinatable?this is called the tuple ID. When you selectarow,you can ask for the row's tuple ID:

The "ctid" isaspecial column (similar to the oid) that is automaticallyapart of every row.Atuple ID is composed ofablock number andatuple number within the block. All the rows in the previous sample are stored in block 0 (the first block of the table file). The customers row for "Panky,Henry" is stored in tuple 3 of block 0.

After you knowarow's tuple ID,you can request that row again by using its ID:

movies=# SELECT customer_id,customer_name FROM customers movies-# WHERE ctid = '(0,3)'; customer_id | customer_name -------------+--------------- 3 | Panky,Henry

The tuple ID works likeabookmark.Atuple ID,however,is valid only withinasingle transaction. After the transaction completes,the tuple ID should not be used.

The Tid Scan operator is used whenever the planner/optimizer encountersaconstraint of the form ctid =expressionorexpression= ctid.

The fastest possible way to retrievearow is by its tuple ID. When you SELECT by tuple ID,the Tid Scan operator reads the block specified in the tuple ID and returns the requested tuple.

Materialize

The Materialize operator is used for some subselect operations. The planner/optimizer may decide that it is less expensive to materializeasubselect once than to repeat the work for each top-level row.

Materialize will also be used for some merge-join operations. In particular,if the inner input set ofaMerge Join operator is not produced byaSeq Scan,an Index Scan,aSort,oraMaterialize operator,the planner/optimizer will insertaMaterialize operator into the plan. The reasoning behind this rule is not obvious?it has more to do with the capabilities of the other operators than with the performance or the structure of your data. The Merge Join operator is complex; one requirement of Merge Join is that the input sets must be ordered by the join columns.Asecond requirement is that the inner input set must berepositionable; that is,Merge Join needs to move backward and forward through the input set. Not all ordered operators can move backward and forward. If the inner input set is produced by an operator that is not repositionable,the planner/optimizer will insertaMaterialize.

Setop (Intersect,Intersect All,Except,Except All)

There are four Setop operators: Setop Intersect,Setop Intersect All,Setop Except,and Setop Except All. These operators are produced only when the planner/optimizer encounters an INTERSECT,INTERSECT ALL,EXCEPT,or EXCEPT ALL clause,respectively.

All Setop operators require two input sets. The Setop operators work by first combining the input sets intoasorted list,and then groups of identical rows are identified. For each group,the Setop operator counts the number of rows contributed by each input set. Finally,each Setop operator uses the counts to determinehowmany rows to add to the result set.

I think this will be easier to understand by looking at an example. Here are two queries; the first selects all customers born in the 1960s:

movies=# SELECT * FROM customers movies-# WHERE EXTRACT( DECADE FROM birth_date ) = 196; customer_id | customer_name | phone | birth_date | balance -------------+----------------------+----------+------------+--------- 3 | Panky,Henry | 555-1221 | 1968-01-21 | 0.00 4 | Wonderland,Alice N. | 555-1122 | 1969-03-05 | 3.00

The second selects all customers withabalance greater than 0:

movies=# EXPLAIN movies-# SELECT * FROM customers movies-# WHERE EXTRACT( DECADE FROM birth_date ) = 196 movies-# INTERSECT movies-# SELECT * FROM customers WHERE balance > 0; SetOp Intersect -> Sort -> Append -> Subquery Scan *SELECT* 1 -> Seq Scan on customers -> Subquery Scan *SELECT* 2 -> Seq Scan on customers

Thequeryexecutor starts by executing the two subqueries and then combining the results intoasorted list. An extra column is added that indicates which input set contributed each row:

The SetOp operator finds groups of duplicate rows (ignoring the input set pseudo-column). For each group,SetOp counts the number of rows contributed by each input set. The number of rows contributed by the outer set is called count(outer). The number of rows contributed by the inner result set is called count(inner).

Here ishowthe sample looks after counting each group:

The first group containsasingle row,contributed by the inner input set. The second group containsasingle row,contributed by the outer input set. The final group contains two rows,one contributed by each input set.

When SetOp reaches the end ofagroup of duplicate rows,it determineshowmany copies to write into the result set according to the following rules:

INTERSECT? If count(outer) > 0 and count(inner) > 0,write one copy of the row to the result set; otherwise,the row is not included in the result set.
INTERSECT ALL? If count(outer) > 0 and count(inner) > 0,writencopies of the row to the result set; wherenis the greater count(outer) and count(inner).
EXCEPT? If count(outer) > 0 and count(inner) = 0,write one copy of the row to the result set.
EXCEPT ALL? If count(inner) >= count(outer),writencopies of the row to the result set; wherenis count(outer) - count(inner).

转载：Understanding How PostgreSQL Executes a Query

相关推荐