PostgreSQL 查询优化--CTE使用

背景1：

当我们需要查询很多客户的，离当前时间最近订单时

参考网址：

http://bonesmoses.org/2014/05/08/trumping-the-postgresql-query-planner/

创建测试表

CREATETABLEtest_order
(
client_idINTNOTNULL,order_dateTIMESTAMPNOTNULL,fillerTEXTNOTNULL
);

插入测试数据

INSERTINTOtest_order
SELECTs1.id,(CURRENT_DATE-INTERVAL'1000days')::DATE
+generate_series(1,s1.id%1000),repeat('',20)
FROMgenerate_series(1,10000)s1(id);

CREATEINDEXidx_test_order_client_id_order_date
ONtest_order(client_id,order_dateDESC);

执行普通SQL

不走索引

EXPLAINANALYZE
SELECTclient_id,max(order_date)
FROMtest_order
GROUPBYclient_id;

"Executiontime:5741.682ms"

使用索引

EXPLAIN ANALYZE SELECT DISTINCT ON (client_id) client_id,order_date FROM test_order ORDER BY client_id,order_date DESC;

"Executiontime:4628.510ms"

优化后SQL

EXPLAINANALYZE
WITHRECURSIVEskipAS
(
(SELECTclient_id,order_date
FROMtest_order
ORDERBYclient_id,order_dateDESC
LIMIT1)
UNIONALL
(SELECT(SELECTmin(client_id)
FROMtest_order
WHEREclient_id>skip.client_id
)ASclient_id,(SELECTmax(order_date)
FROMtest_order
WHEREclient_id=(
SELECTmin(client_id)
FROMtest_order
WHEREclient_id>skip.client_id
)
)ASorder_date
FROMskip
WHEREskip.client_idISNOTNULL)
)
SELECT*
FROMskip;

"Executiontime:865.889ms"

查询结果

client_id;order_date
1;"2014-03-0900:00:00"
2;"2014-03-1000:00:00"
3;"2014-03-1100:00:00"
4;"2014-03-1200:00:00"
5;"2014-03-1300:00:00"
6;"2014-03-1400:00:00"
7;"2014-03-1500:00:00"
8;"2014-03-1600:00:00"
9;"2014-03-1700:00:00"
10;"2014-03-1800:00:00"
11;"2014-03-1900:00:00"
12;"2014-03-2000:00:00"
13;"2014-03-2100:00:00"
14;"2014-03-2200:00:00"
15;"2014-03-2300:00:00"
16;"2014-03-2400:00:00"
17;"2014-03-2500:00:00"
18;"2014-03-2600:00:00"
19;"2014-03-2700:00:00"
20;"2014-03-2800:00:00"
21;"2014-03-2900:00:00"
22;"2014-03-3000:00:00"
23;"2014-03-3100:00:00"
24;"2014-04-0100:00:00"

背景2：

参考网址：

https://yq.aliyun.com/articles/65202?spm=5176.8091938.0.0.tZZBTS

有一个这样的场景，一张小表A，里面存储了一些ID，大约几百个。

（比如说巡逻车辆ID，环卫车辆的ID，公交车，微公交的ID）。

另外有一张日志表B，每条记录中的ID是来自前面那张小表的，但不是每个ID都出现在这张日志表中，比如说一天可能只有几十个ID会出现在这个日志表的当天的数据中。

（比如车辆的行车轨迹数据，每秒上报轨迹，数据量就非常庞大）。

那么我怎么快速的找出今天没有出现的ID呢。

（哪些巡逻车辆没有出现在这个片区，是不是偷懒了？哪些环卫车辆没有出行，哪些公交或微公交没有出行）？

select id from A where id not in (select id from B where time between ? and ?);

这个QUERY会很慢，有什么优化方法呢。
当然，你还可以让车辆签到的方式来解决这个问题，但是总有未签到的，或者没有这种设计的时候，那么怎么解决呢？

--A
createtablea(idintprimarykey,infotext);
--B
createtableb(idintprimarykey,aidint,crt_timetimestamp);
createindexb_aidonb(aid);

--a表插入1000条
insertintoaselectgenerate_series(1,1000),md5(random()::text);
--b表插入500万条，只包含aid的500个id。
insertintobselectgenerate_series(1,5000000),generate_series(1,500),clock_timestamp();
优化前：

select*fromawhereidnotin(selectaidfromb);

执行时间：大于1min

优化后：

select*fromawhereidnotin(withrecursiveskipas(
(
selectmin(aid)aidfrombwhereaidisnotnull
)
unionall
(
select(selectmin(aid)aidfrombwhereb.aid>s.aidandb.aidisnotnull)
fromskipswheres.aidisnotnull
)--这里的wheres.aidisnotnull一定要加,否则就死循环了.
)
selectaidfromskipwhereaidisnotnull);

执行时间：46msec

情景3:

生成树形结构

参考网址：

http://blog.databasepatterns.com/2014/02/trees-paths-recursive-cte-postgresql.html

createtablesubregions(
idsmallintprimarykey,nametextnotnull,parent_idsmallintnullreferencessubregions(id)
);

insertintosubregionsvalues
(1,'World',null),(2,'Africa',1),(5,'SouthAmerica',419),(9,'Oceania',(11,'WesternAfrica',2),(13,'CentralAmerica',(14,'EasternAfrica',(15,'NorthernAfrica',(17,'MiddleAfrica',(18,'SouthernAfrica',(19,'Americas',(21,'NorthernAmerica',19),(29,'Caribbean',(30,'EasternAsia',142),(34,'SouthernAsia',(35,'South-EasternAsia',(39,'SouthernEurope',150),(53,'AustraliaandNewZealand',9),(54,'Melanesia',(57,'Micronesia',(61,'Polynesia',(142,'Asia',(143,'CentralAsia',(145,'WesternAsia',(150,'Europe',(151,'EasternEurope',(154,'NorthernEurope',(155,'WesternEurope',(419,'LatinAmericaandtheCaribbean',19);
Andyouwantedtomakeaprettytreelikethis:

World
Africa
EasternAfrica
MiddleAfrica
NorthernAfrica
SouthernAfrica
WesternAfrica
Americas
LatinAmericaandtheCaribbean
Caribbean
CentralAmerica
SouthAmerica
NorthernAmerica
Asia
CentralAsia
EasternAsia
South-EasternAsia
SouthernAsia
WesternAsia
Europe
EasternEurope
NorthernEurope
SouthernEurope
WesternEurope
Oceania
AustraliaandNewZealand
Melanesia
Micronesia
Polynesia
Here'showyou'ddoit:
withrecursivemy_expressionas(

--startwiththe"anchor",i.e.allofthenodeswhoseparent_idisnull:
select
id,nameaspath,nameastree,0aslevel
fromsubregions
where
parent_idisnull

unionall

--thentherecursivepart:
select
current.idasid,previous.path||'>'||current.nameaspath,previous.level+1)||current.nameastree,previous.level+1aslevel
fromsubregionscurrent
joinmy_expressionaspreviousoncurrent.parent_id=previous.id
)
select
tree
frommy_expression
orderby
path

路径间加入父节点和分割

select
path
frommy_expression
orderby
path

输出结果：

World
World>Africa
World>Africa>EasternAfrica
World>Africa>MiddleAfrica
World>Africa>NorthernAfrica
World>Africa>SouthernAfrica
World>Africa>WesternAfrica
World>Americas
World>Americas>LatinAmericaandtheCaribbean
World>Americas>LatinAmericaandtheCaribbean>Caribbean
World>Americas>LatinAmericaandtheCaribbean>CentralAmerica
World>Americas>LatinAmericaandtheCaribbean>SouthAmerica
World>Americas>NorthernAmerica
World>Asia
World>Asia>CentralAsia
World>Asia>EasternAsia
World>Asia>South-EasternAsia
World>Asia>SouthernAsia
World>Asia>WesternAsia
World>Europe
World>Europe>EasternEurope
World>Europe>NorthernEurope
World>Europe>SouthernEurope
World>Europe>WesternEurope
World>Oceania
World>Oceania>AustraliaandNewZealand
World>Oceania>Melanesia
World>Oceania>Micronesia
World>Oceania>Polynesia

PostgreSQL 查询优化--CTE使用

有一个这样的场景，一张小表A，里面存储了一些ID，大约几百个。

相关推荐