本文分享自华为云社区《》,作者: O泡果奶~。
当前DM(P1、P3、CBGDM)存在维度表与主表关联时使用会计期作为关联条件,会导致出现大内存占用或未识别数据倾斜的问题
【场景一】f.period_id = 维度表.period_id1.1、【问题描述】主表和维度表关联过程中将会计期作为关联条件,导致维度表未进行分区剪枝,可能会产生大内存占用的情况
1.2、【原始SQL】仅呈现SQL中的问题,详细SQL见附件
FROM
DMACC.dm_adp_ar_trx_dtl_tmp F
INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY
INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY
AND T9.PROD_POV_ID = 1
INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY
INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY
LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY
AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY
left join DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID
AND F.PERIOD_ID = PP.PERIOD_ID
LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID
INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID
INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID
INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID
LEFT JOIN (
SELECT C
.CONTRACT_KEY,
D.COMPANY_KEY,
R.FIRST_SHIP_DATE
FROM
DMDIM.dm_dim_contract_d C,
DMDIM.DM_DIM_COMPANY_D D,
DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R
WHERE
C.CONTRACT_ID = R.CONTRACT_ID
AND D.COMPANY_ID = R.COMPANY_ID
) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY
AND F.COA_COMPANY_KEY = FR.COMPANY_KEY
INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY
JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID
JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID
1.3、【性能分析】
从上图的执行计划可以看出,由于用会计期作为关联条件,导致维度表未进行分区剪枝,数据量大,不但产生了数据倾斜,同时还由于数据量大出现了关联下盘,大大降低了sql执行性能。
主表只有一个会计期,可以识别出对应的会计期,然后对SQL进行如下改写:
FROM
DMACC.dm_adp_ar_trx_dtl_tmp F
INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY
INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY
AND T9.PROD_POV_ID = 1
INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY
INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY
LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY
AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY
LEFT JOIN DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID
AND PP.PERIOD_ID = '202406'
LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID
INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID
INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID
INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID
LEFT JOIN (
SELECT C
.CONTRACT_KEY,
D.COMPANY_KEY,
R.FIRST_SHIP_DATE
FROM
DMDIM.dm_dim_contract_d C,
DMDIM.DM_DIM_COMPANY_D D,
DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R
WHERE
C.CONTRACT_ID = R.CONTRACT_ID
AND D.COMPANY_ID = R.COMPANY_ID
) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY
AND F.COA_COMPANY_KEY = FR.COMPANY_KEY
INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY
JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID
JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID
经优化后,执行计划如下图所示,维度表进行了分区剪枝,数据量减少,缓解了数据倾斜,也避免了关联下盘的问题。