Oracle 21C是融合数据库,可以支持任何数据、任何工作负载。不仅有区块链表、原生json数据类型、持久内存支持等技术创新,也有对传统老用户极度友好的自带机器学习算法:

d7e56911dfd789abf17a3c536004f967.png

    不用额外加载、学习python,只用我们最熟悉的sql语句调用dbms包就能进行数据分析,这太爽了。我们这里以MSET-SPRT 算法为例,切身体会一下Oracle 21c自带机器学习算法的友好便捷。

    MSET-SPRT(多元状态估计技术-序列概率比检验)算法是一种用于监视关键过程的非线性、非参数异常检测技术。MSET-SPRT算法可检测到细微的异常,同时产生最少的错误警报。算法根据监视信号正常操作序列的可用历史数据校准预期行为,将学习到的系统行为整合到持久的MSET-SPRT模型中。这样,我们就可以将模型应用于新记录以检测异常行为。

一、数据准备

我们用DBA们比较熟悉的SH用户下的sales表来做实验,我们选取1998年到1999年的数据生成mset_build_sh_data视图来训练数据。

SQL> CREATE OR replace VIEW mset_build_sh_data
2    AS SELECT time_id, sum(quantity_sold) quantity,
3    sum(amount_sold) amount from (SELECT * FROM sh.sales WHERE
4    time_id <= '30-DEC-99') GROUP BY time_id ORDER BY time_id;


View created.

我们选取sales表2000年以后的数据生成mset_test_sh_data视图来测试数据。

SQL> CREATE OR replace VIEW mset_test_sh_data
2    AS SELECT time_id, sum(quantity_sold) quantity, sum(amount_sold)
3    amount FROM (SELECT * FROM sh.sales WHERE time_id > '30-DEC-99')
4    GROUP BY time_id ORDER BY time_id;


View created.

我们创建mset_sh_settings表来存放算法执行需要的配置数据。

SQL> CREATE TABLE mset_sh_settings(setting_name VARCHAR2(30),
2    setting_value VARCHAR2(128));


Table created.

我们把MSET-SPRT设置为模型算法,并配置算法的几个详细参数。

SQL> BEGIN
2    -- Select MSET-SPRT as the algorithm
3    INSERT INTO mset_sh_settings
4    VALUES(dbms_data_mining.algo_name,
5    dbms_data_mining.algo_mset_sprt);
6    -- Turn on automatic data preparation
7    INSERT INTO mset_sh_settings
8    VALUES(dbms_data_mining.prep_auto,
9    dbms_data_mining.prep_auto_on);
10    -- Set memory vector
11    INSERT INTO mset_sh_settings
12    VALUES(dbms_data_mining.mset_memory_vectors, 100);
13    -- Set alpha
14    INSERT INTO mset_sh_settings
15    VALUES(dbms_data_mining.MSET_ALPHA_PROB, 0.1);
16    -- Set alert count
17    INSERT INTO mset_sh_settings
18    VALUES(dbms_data_mining.MSET_ALERT_COUNT, 3);
19    -- Set alert window
20    INSERT INTO mset_sh_settings
21    VALUES(dbms_data_mining.MSET_ALERT_WINDOW, 5);
22    -- Examples of other possible settings are:
23    -- (dbms_data_mining.mset_beta_prob, 0.1)
24    -- (dbms_data_mining.mset_adb_height, 0.01)
25    -- (dbms_data_mining.mset_std_tolerance, 3)
26    -- (dbms_data_mining.mset_heldaside, 500)
27    commit;
28    END;
29    /


PL/SQL procedure successfully completed.

二、建立数据模型

我们通过Oracle 21c自带的dbms_data_mining包调用mset_test_sh_data视图来建立MSET模型

SQL> BEGIN
2    dbms_data_mining.create_model(model_name => 'MSET_SH_MODEL',
3    mining_function => 'CLASSIFICATION',
4    data_table_name => 'mset_build_sh_data',
5    case_id_column_name => 'time_id',
6    target_column_name => '',
7    settings_table_name => 'MSET_SH_SETTINGS');
8    END;
9    /


PL/SQL procedure successfully completed.

我们查询模型设置,看看算法参数是否生效

SQL> column setting_name format a30
SQL>   column setting_value format a30
SQL>   SELECT setting_name, setting_value
2    FROM user_mining_model_settings
3    WHERE model_name = 'MSET_SH_MODEL'
4    ORDER BY setting_name;


SETTING_NAME                   SETTING_VALUE
------------------------------ ------------------------------
ALGO_NAME                      ALGO_MSET_SPRT
MSET_ADB_HEIGHT                .05
MSET_ALERT_COUNT               3
MSET_ALERT_WINDOW              5
MSET_ALPHA_PROB                .1
MSET_BETA_PROB                 .1
MSET_HELDASIDE                 10000
MSET_MEMORY_VECTORS            100
MSET_STD_TOLERANCE             3
ODMS_DETAILS                   ODMS_ENABLE
ODMS_MISSING_VALUE_TREATMENT   ODMS_MISSING_VALUE_AUTO
ODMS_RANDOM_SEED               0
ODMS_SAMPLING                  ODMS_SAMPLING_DISABLE
PREP_AUTO                      ON


14 rows selected.

我们查询模型属性信息

SQL> column attribute_name format a40
SQL>   column attribute_type format a20
SQL>   SELECT attribute_name, attribute_type
2    FROM user_mining_model_attributes
3    WHERE model_name = 'MSET_SH_MODEL'
4    ORDER BY attribute_name;


ATTRIBUTE_NAME                           ATTRIBUTE_TYPE
---------------------------------------- --------------------
AMOUNT                                   NUMERICAL
QUANTITY                                 NUMERICAL


2 rows selected.

MSET_SH_MODEL模型会有一些辅助视图来记录相关信息。

SQL> col view_name format a30
SQL>   col view_type format a50
SQL>   SELECT view_name, view_type FROM user_mining_model_views
2    WHERE model_name='MSET_SH_MODEL'
3    ORDER BY view_name;


VIEW_NAME
------------------------------
VIEW_TYPE
--------------------------------------------------
DM$VCMSET_SH_MODEL
Scoring Cost Matrix


DM$VGMSET_SH_MODEL
Global Name-Value Pairs


DM$VNMSET_SH_MODEL
Normalization and Missing Value Handling


DM$VSMSET_SH_MODEL
Computed Settings


DM$VTMSET_SH_MODEL
Classification Targets


DM$VWMSET_SH_MODEL
Model Build Alerts




6 rows selected.

以DM$VGMSET_SH_MODEL视图为例,我们可以通过DBA们熟悉的sql语句来查看NUM_ROWS信息

SQL> column name format a20
SQL>   column numeric_value format a20
SQL>   column string_value format a15
SQL>   SELECT name,
2    to_char(numeric_value, '99999') numeric_value,
3    string_value FROM DM$VGMSET_SH_MODEL
4    ORDER BY name;


NAME                 NUMERIC_VALUE        STRING_VALUE
-------------------- -------------------- ---------------
NUM_ROWS                729


1 row selected.

三、测试数据模型

我们用1998年到1999年的数据来建立模型,用2000年后的数据来测试模型。我们选择01年12月16日到01年12月25日的数据来显示数据样本的预测和概率

SQL> col prob format 0.999
SQL>   col pred format 9999
SQL>   col time_id format a20
SQL>   SELECT time_id, prediction(mset_sh_model using *) over
2    (ORDER BY time_id) pred, prediction_probability (mset_sh_model using *)
3    over (ORDER BY time_id) prob
4    FROM (SELECT * FROM mset_test_sh_data WHERE time_id > '15-DEC-01' AND
5    time_id <= '25-DEC-01' ) ORDER BY time_id;


TIME_ID               PRED   PROB
-------------------- ----- ------
16-DEC-01                1  1.000
17-DEC-01                1  0.833
18-DEC-01                1  0.667
19-DEC-01                0  0.500
20-DEC-01                0  0.500
21-DEC-01                0  0.500
22-DEC-01                0  0.500
23-DEC-01                1  0.667
24-DEC-01                1  0.833
25-DEC-01                1  0.833


10 rows selected.

还可以更进一步,用sql查看 2000年至2001年出现异常的所有日期

SQL> SELECT time_id, pred FROM (SELECT time_id, prediction(mset_sh_model using *)
2    over (ORDER BY time_id) pred FROM mset_test_sh_data) WHERE pred = 0;


TIME_ID               PRED
-------------------- -----
21-SEP-00                0
22-SEP-00                0
19-DEC-01                0
20-DEC-01                0
21-DEC-01                0
22-DEC-01                0


6 rows selected.

显示2000年至2001年的异常总数

SQL> col min(prob) format 0.999
SQL>   col max(prob) format 0.999
SQL>   SELECT pred, count(pred), min(prob), max(prob) FROM (
2    SELECT prediction(mset_sh_model using *) over (ORDER BY time_id) pred,
3    prediction_probability(mset_sh_model using *) over (ORDER BY time_id)
4    prob FROM mset_test_sh_data ) GROUP BY pred ORDER BY pred;


PRED COUNT(PRED) MIN(PROB) MAX(PROB)
----- ----------- --------- ---------
0           6     0.500     0.500
1         725     0.667     1.000


2 rows selected.

显示异常率:异常数/数据总数

SQL> 
SQL> col anomalyrate format 9.999
SQL>   SELECT 1-sum(correct)/count(*) AS anomalyrate
  2    FROM (SELECT decode(prediction(mset_sh_model using *) over
  3    (ORDER BY time_id), 1, 1) AS correct FROM mset_test_sh_data );


ANOMALYRATE
-----------
       .008


1 row selected.

显示预测详细信息

SQL> SET long 1000;
SQL>   col anomalydetails format a80
SQL>   SELECT time_id, prediction(mset_sh_model using *) over (ORDER BY time_id)
2    pred, prediction_details(mset_sh_model using *) over (ORDER BY time_id)
3    anomalyDetails FROM mset_test_sh_data WHERE time_id > '15-DEC-01' AND
4    time_id <= '25-DEC-01' ORDER BY time_id;


TIME_ID               PRED
-------------------- -----
ANOMALYDETAILS
--------------------------------------------------------------------------------
16-DEC-01                1
<Details algorithm="MSET-SPRT" class="1">
<Attribute name="QUANTITY" actualValue="558" weight="1" rank="1"/>
<Attribute name="AMOUNT" actualValue="117095.21" weight="1" rank="2"/>
</Details>


17-DEC-01                1
<Details algorithm="MSET-SPRT" class="1">
<Attribute name="AMOUNT" actualValue="281264.3" weight="1" rank="1"/>
<Attribute name="QUANTITY" actualValue="1507" weight=".8" rank="2"/>
</Details>


18-DEC-01                1
<Details algorithm="MSET-SPRT" class="1">
<Attribute name="AMOUNT" actualValue="114550.12" weight="1" rank="1"/>
<Attribute name="QUANTITY" actualValue="2447" weight=".6" rank="2"/>
</Details>


19-DEC-01                0
<Details algorithm="MSET-SPRT" class="0">
<Attribute name="QUANTITY" actualValue="367" weight=".6" rank="1"/>
</Details>


20-DEC-01                0
<Details algorithm="MSET-SPRT" class="0">
<Attribute name="QUANTITY" actualValue="1060" weight=".6" rank="1"/>
</Details>


21-DEC-01                0
<Details algorithm="MSET-SPRT" class="0">
<Attribute name="QUANTITY" actualValue="473" weight=".6" rank="1"/>
</Details>


22-DEC-01                0
<Details algorithm="MSET-SPRT" class="0">
<Attribute name="QUANTITY" actualValue="374" weight=".6" rank="1"/>
</Details>


23-DEC-01                1
<Details algorithm="MSET-SPRT" class="1">
<Attribute name="AMOUNT" actualValue="101917.27" weight="1" rank="1"/>
<Attribute name="QUANTITY" actualValue="1034" weight=".6" rank="2"/>
</Details>


24-DEC-01                1
<Details algorithm="MSET-SPRT" class="1">
<Attribute name="AMOUNT" actualValue="201537.77" weight="1" rank="1"/>
<Attribute name="QUANTITY" actualValue="1662" weight=".8" rank="2"/>
</Details>


25-DEC-01                1
<Details algorithm="MSET-SPRT" class="1">
<Attribute name="AMOUNT" actualValue="88312.19" weight="1" rank="1"/>
<Attribute name="QUANTITY" actualValue="470" weight=".8" rank="2"/>
</Details>




10 rows selected.

这一系列的操作都可以通过sql语句调用我们无比熟悉的DBMS包、视图、表来直接实现,简单便捷,减少了额外的学习成本、数据复制成本,是融合数据库便捷高效的又一例证。

编辑:殷海英

Logo

一站式 AI 云服务平台

更多推荐