最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

PolarDB MySQL 加索引卡主的整体解决方案

网站源码admin4浏览0评论

PolarDB MySQL 加索引卡主的整体解决方案

在使用PolarDB for MySQL的过程中,我们遇到一个问题,PolarDB 8.02的小版本8.02.2.24 在添加索引的时候,会有部分情况无法添加索引,添加索引失败。具体表现是如下图:

在添加索引后会停滞,如果你等不及,直接KILL 添加索引的会话(实际上不KILL长时间也会索引添加失败),则BUG触发,稍后会返回如下报错信息,然后在怎么添加索引都不在报错而是直接无响应。

代码语言:javascript代码运行次数:0运行复制
Fail to get table lock on replica; 
You can set polar_support_mdl_sync_preemption = ON;

但即使你打开了 polar_support_mdl_sync_preemption = ON; 也无济于事,还会在添加索引的时候报这个错误,同时也不是每次都会报错,而是在有大事务情况下,在读节点上很长时间无法运行完毕的情况下,会出现这个报错,而且只要出现这个报错,就会导致后续在添加索引的失败。解决方案为重启从节点。

这里提示:解决问题的关键是升级数据库版本,从8.02.2.24升级到 8.0.2.2.27,实际上8.0.2.2.24的BUG不少,这里列举一下。

1 优化文件系统在DDL过程中频繁创建和删除表文件时的性能稳定性。 2 新增高优先级DDL能力,保证DDL执行成功。

这两个部分是在8.02.2.25中修复的。

故障的原理是

1 在添加索引的情况下,基于POLARDB的原理,shared storage的架构,所以主从是一个存储,在这样的情况下,如果在主库将添加索引的SESSION KILL,会卡主,因为主库和从库之间有线程,从库不返回,主库就一直等导致的。

2 解决方案只能强行重启从节点,给主库一个信号,然后如果此时在建立索引则会建立索引立即成功。

在排查POLARDB FOR MySQL的数据库时,可以利用如下的命令

/force_node='p-2zexq8l8pzx8m6i'/ select * from information_schema.processlist where db = 'scm_suppl'; /force_node='p-2zec58b6uxm16a1'/ select * from information_schema.processlist where db = 'scm_suppl'; /force_node='p-2zeij811u776s5m'/ select * from information_schema.processlist where db = 'scm_supply';

在具体查询PolarDB的各个节点的信息,需要在语句前加入force_node 来具体查询各个节点的具体的状态信息。

查看DDL 工作的进度

代码语言:javascript代码运行次数:0运行复制
SELECT THREAD_ID,EVENT_ID, EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED,(WORK_COMPLETED/WORK_ESTIMATED)*100 as PROGRESS FROM performance_schema.events_stages_current;

如一些常见的问题

解决方案 1 及时进行从节点的事务的COMMIT ,如何发现没有及时的COMMIT 在从节点执行,并发现和解决

1 在主节点运行 /force_node='主节点名'/ show processlist; 如果在其中发现了如图中的位置写着,wait for syncing with replicas。 那么就证明的DDL 在被从库卡着

2 此时你需要去各个从库来对

代码语言:javascript代码运行次数:0运行复制
select * from information_schema.innodb_log_mdl_slot where slot_state = "SLOT_ACQUIRING"

来去找到从库那个语句在让主库的DDL等待的原因。

你可以开启 polar_slave_work_on_nonblock_mdl_mode = on

他的主要目的是为了解决从库中的长事务的问题,而不是长SQL的问题,长SQL这个解决不了,你还的自己在从库发现长时间运行的SQL并且杀死他,而不是依靠polar_slave_work_on_nonblock_mdl_mode 来解决问题。

同时在进行DDL操作可以通过POLARDB 中的如下语句来查看对应的操作的进度。

代码语言:javascript代码运行次数:0运行复制
SELECT THREAD_ID,EVENT_ID, EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED,(WORK_COMPLETED/WORK_ESTIMATED)*100 as PROGRESS FROM performance_schema.events_stages_current;

+-----------+----------+------------------------------------------------------+----------------+----------------+----------+
| THREAD_ID | EVENT_ID | EVENT_NAME                                           | WORK_COMPLETED | WORK_ESTIMATED | PROGRESS |
+-----------+----------+------------------------------------------------------+----------------+----------------+----------+
|   3057989 |       13 | stage/innodb/alter table (read PK and internal sort) |          56634 |         330135 |  17.1548 |
+-----------+----------+------------------------------------------------------+----------------+----------------+----------+
1 row in set (0.00 sec)

然后通过如下的语句,将3057989放入可以查询到对应的语句

代码语言:javascript代码运行次数:0运行复制
SELECT esc.THREAD_ID, esc.EVENT_NAME, esc.WORK_COMPLETED, esc.WORK_ESTIMATED, pl.INFO FROM performance_schema.events_stages_current esc LEFT JOIN performance_schema.threads th ON esc.thread_id = th.thread_id LEFT JOIN information_schema.PROCESSLIST pl ON th.PROCESSLIST_ID = pl.ID;

+-----------+------------------------------------------------------+----------------+----------------+-----------------------------------------------------------------------------------------+
| THREAD_ID | EVENT_NAME                                           | WORK_COMPLETED | WORK_ESTIMATED | INFO                                                                                    |
+-----------+------------------------------------------------------+----------------+----------------+-----------------------------------------------------------------------------------------+
|   3057989 | stage/innodb/alter table (read PK and internal sort) |          77034 |         330519 | ALTER TABLE test.test ALGORITHM=INPLACE, ADD testA VARCHAR(20) NOT NULL DEFAULT 'testA' |
+-----------+------------------------------------------------------+----------------+----------------+-----------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

还可以通过获得 medtadata_locks 信息来查看具体performance_schema.threads 中持有的MDL线程

代码语言:javascript代码运行次数:0运行复制
SELECT * FROM performance_schema.metadata_locks;

+-------------+--------------------+------------------+-------------+-----------------------+---------------------+---------------+-------------+------------------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA      | OBJECT_NAME      | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE           | LOCK_DURATION | LOCK_STATUS | SOURCE                 | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+--------------------+------------------+-------------+-----------------------+---------------------+---------------+-------------+------------------------+-----------------+----------------+
| GLOBAL      | NULL               | NULL             | NULL        |       139949462878336 | INTENTION_EXCLUSIVE | STATEMENT     | GRANTED     | sql_base:3103       |         3055785 |              1 |
| TABLE       | test               | test             | NULL        |       139931318980224 | SHARED_WRITE        | TRANSACTION   | GRANTED     | sql_parse:6479      |         3055785 |              1 |
| COMMIT      | NULL               | NULL             | NULL        |       139931318980480 | INTENTION_EXCLUSIVE | EXPLICIT      | GRANTED     | handler:1669        |         3055785 |              1 |
| TABLE       | performance_schema | metadata_locks   | NULL        |       139934227366144 | SHARED_READ         | TRANSACTION   | GRANTED     | sql_parse:6479      |         3057612 |              1 |
| GLOBAL      | NULL               | NULL             | NULL        |       139934216849664 | INTENTION_EXCLUSIVE | STATEMENT     | GRANTED     | sql_base:5519       |         3057989 |             13 |
| SCHEMA      | test               | NULL             | NULL        |       139934216849408 | INTENTION_EXCLUSIVE | TRANSACTION   | GRANTED     | sql_base:5506       |         3057989 |             13 |
| TABLE       | test               | test             | NULL        |       139934216848640 | SHARED_UPGRADABLE   | TRANSACTION   | GRANTED     | sql_parse:6479      |         3057989 |             13 |
| BACKUP LOCK | NULL               | NULL             | NULL        |       139934216849280 | INTENTION_EXCLUSIVE | TRANSACTION   | GRANTED     | sql_base:5526       |         3057989 |             13 |
| TABLESPACE  | NULL               | test/test        | NULL        |       139934216848384 | INTENTION_EXCLUSIVE | TRANSACTION   | GRANTED     | lock:815            |         3057989 |             13 |
| TABLE       | test               | #sql-17d9_2ea89a | NULL        |       139934216848896 | EXCLUSIVE           | STATEMENT     | GRANTED     | sql_table:15054     |         3057989 |             13 |
| GLOBAL      | NULL               | NULL             | NULL        |       139934216850176 | INTENTION_EXCLUSIVE | TRANSACTION   | GRANTED     | dictionary_impl:416 |         3057989 |             13 |
| TABLESPACE  | NULL               | test/test        | NULL        |       139934216849920 | EXCLUSIVE           | TRANSACTION   | GRANTED     | dictionary_impl:397 |         3057989 |             13 |
+-------------+--------------------+------------------+-------------+-----------------------+---------------------+---------------+-------------+------------------------+-----------------+----------------+
12 rows inset (0.00 sec)

SELECT * FROM performance_schema.threads WHERE THREAD_ID = "3057612"in performance_schema.metadata_locks table ";

如果系统中存在Wait for syncing with replicas,PolarDB采用集群架构,因此在主节点上执行DDL操作时,需要等待所有只读节点释放对应的MDL锁。若您在使用show processlist时,观察到DDL操作处于Wait for syncing with replicas的状态,则说明在只读节点上持有了对应表的MDL锁。

在主库进行查找发现 Wait for syncing with replicas

代码语言:javascript代码运行次数:0运行复制
/*force_node='pi-bp10k7631d6k3****'*/ SHOW PROCESSLIST;
+-----------+-----------------+-----------------------+--------------------+----------------+---------+--------------------------------+-------------------------------------------------------------+
| Id        | User            | Host                  | db                 | Command        | Time    | State                          | Info                                                        |
+-----------+-----------------+-----------------------+--------------------+----------------+---------+--------------------------------+-------------------------------------------------------------+
|        98 | event_scheduler | localhost             | NULL               | Daemon         | 1307512 | Waiting on empty queue         | NULL                                                        |
|       109 | replicator      | 11.111.XX.XX:62549    | NULL               | Polar Log Dump |       1 | Reading log from innodb        | NULL                                                        |
|       113 | replicator      | 11.111.XX.XX:62560    | NULL               | Polar Log Ack  |       1 | Receiving from client          | NULL                                                        |
|       133 | replicator      | 10.13.64.70:42712     | NULL               | Polar Log Dump |       1 | Reading log from innodb        | NULL                                                        |
|       138 | replicator      | 10.13.64.70:42723     | NULL               | Polar Log Ack  |       0 | Receiving from client          | NULL                                           
|   3064011 | root            | 127.0.0.1:59703       | NULL               | Sleep          |     742 |                                | NULL                                                        |
|   3064013 | root            | 127.0.0.1:59710       | NULL               | Sleep          |      21 |                                | NULL                                                        |
|   3064015 | root            | 127.0.0.1:59713       | NULL               | Sleep          |      21 |                                | NULL                                                        |
|   3064018 | root            | 127.0.0.1:59716       | NULL               | Sleep          |       1 |                                | NULL                                                        |
|   3067041 | zyg_root        | 172.17.28.253:48594   | test               | Query          |       6 | Wait for syncing with replicas | alter table t1 add column d varchar(10),algorithm = inplace |

+-----------+-----------------+-----------------------+--------------------+----------------+---------+--------------------------------+-------------------------------------------------------------+
21 rows inset (0.00 sec)

然后转到从节点,通过metadata_locks 语句找到对应的进程在通过 performance_schema.threads表来进一步查找对应的线程信息。最后决定是如何将这些阻塞DDL的操作仅先处理。

代码语言:javascript代码运行次数:0运行复制
/*force_node='pi-bp186ko4o21wl****'*/ SELECT * FROM performance_schema.metadata_locks;
+-------------+--------------------+----------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA      | OBJECT_NAME    | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE           | LOCK_DURATION | LOCK_STATUS | SOURCE             | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+--------------------+----------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
| TABLE       | test               | t1             | NULL        |       139394298895872 | SHARED_READ         | TRANSACTION   | GRANTED     | sql_parse:7688  |         3513381 |              1 |
| TABLE       | test               | t1             | NULL        |       139394298602240 | SHARED_READ         | TRANSACTION   | GRANTED     | sql_parse:7688  |         3519277 |              1 |
| TABLE       | test               | t1             | NULL        |       139917548369664 | SHARED_READ         | TRANSACTION   | GRANTED     | sql_parse:7688  |         3519279 |              1 |
| TABLE       | test               | t1             | NULL        |       139394296661888 | SHARED_READ         | TRANSACTION   | GRANTED     | sql_parse:7688  |         3519278 |              1 |
| TABLE       | test               | t1             | NULL        |       139394297595520 | SHARED_READ         | TRANSACTION   | GRANTED     | sql_parse:7688  |         3519276 |              1 |
| SCHEMA      | test               | NULL           | NULL        |       139464322084864 | INTENTION_EXCLUSIVE | EXPLICIT      | GRANTED     | sql_table:17404 |              57 |              1 |
| TABLE       | test               | t1             | NULL        |       139464322084992 | EXCLUSIVE           | EXPLICIT      | PENDING     | sql_table:17410 |              57 |              1 |
| TABLE       | performance_schema | metadata_locks | NULL        |       139394296038784 | SHARED_READ         | TRANSACTION   | GRANTED     | sql_parse:7688  |         3518506 |              1 |
+-------------+--------------------+----------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
8 rows inset (0.00 sec)

/*force_node='pi-bp186ko4o21wl****'*/SELECT * FROM performance_schema.threads WHERE THREAD_ID IN (3519278,3513381,3519279,3519276,3519277)\G
*************************** 1. row ***************************
          THREAD_ID: 3513381
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 538961413
   PROCESSLIST_USER: zyg_root
   PROCESSLIST_HOST: 172.17.28.253
     PROCESSLIST_DB: test
PROCESSLIST_COMMAND: Connect
   PROCESSLIST_TIME: 103
  PROCESSLIST_STATE: User sleep
   PROCESSLIST_INFO: select *,sleep(60) from t1
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: TCP/IP
       THREAD_OS_ID: 63826
     RESOURCE_GROUP: NULL

这里我简短的画一个处理问题的流程图

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。原始发表:2025-04-08,如有侵权请联系 cloudcommunity@tencent 删除索引mysqlnull解决方案事务
发布评论

评论列表(0)

  1. 暂无评论