使用python将逐条insert转为批量insert
需求背景
研发有时候交付过来一批sql,(可能是navicat之类的客户端导出的)
todo.sql 文件内容类似如下:
代码语言:sql复制INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
INSERT INTO `sbtest`.`sbtest123`( `name`, `age`, `sex`) VALUES ('zhangsan',22,'Male');
这是一批逐条insert的语句,插入数据库会很慢。
因此考虑将其转为批量插入方式,提升其导入效率。
代码实现
代码语言:python代码运行次数:0运行复制# 将多行的逐条insert文件,合并为N个批量insert sql。注意每条insert必须是在每一行,不能换行。
import os
import subprocess
# 待处理的文件名(这里需要根据实际情况修改)
origial_file="todo.sql"
# 切分的行数(这里为了演示调的比较小,生产建议设为1000到5000
batch_size=10
# 数据库的表名和列名(这里需要根据实际情况修改)
column_list=" `name`, `age`, `sex`"
table_name="`sbtest`.`sbtest`"
# 清理下环境,防止有老的文件存在
try:
os.remove('batch.sql')
except:
pass
# 将某个文件按照每1k行切分为一个个小文件,文件名以output_开头
command = f"split -l {batch_size} {origial_file} output_"
result = subprocess.run(command, shell=True, check=True)
# 检查命令的输出
if result.returncode == 0:
print("Command executed successfully!")
else:
print("Command failed with return code:", result.returncode)
sys.exit(10)
# 获取当前目录
current_directory = os.getcwd()
# 遍历当前目录下的所有文件和目录
for filename in os.listdir(current_directory):
# 检查文件是否以output_开头
if filename.startswith("output_"):
print(filename)
with open (filename,'r') as f:
#insert_statements=f.readlines()
insert_statements = [line.strip() for line in f.readlines()]
# print(insert_statements)
# 提取每条语句的VALUES部分
values = [stmt.rstrip(';\n').split("VALUES")[1] for stmt in insert_statements]
print(values)
# 拼接成一条合并的INSERT语句
merged_statement = f"INSERT INTO {table_name} ({column_list}) VALUES " + ", ".join(values) + ';\n'
with open ('batch.sql','a') as ff:
ff.write(merged_statement)
try:
os.remove(filename)
except:
pass
生成的sql如下: