手机浏览 RSS 2.0 订阅 膘叔的简单人生 , 腾讯云RDS购买 | 超便宜的Vultr , 注册 | 登陆
浏览模式: 标准 | 列表分类:DataBase

小议mysql变量及其优化

本文属摘抄文章,以作备份

原文地址为:http://www.tbqu.com/post/251.html

mysql里的变量分系统变量和用户变量.
系统变量
其中系统变量,又分全局(global)和会话(session)两种.
全局系统变量,主要影响整个mysql实例的全局设置.
大部分变量都是作为mysql的服务器调节参数存在.新增或修改这类变量, 会影响mysql的运行方式.
比如: auto_increment_increment 代表序列的自增值, 默认为1

另外, mysql中,只有一部分变量是支持运行时动态修改的.
变量修改的作用范围,是那些重新创建连接到mysql服务器的客户端连接.

如果你的连接是由应用通过连接池来维护的,并且,是长连接的,那么,此时在服务器运行过程中,动态地修改全局变量对你是没有什么影响的.

 

通过全局变量, 我们可以想到一些应用场景:
1, 在存储过程, 函数, sql里,使用一些业务相关的全局变量.
经常编写业务相关的存储过程,函数的朋友, 可能会需要针对不同业务,设置能共享访问的全局变量.

比如, 公司的总人数,很多存储过程都需要调用这个值,但是因为这个值,不是经常会改变的,不需要每次都count.
所以大多数人会考虑把它cache一把,那么mysql提供的全局变量,就是一个好的存储场所.

2, 数据库配置中心

首先,这不一定是一种好的解决方式, 特别是大型地分布式系统.但是想到了,就跟大家分享一下.
主要就是利用了三层,四层架构的系统, 必不可少的数据库层来实现的.

大家知道, 应用的配置, 大多数是用配置文件或配置服务器来实现, 前者需要跟着项目打包, 部署.
后者, 只要在应用初始化或运行时,去配置中心取配置就行了.(淘宝这边两者都在用)

而数据库的配置中心, 其实类似配置服务器, 只不过灵活运用了mysql的变量机制.
它能继承配置服务器的多数优点, 但最大的特色就是在 sql, function , procedure 里都可以很方便地引用到.(相比用表的方式, 要方便一些)
并且本身就具有缓存, 移植得话, mysql得导出也是很快.
所以小系统, 想快速得搭建一个配置中心,利用mysql感觉还是不错的.

调用语法:
select * from user_info where id = @@global.admin_id — 例子,通过全局里配置的管理员id来取用户信息

以下是全局变量的一些常用命令,比较简单:
set global auto_increment_increment=1; — 设置序列的增长值
show global variables; — 显示所有的global变量
show global variables like ‘%test%’ — 查询包含test字符串的global变量

会话系统变量, 主要用于在当前客户端连接的生命周期内.它的变量值是全局变量的一份拷贝.
如果连接断开, 对当前会话变量所做修改都会被重置.

比如, 服务器会话变量 autocommit 默认为 true , 而你在非常连的客户端连接里设置了 false , 那么在执行完sql后, 连接就会断开. 此后,如果创建新的连接来执行sql, autocommit又会默认 true.

会话变量的使用场景与全局类似,只是生命周期不同, 因此可以用来
统计同一连接内请求sql次数, sql类型等信息.

session变量的一些常用操作:
set session auto_increment_increment=1; — 设置序列的增长值
show variables; or show session variables; — 如果不指明, 默认使用session变量
show variables like ‘%test%’ or show session variables lile ‘%test%’ — 查询包含test字符串的session变量

用户变量
基于会话变量实现的, 可以暂存值, 并传递给同一连接里的下一条sql使用的变量.
当客户端连接退出时,变量会被释放.

应用场景:
同一连接,未关闭情况下,帮你暂存一些计算结果.

比如
select @admin_id:=max(id) from user_info;
select * from user_info where id = @admin_id
以上两条sql在同一connection中完成.
另外,注意,用户变量前只有1个@, 2个@是用于存取系统变量的
业务相关的变量定义太多,无法区分系统全局变量?

这类问题,可以通过mysql提供的结构化系统变量来解决.

他们有独特的表示形式:
instance_name.test_var

比一般的变量多了一个instance_name.这样,你就可以很方便地对他们进行分类,比如 app.test_var,就可以区别于系统的var了.

PS: 其实,普通的系统变量,它也属于结构化变量,只不过他们默认隶属于 default 这个 instance, 不需要显示出来罢了.
影响mysql性能的系统变量
以下内容参考 http://www.mysqlperformanceblog.com/2006/06/08/mysql-server-variables-sql-layer-or-storage-engine-specific/

bulk_insert_buffer_size

批量插入缓存大小, 这个参数是针对MyISAM存储引擎来说的.适用于在一次性插入100-1000+条记录时, 提高效率.默认值是8M.可以针对数据量的大小,翻倍增加.

concurrent_insert
并发插入, 当表没有空洞(删除过记录), 在某进程获取读锁的情况下,其他进程可以在表尾部进行插入.
值可以设0不允许并发插入,1当表没有空洞时,执行并发插入,2.不管是否有空洞都执行并发插入.
默认是1.针对表的删除频率来设置.

delay_key_write

针对MyISAM存储引擎,延迟更新索引.意思是说,update记录时,先将数据up到磁盘,但不up索引,将索引存在内存里,当表关闭时,将内存索引,写到磁盘. 值为 0不开启, 1开启. 默认开启.

delayed_insert_limit, delayed_insert_timeout, delayed_queue_size
延迟插入, 将数据先交给内存队列, 然后慢慢地插入.但是这些配置,不是所有的存储引擎都支持, 目前来看, 常用的InnoDB不支持, MyISAM支持. 根据实际情况调大, 一般默认够用了.

expire_logs_days

自动删除超过指定天数的日志. 建议为0,表示“不自动删除”.

flush, flush_time

是否启用, 同步表数据到磁盘.以及自动同步的间隔时间.
针对flush_time, 官方建议只在Windows 9x或Me,或有最小资源的系统中使用该选项.所以,建议关闭.

ft_boolean_syntax, ft_max_word_len, ft_min_word_len,ft_query_expansion_limit, ft_stopword_file

针对MyISAM设置的参数, 全文搜索特性. 如果你不使用 FULLTEXT 索引,就不需要优化了. 详见mysql参考手册.

join_buffer_size

用于表间关联(join)的缓存大小.建议设为 131072.(128K)

key_buffer_size

索引块缓存区大小, 针对MyISAM存储引擎,该值越大,性能越好.但是超过操作系统能承受的最大值,反而会使mysql变得不稳定.
如果不是MyISAM存储引擎,一般设置为 4-32M大小.

key_cache_age_threshold, key_cache_block_size, key_cache_division_limit

键值缓存的相关设置.需要针对实际情况调优.只是针对MyISAM储存引擎.

large_pages

是否启动大页面支持.意思是,可以一些缓存分配更大的空间.这个特性已经被InnoDB, MyISAM等常用存储引擎支持.

low_priority_updates

低优先级更新.意思是, 所有的写操作(表写锁), 包括update,delete,insert等都需要等待读操作完成后才执行 (表读锁解开).
因为是针对表的锁,所以,这里仅支持MyISAM.

max_write_lock_count

最大写锁数量.这个变量的含义是, 当写锁达到一定数量后, 就不限制读锁, 允许一部分读锁进入.(可以读数据了,否则需要等待写锁释放后,才能读)

因为是针对表的锁,所以,这里仅支持MyISAM.

preload_buffer_size

重载索引时分配的缓冲区大小, 该变量仅支持MyISAM.

read_buffer_size, read_rnd_buffer_size

每个线程连续扫描时为扫描的每个表分配的缓冲区的大小(字节)。如果进行多次连续扫描,可能需要增加该值, 默认值为131072。

sort_buffer_size

每个排序线程分配的缓冲区的大小。增加该值可以加快ORDER BY或GROUP BY操作.默认数值是2097144(2M),建议加大到 16777208 (16M)。

timed_mutexes

显示mutexes的统计信息, 默认关闭OFF

tmp_table_size

临时表的大小,在未超过大小之前进行的操作是在内存中的, 当超过后,mysql会自动转换到硬盘上.

Tags: mysql, 变量优化

MySQL优化指导建议[不知道能不能发,所以加密]

这篇日志被加密了,请输入密码后查看。

用Amoeba构架MySQL分布式数据库环境

国内对于MYSQL PROXY的研究也越来越多了,最近在淘宝DBA上看到这个,贴上来与君共享:

Amoeba是一个类似MySQL Proxy的 分布式数据库中间代理层软件,是由陈思儒开发的一个开源的java项目。其主要功能包括读写分离,垂直分库,水平分库等,经过测试,发现其功能和稳定性都 非常的不错,如果需要构架分布式数据库环境,采用Amoeba是一个不错的方案。目前Amoeba一共包括For aladdin,For MySQL和For Oracle三个版本,本文主要关注For MySQL版本的一个读写分离实现。实际上垂直切分和水平切分的架构也相差不大,改动几个配置就可以轻松实现。

下图是一个采用Amoeba的读写分离技术结合MySQL的Master-Slave Replication的一个分布式系统的架构:

Amoeba处于在应用和数据库之间,扮演一个中介的角色,将应用传递过来的SQL语句经过分析后,将写的语句交给Master库执行,将读的语句 路由到Slave库执行(当然也可以到Master读,这个完全看配置)。Amoeba实现了简单的负载均衡(采用轮询算法)和Failover。如果配 置了多个读的库,则任何一个读的库出现宕机,不会导致整个系统故障,Amoeba能自动将读请求路由到其他可用的库上,当然,写还是单点的依赖于 Master数据库的,这个需要通过数据库的切换,或者水平分割等技术来提升Master库的可用性。

Amoeba可以在不同机器上启动多个,并且做同样的配置来进行水平扩展,以分担压力和提升可用性,可以将Amoeba和MySQL装在同一台机 器,也可以装在不同的机器上,Amoeba本身不做数据缓存,所以对于内存消耗很少,主要是CPU占用。对于应用来说,图中的三个Amoeba就是三台一 模一样的MySQL数据库,连接其中任何一台都是可以的,所以需要在应用端有一个Load balance和Failover的机制,需要连接数据库时从三台中随机挑选一台即可,如果其他任何一台出现故障,则可以自动Failover到剩余的可 用机器上。MySQL的JDBC驱动从connector-j 3.17版本起已经提供了这样的负载均衡和故障切换的功能,那么剩下的事情对于应用来说就很简单了,不需要做太多的改动就能搭建一套高可用的MySQL分 布式数据库环境,何乐而不为?

Tags: amoeba, mysql, 分布式

wwwsqldesigner:a Visual web-based SQL modelling tool

看到这个的时候很吃惊,最初看到类似功能的时候是在PHPMYADMIN 3上面,想不到现在已经发展到这个地步了,实在让人大吃一“斤”呀。

WWW SQL Designer allows users to create database designs, which can be saved/loaded and exported to SQL scripts. Various databases and languages are supported. Ability to import existing database design.

About

Hi and welcome to WWW SQL Designer! This tool allows you to draw and create database schemas (E-R diagrams) directly in browser, without the need for any external programs (flash). You only need JavaScript enabled. The Designer works perfectly in Mozillas (Firefox, Seamonkey), Internet Explorers (6, 7, 8), Safari and Operas. Konqueror works, but the experience is limited.

Many database features are supported, such as keys, foreign key constraints, comments and indexes. You can either save your design (for further loading & modifications), print it or export as SQL script. It is possible to retrieve (import) schema from existing database.

WWW SQL Designer was created by Ondrej Zara and is built atop the oz.js JavaScript module. It is distributed under GNU GPL license.

网址:http://code.google.com/p/wwwsqldesigner/

我是从:http://parandroid.com/www-sql-designer-open-source-sql-based-web-creation-tools/看来的

它支持很多的数据库功能:

  • 强制外关键字
  • 注解
  • 索引

当数据库图表创建后,你可以:

  • 保存你的数据表设计或稍后修改它们
  • 打印它
  • 作为 SQL脚本导出

它甚至还可以从已存在的数据库中检索出概况图标,支持MySQL, MSSQL, SQLite 和 Web2py。感兴趣的朋友看查看DEMO.

 

 

图片附件:
大小: 18.81 K
尺寸: 430 x 188
浏览: 2420 次
点击打开新窗口浏览全图

Tags: designer, sql, web

ORDER BY RAND()

本文作者好象是写mysql proxy的,这是他写的关于order by rand的效率方面的文章。事实上我们在使用中是尽量避免采用order by rand这种的,但应该如何写,他给了我们一些其他的解决方法,并将其效率进行了分析,我觉得很有用,就留下来了。

原文网址:http://jan.kneschke.de/2007/2/15/order-by-rand

内容如下:
If you read the MySQL manual you might have seen the ORDER BY RAND() to randomize the the rows and using the LIMIT 1 to just take one of the rows.

SELECT name
FROM random
ORDER BY RAND()
LIMIT 1;

This example works fine and is fast if you only when let's say 1000 rows. As soon as you have 10000 rows the overhead for sorting the rows becomes important. Don't forget: we only sort to throw nearly all the rows away.

I never liked it. And there are better ways to do it. Without a sorting. As long as we have a numeric primary key.

For the first examples we assume the be ID is starting at 1 and we have no holes between 1 and the maximum value of the ID.

move the work into the application

First idea: We can simplify the whole job if we calculate the ID beforehand in the application.

SELECT MAX(id) FROM random;
## generate random id in application
SELECT name FROM random WHERE id = <random-id>

As MAX(id) == COUNT(id) we just generate random number between 1 and MAX(id) and pass it into the database to retrieve the random row.

The first SELECT is a NO-OP and is optimized away. The second is a eq_ref against a constant value and also very fast.

move the job into the database

But is it really necessary to do it in the application ? Can't we do it in the database ?

# generating a random ID
> SELECT RAND() * MAX(id) FROM random;
+------------------+
| RAND() * MAX(id) |
+------------------+
| 689.37582507297 |
+------------------+
# oops, this is a double, we need an int

> SELECT CEIL(RAND() * MAX(id)) FROM random;
+-------------------------+
| CEIL(RAND() * MAX(id)) |
+-------------------------+
| 1000000 |
+-------------------------+
# better. But how is the performance:

> EXPLAIN
SELECT CEIL(RAND() * MAX(id)) FROM random;
+----+-------------+-------+-------+------+-------------+
| id | select_type | table | type | rows | Extra |
+----+-------------+-------+-------+------+-------------+
| 1 | SIMPLE | random | index | 1000000 | Using index |
+----+-------------+-------+-------+------+-------------+
## a index scan ? we lost our optimization for the MAX()

> EXPLAIN
SELECT CEIL(RAND() * (SELECT MAX(id) FROM random));
+----+-------------+-------+------+------+------------------------------+
| id | select_type | table | type | rows | Extra |
+----+-------------+-------+------+------+------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | No tables used |
| 2 | SUBQUERY | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+------+------------------------------+
## a simple Sub-Query is bringing us the performance back.

Ok, now we know how to generate the random ID, but how to get the row ?

> EXPLAIN
SELECT name
FROM random
WHERE id = (SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random));
+----+-------------+--------+------+---------------+------+---------+------+---------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+---------+------------------------------+
| 1 | PRIMARY | random | ALL | NULL | NULL | NULL | NULL | 1000000 | Using where |
| 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+--------+------+---------------+------+---------+------+---------+------------------------------+
> show warnings;
+-------+------+------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------+
| Note | 1249 | Select 2 was reduced during optimization |
+-------+------+------------------------------------------+

NO, NO, NO. Don't go this way. This is the most obvious, but also the most wrong way to do it. The reason: the SELECT in the WHERE clause is executed for every row the outer SELECT is fetching. This leads to 0 to 4091 rows, depending on your luck.

We need a way to make sure that the random-id is only generated once:

SELECT name
FROM random JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random)) AS id
) AS r2
USING (id);
+----+-------------+------------+--------+------+------------------------------+
| id | select_type | table | type | rows | Extra |
+----+-------------+------------+--------+------+------------------------------+
| 1 | PRIMARY | <derived2> | system | 1 | |
| 1 | PRIMARY | random | const | 1 | |
| 2 | DERIVED | NULL | NULL | NULL | No tables used |
| 3 | SUBQUERY | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+------------+--------+------+------------------------------+

The inner SELECT is generating a constant TEMPORARY table and the JOIN is selecting just on single row. Perfect.

No Sorting, No Application, Most parts of the query optimized away.

adding holes to the numbers

To generalize the last solution we add the possibility of holes, like when you DELETE rows.

SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
+----+-------------+------------+--------+------+------------------------------+
| id | select_type | table | type | rows | Extra |
+----+-------------+------------+--------+------+------------------------------+
| 1 | PRIMARY | <derived2> | system | 1 | |
| 1 | PRIMARY | r1 | range | 689 | Using where |
| 2 | DERIVED | NULL | NULL | NULL | No tables used |
| 3 | SUBQUERY | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+------------+--------+------+------------------------------+

The JOIN now adds all the IDs which are greater or equal than our random value and selects only the direct neighboor if a direct match is not possible. BUT as soon as one row is found, we stop (the LIMIT 1). And we read the rows according to the index (ORDER BY id ASC). As we are using >= instead of a = we can get rid of a the CEIL and get the same result with a little less work.

Equal Distribution

As soon as the distribution of the IDs is not equal anymore our selection of rows isn't really random either.

> select * from holes;
+----+----------------------------------+----------+
| id | name | accesses |
+----+----------------------------------+----------+
| 1 | d12b2551c6cb7d7a64e40221569a8571 | 107 |
| 2 | f82ad6f29c9a680d7873d1bef822e3e9 | 50 |
| 4 | 9da1ed7dbbdcc6ec90d6cb139521f14a | 132 |
| 8 | 677a196206d93cdf18c3744905b94f73 | 230 |
| 16 | b7556d8ed40587a33dc5c449ae0345aa | 481 |
+----+----------------------------------+----------+

The RAND function is generating IDs like 9 to 15 which all lead to the id 16 to be selected as the next higher number.

There is no real solution for this problem, but your data is mostly constant you can add a mapping table which maps the row-number to the id:

> create table holes_map ( row_id int not NULL primary key, random_id int not null);
> SET @id = 0;
> INSERT INTO holes_map SELECT @id := @id + 1, id FROM holes;
> select * from holes_map;
+--------+-----------+
| row_id | random_id |
+--------+-----------+
| 1 | 1 |
| 2 | 2 |
| 3 | 4 |
| 4 | 8 |
| 5 | 16 |
+--------+-----------+

The row_id is now free of holes again and we can run our random query again:

SELECT name FROM holes
JOIN (SELECT r1.random_id
FROM holes_map AS r1
JOIN (SELECT (RAND() *
(SELECT MAX(row_id)
FROM holes_map)) AS row_id)
AS r2
WHERE r1.row_id >= r2.row_id
ORDER BY r1.row_id ASC
LIMIT 1) as rows ON (id = random_id);

After 1000 fetches we see a equal distribution again:

> select * from holes;
+----+----------------------------------+----------+
| id | name | accesses |
+----+----------------------------------+----------+
| 1 | d12b2551c6cb7d7a64e40221569a8571 | 222 |
| 2 | f82ad6f29c9a680d7873d1bef822e3e9 | 187 |
| 4 | 9da1ed7dbbdcc6ec90d6cb139521f14a | 195 |
| 8 | 677a196206d93cdf18c3744905b94f73 | 207 |
| 16 | b7556d8ed40587a33dc5c449ae0345aa | 189 |
+----+----------------------------------+----------+

Multiple Rows at once

If you want to get more than one row returned, you can:

  • execute the Query several times
  • write a stored procedure which is executing the query and stores the result in a temp-table
  • (make a UNION)[http://jan.kneschke.de/2007/2/22/analyzing-complex-queries]

Performance

Now let's see what happends to our performance. We have 3 different queries for solving our problems.

  • Q1. ORDER BY RAND()
  • Q2. RAND() * MAX(ID)
  • Q3. RAND() * MAX(ID) + ORDER BY ID

Q1 is expected to cost N * log2(N), Q2 and Q3 are nearly constant.

The get real values we filled the table with N rows ( one thousand to one million) and executed each query 1000 times.

100        1.000      10.000     100.000    1.000.000
Q1 0:00.718s 0:02.092s 0:18.684s 2:59.081s 58:20.000s
Q2 0:00.519s 0:00.607s 0:00.614s 0:00.628s 0:00.637s
Q3 0:00.570s 0:00.607s 0:00.614s 0:00.628s 0:00.637s

As you can see the plain ORDER BY RAND() is already behind the optimized query at only 100 rows in the table.

A more detailed analysis of those queries is at analyzing-complex-queries.

 

Tags: mysql, order, rand, explain, analyze