top of page

Help Needed Worldwide


HOT MySQL InnoDB Sorted Index Builds LINK

I'm trying to import a large SQL file that was generated by mysqldump for an InnoDB table but it is taking a very long time even after adjusting some parameters in my.cnf and disabling AUTOCOMMIT (as well as FOREIGN_KEY_CHECKS and UNIQUE_CHECKS but the table does not have any foreign or unique keys). But I'm wondering if it's taking so long because of the several indexes in the table.

HOT MySQL InnoDB Sorted Index Builds

Download File:

Percona Server, a branch of MySQL, experimented with a mysqldump --optimize-keys option. When you use this option, it changes the output of mysqldump to have CREATE TABLE with no indexes, then INSERT all data, then ALTER TABLE to add the indexes after the data is loaded. See -server/LATEST/management/innodb_expanded_fast_index_creation.html

The change buffer is a special data structure used for caching changes in secondary index pages that are not in the buffer pool. The innodb_change_buffering parameter helps in reducing the substantial I/O operations used to keep secondary indexes up-to-date after data manipulation language (DML) operations. This parameter is used to control the extent of change buffering operations.

Although the second method is better suited for needs (look up will be faster for a small range scans) I have a very strong caveat for you: If you fill the PRIMARY KEY with column data already sorted, you may very well end up with some index fragmentation (See my answer to the Oct 26, 2012 post How badly does innodb fragment in the face of somewhat out-of-order insertions?) Not a bad tradeoff if you know you are doing small range scans.

In that instance, the same caveat applies. You will have a fragmented PRIMARY KEY. The proof of that was expressed in my answer to the May 01, 2014 post Why would the size of MySQL MyISAM table indexes (aka MYI file) not match after mysqldump import?

To see why, let's look at the EXPLAIN output. It shows that the query is fast because it does a point lookup on the indexed column username (as shown by the line spans /"alyssa"-...). Furthermore, the column post_timestamp is already in an index, and sorted (since it's a monotonically increasing part of the primary key).

In the previous example, the primary table is indexed oncolumn_a. Additionally, there is a secondary clustering index (namedindex_b) sorted on column_b. Unlike non-clustered indexes, clusteringindexes include all the columns of a table and can be used as coveringindexes. For example, the following query will run very fast using theclustering index_b:

Hot creating an index will be slower than creating the index offline, andprogress depends how busy the mysqld server is with other tasks. Progress of theindex creation can be seen by using the SHOW PROCESSLIST command (in anotherclient). Once the index creation completes, the new index will be used in futurequery plans.

The InnoDB storage engine maintains its own buffer pool for caching data and indexes in main memory. By default, with the innodb_file_per_table setting enabled, each new InnoDB table and its associated indexes are stored in a separate file. When the innodb_file_per_table option is disabled, InnoDB stores tables and indexes in the single system tablespace, which may consist of several files (or raw disk partitions). As of MySQL 5.7.6, InnoDB tables can also be stored in general tablespaces, which are shared tablespaces that can store data for multiple tables. InnoDB tables can handle large quantities of data, even on operating systems where file size is limited to 2GB.

Enabling the innodb_file_per_table option to put the data and indexes for individual tables into separate files, instead of in a single giant system tablespace. This setting is required to use some of the other features, such as table compression and fast truncation.

If InnoDB is not your default storage engine, you can determine if your database server or applications work correctly with InnoDB by restarting the server with --default-storage-engine=InnoDB defined on the command line or with default-storage-engine=innodb defined in the [mysqld] section of the my.cnf configuration file.

The inverted index is partitioned into six auxiliary index tables to support parallel index creation. By default, two threads tokenize, sort, and insert words and associated data into the index tables. The number of threads is configurable using the innodb_ft_sort_pll_degree option. When creating FULLTEXT indexes on large tables, consider increasing the number of threads.

The innodb_ft_cache_size variable is used to configure the full-text index cache size (on a per-table basis), which affects how often the full-text index cache is flushed. You can also define a global full-text index cache size limit for all tables in a given instance using the innodb_ft_total_cache_size option.

Deleting a record that has a full-text index column could result in numerous small deletions in the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, the Document ID (DOC_ID) of a deleted document is logged in a special FTS_*_DELETED table whenever a record is deleted from an indexed table, and the indexed record remains in the full-text index. Before returning query results, information in the FTS_*_DELETED table is used to filter out deleted Document IDs. The benefit of this design is that deletions are fast and inexpensive. The drawback is that the size of the index is not immediately reduced after deleting records. To remove full-text index entries for deleted records, you must run OPTIMIZE TABLE on the indexed table with innodb_optimize_fulltext_only=ON to rebuild the full-text index. For more information, see Optimizing InnoDB Full-Text Indexes.

The adaptive hash index (AHI) lets InnoDB perform more like an in-memory database on systems with appropriate combinations of workload and ample memory for the buffer pool, without sacrificing any transactional features or reliability. This feature is enabled by the innodb_adaptive_hash_index option, or turned off by --skip-innodb_adaptive_hash_index at server startup.

Based on the observed pattern of searches, MySQL builds a hash index using a prefix of the index key. The prefix of the key can be any length, and it may be that only some of the values in the B-tree appear in the hash index. Hash indexes are built on demand for those pages of the index that are often accessed.

As of MySQL 5.7.8, the adaptive hash index search system is partitioned. Each index is bound to a specific partition, and each partition is protected by a separate latch. Partitioning is controlled by the innodb_adaptive_hash_index_parts configuration option. Prior to MySQL 5.7.8, the adaptive hash index search system was protected by a single latch which could become a point of contention under heavy workloads. The innodb_adaptive_hash_index_parts option is set to 8 by default. The maximum setting is 512.

As of MySQL 5.7.5, InnoDB performs a bulk load instead of inserting one index record at a time when creating or rebuilding indexes. This method of index creation is also known as a sorted index build. Sorted index builds are not supported for spatial indexes.

Sorted index builds use a bottom up approach to building an index. With this approach, a reference to the right-most leaf page is held at all levels of the B-tree. The right-most leaf page at the necessary B-tree depth is allocated and entries are inserted according to their sorted order. Once a leaf page is full, a node pointer is appended to the parent page and a sibling leaf page is allocated for the next insert. This process continues until all entries are inserted, which may result in inserts up to the root level. When a sibling page is allocated, the reference to the previously pinned leaf page is released, and the newly allocated leaf page becomes the right-most leaf page and new default insert location.

To set aside space for future index growth, you can use the innodb_fill_factor configuration option to reserve a percentage of B-tree page space. For example, setting innodb_fill_factor to 80 will reserve 20 percent of the space in B-tree pages during a sorted index build. This setting applies to both B-tree leaf and non-leaf pages. It does not apply to external pages used for TEXT or BLOB entries. The amount of space that is reserved may not be exactly as configured, as the innodb_fill_factor value is interpreted as a hint rather than a hard limit.

Redo logging is turned off during a sorted index build. Instead, there is a checkpoint to ensure that the index build can withstand a crash or failure. The checkpoint forces a write of all dirty pages to disk. During a sorted index build, the page cleaner thread is signaled periodically to flush dirty pages to ensure that the checkpoint operation can be processed quickly. Normally, the page cleaner thread flushes dirty pages when the number of clean pages falls below a set threshold. For sorted index builds, dirty pages are flushed promptly to reduce checkpoint overhead and to parallelize IO and CPU activity.

Sorted index builds may result in optimizer statistics that differ from those generated by the previous method of index creation. The difference in statistics, which is not expected to affect workload performance, is due to the different algorithm that is used to populate the index.

Gap locking can be disabled explicitly. This occurs if you change the transaction isolation level to READ COMMITTED or enable the innodb_locks_unsafe_for_binlog system variable (which is now deprecated). Under these circumstances, gap locking is disabled for searches and index scans and is used only for foreign-key constraint checking and duplicate-key checking.

INSERT INTO T SELECT ... FROM S WHERE ... sets an exclusive index record lock (without a gap lock) on each row inserted into T. If the transaction isolation level is READ COMMITTED, or innodb_locks_unsafe_for_binlog is enabled and the transaction isolation level is not SERIALIZABLE, InnoDB does the search on S as a consistent read (no locks). Otherwise, InnoDB sets shared next-key locks on rows from S. InnoDB has to set locks in the latter case: In roll-forward recovery from a backup, every SQL statement must be executed in exactly the same way it was done originally. 350c69d7ab


Welcome to the group! You can connect with other members, ge...

bottom of page