Simulation Name: sqlldrtest
To use the Oracle bulk loader, you need a control file, which specifies how data should be loaded into the database; and a data file, which specifies what data should be loaded. You will learn how to create these files in turn. A simple control file has the following form:
APPEND INTO TABLE <tableName>
FIELDS TERMINATED BY '<separator>'
(<list of all attribute names to load>)
- < dataFile> is the name of the data file. If you did not give a file name extension for <dataFile>, Oracle will assume the default extension ".dat".
Therefore, it is a good idea to name every data file with an extension, and specify the complete file name with the extension.
- <tableName> is the name of the table to which data will be loaded. Of course, it should have been created already before the bulk load operation.
- The optional keyword APPEND says that data will be appended to <tableName>. If APPEND is omitted, the table must be empty before the bulk load operation or else an error will occur.
- <separator> specifies the field separator for your data file. This can be any string. It is a good idea to use a string that you know will never appear in the data, so the separator will not be confused with data fields.
- Finally, list the names of attributes of <tableName> that are set by your data file, separated by commas and enclosed in parentheses. This list need not be the complete list of attributes in the actual schema of the table,
nor must it be arranged in the same order as the attributes when the table was created -- sqlldr will match attributes to by their names in the table schema.
Any attributes unspecified in the list of attributes will be set to NULL.
Bulk Inserts: Using the SQL*Loader Direct Path Option
When used in the Conventional Path mode, SQL*Loader reads records from a file, generates INSERT commands, and passes them to the Oracle kernel. Oracle then finds places for those rows in free blocks in the table and updates any associated indexes.
In Direct Path mode, SQL*Loader creates formatted data blocks and writes directly to the
datafiles. This requires occasional checks with the database to get new locations for data blocks, but no other I/O with the database kernel is required. The result is a data load process that is dramatically faster than Conventional Path mode.
If the table is indexed, the indexes will be placed in DIRECT PATH state during the load. After the load is complete, the new keys (index column values) will be sorted and merged with the
existing keys in the index. To maintain the temporary set of keys, the load will create a temporary index segment that is at least as large as the largest index on the table. The space requirements for this can be minimized by presorting the data and using the SORTED INDEXES clause in the SQL*Loader control file.
To minimize the amount of dynamic space allocation necessary during the load, the data segment that you are loading into should already be created, with all the space it will need already allocated. You should also presort the data on the columns of the largest index in the table. Sorting the data and leaving the indexes on the table during a Direct Path load will usually
yield better performance than if you were to drop the indexes before the load and then re-create them after it completed.
However, keep in mind that direct path load operations always use new extents. Therefore, if you use parallel DELETEs and then follow it with parallel direct path loads, you will potentially have an ever-increasing amount of free space in every block, and the disk space allocated to the table will increase much faster than you expect.
To take advantage of the Direct Path option, the table cannot be clustered, and there can be no other active transactions against it. During the load, only NOT NULL, UNIQUE, and PRIMARY KEY constraints will be enforced; after the load has completed, the CHECK and FOREIGN KEY constraints can be automatically reenabled. To force this to occur, use the REENABLE DISABLED_CONSTRAINTS
clause in the SQL*Loader control file.
Table insert triggers
The only exception to this reenabling process is that table insert triggers, when reenabled, are not executed for each of the new rows in the table. A separate process must manually perform whatever commands were to have been performed by this type of trigger.
The SQL*Loader Direct Path loading option provides significant performance improvements over the SQL*Loader Conventional Path loader in loading data into Oracle tables by bypassing SQL processing, buffer cache management, and unnecessary reads for the data blocks. The Parallel Data Loading option of SQL*Loader allows multiple processes to load data into the same table, utilizing spare resources on the system and thereby reducing the overall elapsed times for loading. Given enough CPU and I/O resources, this can significantly reduce the overall loading times.
To use Parallel Data Loading, start multiple SQL*Loader sessions using the PARALLEL keyword (otherwise, SQL*Loader puts an exclusive lock on the table). Each session is an independent session requiring its own control file. The following listing shows an example of a Direct Path load using the DIRECT=TRUE parameter on the command line:
sqlldr userid=rjb/rjb control=part1.ctl direct=true parallel=true
sqlldr userid=rjb/rjb control=part2.ctl direct=true parallel=true
sqlldr userid=rjb/rjb control=part3.ctl direct=true parallel=true
Each session creates its own log, bad, and discard files (part1.log, part2.log, part3.log, part1.bad, part2.bad, and so on) by default. Because you have multiple sessions loading data into the same table, only the APPEND option is allowed for Parallel Data Loading. The SQL*Loader REPLACE, TRUNCATE, and INSERT options are not allowed for Parallel Data Loading. If you need to delete the table's data before starting the load, you must manually delete the data (via DELETE or TRUNCATE commands). You cannot use SQL*Loader to delete the rows automatically if you are using Parallel Data Loading.