| Lesson 3 |
Use SQL*Loader to load LOB data |
| Objective |
Load LOB data with SQL*Loader in Oracle 23ai |
Loading LOB Data with SQL*Loader in Oracle 23ai
Oracle Database supports a wide variety of data types for storing unstructured content — graphics, documents, audio files, video, and other binary or character data — through its Large Object (LOB) data types. SQL*Loader can load this content directly into LOB columns in bulk, without requiring a custom application or PL/SQL script for each file. This lesson covers the two primary SQL*Loader approaches for LOB loading, the control file syntax that drives them, Direct Path loading for high-throughput scenarios, and the three datafile record formats SQL*Loader supports.
Supported LOB Data Types in Oracle 23ai
SQL*Loader supports loading all Oracle LOB data types in Oracle Database 23ai:
| LOB Type |
Description |
| BLOB |
Binary Large Object — stores unstructured binary data such as images, audio, video, and binary files. |
| CLOB |
Character Large Object — stores single-byte or multi-byte character data. |
| NCLOB |
National Character Large Object — stores character data in the database national character set. |
| BFILE |
External file reference — a read-only pointer to a file stored in the operating system file system outside the database. |
LOB Loading Approaches
SQL*Loader provides two approaches for loading LOB data. The choice depends on the size of the LOB content and whether it can be embedded in the primary datafile.
LOBFILE Method (Recommended for Large LOBs)
The LOBFILE method separates the LOB content from the primary datafile. The primary datafile contains one record per row to be loaded, including metadata fields and a filename reference that identifies the corresponding LOB file. The actual LOB content lives in secondary files — LOBFILEs — which SQL*Loader reads separately and loads into the target LOB column for each row.
This approach handles LOBs of any size without embedding binary content in the primary datafile, making it the recommended method for images, documents, and other large binary or character objects. The control file uses the
LOBFILE() clause to identify which field supplies the secondary file name, and
TERMINATED BY EOF to instruct SQL*Loader to read the entire secondary file as a single LOB value.
The
FILLER keyword marks a field that is read from the primary datafile record but not loaded into any table column — it exists only to supply the filename reference to the
LOBFILE() clause.
Example: LOBFILE control file for loading CLOB and BLOB columns
LOAD DATA
INFILE 'main_data.dat'
INTO TABLE my_lob_table
FIELDS TERMINATED BY ','
(
id INTEGER EXTERNAL,
clob_filename FILLER CHAR(100),
clob_content LOBFILE(clob_filename) TERMINATED BY EOF,
blob_filename FILLER CHAR(100),
blob_content LOBFILE(blob_filename) TERMINATED BY EOF
)
In this example,
clob_filename and
blob_filename are FILLER fields — SQL*Loader reads their values from each primary record and uses them to locate the corresponding secondary files, but does not store the filenames in the table. The
clob_content and
blob_content columns receive the full content of the referenced files.
Inline Loading
LOB data can also be embedded directly in the primary datafile, bounded by large CHAR field lengths or delimiters. This approach is practical only for small LOBs — text fragments, short documents, or small character data — where embedding binary content in a delimited file is feasible. For large binary objects such as images or audio files, the LOBFILE method is strongly preferred.
Loading Images into BLOB Columns Using PL/SQL
For individual LOB insertions or when integrating LOB loading into an application, PL/SQL provides a programmatic alternative to SQL*Loader bulk loading. The
DBMS_LOB package supports reading an external file into a BLOB column through an Oracle directory object. This approach complements SQL*Loader — it is appropriate for low-volume or application-driven insertions, while SQL*Loader is the right tool for bulk loading large numbers of LOB files.
Step 1: Create a Table with a BLOB Column
CREATE TABLE images (
id NUMBER,
image BLOB
);
Step 2: Load an Image Using DBMS_LOB
DECLARE
l_blob BLOB;
l_bfile BFILE := BFILENAME('DIRECTORY_NAME', 'filename.jpg');
BEGIN
INSERT INTO images (id, image) VALUES (1, EMPTY_BLOB())
RETURNING image INTO l_blob;
DBMS_LOB.fileopen(l_bfile, DBMS_LOB.file_readonly);
DBMS_LOB.loadfromfile(l_blob, l_bfile,
DBMS_LOB.getlength(l_bfile));
DBMS_LOB.fileclose(l_bfile);
COMMIT;
END;
DIRECTORY_NAME must be an Oracle directory object pointing to the OS directory containing the image file. The database user must have READ privilege on the directory object.
filename.jpg is the name of the file to load from that directory.
EMPTY_BLOB() initializes the LOB locator in the row before content is written. A LOB locator must exist in the row before DBMS_LOB.loadfromfile can write to it.
- Once stored, BLOB content can be retrieved, updated, and backed up using SQL and PL/SQL.
For loading five or more image files — the scenario described in this lesson's objective — SQL*Loader with the LOBFILE method is more efficient than repeated PL/SQL calls.
SQL*Loader Control File for Bulk Loading
A SQL*Loader session requires a control file that specifies how data should be interpreted and loaded, and a datafile that contains the source records. A basic control file for loading delimited flat-file data has the following structure:
LOAD DATA
INFILE <dataFile>
APPEND INTO TABLE <tableName>
FIELDS TERMINATED BY '<separator>'
(<list of attribute names to load>)
<dataFile> — the name of the input datafile. If no extension is given, SQL*Loader assumes .dat. Always specify the full filename including the extension.
<tableName> — the target table. The table must exist before the load session runs.
APPEND — adds rows to the existing table. If omitted, the table must be empty before loading or an error occurs. Other loading modes are INSERT, REPLACE, and TRUNCATE.
<separator> — the field delimiter string. Choose a string that cannot appear in the data to prevent it from being mistaken for a field boundary.
- Attribute list — the column names to populate, separated by commas and enclosed in parentheses. The list does not need to be complete or in the same order as the table's column definitions — SQL*Loader matches fields to columns by name. Columns omitted from the list are set to NULL.
Direct Path Loading
Conventional Path vs. Direct Path
In Conventional Path mode, SQL*Loader generates SQL
INSERT statements and passes them to the Oracle kernel. Oracle locates free blocks in the table, inserts each row, and updates associated indexes through the standard SQL processing pipeline — the same path used by any application inserting rows.
In Direct Path mode
[1], SQL*Loader formats complete data blocks and writes them directly to the database datafiles, bypassing the SQL engine, the buffer cache, and the normal block-read overhead. The result is significantly faster load throughput for large data volumes. Direct Path mode requires occasional coordination with the database to obtain new block locations, but no other I/O with the database kernel is required during the load itself.
Index Handling During Direct Path Loads
If the target table has indexes, they are placed in DIRECT PATH state for the duration of the load. After the load completes, SQL*Loader sorts the new key values and merges them with the existing index. To support this merge, SQL*Loader creates a temporary index segment at least as large as the largest index on the table.
To minimize space usage and improve performance:
- Presort the input data on the columns of the largest index before starting the load.
- Use the
SORTED INDEXES clause in the control file to inform SQL*Loader that the data is pre-sorted.
- Ensure the target data segment has sufficient space already allocated before the load begins, to minimize dynamic extent allocation during the load.
Leaving indexes in place and presorting the data typically yields better performance than dropping indexes before the load and recreating them afterward — the merge operation is faster than a full index rebuild when the data is sorted.
Space management note: Direct Path load operations always write to new extents. If parallel
DELETE operations are followed by parallel Direct Path loads repeatedly, free space accumulates within existing blocks and disk allocation grows faster than expected.
Direct Path Constraints and Restrictions
- The target table cannot be a clustered table.
- No other active transactions may be running against the table during the load.
- During the load, only
NOT NULL, UNIQUE, and PRIMARY KEY constraints are enforced.
CHECK and FOREIGN KEY constraints are automatically re-enabled after the load completes when the REENABLE DISABLED_CONSTRAINTS clause is included in the control file.
- Table insert triggers, when re-enabled after the load, are not executed for the rows that were loaded during the Direct Path operation. Any logic those triggers would have performed must be applied manually after the load completes.
Parallel Direct Path Loading
The Parallel Data Loading option runs multiple independent SQL*Loader sessions against the same table simultaneously, using available CPU and I/O resources to reduce total elapsed load time. Each session requires its own control file and operates independently.
sqlldr userid=rjb/rjb control=part1.ctl direct=true parallel=true
sqlldr userid=rjb/rjb control=part2.ctl direct=true parallel=true
sqlldr userid=rjb/rjb control=part3.ctl direct=true parallel=true
Each session generates its own log, bad, and discard files —
part1.log,
part2.log,
part3.log,
part1.bad,
part2.bad, and so on. The
PARALLEL keyword must be specified; without it, SQL*Loader places an exclusive lock on the table, preventing concurrent sessions from running.
Parallel Data Loading restrictions:
- Only the
APPEND loading option is permitted. INSERT, REPLACE, and TRUNCATE are not allowed.
- If the table must be emptied before loading, it must be done manually using
DELETE or TRUNCATE before the parallel sessions start. SQL*Loader cannot delete existing rows when Parallel Data Loading is in use.
Datafile Record Formats
SQL*Loader reads data from one or more files specified in the control file. From SQL*Loader's perspective, datafile content is organized as records. Three record formats are supported and specified using the
INFILE parameter.
Fixed Record Format
All records in the datafile are the same byte length. This is the least flexible format but yields the best read performance. Specified as:
INFILE datafile_name "fix n"
Where
n is the exact byte length of every record. Record length is always measured in bytes, even when character-length semantics are in effect for other fields in the file, because a single file may contain a mix of byte-length and character-length fields.
Newline characters are not required to terminate records in fixed format — the byte count alone determines record boundaries.
Example 5-3: Loading Data in Fixed Record Format
load data
infile 'example.dat' "fix 11"
into table example
fields terminated by ',' optionally enclosed by '"'
(col1, col2)
example.dat:
001, cd,
0002,fghi,
00003,lmn,
1, "pqrs",
0005,uvwx,
Each physical record in
example.dat is exactly 11 bytes. The first record is
001, cd, — 9 characters plus 2 bytes to reach the 11-byte boundary. The second record is
0002,fghi,\n where the newline is the eleventh byte.
Variable Record Format
Each record is preceded by a length field that specifies the byte length of that record. This format is more flexible than fixed format and is common on operating systems that use length-prefixed record files.
Stream Record Format
Records are terminated by a record terminator, typically a newline character. This is the default format when no format specification is given in the
INFILE clause.
Note: If data is specified inline within the control file using
INFILE *, the data is always interpreted in stream record format with the default record terminator, regardless of any other format specification.
The next lesson describes the table management enhancements covered in the remainder of this module.
[1] Direct Path mode: Oracle Direct Path mode is a data loading method that bypasses the SQL engine and writes formatted data blocks directly to database datafiles. It avoids transaction logging overhead, buffer cache management, and unnecessary block reads, producing significantly faster load throughput for large datasets compared to Conventional Path mode.
