Language Support   «Prev  Next»

Lesson 4Character Sets
ObjectiveUnderstand how to define a Character Set for a Database

Define Character Sets

The NLS_LANG parameter is used to establish the language for the language-independent support delivered by Oracle.
The character set is used to establish the language for the language-dependent storage of data.

Character sets

A character set is a group of characters that defines how to represent valid values for a particular language in the database. A character set that supports English allows upper- and lowercase representations of each letter of the alphabet, the 10 digits that make up numbers, and all valid punctuation characters. The character set for other languages, such as Chinese, could include many, many more valid characters.
Oracle can support character sets that can be represented in a single byte of data, such as English, or that require multiple bytes, such as many Asian languages. One-hundred and eighty different character sets come with your Oracle database. Oracle uses an industry standard called Unicode for its character sets, which can store a wide variety of single and multiple-byte languages.

Defining a character set

You define a character set for your database when you install the database. You cannot change the character set used by the database once you create the database.

Character Set Encoding

When computer systems process characters, they use numeric codes instead of the graphical representation of the character.
For example, when the database stores the letter A, it actually stores a numeric code that is interpreted by software as the letter. These numeric codes are especially important in a global environment because of the potential need to convert data between different character sets.

What is an Encoded Character Set?

You specify an encoded character set when you create a database. Choosing a character set determines what languages can be represented in the database. It also affects:
  1. How you create the database schema
  2. How you develop applications that process character data
  3. How the database works with the operating system
  4. Performance
A group of characters (for example, alphabetic characters, ideographs, symbols, punctuation marks, and control characters) can be encoded as a character set. An encoded character set assigns unique numeric codes to each character in the character repertoire. The numeric codes are called code points or encoded values. Table Character Set shows examples of characters that have been assigned a numeric code value in the ASCII character set.

Character Set Encoded Characters in the ASCII Character Set

Character Description Code Value
! Exclamation Mark 21
# Number Sign 23
$ Dollar Sign 24
1 Number 1 31
2 Number 2 32
3 Number 3 33
A Uppercase A 41
B Uppercase B 42
C Uppercase C 43
a Lowercase a 61
b Lowercase b 62
c Lowercase c 63

In the next lesson, you will learn how to use a character set.