| Lesson 3 | SQL Strings and substrings |
| Objective | Run queries that contain string and substring functions to see how they work. |
In practice, string functions are useful in both reporting and data preparation. A reporting query may combine first and last names into one display column, while a data-cleaning query may remove extra spaces or standardize capitalization. A validation query may check whether a field contains a prefix, suffix, or domain name. Because of that, string and substring functions appear frequently in the SELECT, WHERE, and ORDER BY clauses of SQL statements.
The exact names of string functions vary somewhat across database platforms. SQL Server, Oracle, PostgreSQL, and MySQL all support strong string-processing capabilities, but they do not always use the same function names or syntax. That is why this lesson emphasizes both the basic ideas and the cross-database differences. Once you understand what a function is supposed to do, it becomes easier to adapt your query to the SQL dialect you are using.
This lesson focuses on running queries that contain string and substring functions so you can see how they work. As you move through the examples, pay attention to three core ideas: combining strings, measuring or transforming strings, and extracting part of a string. Those three operations form the foundation for much more advanced SQL text processing later on.
One of the most common functions is CONCAT(), which joins two or more strings into a single result. This is useful when you want to display a full name, build a label, or combine columns with punctuation or spacing.
SELECT CONCAT('Hello', ' World');
This returns Hello World.
Another common function is LENGTH(), which returns the number of characters in a string. In SQL Server, the equivalent is usually LEN().
SELECT LENGTH('Hello World');
SELECT LEN('Hello World');
Both examples return 11, although the specific function name depends on the database platform.
The UPPER() and LOWER() functions are used to normalize letter case. These functions are helpful when you want output to appear consistently or when you want case-insensitive comparisons to behave more predictably.
SELECT UPPER('Hello World');
SELECT LOWER('Hello World');
TRIM() removes leading and trailing spaces. Older or vendor-specific implementations may also use LTRIM() and RTRIM() separately.
SELECT TRIM(' Hello World ');
SELECT LTRIM(' Hello World');
SELECT RTRIM('Hello World ');
REPLACE() substitutes one substring for another. This is helpful when standardizing data or correcting repeated text patterns.
SELECT REPLACE('Hello World', 'World', 'SQL');
This returns Hello SQL.
Older examples sometimes use the plus sign operator to concatenate strings:
SELECT au_Fname + ' ' + au_Lname
FROM Authors;
This syntax is associated primarily with SQL Server. It is not portable across all major database systems, so a more modern and cross-database-friendly approach is to use CONCAT() instead.
SELECT CONCAT(au_Fname, ' ', au_Lname)
FROM Authors;
This version makes the space explicit. That point matters because the database engine does not automatically insert a space for you. If you omit the literal space, the values will run together. In other words, concatenating John and Smith without a space produces JohnSmith, not John Smith.
Some systems also support concatenation operators other than +. Oracle and PostgreSQL commonly use the double-pipe operator:
SELECT au_Fname || ' ' || au_Lname
FROM Authors;
This illustrates an important practical lesson: the idea of concatenation is universal, but the exact syntax is not. When writing SQL for a specific platform, always confirm whether the database expects CONCAT(), +, or ||.
Another useful variation is CONCAT_WS(), which means concatenate with separator. It inserts the separator between values automatically.
SELECT CONCAT_WS(' ', au_Fname, au_Lname)
FROM Authors;
This can make your code easier to read, especially when you are joining several text columns into one display value.
The general idea is simple: identify the source string, choose a starting position, and choose how many characters to return.
If you want to get the first three characters of the au_Lname column, you can use the following query:
SELECT SUBSTRING(au_Lname, 1, 3)
FROM Authors;
This returns a new result column containing only the first three characters of each last name. This type of query is useful when you want abbreviations, prefixes, or partial codes.
You can also apply substring functions to a string literal rather than a table column. The following query extracts five characters starting at the ninth position:
SELECT SUBSTRING('goodbye cruel world', 9, 5);
This returns cruel.
Most modern SQL platforms support substring operations, but the syntax is not always identical. SQL Server and MySQL commonly use:
SUBSTRING(string, start, length)
Oracle commonly uses:
SUBSTR(string, start, length)
PostgreSQL also supports a more standards-oriented form:
SUBSTRING(string FROM start FOR length)
Examples:
SELECT SUBSTR('goodbye cruel world', 9, 5);
SELECT SUBSTRING('goodbye cruel world' FROM 9 FOR 5);
All of these are based on the same idea: start at a given character position and return a specified number of characters.
LEFT() and RIGHT() can be easier to read.
SELECT LEFT(au_Lname, 3)
FROM Authors;
SELECT RIGHT(au_Lname, 4)
FROM Authors;
The first query returns the first three characters of each last name. The second returns the last four characters. These functions are often clearer than SUBSTRING() when your goal is straightforward extraction from one side of the string.
Not every database uses the same set of convenience functions, but the concept remains useful. Even if your platform does not support LEFT() or RIGHT() directly, you can usually reproduce the same result with a substring expression.
In SQL Server, CHARINDEX() returns the starting position of a substring within another string:
SELECT CHARINDEX('World', 'Hello World');
This returns 7 because the letter W in World begins at the seventh character position.
Other databases use different names for similar functionality. PostgreSQL commonly uses POSITION(), while Oracle often uses INSTR().
SELECT POSITION('World' IN 'Hello World');
SELECT INSTR('Hello World', 'World') FROM dual;
These functions are especially useful when combined with substring extraction. For example, you might first locate the @ symbol in an email address and then extract the domain name that follows it.
For example, names or categories may be stored with irregular capitalization. You can normalize them with UPPER() or LOWER():
SELECT UPPER(city_name),
LOWER(email_address)
FROM Contacts;
Leading and trailing spaces are another common issue, especially when data has been imported from external systems. Those spaces may cause sorting, grouping, and comparison problems. TRIM() removes those unwanted characters cleanly:
SELECT TRIM(customer_name)
FROM Customers;
REPLACE() can also help standardize data. For example, a query might convert one abbreviation into another or replace punctuation that should not appear in a cleaned output column.
SELECT REPLACE(phone_number, '-', '')
FROM Customers;
That query strips hyphens from a phone number, making it easier to compare values consistently.
Suppose you want to format full names for a report:
SELECT CONCAT(au_Fname, ' ', au_Lname) AS full_name
FROM Authors;
Suppose you want only the domain portion of an email address in SQL Server:
SELECT SUBSTRING(email_address,
CHARINDEX('@', email_address) + 1,
LEN(email_address))
FROM Users;
In a system that supports POSITION(), the same idea can be adapted to a different syntax:
SELECT SUBSTRING(email_address FROM POSITION('@' IN email_address) + 1)
FROM Users;
You might also use substring functions to identify product families from SKU codes:
SELECT LEFT(product_code, 3) AS sku_prefix
FROM Inventory;
Or you may need to convert mixed-case user input into a normalized display style before presenting results to an end user. These are the kinds of problems string functions solve every day in database work.
As your SQL skills grow, you will see these functions used in filters, joins, search conditions, reports, ETL tasks, validation queries, and data-quality checks. Even when more advanced tools are available, the ability to manipulate character data directly in SQL remains a practical and important skill.
The key is not to memorize every vendor-specific variation immediately. Instead, focus on the underlying operations: join text, measure text, transform text, search text, and extract text. Once those patterns are clear, you can translate them from one SQL dialect to another much more easily.
Continue to the exercise and test these functions directly so you can see how concatenation, trimming, searching, and substring extraction behave in actual query results.