Full Text Searching  «Prev  Next»
Lesson 3Considerations for full-text searching
ObjectiveDescribe the components you should consider when using full-text searching.
When using Full-Text Searching in SQL Server 2022, several key components and considerations are involved to ensure efficient, accurate, and maintainable search capabilities across large textual data sets. Here's a breakdown of the critical components:
  1. ✅ Full-Text Catalog
    • Purpose: A logical container for full-text indexes.
    • Note: In SQL Server 2022, creating a catalog is optional. You can associate a full-text index directly with a table.
    • Command Example:
      CREATE FULLTEXT CATALOG MyCatalog AS DEFAULT;
      
  2. ✅ Full-Text Index
    • Purpose: Enables full-text search on one or more columns of a table.
    • Requirements:
      • The table must have a unique, non-nullable single-column index (usually a PRIMARY KEY).
      • Columns must be of a supported data type (CHAR, VARCHAR, TEXT, NVARCHAR, NTEXT, XML, etc.).
    • Command Example:
      CREATE FULLTEXT INDEX ON Documents(Content LANGUAGE 1033)
      KEY INDEX PK_Documents_ID
      ON MyCatalog;
                
  3. ✅ Supported Data Types
    • CHAR, VARCHAR, TEXT, NCHAR, NVARCHAR, NTEXT, IMAGE, XML, VARBINARY(MAX) (if associated with a type column)
  4. ✅ Language-Specific Considerations
    • Language ID determines the linguistic rules used for word-breaking and stemming.
    • SQL Server supports multiple languages via word breakers and stemmers.
  5. ✅ Full-Text Engine Components
    • Word Breakers and Stemmers: Break text into tokens and reduce words to their base forms.
    • Noise Words (Stopwords): Common words (like "the", "and") ignored during indexing and querying.
    • Thesaurus Files: XML-based mappings for synonyms to enhance search flexibility.
  6. ✅ Queries Using Full-Text Search

    You can use the following predicates:

    Predicate Description
    CONTAINS Searches for precise matches and supports complex syntax with prefixes, wildcards, and logical operators.
    FREETEXT Searches for meaning or intent, leveraging a thesaurus and linguistic analysis to understand the query.
    CONTAINSTABLE / FREETEXTTABLE Return ranked result sets, including a RANK column to indicate relevance.

Example:
SELECT Title
FROM Documents
WHERE CONTAINS(Content, '("SQL Server*" AND "indexing")');
  • ✅ Population (Indexing)
    • Manual or Automatic: You can trigger population manually or let SQL Server update it based on change tracking.
    • Modes:
      • AUTO – automatic change tracking
      • MANUAL – manual repopulation required
      • FULL or INCREMENTAL population options available
  • ✅ Security Considerations
    • Users need appropriate SELECT permission on full-text indexed columns.
    • Full-text indexes are stored separately from the regular indexes, in the Full-Text Catalog directory.
  • ✅ Performance Optimization
    • Use filters to narrow searches.
    • Avoid unnecessary use of FREETEXT, as it's more resource-intensive.
    • Index only the required columns.
    • Monitor using sys.fulltext_catalogs, sys.fulltext_indexes, and DMVs like sys.dm_fts_index_population.
  • ✅ Maintenance and Monitoring Tools
    • Catalog Status: FULLTEXTCATALOGPROPERTY('MyCatalog', 'PopulateStatus')
    • Population Status: Use sys.dm_fts_index_population
    • Rebuilding Indexes:
      ALTER FULLTEXT INDEX ON Documents START FULL POPULATION;
                

  • Full text Searching in SQL Server 2022
    Full text Searching in SQL Server 2022

    Considerations for full-text Searching
    There are several things to consider before you begin using full-text searching. First of all, you must install Microsoft full-text searching, an option presented to you when you install SQL Server 2022. However, this option is not selected by default, so you must choose to install this option to be able to use the Microsoft Search service. Without SSMS, menu options for full-text searching will be disabled. Transact-SQL commands used against the database will also fail.

    Full-text Indexes

    In SQL Server 2022, full-text indexes can only be created on columns with character-based data types, because the full-text engine tokenizes and indexes "words and phrases", not numeric values.
    ✅ Valid Data Types for Full-Text Indexing:
    1. char, varchar
      • Fixed-length non-Unicode string data.
      • Variable-length non-Unicode string data.
    2. nchar, nvarchar
      • Fixed-length Unicode string data.
      • Variable-length Unicode string data.
    3. text, ntext (deprecated but still supported)
      • Variable-length non-Unicode data with a maximum length of 2^31-1 (2 GB).
      • Variable-length Unicode data with a maximum length of 2^30-1 (1 GB).
    4. xml
      • Stores XML data.
    5. image, varbinary(max) (with a type column and filters)
      • Variable-length binary data with a maximum length of 2^31-1 (2 GB).
      • Variable-length binary data with a maximum length of 2^31-1 (2 GB), often used for images or other large binary objects.

    Following Datatypes are Not Allowed:
    * `int`, `bigint`, `money`, `decimal`, `float`, `datetime`, etc.
    🔍 Why?
    Full-text indexing uses word breakers, stemmers, stopwords, and linguistic analysis to build an inverted index. These processes only make sense on "textual data", not numeric values or binary math-based types.

    One full-text index

    It is important to remember that, like clustered relational indexes, there can be only one full-text index on any given table. All desired columns that are specified as part of the index will be added to that one index. In the next lesson, we will discuss how to define your full-text indexes.
    This paragraph highlights a crucial limitation of full-text indexes in SQL Server, drawing a parallel to clustered relational indexes:
    • One Full-Text Index Per Table: The most important point is that a table can have only one full-text index. This is a significant design constraint to keep in mind when planning your database schema for full-text search capabilities.
    • Consolidated Columns: If you want to include multiple columns in your full-text search, they must all be part of this single full-text index. You can't create separate full-text indexes for different sets of columns on the same table. All "desired columns" are combined into that one index.
    • Forward-Looking: The final sentence indicates that the subsequent lesson will delve into the practical steps of defining these full-text indexes, building upon this foundational understanding of their limitations.

    In summary, this reminds the user to be strategic when designing their full-text search implementation, as they have only one opportunity per table to define which columns will be searchable via this mechanism.

    CREATE FULLTEXT INDEX` statement

    The `CREATE FULLTEXT INDEX` statement in SQL Server 2022 is used to create a full-text index on a table or indexed view to support linguistic-based searches on text data. This index enables you to run queries using `CONTAINS`, `FREETEXT`, `CONTAINSTABLE`, and `FREETEXTTABLE`—which go far beyond `LIKE` by enabling stemming, proximity, inflectional forms, and thesaurus-based searches.
    🔹 Syntax Overview
    CREATE FULLTEXT INDEX ON table_name
    (
        column_name1 [TYPE COLUMN type_column_name] [LANGUAGE language_term_id],
        column_name2 [LANGUAGE language_term_id],
        ...
    )
    KEY INDEX unique_index_name
    [ON fulltext_catalog_name]
    [WITH <fulltext_index_options>];
    
    🔹 Key Clauses Explained
    Clause Description
    ON table_name The name of the base table for which the full-text index is being created.
    (columns...) Specifies the columns to be indexed; only character-based data types are allowed.
    TYPE COLUMN Required if the column is varbinary(max) or image—tells SQL Server what type of document (e.g., .pdf, .docx).
    LANGUAGE Specifies the language-specific word breaker and stemmer (e.g., 1033 = English).
    KEY INDEX Mandatory—specifies the unique key index on the table (typically the primary key).
    ON catalog_name Optional—specifies the full-text catalog to associate with the index. If omitted, the default catalog is used.
    WITH Used to define index properties like change tracking and stoplist behavior.

    🔹 Example: Basic Full-Text Index
    CREATE FULLTEXT INDEX ON Articles
    (
        Title LANGUAGE 1033,
        Body LANGUAGE 1033
    )
    KEY INDEX PK_Articles_ArticleID
    ON MyFullTextCatalog
    WITH CHANGE_TRACKING AUTO;
    

    This index allows full-text searches on the `Title` and `Body` columns using English linguistic rules.
    🔹 Options in the `WITH` Clause
    Option Description
    CHANGE_TRACKING { AUTO | MANUAL | OFF } Determines how index updates are managed. AUTO is most common.
    STOPLIST { SYSTEM | OFF | stoplist_name } Controls which stopwords (noise words) are ignored during indexing.

    🔹 Special Cases
    ➤ `TYPE COLUMN`
    Used for `varbinary(max)` or `image` columns, which store binary documents.
    CREATE FULLTEXT INDEX ON Documents
    (
        DocumentContent TYPE COLUMN DocumentType
    )
    KEY INDEX PK_Documents_DocumentID
    ON DocCatalog;
    

    Here, `DocumentType` might contain values like `'.docx'`, `'.pdf'`.
    ➤ Indexed View Support
    You can also create a full-text index on an indexed view, but it must meet strict requirements (e.g., schema-bound view, deterministic expressions).
    ✅ Best Practices
    • Ensure the base table has a unique single-column NOT NULL index.
    • Use a dedicated full-text catalog for large datasets to isolate I/O.
    • Choose the correct LANGUAGE for accurate linguistic processing.
    • Monitor population status with:

      SELECT FULLTEXTCATALOGPROPERTY('MyFullTextCatalog', 'PopulateStatus');
      

    SEMrush Software 3 SEMrush Banner 3