Data analysis is concerned with the NATURE and USE of data. It involves the identification of the data elements which are needed to support the data processing system of the organization, the placing of these elements into logical groups and the definition of the relationships between the resulting groups. Other approaches, for example D.F.Ds and Flowcharts, have been concerned with the flow of data-dataflow methodologies. Data analysis is one of several data structure based methodologies.
A rule associated with a shared resource that determines which users can access the resource, and at what permissions level.
A relation in which every field is a member of the relation's primary key.
The Apriori algorithm is an algorithm that attempts to operate on database records, particularly transactional records, or records including certain numbers of fields or items. It is one of a number of algorithms using a "bottom-up approach" to incrementally contrast complex records, and it is useful in today's complex machine learning and artificial intelligence projects.
An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs. A guarantee of atomicity prevents updates to the database occurring only partially, which can cause greater problems than rejecting the whole series outright. As a consequence, the transaction cannot be observed to be in progress by another database client.
A characteristic of an entity; data that identifies or describes an entity. Usually represented as a column in a table, attributes store data values.
A table stored in a database.
BLOB domain type
Short for “binary large object” a domain type used to store binary objects such as graphics in a relational database.
The smallest area that can be addressed on a magnetic disk.
An expression that results in a value of either TRUE or FALSE. Boolean expressions are also called comparison expressions, conditional expressions, and relational expressions.
Business Analytics is the process by which businesses use statistical methods and technologies for analyzing historical data in order to gain new insight and improve strategic decision-making.
Business intelligence (BI) combines business analytics, data mining, data visualization, data tools and infrastructure, and best practices to help organizations to make more data-driven decisions.
Items in a business environment that are related, and about which data need to be stored (e.g., customers, products, orders, etc.).
A set of rules or conditions describing the business polices that apply to the data stored on a company databases.
A column or set of columns that could can used as a primary key for a table.
A type of table join: the Cartesian product of two tables consists of the combination of every record in one table with every record in another table.
Short for computer-aided software-engineering tools, CASE tools are software packages used to design and implement information systems.
Also called the data dictionary. The catalog is the foundation of the database--it's where the RDBMS finds which tables are in a database, which columns are in the tables, which columns are primary or foreign keys, and what type of data to expect in those columns.
One of several types of entity-relationship (ER) diagrams. The Chen model is named after the inventor of ER diagramming.
A group of table columns or rows often accessed together.
Clustering brings records from two or more tables together on a hard disk to improve joins between the two tables.
Part of the structure of a database table; also known as a field or, during early database design, as an attribute.
In relational database design, a composite entity is used to provide an indirect link between two entities in aM:N (many-to-many) relationship. Also called a linking table.
composite key attribute
A key attribute belonging to a composite entity, comprising the key attributes from each of the two entities
linked by the composite entity. Also called a composite primary key.
composite primary key
Another name for a composite key attribute.
concatenated primary key
A primary key made up of more than one field.
A description of the structure of a database.
Rules a database designer imposes upon certain elements in a database to preserve data integrity.
One of several types of entity-relationship (ER) diagrams, the Crowsfoot model neatly packages entities with
their attributes by placing them in boxes. Also referred to as the Information Engineering model.
The "raw facts" stored in a database.
data definition language (DDL)
A collection of programming statements that describe and define data and data relationships in a database.
Also called a catalog. The data dictionary is the foundation of the database, and it is where the RDBMS finds which tables are in a database,
which columns are in the tables, which columns are primary or foreign keys, and what type of data to expect in those columns.
data flow diagram
A diagram illustrating the flow of data in an organization, including data sources, data storage, and data transformation processes.
A term used to describe the quality (in terms of accuracy, consistency, and validity) of data in a database, in the sense that values required to enforce data relationships actually exist. Problems with data integrity occur when a value in one table that’s supposed to relate to a value in another can’t, because the second value either has been deleted or was never entered.
data manipulation language (DML)
A collection of programming statements used to manipulate a database.
An information extraction activity whose goal is to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis.
The category of data stored in a database. Each field in a database must have a data type assigned to it, such as Integer, Text, or Date/Time.
Data entered at the intersection of a row (record) and column (field); the data describes or identifies the subject of the record.
A formal structure for organizing and linking data in a database.
To move a table to a lower normal form (e.g., from 3NF to 2NF).
A descriptor column contains a non-key attribute of an entity.
An attribute upon which another attribute is functionally dependent.
A characteristic of certain Relational Database Managment Systems(RDBMS) in which storing parts of a database on more than one computer does not affect the RDBMS's ability to work with the database.
In a traditional relational database, the user begins by specifying a series of column types and names for a table. Information is then added as rows of values, with each of those named columns as a cell of each row. You cannot have additional values that were not specified when you created the table, and every value must be present, even if it is as a NULL value.
You do not specify ahead of time what names will be in each table using a schema. In theory, each record could contain a completely different set of named values, though in practice, the application layer often relies on an informal schema, with the client code expecting certain named values to be present.
The key advantage of this document-oriented approach is its flexibility. You can add or remove the equivalent of columns with no penalty, as long as the application layer does not rely on the values that were removed.
A good analogy is the difference between languages where you declare the types of variables ahead of time, and those where the type is inferred by the compiler or interpreter. You lose information that can be used to automatically check correctness and
optimize for performance, but it becomes a lot easier to prototype and experiment.
Determines the type of data values that are permitted for that attribute.
Rules that require values of attributes to come from specific domains (e.g., text, numbers, date, etc.).
A single stand-alone unit or a business object about which data are stored in a database; usually synonymous with a database table.
A key attribute that uniquely identifies each entity.
Entities that participate in a relationship are associated; the three types of entity relationships (associations) are one-to-one, one-to-many, and many-to-many.
entity-relationship (ER) diagram
A diagram used during the design phase of database development to illustrate the organization of and relationships between data during database design.
Also referred to as a natural join, an equi-join is a type of table join. The common field contains equivalent values.
(Entered in the glossary db under entity-relationship diagram.) A diagram used during the design phase of database development to illustrate the organization of and relationships between data during database design.
The smallest structure in a table; contains data describing the subject of a table.
Short for field variable.
first normal form (1NF)
A table is in first normal form if it contains no repeating groups.
A field (or combination of fields) used to link tables; a corresponding primary key field occurs in the same database.
Fourth Normal Form (4NF)
A relation is in 4NF if it all of its determinants are candidate keys and it contains no multi-valued dependencies.
Fully functional dependency
In (2NF), all non-key values must be fully functionally dependent on the primary key and no partial dependencies are allowed. A partial dependency exists when a field is fully dependent on a part of a composite primary key.
A one-way relationship between two attributes so that, for any unique value of one attribute, there is only one value of the other attribute.
A database model that organizes data in a top-down (inverted tree) structure.
Horizontal partitioning divides a table's records into two or more tables
One of several types of entity-relationship (ER) diagrams. Information Engineering (Crowsfoot) model - One of several types of entity-relationship (ER) diagrams; usually referred to as the "Crowsfoot" model.
Using SQL to create a database; the fourth step in the database life cycle.
An index is an ordered list of values in a field, exactly like the index in the back of a book.
Creating a list of values in a column to speed searches, joins, and queries.
A collection of "raw facts" (data) organized in such a way that they have value beyond the value of the facts themselves.
Interrelated components (e.g., people, hardware, software, databases, telecommunications, policies, and procedures) that input, process, output, and store data to provide an organization with useful information.
A link between tables in a database that displays only the rows with a match (true value of the join condition) for both join tables. An inner join does not guarantee the return of every row of data that you expect will be returned.
instance of entity.
A group of attributes that identify and describe one occurrence of an entity; the equivalent of a record in a table.
A mathematical operation that returns the integer portion of a division and discards the remainder.
Database design considerations based on other countries' differing data standards, alphabets, and writing systems.
An operation that links table records based on data in common fields.
An attribute that identifies an entity; an entity identifier.
A key column contains information that distinguishes that person, place, or thing from every other item represented in the table.
Any type of database that has been in use by an organization for several years.
Another name for a composite entity. Used to provide an indirect link between two entities in a M:N (many-to-many) relationship.
The second stage in the DBLC: creating a logical schema, followed by data normalization.
The overall logical plan of a database; typically a completed ER diagram.
Mail Exchange Record
DNS database record type that indicates which host can transfer messages for a particular domain.
In the language of relational database design, mandatory entity is another name for the entity that has mandatory participation in an entity relationship.
In the language of database design, participation describes the nature of the relationship between two tables. Mandatory participation describes a relationship in which least one record in a table must exist before any records can be added toits associated table.
In relational database design, a many-to-many (M:N) relationship exists when, for one instance of entity A, there exists zero, one, or many instances of entity B; and for one instance of entity B, there exists zero, one, or many instances of entity A.
A determinant consisting of more than one field.
An attribute with more than one data value; must be resolved into a single-valued attribute.
A condition in which a primary-key value in one table determines multiple values in two or more independent tables.
Also referred to as a equi-join, an natural join is a type of table join. The common field contains equivalent values.
An extension of the hierarchical model. (See hierarchical model.)
Non-key attributes are the attributes or fields of a table, other than candidate key fields in a table.
A set of theoretical rules for the design of database tables.
(Entered in the db under normal form.) A set of theoretical rules for the design of database tables.
The process of applying increasingly stringent rules to a relational database to correct any problems associated with poor design.
To break up large tables into smaller, more efficient tables without losing any information.
An unknown value in a column or field.
Unknown values in a column or field
A design that allows you to not only define what data you can store in your database, you also define a set of actions (methods) associated with the data.
A database model that stores data as objects.
In relational database design, a one-to-many (1:N) relationship exists when, for one instance of entity A, there exists zero, one, or many instances of entity B; but for one instance of entity B, there exists zero or one instance of entity A.
In relational database design, a one-to-one (1:1) relationship exists when zero or one instance of entity A can be associated with zero or one instance of entity B, and zero or one instance of entity B can be associated with zero or one instance of entity A.
In the language of database design, participation refers to the nature of the relationship between tables. Optional participation describes a relationship in which no records need to exist in a table before records are added to its associated table.
A link between tables in a database. When two tables are combined by an outer join, any records from the base table with no matching records from the other table are included in the results and any columns where no values are available are filled with nulls .
Typically a filing system where data is stored on a variety of paper forms.
The process of breaking tables into parts to improve retrieval performance.
Permissions are database tasks that a user or group of users is allowed to carry out. There are several types of permissions the administrator can grant:..· Select: The user can retrieve data from a table or view..· Insert: The user can create new records in a table or view. The user may also be limited to inserting values into particular fields..· Update: The user can modify existing values in a table or view, again for an entire table or specific columns..· Delete: The user can remove existing records from a table or view..· References: The user can use a field in an existing table or view a
field as a foreign key in a table that the user creates. This permission may be limited to particular fields..· All: The user has all of the permissions listed above.
Partitioning a table, in database lexicon, divides a single table into two or more tables to limit the amount of data the RDBMS has to retrieve at once. There are two types of partitioning operations: .· Horizontal partitioning, which splits a table's records into two or more tables .· Vertical partitioning, which splits a table's columns into two or more tables.
The underlying physical storage of data in a database, managed by the RDBMS.
Predictive analytics is the use of data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data.
(v.) To extract data from a database; (n.) a set of SQL statements for extracting particular data from a database.
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (now Lucent Technologies) by John Chambers and colleagues.
R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons.
The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts.
A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as "Find all museums within 2 km of my current location", "retrieve all road segments within 2 km of my location" (to display them in a navigation system) or "find the nearest gas station" (although not taking roads into account). The R-tree can also accelerate nearest neighbor search for various distance metrics, including great-circle distance.
A particular instance of the subject of a table.
Duplicate data in a database; although redundant data can never be totally removed, a major goal of database design is to eliminate as much redundant data as possible.
The means of maintaining the integrity of data between one or more tables that relate to each other, in other words, that a column of data in a table has a null or matching value in a corresponding table. Referential integrity is usually enforced with foreign keys.
Another word for table. Relations represent entities in database design.
A database built on the relational model, which organizes data into tables comprising columns and rows.
Relational Database Management System (RDBMS)
A software package that manages and provides access to a database. These packages follow Codd’s 12 rules of relational databases and normally use SQL to access data.
A formal structure that organizes data into relations (i.e., tables).
Relations are expressed in a shorthand called relational notation, a textual interpretation of the ER diagram.
If the same attribute occurs in more than one table, a relationship exists between those two tables.
An attribute that has more than one value in a table row.
The stage in the database design cycle when designers find out everything they can about the data the client needs to store in the database and the conditions under which that data needs to be accessed.
A type of inner join, so named because it creates two copies of a table and then joins records from the tables where the tables have equivalent values in designated fields.
The overall logical plan of a database.
second normal form (2NF)
A table is in second normal form when every attribute is functionally dependent on the entire primary key.
A branch of mathematics that studies how groups of objects interact.
An attribute with more than one data value; must be resolved into a single-valued attribute.
SQL is an acronym for Structured Query Language. It provides a set of commands that can be used to add data to a database, retrieve that data, and update it. SQL, often pronounced “sequel”, is universally supported by relationaldatabase vendors.
Structured Query Language (SQL)
The standard language used by all relational databases, including Oracle 13c, DB2 Version 11, Microsoft SQL Server 2016.
The person responsible for administering a multi-user computer system; duties range from setting up and configuring system components (e.g., an RDBMS) to performing maintenance procedures (e.g., database backups) on the system.
A collection of data arranged in rows and columns. A table is the largest structure in a relational database.
third normal form (3NF)
A table is in third normal form if it contains no transitive dependencies.
In logic, a three-valued logic (also trinary logic, trivalent, ternary, or trilean, sometimes abbreviated 3VL) is any of several many-valued logic systems in which there are three truth values indicating true, false and some indeterminate third value. This is contrasted with the more commonly known bivalent logics (such as classical sentential or Boolean logic) which provide only for true and false. Conceptual form and basic ideas were initially created by Jan Lukasiewicz and C. I. Lewis. These were then re-formulated by Grigore Moisil in an axiomatic algebraic form, and also extended to n-valued logics in 1945.
A term derived from the mathematical equation for a transitive property: If a>b and b>c, then a>c. In terms of a database, a transitive dependency refers to a relationship between two entities that is dependent upon a second relationship between one of the first two entities and a third entity.
Amathematical property which states that if A=B and B=C, then A=C.
In the lexicon of relational database design another word for rows or records.
Saved queries created with SQL. User views specify which users are permitted access to what data in a database.
To divide a table into multiple tables, placing different columns in each new table.
A database object that enables you to define the columns and rows that a specific user can see. A view can also serve as a tool for enforcing security within your database.
A table stored in the computer’s memory. Virtual tables themselves are not stored in the database; rather, the definition of the view is stored and given a name. Users call up that name, and the view is created (from base tables) on the fly. When a user closes the view, it "disappears" from memory, only to be recreated the next time its name is invoked.
In the language of relational database design, weak entity is another name for the entity that has optionalparticipation in an entity relationship.
A popular style of data-flow diagram. It includes symbols that illustrate who handles the data in a database, where the data moves, where the data are stored, and what is done to the data.