Business Intelligence

โˆ’

Video 1: Introduction to Data & Business Intelligence

This session covers the foundational aspects of data, its quality, and the evolution and necessity of Business Intelligence in modern decision-making.

Big Data
๐Ÿ“Š

Data Collection refers to "facts which have similar attributes or characteristics."

Data Lifecycle

๐Ÿ“ฅ
Collected
๐Ÿ“
Measured
๐Ÿ“„
Reported
๐Ÿ“ˆ
Visualized
๐Ÿ”
Analyzed

Types of Data

1

Unstructured Data

No particular format or sequence, no data model, not easily usable. Doesn't follow rules or semantics with no easily identifiable structure.

Examples: Webpages Memos Videos Email
2

Semi-structured Data

Similar entities are grouped with insufficient metadata. No fixed data model but contains tags and elements.

Examples: XML Markup Languages Data Integration
3

Structured Data

Conforms to a data model with data stored in rows and columns. Attributes in a group are consistent, residing in fixed fields. Definition, format, and meaning are explicitly known.

Examples: Databases Spreadsheets

Data Distribution

Data Type Distribution
Unstructured 80%
Semi-structured 10%
Structured 10%

Data quality is depicted as a hierarchy leading to action:

Action (Decide)
โ†‘
Knowledge (Analyze)
โ†‘
Information (Integrate)
โ†‘
Data (Collect)

Data is obtained from multiple disparate sources (integrated).

Data Quality Attributes

  1. 1

    Accuracy

    Data correctly represents the real-world entities and values it is intended to model.

  2. 2

    Completeness

    All required data is present and no critical information is missing.

  3. 3

    Consistency

    Data values are uniform across different datasets and systems without contradictions.

  4. 4

    Timeliness

    Data is available and up-to-date when needed for decision-making processes.

It unites data, technology, analytics, and human knowledge to optimize business decisions and ultimately drive an enterprise's success.

Evolution of BI

1956
Invention of hard disk.
1970
Decision Support System (DSS).
1980
Executive Information System (EIS).
1990
Data Warehouse, Business Intelligence.
2000
Dashboards, Scorecards.
2010
Analytics, Big Data, Mobile BI, Personal BI.

Features of BI

  1. 1
    Fact-based decision making.
  2. 2
    Single version of truth.
  3. 3
    360-degree perspective on business.
  4. 4
    Virtual team members on the same page.

Need for BI

  1. 1
    Too much data, too little insight.
  2. 2
    Need to blend the unstructured data to support better decision making.
  3. 3
    React faster to changing market.
  4. 4
    Improve control over important processes.
  5. 5
    Improve efficiency within organization so as to increase productivity.
๐Ÿ—„๏ธ

What is a Database?

A repository of information that serves as a backing storage for applications, enabling efficient data management and retrieval.

Key Characteristics

1

Application Storage

Repository of information that is used as a backing storage for some applications.

2

Current Data

Stores current data for immediate access and operations.

3

Day-to-Day Operations

Used for daily operational tasks and business processes.

4

Process-Oriented

Focus on specific business processes and operational workflows.

5

Size Range

Typically ranges from 100 MB to GB in size.

6

Transaction Processing

Supports transaction processing for data integrity and consistency.

7

User Types

Used by officials, DBAs, and database professionals.

โšก

OLTP (Online Transaction Processing)

Handles day-to-day transactions that result from enterprise operations, designed for frequent queries and update operations.

Key Features

1

Transaction Handling

Day-to-day handling of transactions that result from enterprise operation.

2

Normalized Data

Stores data in normalized form to remove duplication.

3

High Granularity

Stores data in normalized format with high granularity and detailed format.

4

Frequent Operations

Designed mainly to support frequent queries and update operations.

๐Ÿ“Š

OLAP (Online Analytical Processing)

Analyzes historical data from data warehouses for strategic decision making using complex queries.

Key Features

1

Historical Data Analysis

Utilizes historical data in a data warehouse for purpose of making management decisions.

2

Complex Queries

Analyzes historical data (terabytes) using complex queries.

3

Data Aggregation

Mostly data aggregation and less update operations.

4

Strategic Decisions

Gives information for strategic decision making.

โˆ’

Video 2: Data Warehousing Concepts & Approaches

This session delves into data warehousing, data mining, and different approaches to building data warehouses, along with key measurement terminology and decision types.

Data Mining & Data Warehouse Approaches
๐Ÿข

Data Warehouse

A repository that deals with multiple subject areas. It is not a product, it's a data environment. Data warehousing is the process of designing, building, and maintaining a data warehouse system.

Key Characteristics

๐Ÿ“Š

Subject Oriented

Subject areas are populated from one or more databases

๐Ÿ”—

Integrated

Data consolidated from multiple sources

๐Ÿ”’

Non Volatile

Data is permanent and stable over time

โฐ

Time Variant

Tracks historical changes over time

Core Features

1

Historical Data Storage

Stores the historical data

2

Knowledge Workers

Users are knowledge workers who make informed decisions

3

Enterprise Scale

Size ranges from 100 GB to TB scale

4

Analytical Processing

Supports analytical processing for insights

5

Long-term Requirements

Used for long term informational requirements and decision support

Data Warehouse Flow

๐Ÿ“ฅ
Data Sources

Multiple databases & systems

โ†’
๐Ÿ”„
ETL Process

Extract, Transform, Load

โ†’
๐Ÿข
Data Warehouse

Centralized repository

โ†’
๐Ÿ“Š
Analytics

Reports & Insights

โ†’
๐Ÿ’ก
Decision Making

Strategic decisions

โ›๏ธ

Data Mining

Process of discovering meaningful correlations, patterns and trends by selecting through large amounts of data stored in repositories.

๐Ÿ”ฌ

Technologies & Techniques

It employs pattern recognition technologies, as well as statistical and mathematical techniques.

Data Mining Process

๐Ÿ”„
Transformation

Convert raw data into analyzable format

โ†’
๐Ÿ’พ
Storage

Store transformed data efficiently

โ†’
๐Ÿ“ฆ
Delivery

Present insights and findings

Core Techniques

๐Ÿ”

Pattern Recognition

Identify recurring patterns and relationships in data

๐Ÿ“Š

Statistical Analysis

Apply statistical methods to understand data distribution

๐Ÿงฎ

Mathematical Models

Use mathematical algorithms to predict outcomes

๐Ÿค–

Machine Learning

Leverage AI algorithms for automated insights

Key Applications

๐ŸŽฏ

Market Basket Analysis

Identify product associations and customer buying patterns

๐Ÿ‘ฅ

Customer Segmentation

Group customers based on similar characteristics and behaviors

๐Ÿ”ฎ

Predictive Analytics

Forecast future trends and outcomes using historical data

โš ๏ธ

Fraud Detection

Identify unusual patterns indicating fraudulent activities

๐Ÿ—๏ธ

Data Warehouse Approaches

The approach for building a data warehouse is based on business requirements. Two primary methodologies exist, each with distinct philosophies and implementation strategies.

1

Ralph Kimball

Dimension Modelling (Bottom-Up)

3NF - more tables, more joins

DW is made up of all the data marts in an enterprise.

Bottom-Up Approach
DM1
โ†‘
Enterprise DW
โ†‘
DM2
...
DMn

(bottom-up approach)

2

William H. Inmon

3NF - More Tables, More Joins (Top-Down)

3NF - more tables, more joins

An enterprise has one data warehouse; and data marts source their information from the data warehouse.

Top-Down Approach
Enterprise DW
โ†“
DM1
DM2
...
DMn

(top-down approach)

Detailed Comparison

Criterion
๐Ÿ‘ค
Ralph Kimball
๐Ÿ‘ค
William H. Inmon
โฑ๏ธ
Time
less time
time consuming
๐Ÿ”ง
Maintenance
difficult, redundant maintenance, subject to revisions
easy maintenance
๐Ÿ’ฐ
Cost
low initial cost, subsequent phases cost same
high initial cost, subsequent phases cost less
๐Ÿš€
Start-up
shorter time for initial set-up
longer start-up time
๐Ÿ‘ฅ
Team
generalist team
specialist team
๐ŸŽฏ
Scope
individual business areas
enterprise-wide
๐Ÿ“Š

Measurement System Terminology

Understanding the hierarchy of measurement concepts from basic counts to actionable insights

1

Measure

Countability

The basic unit of quantification - what can be counted or measured

2

Metric

Degree of measure

A standard unit that provides context and scale to measurements

3

Indicator

Metric to track performance

A specific metric used to monitor and evaluate performance

4

Index

Composite set of indicators

A combined measure made up of multiple indicators

5

Key Performance Indicator (KPI)

Measurable value to check efficiency of company

Critical metrics that directly measure success against strategic objectives

Measurement Hierarchy

๐Ÿ“
Measure
โ†’
๐Ÿ“Š
Metric
โ†’
๐Ÿ“ˆ
Indicator
โ†’
๐Ÿ“‘
Index
โ†’
๐ŸŽฏ
KPI
๐ŸŽฏ

Types of Decision

Understanding decision types based on impact and frequency - from strategic long-term planning to operational day-to-day activities

Decision Matrix

Low Frequency
High Impact
High Frequency
Low Impact
๐Ÿ›๏ธ

Strategic

High Impact Low Frequency
โš™๏ธ

Tactical

Balanced
๐Ÿ”„

Operational

High Frequency Low Impact
โ†
โ†
โ†
โ†
๐Ÿ›๏ธ

Strategic Decisions

Impact: High
Frequency: Low

Long-term, organization-wide decisions that shape the future direction and competitive position

Examples: Market Entry Mergers Business Model
โš™๏ธ

Tactical Decisions

Impact: Medium
Frequency: Medium

Mid-term decisions that implement strategic plans through resource allocation and process optimization

Examples: Budget Allocation Hiring Plans Campaign Launch
๐Ÿ”„

Operational Decisions

Impact: Low
Frequency: High

Day-to-day decisions that ensure smooth operations and immediate task execution

Examples: Task Assignment Inventory Orders Daily Scheduling
โˆ’

Video 3: Business Analytics & BI Component Framework

This session focuses on Business Analytics, its various types, and the comprehensive framework of Business Intelligence components.

Business Analytics & BI Component Framework
๐Ÿ“Š

Business Analytics (BA)

It allows users to examine and manipulate data to drive positive business actions. Business users can make well-informed, fact-based decisions to support their organization's tactical and strategic goals.

Three Pillars of Business Analytics

๐Ÿ”™

Know the past

What happened?

Why it happened?

Historical Analysis
๐Ÿ”

Analyze present

What is happening?

Why it is happening?

Current Insights
๐Ÿ”ฎ

Predict future

What will happen?

Predictive Modeling

Business Analytics Flow

๐Ÿ”™
Past

Historical Data

โ†’
๐Ÿ”
Present

Current Analysis

โ†’
๐Ÿ”ฎ
Future

Predictions

โ†’
๐ŸŽฏ
Action

Business Decisions

โš–๏ธ

BI vs. BA

Understanding the key differences between Business Intelligence and Business Analytics - from reporting capabilities to predictive modeling and testing methodologies.

๐Ÿ“ˆ

BI (Business Intelligence)

1
Reporting (KPIs, metrics)

Track key performance indicators and business metrics

2
Automated monitoring/alerting

Set thresholds for automatic notifications

3
Dashboards

Visual representation of business data

4
Score cards

Measure performance against objectives

5
OLAP (Cubes, slice, dice, drilling)

Multi-dimensional data analysis

6
Ad hoc query

On-demand data exploration

๐Ÿง 

BA (Business Analytics)

1
Statistical/quantitative analysis

Apply statistical methods to data

2
Data mining

Discover patterns in large datasets

3
Predictive modeling

Forecast future outcomes and trends

4
Multivariate testing

Test multiple variables simultaneously

Key Differences

๐Ÿ“ˆ
BI Focus

Descriptive analytics - What happened and why

๐Ÿง 
BA Focus

Predictive & prescriptive analytics - What will happen

๐ŸŽฏ
Combined Power

Together they provide complete business insight

๐Ÿ“Š

Types of Business Analytics

Business Analytics encompasses four key types, progressing from understanding the past to automating future decisions.

Four Analytics Types

1
๐Ÿ“Š

Descriptive

What happened?

Analyzes historical data to understand past performance and trends

2
๐Ÿ”

Diagnostic

Why did it happen?

Investigates causes and reasons behind past outcomes

3
๐Ÿ”ฎ

Predictive

What will happen?

Forecasts future outcomes using statistical models and ML

4
๐Ÿง 

Prescriptive

What should I do?

Recommends optimal actions based on predictions

Decision Making Categories

๐Ÿ‘ค

1. Decision Support

Provides insights and recommendations to aid human decision-making

๐Ÿค–

2. Decision Automation

Automates decisions based on predefined rules and algorithms

Basic Domains within BA

๐Ÿ“ข
Marketing
๐Ÿ›๏ธ
Retail
๐Ÿ’ฐ
Financial services
๐Ÿšš
Supply chain
โœˆ๏ธ
Transportation
๐Ÿ‘ฅ
Customer
๐Ÿ› ๏ธ

Tools used for BI

Business Intelligence tools span across data extraction, reporting, and analysis phases of the BI lifecycle.

๐Ÿ“ฅ

Data Extraction

Informatica

Enterprise data integration platform

IBM Datastage

ETL and data integration tool

๐Ÿ“„

Reporting

IBM Cognos

Business intelligence suite

Qlikview

Visual analytics platform

Power BI

Microsoft's analytics service

๐Ÿ“ˆ

Analyzing

SAS

Statistical analysis software

IBM SPSS

Predictive analytics platform

BI Tool Pipeline

๐Ÿ“ฅ
Extract

Data Extraction

โ†’
๐Ÿ“„
Report

Visualization

โ†’
๐Ÿ“ˆ
Analyze

Insights Generation

๐Ÿ›๏ธ

BI Component Framework

A comprehensive framework that outlines all essential components and their relationships in a Business Intelligence ecosystem.

1

Business Layer

Business drivers: manufacturing, telecommunications, IT, banking, finance, transportation, travel, energy.

๐Ÿ“‹

1. Business Requirement (It is product of 3 steps of business process)

1
Business demands (Why?): These are impulses that initiate the need to act, e.g., increased competition, changing economy, changing technology, changing regulations.
2
Business goals (What?): Actions taken in response to business demands, e.g., increased productivity, improved market share, cost reduction etc.
3
Business strategies (How?): Action plans for the business to accomplish its goals, e.g., outsourcing, partnerships, customer retention.
๐Ÿ’ฐ

2. Business Value

When a strategy is implemented against certain business goals, then certain costs are involved. However, the final output of this process should create such business whose ratio to the costs involved should be feasible rather.

๐Ÿ“ˆ
Return on Investment (ROI)

It is the benefit to the investor resulting from an investment of some resource.

Formula:
ROI = Net income / investment
๐Ÿ’Ž
Return on Asset (ROA)

Indicator of how profitable a company is relative to its total assets. It is the rate at which invested Capital.

Example: A company has a net income of $1 million and has total assets of $5 million. Then its ROA is 20%.
๐Ÿ’ต
Total Cost of Ownership (TCO)

It is a financial estimate intended to help buyers and owners determine the direct and indirect costs of a product or system.

Example: The cost of owning a vehicle from a time of purchase by the owner, through its operation and maintenance to the time its leave the possession of owner.
โš–๏ธ
Total Value of Ownership (TVO)

Comparative measure that evaluates the TCO and any additional benefits, such as the mobility of laptops when compared to desktops.

3

Program Management

Seamless integration of people, project and job activities and smooth functioning of entire program.

1
Business priorities.
2
State missions and goals.
3
Predict strategies and risks.
4
Define deliverables from multiple projects.
5
Manage dependencies.
6
Measure cost and value and publish ROI.
7
Capture and monitor business rules.
8
Manage technical non technical infrastructure.
4

Development

The process of development consists of:

๐Ÿ—„๏ธ
1
Database/Data Warehouse development
โ†’
๐Ÿ”„
2
Data integration system development
โ†’
๐Ÿ“Š
3
Business analytics development
๐ŸŽฏ

Complete BI Framework Overview

The BI Component Framework integrates Business Layer requirements and value metrics with Program Management oversight, guiding the Development process to create a robust Business Intelligence ecosystem that delivers measurable business outcomes.

โˆ’

Video 4: Multi-dimensional Data Modeling (MDDM) - Part 1

This session introduces Multi-dimensional Data Modeling, addressing challenges with relational models and explaining key concepts like surrogate keys, dimensions, facts, and schemas.

Multi-dimensional Data Modeling (MDDM)
โš ๏ธ

Challenges with Relational Model

1

Concurrent Updates

Database is concurrently updated by millions of users. Results in transactional locks on data, slower and time consuming data read operation.

2

High Normalization

Data is highly normalized usually in 3NF, requires data to be fetched from multiple tables, more no. of joins.

3

Alphanumeric Primary Keys

Alphanumeric primary keys Joins between tables with alphanumeric keys result in high memory consumption which is why numeric columns are to be used.

4

Scattered Transactional Data

Transactional data is scattered over multiple tables. Makes aggregation and summarization difficult.

5

Limited Time-Series Analysis

Data like month, year, quarter values for a date component is not available. Unable to do time series based analysis.

๐Ÿ“Š

What is MDDM

Arrangement of database tables designed to optimize, query, analyze and reporting needs of an enterprise.

It helps with data retrieval operation and also called Dimensional or Dimension model.

Used to:

๐Ÿ’ก

Summarizes data, for ex. products with highest profit margin for last three quarters, would give a more clearer picture than product sales details.

โšก

Simplifies performance related problems:

  • Slow turnaround time for complex queries.
  • Too many joins between multiple tables.
  • Large size of database.

Data Flow Process

๐Ÿ“ฅ
Collected
โ†’
๐Ÿ“
Measured
โ†’
๐Ÿ“„
Reported
โ†’
๐Ÿ“ˆ
Visualized
โ†’
๐Ÿ”
Analyzed
๐Ÿ”‘

Key MDDM Concepts

1

Surrogate Key

It is an additional primary key of dimension table which is populated using ETL (extract, transform, load) process.

Reasons:
Reason 1:

Primary keys are usually alphanumeric and are designed with intent for providing additional information about it. For analysis we need to form multiple joins and these primary keys create high memory consumption which is why numeric columns are to be used.

Reason 2:

Capture changes in master data. For example: petrol prices change frequently. In a transactional processing system, updating the record is easy. But for time series analysis in a data warehouse, we need historical data. A surrogate key allows us to have multiple rows of the same product with different effective dates to preserve history.

2

Dimension

Descriptive information used to group transactional data. This provides context to analyze transactional information. It provides the "who, what, where, when, why, and how" context surrounding a business process event. It may have attributes/columns.

Examples:
Customer Product Date Time Location
3

Fact

Transactional data which is being analyzed by business user.

Examples:
Sales Qty Sold Profit Loss
4

Hierarchy

A set of related dimensional attribute (Columns).

Example:
Level 1
Country Name
โ†’
Level 2
State Name
โ†’
Level 3
City

Panipat is a city in the state of Haryana in India.

๐Ÿ“Š

Hierarchy Facilitates different view of summarized data.

For ex. viewing sales details by country (less details) vs. viewing sales details by city (more details).

Drill down: in-depth reporting.

5

Grain

An attribute of hierarchy at which data is collected or stored in the fact table.

Example:

If you store data in the fact table for each city on daily basis, then city and day are grains.

Data collected / basic attribute from dimension table, retrieved to the fact table.

6

Schema

An arrangement of fact and dimension table, called schema. You can use ER diagram or logical data model or physical data model to represent details of the schema.

Example:

A Fact table sales is connected to dimensions customer, date, product and geography (Star schema).

โˆ’

Video 5: Multi-dimensional Data Modeling (MDDM) - Part 2

This session continues with MDDM, focusing on various types of dimensions, detailed explanations of different schema types, and categories of facts.

Multi-dimensional Data Modeling (MDDM)
๐Ÿ“

Types of Dimensions

1

Slowly Changing Dimension (SCD)

Dimension whose attribute value changes infrequently.

Reason: error in data, e.g. spelling mistakes; change in name, address, price.
Type-1
Overwrite History

Old data is replaced with new data, hence history will not be preserved.

Adv.
Easy to implement because simple update.
Disadv.
Old information deleted permanently.
Type-2
Preserves History

New record is inserted to add new information. Both old and new record coexist in the table to preserve history. The new record with new info is assigned a new primary key.

Adv.
Allows to keep all history.
Disadv.
Size of table increases rapidly with each update, frequent update and insert causes performance and maintenance problems.
Type-3
Partially Preserves History

Change the dimension structure, add a new column or modify existing columns.

Adv.
(i) Doesn't create storage problem, old data gets updated. (ii) Allows to preserve limited amount of historical data.
Disadv.
Stores limited historical data, not useful in time series analysis.
2

Rapidly Changing Dimension

Dimension whose one or more attribute values change rapidly over a period of time.

Example:

Age of a person, test score of student, credit history or account status of customer.

3

Role Playing Dimension

A single dimension needed to be used for different context (in analysis), such dimensions are role playing.

4

Degenerate Dimension

Dimension without any descriptive or contextual information. In real life, degenerate dimension doesn't have their own separate existence. It is good to keep such data along with transaction data to avoid managing one extra table.

5

Junk Dimension

In real life, these are rarely used.

1

Dimension table that consists of columns such as column with few unique values, typically status Flags (yes/no), boolean values (true/false), or classifications (male/female).

2

Contains attributes that neither belong to the fact table nor dimension tables.

3

They can have values such as true/false or yes/no which have no correlation with each other, you can convert such values into a descriptive attribute.

4

Example: Sales fact table in a retail store can have fields like transaction id, coupon-indicator, and Customer-card-type (silver, gold, platinum). The above mentioned fields can be combined into a single junk dimension to be used in a warehouse.

๐Ÿ—๏ธ

Types of Schemas

A Schema defines the structure of a data warehouse, organizing fact and dimension tables to optimize analytical queries. Below are the primary types of schemas used in multi-dimensional data modeling.

1

Star Schema

The Star Schema is the simplest data warehouse schema. It features a central "fact" table surrounded by denormalized dimension tables (no joins), allowing all related attributes to one table for faster querying.

Fact Table (Sales)
Dimension
(Time)
Dimension
(Product)
Dimension
(Customer)
Dimension
(Location)
โœ“
Advantages:

Simpler queries, faster aggregation, quicker initial setup.

โœ—
Disadvantages:

Data redundancy due to denormalization.

2

Snowflake Schema

The Snowflake Schema is an extension of the Star Schema where dimension tables are normalized into multiple related tables. This reduces data redundancy but increases the complexity of the conceptual design and requires extra effort for more joins.

Dimension (Time)
Normalized Dim (Year)
Fact Table (Sales)
Dimension (Product)
Normalized Dim (Category)
Normalized Dim (Brand)
Dimension (Location)
Normalized Dim (City)
โœ“
Advantages:

Reduced data redundancy, better data integrity, smaller disk space usage.

โœ—
Disadvantages:

More complex queries (more joins), slower query performance for some operations, harder to maintain.

3

Fact Constellation Schema (Galaxy Schema)

The Fact Constellation Schema involves multiple fact tables that share dimension tables. It is useful when there are complex business processes that require collection of measures and dimensions at different levels or when there is a need to avoid excessive normalization while handling large amounts of data.

Fact Table (Sales)
Fact Table (Inventory)
Dim (Product)
Dim (Time)
Dim (Warehouse)
Dim (Time)
Note: "Time" dimension is shared between two tables and Inventory Fact tables.
โœ“
Advantages:

Flexibility to model multiple processes, can share dimension tables across fact tables.

โœ—
Disadvantages:

Very complex to design and manage, difficult to implement and query.

๐Ÿ“Š

Types of Facts

+
1

Additive Facts

Can be summarized across all dimensions (star, snowflake schema).

Example:
Quantity sold Amount
โœ“ Full Aggregation
ยฑ
2

Semi-Additive Facts

Can be summarized across one or few dimensions but not across all. (Usually cannot be summed across time dimension).

Example:
Current balance Unit cost
โš  Partial Aggregation
โœ•
3

Non-Additive Facts

Cannot be summarized across any dimension.

Example:

Headless table (a data table that is created without keys and has large volume of data but doesn't require to involve business logic).

โœ• No Aggregation

Quick Comparison

Type
Aggregation
Use Case
Additive
All dimensions
Sales, Revenue, Quantity
Semi-Additive
Some dimensions (not time)
Balance, Inventory level
Non-Additive
No aggregation
Ratios, Percentages
โˆ’

Video 6: Unix/Shell Scripting - Basics

This session introduces the fundamentals of Unix and Shell Scripting, covering its features, importance in BI, file system, permissions, and basic commands.

Unix/Shell Scripting
โš™๏ธ

Features of Unix

1

Portability

2

Security

3

Background Processing

4

Piping

5

Interactivity - the System

6

Scalability

7

Redirection

8

Stable & Reliable

9

High Availability Cluster

๐Ÿ’ผ

Unix in BI

Unix adds more functionality to BI by providing powerful tools for data collection, data warehousing, data integration, data manipulation, data loading (in to star schema) and reporting.

1

Unix scripts are for automated data loading into data warehouse.

2

Unix shell scripting is used for data cleaning and data manipulation.

3

Shell scripts play an important role in ETL process.

4

Unix scripts are used to prepare a new file in case of error.

๐Ÿ“

Unix File System

Describes the structure and organization of how data is stored on a disk.

1

Boot Block

Occupies beginning of the FS.

2

Super Block

Maintains state of the FS (its size, where to find inode list etc.).

3

Inode Block

It follows the super block. Internal representation of file.

4

Data Block

Contains data (size of blocks can vary from 512 bytes to 8 KB).

File System Structure Flow
Boot Block
โ†’
Super Block
โ†’
Inode Block
โ†’
Data Block

The root ('/') is the highest directory.

Key Features of Unix File System

1

Consistent placement of file data.

2

Create and delete files.

3

Dynamic growing attributes.

4

Protection of file data.

5

It allocates read/write data in blocks.

6

Keeps track files with inode numbers.

7

Info regarding owner, group etc.

๐Ÿ”

File Access Permissions

Permissions are set for 3 classes:

1

File owner

2

File owner's group

3

Others

Permissions:

Read (r)
Write (w)
Execute (x)

Two modes of setting permissions:

1
Absolute Mode (Numeric)

Uses numbers for 'chmod'

Read (r):
4
Write (w):
2
Execute (x):
1
Example:
chmod 755 file.txt
(rwx for owner, rx for group, rx for others)
Example:
chmod 777 file.txt
(rwx for owner, group, others)
2
Symbolic Mode

Uses characters to add (+), remove (-), or set (=) permissions

u=owner
g=group
o=others
a=all
Symbols: r=read, w=write, x=execute
Example:
chmod u+x file.sh
(add execute permission for owner)
Example:
chmod go-rwx file.txt
(remove rwx for group and others)
๐Ÿ’ก

Note: chmod is used for setting/changing file/directory permissions.

๐Ÿ—‚๏ธ

Path Types

1

Absolute Path

Path from the root ('/').

Example:
/home/user/report.txt
2

Relative Path

Path relative to current directory.

Example:
report.txt
๐Ÿ“
pwd (present working directory)
๐Ÿ’ป

Shell Features and Basic Commands

Command structure:

Command [Options] [Arguments]

Options are used to modify the shell.

The shell:

1

Is a command interpreter. It helps to run programs.

2

Variables are used to store output.

3

Accepts user input.

4

Executes commands.

Unix shell allows these kinds of commands:

1

An internal command.

2

An executable file that contains a series of shell commands.

3

A .out executable file that contains object code and binary data.

Shell features:

1
cd new_dir (change directory)
2
cd /home/user/new_dir (absolute path)
3
cd - (moves to previous directory)
4
cd (doesn't affect present directory)
5
cd .. (moves up one directory)
6
pwd
7
cal
8
mkdir
9
rmdir
10
touch
11
cp
12
mv
13
rm
14
clear
15
ls
16
date

Common Unix Commands

Command Description / Example
who am I Displays the specific user (user id plus 10.123.198.255)
who Lists all users currently logged in.
clear Clears the screen.
date Displays date and time.
Example: date +%D (outputs date in MM/DD/YY)
Example: date +%T (outputs time in HH:MM:SS)
Example: date +%m (outputs month in short form)
Example: date +%y (outputs year in 2 digits)
cal Displays calendar.
Example: cal 2 2014 (Feb, February 2014)
Example: cal -3 (previous, current, next month)
Example: cal -y (shows year calendar)
ls Lists directory contents.
Example: ls -l (long listing format: permissions, owner, group, size, date)
Example: ls -lh (reverse sort)
Example: ls -la (all files including hidden)
Example: ls -lt (recursive list)
cat Concatenates and displays file content.
Assume: file1.txt contains "World", and file2.txt contains "Nation"
Example: cat file1.txt file2.txt
# Output: World Nation

Example: cat file1.txt file2.txt > output.txt
Example: cat output.txt
# Output: "merge to main files"
touch Creates empty files or updates timestamps.
Example: touch newfile
cp Copies files/directories.
Example: cp file1.txt file2.txt (recursive copy of directory)
Example: cp -r new_dir_one_dir_two
mv Moves or renames files/directories.
Example: mv oldname.txt newname.txt
Example: mv file.txt parent/child/newdirectory
rm Removes files/directories.
Example: rm file.txt
Example: rm -r directory (recursive removal)
df Disk free: reports file system disk space usage.
du Disk usage: estimates file space usage.
โˆ’

Video 7: Unix/Shell Scripting - Advanced Commands & Text Processing

This session dives into advanced Unix commands, focusing on redirection, piping, utility commands, and a range of text processing tools.

Unix/Shell Scripting
๐Ÿ”€

Redirection and Piping

1
>

Standard output redirection (overwrite).

2
<

Standard input.

3
2>

Standard error.

4
|

Pipe (output of one command becomes input of another).

Example:
ls -l | wc -l

Counts the number of files/directories in the current directory

๐Ÿ› ๏ธ

Utility Commands

1
ps Process status
1
ps (processes in current shell)
2
ps -e (all processes)
3
ps -f (detailed summary)
4
ps -a (general view)
2
nohup Runs commands in the background, immune to hangups
nohup sort content.txt &
ps -ef | grep nohup.out
cat nohup.out
3
zip Compresses files
zip temp.zip temp.txt
4
unzip Decompresses files
unzip temp.zip temp.txt
# Original file is deleted and unzipped file remains.

tr (translate)

The tr command is used to translate or delete characters.

$ cat sample.txt | tr '[a-z]' '[A-Z]'

# Converts lower case letters to upper case

$ cat sample.txt | tr -d ' '

# Deletes all space characters

cut (extract characters/fields)

The cut command is used to extract characters or fields from each line of a file or stdin.

-c Extracts specified characters
-f Extracts specified fields
-d Specifies the delimiter (default is tab)
$ cut -c 1-5 sample.txt

# Extracts characters 1 to 5 from each line

$ cut -d ':' -f 1,3 /etc/passwd

# Extracts 1st and 3rd fields delimited by ':'

head and tail

The head command displays the first part of a file, and tail displays the last part.

$ head -5 sample.txt

# Displays the first 5 lines

$ tail -10 sample.txt

# Displays the last 10 lines

$ tail -f logfile.log

# Continuously monitors the file for new content (-f flag)

cmp (compare two files byte by byte)

The cmp command compares two files byte by byte and reports the first difference.

$ cmp file1.txt file2.txt

# Compares two files and shows the byte and line where they differ

comm (compare two sorted files line by line)

The comm command compares two sorted files line by line and produces three columns of output:

  • Column 1: Lines unique to file1
  • Column 2: Lines unique to file2
  • Column 3: Lines common to both files
$ comm file1.txt file2.txt

# Compares two sorted files

$ comm -12 file1.txt file2.txt

# Shows only lines common to both files

wc (word count)

The wc command counts lines, words, and characters in a file.

-l Counts the number of lines
-w Counts the number of words
-c Counts the number of characters
$ wc sample.txt

# Shows lines, words, and characters

$ wc -l sample.txt

# Shows only the line count

sort

The sort command sorts the lines of a text file.

$ sort sample.txt

# Sorts the file in ascending order

$ sort -r sample.txt

# Sorts in reverse (descending) order

$ sort -n numbers.txt

# Sorts numerically (useful for numeric data)

$ sort -u sample.txt

# Sorts and removes duplicate lines (unique)

grep (Global Regular Expression Print)

The grep command searches for patterns in files and prints matching lines.

-n Shows line numbers along with matching lines
-i Ignores case (case-insensitive search)
-v Inverts the match (shows non-matching lines)
$ grep "error" logfile.log

# Searches for the word "error" in the log file

$ grep -n "warning" logfile.log

# Shows line numbers with matching lines

$ grep -i "ERROR" logfile.log

# Case-insensitive search

$ grep -v "success" logfile.log

# Shows all lines that don't contain "success"

โˆ’

Video 8: Unix/Shell Scripting - AWK, SED & Scripting Constructs

This session concludes Unix/Shell Scripting with in-depth coverage of AWK and SED, along with essential shell scripting constructs like variables, arguments, loops, and conditional statements.

Unix/Shell Scripting
๐Ÿ“

awk

Pattern scanning and processing language.

Basic awk script structure
awk 'BEGIN {print "Initializing"} {print "Processing line:", $0} END {print "Finished"}' param.txt
# Example: if param.txt has two lines "line1" and "line2"
# Output:
# Initializing
# Processing line: line1
# Processing line: line2
# Finished
AWK to count total lines
awk 'BEGIN {total=0} {total++} END {print "Total lines:", total}' param.txt
# Example: if param.txt has 3 lines, output: Total lines: 3
AWK to print number of fields and the last field for each line
# Assume 'emp.dat' contains:
# John Doe 10000
# Jane Smith 15000
awk '{print NF, $NF}' emp.dat
# Output:
# 3 10000
# 3 15000
AWK to sum a specific column (e.g., salary in 'emp.dat')
awk '{sum += $3} END {print "Total Salary:", sum}' emp.dat
# Output: Total Salary: 25000

Features of awk:

1
Stream oriented

Processes data line by line as a continuous stream.

2
Regular expressions / conditional for pattern match

Supports powerful pattern matching using regex and conditions.

3
User specified actions / script

Allows custom actions and scripts for flexible data processing.

4
Reads / processes

Efficiently reads and processes text files and streams.

5
Complex string position

Handles complex string manipulations and positional operations.

โœ๏ธ

sed

Non-interactive command. Supports single regular expression. Used for filtering.

Common sed commands:

1 's' Substitute
2 'a' Append
3 'i' Insert
4 'c' Change
5 'd' Delete
6 'p' Print
Assume 'text.txt' contains:
line 1: apple banana
line 2: cherry apple
line 3: banana grape
Replace first occurrence of 'apple' on each line
sed 's/apple/orange/' text.txt
Output:
# line 1: orange banana
# line 2: cherry orange
# line 3: banana grape
Replace all occurrences of 'apple' on each line
sed 's/apple/orange/g' text.txt
Output:
# line 1: orange banana
# line 2: cherry orange
# line 3: banana grape
Append "New data after line 1" after line 1
sed '1a\
New data after line 1' text.txt
Output:
# line 1: apple banana
# New data after line 1
# line 2: cherry apple
# line 3: banana grape
Insert "New data before line 1" before line 1
sed '1i\
New data before line 1' text.txt
Output:
# New data before line 1
# line 1: apple banana
# line 2: cherry apple
# line 3: banana grape
Delete line 1
sed '1d' text.txt
Output:
# line 2: cherry apple
# line 3: banana grape
Delete lines 1 to 2
sed '1,2d' text.txt
Output:
# line 3: banana grape
Print line 1 (suppress default output with -n)
sed -n '1p' text.txt
Output:
# line 1: apple banana
๐Ÿ’ป

Shell Scripting Constructs

Essential building blocks for shell script automation and control flow.

Variables:

#!/bin/bash
num1=10
num2=20
echo "sum of num1 and num2 is: $((num1 + num2))"
echo "difference: $((num1 - num2))"
echo "product: $((num1 * num2))"
echo "division: $((num1 / num2))"
echo "modulo: $((num1 % num2))"
๐Ÿ’ก

Note: Escape sequence: '\n' for newline, '\t' for tab.

Command Line Arguments:

#!/bin/bash
echo "Name of the file: $0"
echo "Total number of arguments passed: $#"
echo "The arguments passed: $*"
echo "The 1st argument: $1"
echo "The 2nd argument: $2"
echo "The 3rd argument: $3"
echo "sum of args passed: $(( $1 + $2 + $3 ))"
# Example output when run as: ./script.sh 10 20 30
# Name of the file: ./script.sh
# Total number of arguments passed: 3
# The arguments passed: 10 20 30
# The 1st argument: 10
# The 2nd argument: 20
# The 3rd argument: 30
# Sum of args passed: 60

Loops:

1
For Loop:
#!/bin/bash
for i in 10 20 30; do
    echo "Output: $i"
done
# Output:
# 10
# 20
# 30
2
While Loop:
#!/bin/bash
a=10
while [ $a -gt 0 ]; do
    echo "Value of a: $a"
    a=$(( a - 1))
done
# Output: Countdown from 10 to 1
3
Until Loop:
#!/bin/bash
a=0
until [ $a -eq 10 ]; do
    echo "Value of a: $a"
    a=$(( a + 1))
done
# Output: Count up from 0 to 9

Conditional Statements ('if', 'case'):

1. 'if' statements - Operators:
1. Numeric Comparison:
'-eq' Equal to
'-ne' Not equal to
'-lt' Less than
'-le' Less than or equal to
'-gt' Greater than
'-ge' Greater than or equal to
2. String Comparison:
'=' Equal to
'!=' Not equal to
'-z' String is null (zero length)
'-n' String is not null (non-zero length)
3. File Test Operators:
'-f' File exists and is a regular file
'-d' File exists and is a directory
'-s' File exists and has size > 0 (not empty)
'-r' File exists and is readable
'-w' File exists and is writable
'-x' File exists and is executable
4. Logical Operators:
'&&' AND (both commands execute if successful)
'||' OR (second command executes if first is unsuccessful)
#!/bin/bash
read -p "Username: " user_var
read -sp "Password: " pass_var -sp for silent input

if [ "$user_var" = "YourUsername" ] && [ "$pass_var" = "YourPassword" ]; then
    echo "Welcome $user_var"
else
    echo "Invalid Username or Password"
fi
#!/bin/bash
read -p "File or Directory Name: " fname

if [ -f "$fname" ]; then
    echo "$fname is a regular file."
elif [ -d "$fname" ]; then
    echo "$fname is a directory."
else
    echo "$fname does not exist or is not a regular file/directory."
fi
2. 'case' statement:
#!/bin/bash
echo "Please enter a fruit:"
read FRUIT_INPUT
case $FRUIT_INPUT in
    "apple") echo "You like apples.";;
    "banana") echo "You like bananas.";;
    *) echo "Unknown fruit.";;
esac

Control Commands:

break

Exit loop, 'break n' (exit nth loop).

continue

Jump to start of loop, 'continue n' (continue nth loop).

exit

Terminate program.

Functions in Shell Scripting:

1
Simple Function
# a simple function
my_function() {
    echo "Hello from my function!"
    # Function can be called like a command & it appends to all arguments passed to the function
}
2
Function with Return Value
# Function with return value
add_numbers() {
    local num1=$(( $1 + 2 )) # local makes var a local variable
    # Return from a function
    return $num1
}

# Calling Functions
my_function -arg1 -arg2
add_numbers 5  # Call with parameter
# So when you call the main add_numbers 5 # Returns 7

String Functions (illustrative/common) in various contexts:

1
${#string} or $(echo $string | wc -c)

Substitute a match of regex in string.

2
$(echo "$string" | sed 's/find/replace/')

Global substitute (all matches).

3
${string:start_pos,length}

Returns index of first character of substring.

4
${#string}

Returns length of string.

5
${string:start_pos,length}

Extract substring.

6
$(echo "$string" | tr 'a-z' 'A-Z')

Removing a substring.

7
$(echo "$command" | awk '{print toupper($0)}' or ${string^^})

Removing a string.

8
${#string} or $(expr length "$string")

Convert between number bases.

9
$(echo "$string" | tr 'A-Z' 'a-z')

Returns phonetic representation for searching similar sounding words.

โˆ’

Video 9: Advanced SQL Concepts - Part 1

This session begins our deep dive into Advanced SQL Concepts, covering SSMS, database types, sequences, data types, and the powerful ROLLUP and CUBE operators.

Advanced SQL Concepts
๐Ÿ—„๏ธ

SQL Server Management Studio (SSMS)

Most user-friendly interface to perform common database tasks. A database in SQL Server is made up of a collection of tables containing rows and columns and stores structured data.

1

System Databases:

Built-in databases.

1
'master'

Contains system-level info, logins, config. SQL Server cannot start without it.

2
'model'

Template for creating new user-defined databases.

3
'resource'

Read-only database, contains all system objects. Logical appearance in 'sys' schema.

4
'msdb'

Used for scheduling jobs, auto-backup of database, sending alerts.

5
'tempdb'

Temporary workspace for user and internal objects. Content recreated on restart.

2

User-defined Databases:

Created by users to store data.

A sequence is a user-defined schema-bound object that generates a sequence of numeric values according to the specification with which it was created. It can be used to generate primary key values automatically, similar to identity columns, but offers more flexibility as a single sequence can be used by multiple tables, and values can be retrieved before insertion into a table.

-- Creating a sequence named Emp_Seq that starts at 101 and increments by 1
CREATE SEQUENCE Emp_Seq
    START WITH 101
    INCREMENT BY 1;

-- Using the sequence to insert a value into an Employees table
-- This assumes EmployeeID is not an IDENTITY column, but takes values from a sequence
INSERT INTO Employees (EmployeeID, Name)
VALUES (NEXT VALUE FOR Emp_Seq, 'John Doe');
๐Ÿ’ก

Note: There should be only one Identity column on a table, and its value will be generated automatically.

1
'numeric(p, s)'

Exact numeric data. 'p' (precision) total digits, 's' (scale) digits after decimal. Example: numeric(5,2) for 123.45.

2
'decimal(p, s)'

Similar to 'numeric'.

3
'money'

Monetary data type.

4
'date'

Stores date only (YYYY-MM-DD).

5
'time'

Stores time only (HH:MM:SS).

6
'datetime'

Stores both date and time.

Extensions to 'GROUP BY' clause for subtotals and grand totals.

1
'ROLLUP'

Generates subtotals for hierarchical grouping and a grand total. Order of columns matters.

Consider the following 'Sales' data:

Sales Table Sample Data:
-- Sales Table Sample Data:
-- Country | State | Amount
-- --------|-------|--------
-- INDIA   | AP    | 10
-- INDIA   | KA    | 20
-- USA     | NY    | 30
-- USA     | CA    | 40
ROLLUP Example:
SQL Query:
SELECT Country, State, SUM(Amount) AS TotalSales
FROM Sales
GROUP BY ROLLUP(Country, State);
Example Output for ROLLUP(Country, State):
-- Country | State | TotalSales
-- --------|-------|------------
-- INDIA   | AP    | 10
-- INDIA   | KA    | 20
-- INDIA   | NULL  | 30  -- Subtotal for INDIA (sum of AP and KA)
-- USA     | NY    | 30
-- USA     | CA    | 40
-- USA     | NULL  | 70  -- Subtotal for USA (sum of NY and CA)
-- NULL    | NULL  | 100 -- Grand Total (sum of all amounts)
2
'CUBE'

Produces subtotals for all possible combinations of grouping columns and a grand total. Order of columns does not typically matter.

Using the same 'Sales' data as above:

CUBE Example:
SQL Query:
SELECT Country, State, SUM(Amount) AS TotalSales
FROM Sales
GROUP BY CUBE(Country, State);
Example Output for CUBE(Country, State):
-- Country | State | TotalSales
-- --------|-------|------------
-- INDIA   | AP    | 10
-- INDIA   | KA    | 20
-- INDIA   | NULL  | 30  -- Subtotal for INDIA (Country only)
-- USA     | CA    | 40
-- USA     | NY    | 30
-- USA     | NULL  | 70  -- Subtotal for USA (Country only)
-- NULL    | AP    | 10  -- Subtotal for State AP (across all countries)
-- NULL    | KA    | 20  -- Subtotal for State KA (across all countries)
-- NULL    | NY    | 30  -- Subtotal for State NY (across all countries)
-- NULL    | CA    | 40  -- Subtotal for State CA (across all countries)
-- NULL    | NULL  | 100 -- Grand Total
โฌ†๏ธ

'TOP' Clause

Returns a specified number or percentage of rows. WITH TIES includes rows that tie for the last position.

Syntax:

SELECT TOP (n | n PERCENT) [WITH TIES] Column(s) FROM Table [ORDER BY Column]

Examples:
Example 1: SELECT TOP 10 (Returns first 10 rows)
SELECT TOP 10 * FROM products 
ORDER BY Price DESC;
Example 2: SELECT TOP PERCENT (Returns top 5%)
SELECT TOP 5 PERCENT * FROM employees 
ORDER BY Salary DESC;
Example 3: WITH TIES (Includes rows that tie for last position)
SELECT TOP 3 WITH TIES EmployeeName, Salary 
FROM Employees 
ORDER BY Salary DESC;
๐Ÿ“

'INTO' Clause

Creates a new table in the default filegroup and then inserts the query results into it. It checks constraints on the new table. 'NOT NULL' constraints are imported.

Key Characteristics:
1
Creates New Table

Automatically creates a table based on the column structure of the SELECT query.

2
Inserts Data

Populates the newly created table with data from the query results.

3
Checks Constraints

Enforces constraints on the new table including NOT NULL definitions.

4
Default Filegroup

Creates the table in the default filegroup unless otherwise specified.

Examples:
Example 1: Create backup table from existing table
SELECT * INTO EmployeeBackup 
FROM Employees;
Example 2: Create table with filtered data
SELECT EmployeeName, Salary INTO SalaryReport 
FROM Employees 
WHERE Salary > 50000;
Example 3: Create table from joined tables
SELECT E.EmployeeName, E.Salary, D.DepartmentName 
INTO EmployeeDepartmentReport 
FROM Employees E 
JOIN Departments D ON E.DeptID = D.DeptID;
โš ๏ธ
Important:

SELECT INTO cannot be used if the target table already exists. To insert data into an existing table, use INSERT INTO SELECT instead.

โˆ’

Video 10: Advanced SQL Concepts - Part 2

This session continues with Advanced SQL, covering NULL handling, temporary tables, identity columns, the differences between TRUNCATE and DELETE, and various ranking functions.

Advanced SQL Concepts
โš™๏ธ

Session Settings Controlling NULL Behavior

Critical SQL Server settings that control how NULL values are handled in comparisons and string concatenations.

1

ANSI_NULLS

When ON (recommended):
  • Any comparison with NULL (e.g., 'col' = NULL) results in UNKNOWN
  • NULL = NULL is UNKNOWN. Use IS NULL or IS NOT NULL to check for NULLs.
When OFF (discouraged):
  • NULL = NULL evaluates to TRUE
2

CONCAT_NULL_YIELDS_NULL

When ON (recommended):
  • Concatenating a string with NULL results in NULL
When OFF (discouraged):
  • Concatenating a string with NULL results in the non-NULL string (NULL is treated as empty string)
๐Ÿ“ฆ

Temporary Tables

Tables created for temporary storage of intermediate results.

Types of Temporary Tables:

1
Local Temporary Tables ( # prefix)

Visible to the current session. Dropped when session ends.

CREATE TABLE #localTemp (ID INT, Name VARCHAR(50));
2
Global Temporary Tables ( ## prefix)

Visible to all sessions. Dropped when all sessions referencing them disconnect.

CREATE TABLE ##globalTemp (OrderID INT, Amount DECIMAL(10,2));
๐Ÿ†”

'IDENTITY' Column

Automatically generates numeric values for new rows (usually for primary keys). IDENTITY(seed, increment)

The IDENTITY property automatically generates unique numeric values for rows as they are inserted into a table.

Example - IDENTITY Column:
CREATE TABLE with IDENTITY:
CREATE TABLE employees (
  empId INT IDENTITY(1,1) PRIMARY KEY,  -- Starts at 1, increments by 1
  empname VARCHAR(25) NOT NULL
);

INSERT INTO employees (empname) VALUES ('Mark');  -- ID auto-generated
๐Ÿ“Œ
Note:

Identity_insert allows explicit insertion into an identity column when set ON.

๐Ÿ—‘๏ธ

'TRUNCATE' vs. 'DELETE'

Both remove rows, but differ in operation, logging, and performance.

Feature TRUNCATE TABLE DELETE FROM
Operation DDL (Data Definition Language). Deallocates data pages. DML (Data Manipulation Language). Removes row by row.
Transaction Log Minimal logging (page deallocation). Faster for large tables. Logs each deleted row. Slower for large tables.
Rollback Can be rolled back if in explicit transaction, but complex due to page deallocation. Fully transactional, can always be rolled back.
Identity Column Resets to seed value. Does NOT reset. Continues from last value.
WHERE Clause Cannot use. Removes all rows. Can use to remove specific rows.
Triggers Does not fire DELETE triggers. Fires DELETE triggers.
Table Locks Acquires exclusive table lock. Acquires row/page/table locks.
Performance Faster on large tables. Slower on large tables.
๐Ÿ†

Ranking Functions

Assigns a rank to each row within an ordered partition of a result set.

Window functions that assign ranks based on row ordering within partitions:

1
ROW_NUMBER()

Assigns a unique, sequential integer. Ties get different numbers arbitrarily.

SELECT ROW_NUMBER() OVER (PARTITION BY Dept ORDER BY Salary DESC) AS RankNum FROM Employees;
2
RANK()

Assigns same rank to ties, then skips subsequent rank numbers.

SELECT RANK() OVER (PARTITION BY Dept ORDER BY Salary DESC) AS RankNum FROM Employees;
3
DENSE_RANK()

Assigns same rank to ties, does NOT skip subsequent rank numbers.

SELECT DENSE_RANK() OVER (PARTITION BY Dept ORDER BY Salary DESC) AS RankNum FROM Employees;
4
NTILE(n)

Divides rows into 'n' groups. Useful for quartiles, deciles, percentiles.

SELECT NTILE(4) OVER (ORDER BY Salary DESC) AS Quartile FROM Employees;
โˆ’

Video 11: Advanced SQL Concepts - Part 3

This final SQL session covers conversion, date/time functions, error handling with TRY...CATCH, and the powerful concept of stored procedures, concluding with a review of practical SQL snippets.

Advanced SQL Concepts
๐Ÿ”„

Conversion Functions ('CAST', 'CONVERT')

Convert data from one type to another. CAST is ANSI standard, CONVERT is SQL Server specific with formatting options.

1
CAST(expression AS data_type)

ANSI standard. Converts a value to a specified data type.

SELECT CAST(100 AS DECIMAL(5,2));  -- Converts Integer 100 to decimal 100.00
SELECT CAST('2023-12-31' AS DATE);  -- Converts string to date type
2
CONVERT(data_type, expression, style)

SQL Server specific, with optional style for formatting.

SELECT CONVERT(VARCHAR(10), GETDATE(), 'MM/DD/YYYY');  -- Converts current date to 'MM/DD/YYYY' string
SELECT CONVERT(INT, 123.45);  -- Converts decimal to integer
๐Ÿ“…

Date and Time Functions

Functions for retrieving, manipulating, and calculating date and time values.

1
GETDATE()

Returns current system date and time.

2
DATEPART(interval, date)

Returns integer part of date. Interval can be YEAR, QUARTER, MONTH, WEEK, DAY, HOUR, MINUTE, SECOND, etc.

SELECT DATEPART(YEAR, GETDATE());      -- Year
SELECT DATEPART(QUARTER, GETDATE());  -- Quarter
SELECT DATEPART(MONTH, GETDATE());    -- Month (1-12)
SELECT DATEPART(MONTH, GETDATE());    -- Month
SELECT DATEPART(DAY, GETDATE());      -- Day of month
SELECT DATEPART(DAYOFYEAR, GETDATE()); -- Day of year
SELECT DATEPART(WEEK, GETDATE());     -- Week (1-53)
SELECT DATEPART(HOUR, GETDATE());     -- Hour
SELECT DATEPART(MINUTE, GETDATE());   -- Minute
SELECT DATEPART(WEEKDAY, GETDATE());  -- Weekday (1-7)
SELECT DATEPART(MINUTE, GETDATE());   -- Minute
3
DATEADD(interval, number, date)

Adds/subtracts interval. Useful for calculating future or past dates.

SELECT DATEADD(MONTH, 3, GETDATE());  -- Adds 3 months to current date
SELECT DATEADD(DAY, -7, GETDATE());    -- Subtracts 7 days from current date
4
DATEDIFF(interval, startDate, endDate)

Calculates difference between two dates in specified interval.

SELECT DATEDIFF(DAY, '2023-01-01', GETDATE());  -- Days between Jan 1, 2023 and today
๐Ÿ›ก๏ธ

Error Handling ('TRY...CATCH')

Robust error handling in T-SQL. Catch exceptions and handle them gracefully.

TRY...CATCH blocks allow you to handle errors gracefully without stopping execution:

TRY...CATCH Structure & Example:
BEGIN TRY
  -- TRY block
  SELECT 1 / 0;  -- Example: division by zero
  
  SELECT 'Success!' AS Message;
END TRY

BEGIN CATCH
  -- Error handling logic
  SELECT
    ERROR_NUMBER() AS ErrorNumber,
    ERROR_SEVERITY() AS ErrorSeverity,
    ERROR_STATE() AS ErrorState,
    ERROR_PROCEDURE() AS ErrorProcedure,
    ERROR_MESSAGE() AS ErrorMessage;
END CATCH;
Error Information Functions (Available in CATCH block):
1
ERROR_NUMBER()

Returns the error number.

2
ERROR_SEVERITY()

Returns the error severity level.

3
ERROR_STATE()

Returns the error state.

4
ERROR_PROCEDURE()

Returns name of the procedure where error occurred.

5
ERROR_MESSAGE()

Returns the error message text.

6
ERROR_LINE()

Returns the line number where error occurred.

๐Ÿ“ฆ

Stored Procedures

Precompiled collections of SQL statements stored as named objects. Enable reusability and improved performance.

Advantages:
1

Improved performance.

2

Reduced network congestion.

3

Code reusability.

4

Maintainability and Reliability.

Types:
1
System Defined Stored Procedures

Built-in procedures for administrative tasks (e.g., 'sp_who', 'sp_helpdb').

2
User Defined Stored Procedures

Created by users for specific business logic.

Syntax & Examples:
Example 1: Create Simple Stored Procedure
CREATE PROCEDURE WelcomeProcedure
AS
  BEGIN
    PRINT 'Welcome to stored procedure execution!';
  END;
Example 2: Procedure with Parameters
CREATE PROCEDURE GetCustomerDetails @CustomerId INT , @CustomerName VARCHAR(50) OUTPUT
AS
  BEGIN
    SELECT @CustomerName = CustomerName 
    FROM Customers 
    WHERE CustomerID = @CustomerId;
  END;
  
  -- Execute the procedure
  DECLARE @Name VARCHAR(50);
  EXEC GetCustomerDetails @CustomerId = 1, @CustomerName = @Name OUTPUT;
  PRINT @Name;
Example 3: Procedure that Executes Query
CREATE PROCEDURE GetAllCustomers
AS
  BEGIN
    SELECT * FROM Customers;
  END;
  
  -- Execute the procedure
  EXEC GetAllCustomers;
Important Operations:
1

WITH ENCRYPTION: Encrypts procedure definition.

2

DROP PROCEDURE: Deletes a procedure.

๐Ÿ’ก

Example SQL Snippets

Comprehensive collection of practical SQL queries covering key concepts from all sessions.

1
SELECT TOP 10 products by price
SELECT TOP 10 ProductName, Price FROM products ORDER BY Price DESC;
2
SELECT TOP 3 employees by salary, including ties
SELECT TOP 3 WITH TIES EmployeeName, Salary FROM Employees ORDER BY Salary DESC;
3
Get the current year using DATEPART
SELECT DATEPART(YEAR, GETDATE());
4
Add 3 months to the current date
SELECT DATEADD(MONTH, 3, GETDATE());
5
Calculate days between two dates
SELECT DATEDIFF(DAY, '2023-01-01', GETDATE());
6
Create a table with an IDENTITY column for auto-incrementing ID
CREATE TABLE Emp (
  EmpID INT IDENTITY(1,1) PRIMARY KEY,
  EmpName VARCHAR(50)
);
7
Insert a row into a table with an IDENTITY column
INSERT INTO Emp (EmpName) VALUES ('John');  -- EmpID will be auto-generated
8
TRUNCATE a table (removes all rows, resets identity, faster than DELETE for all rows)
TRUNCATE TABLE Emp;
9
DELETE specific rows from a table
DELETE FROM Emp WHERE EmpID = 1;
10
TRY...CATCH block to handle potential errors
BEGIN TRY
  SELECT 1 / 0;  -- This will cause a division by zero error
END TRY
BEGIN CATCH
  -- Selects various error details when an error occurs
  SELECT ERROR_NUMBER(), ERROR_SEVERITY(), ERROR_STATE(), ERROR_PROCEDURE(), ERROR_LINE(), ERROR_MESSAGE();
END CATCH;
11
Stored Procedure with IF conditions for input validation
CREATE PROCEDURE CheckUserStatus
  @gender CHAR(1),
  @maritalStatus VARCHAR(10)
AS
BEGIN
  IF NOT (@gender IN ('M', 'F'))
  BEGIN
    RETURN -1;  -- Invalid gender
  END
  IF NOT (@maritalStatus IN ('Married', 'Single'))
  BEGIN
    RETURN -2;  -- Invalid marital status
  END
  -- Further logic here if inputs are valid
END;
12
Using a sequence for insertion into a table
-- Assumes Emp_Seq sequence is created and Employee_Bangalore table exists.
INSERT INTO Employee_Bangalore VALUES (NEXT VALUE FOR Emp_Seq, 'John');
13
Using ROLLUP for hierarchical aggregation
-- Sample Sales table as shown in ROLLUP section above.
SELECT Country, [State], SUM(Amount) AS TotalSales FROM Sales GROUP BY ROLLUP(Country, [State]);
14
Using CUBE for all possible aggregation combinations
-- Sample Sales table as shown in CUBE section above.
SELECT Country, [State], SUM(Amount) AS TotalSales FROM Sales GROUP BY CUBE(Country, [State]);
15
Using COALESCE (similar to NVL in Oracle)
-- Returns the first non-NULL expression among its arguments
SELECT COALESCE(NULL, 'Default Value');  -- Output: 'Default Value'
SELECT COALESCE(10, NULL, 20);           -- Output: 10
16
ROUND and FLOOR functions
SELECT ROUND(49.99, 1);  -- Rounds 49.99 to one decimal place, Output: 50.0
SELECT FLOOR(49.99);      -- Returns the largest integer less than or equal to 49.99, Output: 49
17
Illustrative String/Character functions (some equivalent to standard T-SQL)
-- SUBSTR is typically SUBSTRING in T-SQL
SELECT SUBSTRING('Example String', 1, 7);  -- Standard T-SQL equivalent
-- Returns 'Example' (illustrative, use SUBSTRING in T-SQL)

-- CAP, SHRT, LONG are likely conceptual functions for demonstration, not standard SQL
-- SELECT CAP('example');           -- Convert to uppercase first letter (illustrative)
-- SELECT SHORT('VeryLongString');  -- Shorter string (illustrative)
-- SELECT LONG('Short');            -- Extend string (illustrative)

-- CHR is common in SQL dialects
SELECT CHAR(65);  -- Returns 'A' (SQL Server, Oracle equivalent)

-- REVERSE is common in SQL dialects
SELECT REVERSE('Text');  -- Returns 'txeT' (SQL Server)
โˆ’

Video 12: Probability & Counting Techniques - Part 1

This session introduces fundamental probability concepts, including experiments, sample spaces, events, and basic probability rules, along with the product and sum rules for counting.

Probability & Counting Techniques
๐ŸŽฒ

Basic Probability Concepts

Foundational concepts for understanding probability theory and uncertain outcomes.

1
Experiment

An activity whose outcome is uncertain.

Example: Rolling a die, drawing a card, value of stock.

2
Sample Space (S)

Set of all possible outcomes.

Example: For a die roll, S = {1, 2, 3, 4, 5, 6}.

3
Event (E)

A subset of the sample space.

Example: Getting an even number on a die roll E = {2, 4, 6}.

Probability of an Event P(E):

P(E) = Number of favorable outcomes / Total number of possible outcomes

Example:

In a bag, there are 3 Red balls and 2 Blue balls. Total = 5 balls. The probability of drawing a Red ball is P(Red Ball) = 3/5 = 0.6.

Types of Events:
1
Independent Events

Occurrence of one does not affect the other.

Example: Tossing a coin twice. The outcome of the first toss does not affect the second.

2
Mutually Exclusive Events

Cannot occur at the same time. P(A โˆฉ B) = 0.

Example: When rolling a single die, the event of rolling an even number {2, 4, 6} and the event of rolling an odd number {1, 3, 5} are mutually exclusive.

Basic Probability Rules:
1

0 โ‰ค P(E) โ‰ค 1

2

P(not A) = 1 - P(A) (Complement Rule)

3

If A and B are mutually exclusive: P(A โˆช B) = P(A) + P(B) (Addition Rule for Mutually Exclusive Events)

4

General Addition Rule: P(A โˆช B) = P(A) + P(B) - P(A โˆฉ B)

Note: A โˆฉ B: Event A AND Event B both occurred.

A โˆช B: Event A OR Event B occurred (or both).

๐Ÿ”ข

Counting Techniques

Fundamental rules for counting the number of ways to accomplish tasks or arrange objects.

1
Product Rule

If there are $m$ ways to do one task and $n$ ways to do a second task, then there are $m \times n$ ways to do both.

Example (Movement Ways):

If there are 5 ways to go from A to B, 6 ways from B to C, 5 ways from C to D, and 4 ways from D to E, the total number of ways to travel from A to E is:

N(Total Ways) = 5 ร— 6 ร— 5 ร— 4 = 600

2
Sum Rule

If a task can be done in $m$ ways OR $n$ ways (mutually exclusive), then $m + n$ ways total.

Example (Meal Choices):

You can choose a main course from 3 options (Chicken, Fish, Veg) OR a dessert from 2 options (Cake, Ice Cream). If you only pick one item total, then there are 3 + 2 = 5 total choices.

โˆ’

Video 13: Probability & Counting Techniques - Part 2

This final session on Probability & Counting Techniques covers permutations, combinations, measures of central tendency, dispersion, and shape, concluding with practical quiz questions.

Probability & Counting Techniques

Permutations & Combinations

Understanding arrangements and selections in probability

1
Permutations

Arrangements where order matters.

Formula:

P(n, r) = n! / (n-r)!

Example (Arrangement of 3 letters):

Arrange 3 distinct letters (e.g., A, B, C) from a set of 3. Here n=3, r=2.

P(3, 2) = 3! / (3-2)! = 3! / 1! = 6 รท 1 = 6

Possible arrangements: AB, AC, BA, BC, CA, CB

2
Combinations

Selections where order does not matter.

Formula:

C(n, r) = n! / (r!(n-r)!)

Example (Cricket Team Selection):

Select 9 players from a squad of 20, where 2 specific players are always selected and 3 are never selected.

1. Total players in squad: 20

2. Players always selected: 2

3. Players never selected: 3

4. Remaining pool to choose from: 20 - 2 - 3 = 15 players

5. Players still needed for the team: 9 - 2 = 7 players

C(15, 7) = 15! / (7!(15-7)!) = 15! / (7! ร— 8!) = (15 ร— 14 ร— 13 ร— 12 ร— 11 ร— 10 ร— 9) / (7 ร— 6 ร— 5 ร— 4 ร— 3 ร— 2 ร— 1) = 6435

Answer: There are 6435 ways to select the team.

Central Tendency Measures

Understanding data location and typical values

1
Mean (Arithmetic Mean)

Average value. Sensitive to outliers.

Formulas:

Sample Mean (xฬ„): ฮฃx / n

Population Mean (ฮผ): ฮฃx / N

2
Median

Middle value when data is ordered. Less affected by extreme values.

Example: For dataset {1, 3, 5, 7, 9}, Median is 5

Example: For dataset {2, 3, 4}, Median is (2+3)/2 = 2.5

3
Mode

Most frequent value. A dataset can have multiple modes.

Example: For {1, 2, 2, 3, 4}, Mode is 2

Example: For {1, 2, 2, 3}, Modes are 1 and 2 (bimodal)

Measures of Dispersion

Understanding data spread and variation

1
Range

Difference between maximum and minimum values.

Formula:

Range = Max Value - Min Value

Example: For dataset {10, 20, 5, 25, 15}, Range = 25 - 5 = 20

2
Variance

Average of squared differences from the mean.

Formulas:

Population Variance (ฯƒยฒ): ฮฃ(x - ฮผ)ยฒ / N

Sample Variance (sยฒ): ฮฃ(x - xฬ„)ยฒ / (n-1)

3
Standard Deviation

Square root of variance. Same units as data.

Formulas:

Population SD (ฯƒ): โˆš(ฮฃ(x - ฮผ)ยฒ / N)

Sample SD (s): โˆš(ฮฃ(x - xฬ„)ยฒ / (n-1))

4
Interquartile Range (IQR)

Spread of middle 50% data. Robust to outliers.

Formula:

IQR = Qโ‚ƒ - Qโ‚

Example: For ordered data {1, 2, 3, 5, 6, 7, 8}, Qโ‚ = (2+3)/2 = 2.5, Qโ‚ƒ = (6+7)/2 = 6.5, IQR = 6.5 - 2.5 = 4

Measures of Shape

Understanding distribution symmetry and peakedness

1
Skewness

Measures asymmetry of distribution.

Positive Skew (Right-skewed):

Tail on right is longer. Mean > Median > Mode

Example: Income distribution in a country (few high earners)

Negative Skew (Left-skewed):

Tail on left is longer. Mean < Median < Mode

Example: Scores on a very easy test (most score high)

2
Kurtosis

Measures "tailedness" and peakedness of distribution.

Leptokurtic (High Kurtosis):

Heavy tails, sharp peak (more outliers)

Mesokurtic (Normal Kurtosis):

Similar to normal distribution

Platykurtic (Low Kurtosis):

Light tails, flat peak (fewer outliers)

Probability & Statistics Quiz

Test your understanding with practical problems

Question 1
Student Enrollment Probability

Total students: 400

Students enrolled in Data Science: 320

Find the probability that a randomly selected student is enrolled in Data Science.

Solution:

P(Data Science Student) = 320 / 400 = 0.8

Question 2
Conditional Probability - Class Distribution

Total students: 1000

Total girls: 410

Girls from Class XII: 10% of 410 = 41

Find the probability that a randomly selected girl is from Class XII.

Solution:

P(Class XII | Girl) = (Number of Class XII Girls) / (Total Number of Girls) = 41 / 410 = 0.1

Question 3
Permutations - Password Creation

How many 4-letter passwords can be created from the letters A, B, C, D, E (with no repetition)?

Solution:

This is a permutation problem since order matters in passwords.

P(5, 4) = 5! / (5-4)! = 5! / 1! = 120 / 1 = 120 passwords

Question 4
Combinations - Committee Selection

How many ways can a committee of 3 members be selected from a group of 8 people?

Solution:

This is a combination problem since order doesn't matter for committee membership.

C(8, 3) = 8! / (3!(8-3)!) = 8! / (3! ร— 5!) = (8 ร— 7 ร— 6) / (3 ร— 2 ร— 1) = 336 / 6 = 56 ways