Training Mode
Training Mode — Explore and practice freely. Any items you create exist only in this browser session. Login as Admin to make permanent changes.

Data Governance Concepts

A comprehensive guide for data analysts to understand data governance principles, data quality, profiling, lineage, and how they apply in banking. Every concept is illustrated with real examples from the sample data loaded in this platform.

1
What is Data Governance?

Data Governance is the framework of policies, processes, roles, and standards that ensures an organization's data is managed as a strategic asset. It defines who can take what action, upon what data, in what situations, and using what methods.

People
Data Stewards, Data Owners, Data Custodians, and Governance Councils who are accountable for data decisions.
Process
Workflows for term creation, approval, classification review, data quality monitoring, and issue remediation.
Technology
This platform! Tools for glossaries, metadata catalogs, lineage tracking, quality dashboards, and AI-assisted governance.
Analogy: Think of data governance like city planning. Just as cities need zoning laws (policies), building codes (standards), inspectors (stewards), and permits (approval workflows) to function — organizations need data governance to ensure data is accurate, secure, and used responsibly.
2
Why Data Governance Matters in Banking

Banks handle some of the most sensitive data on earth — customer identities, account balances, transactions, credit scores, and compliance records. Poor data governance in banking leads to regulatory fines, fraud losses, and broken customer trust.

Without Governance
  • Different teams call the same thing different names
  • Nobody knows which system is the "source of truth"
  • PII data ends up in test environments unmasked
  • Regulators ask for data lineage and you can't produce it
  • Reports don't match because definitions differ
With Governance
  • One agreed definition for "Customer ID" across all systems
  • Clear lineage from Core Banking → EDW → Reports
  • PCI data (Credit Card Number) is tokenized everywhere
  • Audit trail for every data change and approval
  • Consistent, trusted numbers for regulatory reporting
BCBS 239 GDPR PCI-DSS IFRS 9 AML/KYC SOX Key regulations driving data governance in banking
3
Business Glossary

A Business Glossary is a curated dictionary of business terms used across an organization. Each term has a single, approved definition so that everyone — from analysts to executives — speaks the same language.

What a Business Term Contains:
FieldExample (from this platform)Why It Matters
Term NameAccount BalanceUnique, standardized name
Definition"The current available balance in a customer deposit account..."One agreed meaning
DomainAccountWhich business area owns it
ClassificationCONFIDENTIALHow sensitive it is
StatusAPPROVEDWent through review workflow
Is CDE? YesCritical for regulatory reporting
OwnerData Steward from Account domainWho to contact for questions
Try it: Go to Browse Terms to explore all 45 banking terms in this platform. Click any term to see its full definition, classification, metadata links, and lineage.
4
Data Domains & Ownership

A Data Domain is a logical grouping of data by business area. Each domain has a Data Owner (accountable executive) and Data Stewards (day-to-day managers). This ensures every piece of data has clear accountability.

Banking Domains in This Platform:
Customer
KYC, Demographics
Account
Deposits, Loans, Cards
Transaction
Payments, Transfers
Risk
Credit, Market, Operational
Compliance
AML, Sanctions, Fraud
Finance
GL, Regulatory Reporting
Product
Banking Products
Reference Data
Codes, Lookups
Try it: Go to Domains to see all 10 domains, their subject areas, and which terms belong to each.
5
Data Classification & Sensitivity

Data Classification categorizes data by its sensitivity level, which determines how it must be stored, transmitted, accessed, and disposed of. Getting classification wrong can lead to data breaches and regulatory fines.

Classification Levels (as used in this platform):
LevelClassificationHandlingExample Terms
1 PUBLIC No restrictions. Can share externally. Transaction Currency, Branch Code
2 INTERNAL Internal systems only. Don't share outside. KYC Status, Account Type, Transaction Type
3 CONFIDENTIAL Encrypted storage, access logging, approval for sharing. Account Balance, Credit Score, Customer ID
4 HIGHLY CONFIDENTIAL MFA required. Full audit trail. Encrypt at rest & transit. National ID Number, SAR Reports
5 PII GDPR/CCPA. Data masking in non-prod. Right to erasure. Customer Full Name, Date of Birth, Email, Phone
5 PCI PCI-DSS Level 1. Tokenization. Quarterly scans. Credit Card Number, Card CVV
Real-world impact: If a "Credit Card Number" is classified as INTERNAL instead of PCI, it might end up in a test database without tokenization — a PCI-DSS violation that can cost millions in fines. This is why classification must be reviewed and approved through a formal workflow.
6
Critical Data Elements (CDEs)

A Critical Data Element is a data field that is essential for regulatory reporting, risk management, or key business decisions. CDEs receive heightened governance — stricter quality checks, mandatory lineage documentation, and formal approval workflows.

Why CDEs Matter (BCBS 239):

The Basel Committee's BCBS 239 standard requires banks to identify their critical data elements and ensure they are accurate, complete, timely, and traceable. Regulators can ask: "Show me every system that touches your Capital Adequacy Ratio calculation and prove the data is correct at every step."

Examples from this platform (31 CDEs out of 45 terms):
Customer ID
Account Balance
Transaction Amount
Credit Score
Probability of Default
Capital Adequacy Ratio
Credit Card Number
Try it: Go to CDE Registry to see all CDEs with their physical database table/column mappings, verification status, and confidence scores.
8
Data Lineage

Data Lineage tracks the journey of data from its origin to its final destination — every system it passes through, every transformation applied, and every report it feeds. It answers: "Where did this number come from, and can I trust it?"

Transformation Types:
DIRECT 1:1 copy
TRANSFORM Format change
LOOKUP Reference table join
AGGREGATION SUM, COUNT, AVG
DERIVATION Calculated field
SNAPSHOT Point-in-time capture
Example: Account Balance Lineage
Core Banking
ACCOUNT_MASTER.WORKING_BALANCE
EDW
FACT_DAILY_BALANCE.CLOSING_BALANCE
Power BI
RPT_CAPITAL_ADEQUACY.TOTAL_ASSETS
Transformations: SNAPSHOT (end-of-day T-1) → AGGREGATION (SUM by asset class, quarterly)
Try it: Go to Lineage to browse all 10 lineage mappings, and Lineage Graph to see the visual flow.
9
Data Quality

Data Quality measures how well data serves its intended purpose. High-quality data is the foundation of trustworthy analytics and regulatory reporting. As a data analyst, you must always question: "Can I trust this data?"

The 6 Dimensions of Data Quality:
1. Accuracy
Does the data correctly represent the real-world entity?
Example: Is the Account Balance in the EDW the same as in the Core Banking system?
2. Completeness
Are all required data values present?
Example: Does every Customer record have a KYC Status? (should be 100% for regulatory compliance)
3. Timeliness
Is the data available when needed and up-to-date?
Example: Is the Account Balance snapshot from T-1 (yesterday) or T-3 (three days ago)?
4. Consistency
Does the same data match across different systems?
Example: Customer Name in Core Banking = Customer Name in EDW = Customer Name in CRM?
5. Uniqueness
Is each record represented only once?
Example: Does each Customer ID appear exactly once in CUSTOMER_MASTER? No duplicates?
6. Validity
Does the data conform to defined rules and formats?
Example: Is the IBAN exactly 34 characters? Does Transaction Currency follow ISO 4217?
Why it matters for you as an analyst: Before building any dashboard or report, always ask:
  1. Where does this data come from? (check Lineage)
  2. What does this field actually mean? (check Glossary)
  3. Is this field complete and accurate? (check Data Quality scores)
  4. Who approved this data's definition? (check Approval status)
10
Data Profiling

Data Profiling is the process of examining data from a source and collecting statistics and summaries about it. It's the first step in understanding data quality — you can't fix what you haven't measured.

What Data Profiling Reveals:
MetricDescriptionExample
Row CountTotal records in the tableCUSTOMER_MASTER: 1.2M rows
Null Rate% of missing values per columnEMAIL column: 23% null
Distinct CountNumber of unique valuesACCOUNT_TYPE: 5 distinct values
Min/MaxRange of valuesWORKING_BALANCE: -50,000 to 12,500,000
Pattern AnalysisFormat patterns detectedPHONE_NO: 89% match "+XXX-XXXXXXXX"
Value DistributionHow values are spreadKYC_STATUS: VERIFIED(72%), PENDING(18%), EXPIRED(10%)
UniquenessAre values unique?CUSTOMER_ID: 100% unique (as expected for PK)
Referential IntegrityDo FK relationships hold?ACCOUNT_MASTER.CUSTOMER_ID: 99.8% match in CUSTOMER_MASTER
Profiling in Action: Imagine you profile the CREDIT_SCORES.SCORE_VALUE column and find values of -1 and 0. A credit score should be between 300-850. This tells you there are data quality issues — maybe -1 is used as a placeholder for "not calculated yet." You'd flag this, create a data quality rule, and work with the Risk data steward to resolve it.
Try it: Upload your own data via Upload Data to profile CSV/Excel files and see column-level statistics.
11
Data Stewardship & Roles

Data Stewardship defines who is responsible for data at every level. Without clear ownership, data issues get ignored — "everyone's responsibility is nobody's responsibility."

RoleResponsibilityPlatform Example
Admin Configure the platform, manage users, set up AI providers Can access all features, AI Settings, User Management
Data Owner Executive accountable for a domain. Final approval authority. Approves/rejects terms in the Approval Inbox
Data Steward Day-to-day manager. Creates terms, reviews quality, manages lineage. Creates terms, sets classifications, maps metadata
Data Custodian Technical staff. Manages databases, ETL, and security controls. Manages Technical Metadata and lineage mappings
Viewer Reads glossary and reports. Cannot modify data. Browses terms, views lineage, reads dashboards
Try it: Approval Inbox shows the dual-approval workflow. Terms go through DRAFT → PENDING_REVIEW → PENDING_APPROVAL → APPROVED.
12
Regulatory Landscape

Banks are among the most regulated industries. Data governance is not optional — it's mandated by regulators. Here are the key regulations and how they map to this platform:

RegulationWhat It RequiresPlatform Feature
BCBS 239 Accurate risk data aggregation. Identify CDEs. Full lineage. CDE Registry, Data Lineage, Approval Workflows
GDPR Protect personal data. Right to erasure. Data minimization. PII Classification, Metadata mapping (find all PII)
PCI-DSS Secure payment card data. Tokenization. Access controls. PCI Classification, CDE flagging for card data
IFRS 9 Accurate Expected Credit Loss (ECL) calculation. Lineage from PD/LGD/EAD → ECL Report
AML/KYC Know your customer. Monitor suspicious transactions. Glossary terms for SAR, Sanctions, KYC Status
Ready to Explore?

Now that you understand the concepts, explore further: