Data Governance Concepts
A comprehensive guide for data analysts to understand data governance principles, data quality, profiling, lineage, and how they apply in banking. Every concept is illustrated with real examples from the sample data loaded in this platform.
What is Data Governance?
Data Governance is the framework of policies, processes, roles, and standards that ensures an organization's data is managed as a strategic asset. It defines who can take what action, upon what data, in what situations, and using what methods.
People
Data Stewards, Data Owners, Data Custodians, and Governance Councils who are accountable for data decisions.Process
Workflows for term creation, approval, classification review, data quality monitoring, and issue remediation.Technology
This platform! Tools for glossaries, metadata catalogs, lineage tracking, quality dashboards, and AI-assisted governance.Why Data Governance Matters in Banking
Banks handle some of the most sensitive data on earth — customer identities, account balances, transactions, credit scores, and compliance records. Poor data governance in banking leads to regulatory fines, fraud losses, and broken customer trust.
Without Governance
- Different teams call the same thing different names
- Nobody knows which system is the "source of truth"
- PII data ends up in test environments unmasked
- Regulators ask for data lineage and you can't produce it
- Reports don't match because definitions differ
With Governance
- One agreed definition for "Customer ID" across all systems
- Clear lineage from Core Banking → EDW → Reports
- PCI data (Credit Card Number) is tokenized everywhere
- Audit trail for every data change and approval
- Consistent, trusted numbers for regulatory reporting
Business Glossary
A Business Glossary is a curated dictionary of business terms used across an organization. Each term has a single, approved definition so that everyone — from analysts to executives — speaks the same language.
What a Business Term Contains:
| Field | Example (from this platform) | Why It Matters |
|---|---|---|
| Term Name | Account Balance | Unique, standardized name |
| Definition | "The current available balance in a customer deposit account..." | One agreed meaning |
| Domain | Account | Which business area owns it |
| Classification | CONFIDENTIAL | How sensitive it is |
| Status | APPROVED | Went through review workflow |
| Is CDE? | Yes | Critical for regulatory reporting |
| Owner | Data Steward from Account domain | Who to contact for questions |
Data Domains & Ownership
A Data Domain is a logical grouping of data by business area. Each domain has a Data Owner (accountable executive) and Data Stewards (day-to-day managers). This ensures every piece of data has clear accountability.
Banking Domains in This Platform:
Data Classification & Sensitivity
Data Classification categorizes data by its sensitivity level, which determines how it must be stored, transmitted, accessed, and disposed of. Getting classification wrong can lead to data breaches and regulatory fines.
Classification Levels (as used in this platform):
| Level | Classification | Handling | Example Terms |
|---|---|---|---|
| 1 | PUBLIC | No restrictions. Can share externally. | Transaction Currency, Branch Code |
| 2 | INTERNAL | Internal systems only. Don't share outside. | KYC Status, Account Type, Transaction Type |
| 3 | CONFIDENTIAL | Encrypted storage, access logging, approval for sharing. | Account Balance, Credit Score, Customer ID |
| 4 | HIGHLY CONFIDENTIAL | MFA required. Full audit trail. Encrypt at rest & transit. | National ID Number, SAR Reports |
| 5 | PII | GDPR/CCPA. Data masking in non-prod. Right to erasure. | Customer Full Name, Date of Birth, Email, Phone |
| 5 | PCI | PCI-DSS Level 1. Tokenization. Quarterly scans. | Credit Card Number, Card CVV |
Critical Data Elements (CDEs)
A Critical Data Element is a data field that is essential for regulatory reporting, risk management, or key business decisions. CDEs receive heightened governance — stricter quality checks, mandatory lineage documentation, and formal approval workflows.
Why CDEs Matter (BCBS 239):
The Basel Committee's BCBS 239 standard requires banks to identify their critical data elements and ensure they are accurate, complete, timely, and traceable. Regulators can ask: "Show me every system that touches your Capital Adequacy Ratio calculation and prove the data is correct at every step."
Data Lineage
Data Lineage tracks the journey of data from its origin to its final destination — every system it passes through, every transformation applied, and every report it feeds. It answers: "Where did this number come from, and can I trust it?"
Transformation Types:
ACCOUNT_MASTER.WORKING_BALANCE
FACT_DAILY_BALANCE.CLOSING_BALANCE
RPT_CAPITAL_ADEQUACY.TOTAL_ASSETS
Data Quality
Data Quality measures how well data serves its intended purpose. High-quality data is the foundation of trustworthy analytics and regulatory reporting. As a data analyst, you must always question: "Can I trust this data?"
The 6 Dimensions of Data Quality:
1. Accuracy
Does the data correctly represent the real-world entity?2. Completeness
Are all required data values present?3. Timeliness
Is the data available when needed and up-to-date?4. Consistency
Does the same data match across different systems?5. Uniqueness
Is each record represented only once?6. Validity
Does the data conform to defined rules and formats?- Where does this data come from? (check Lineage)
- What does this field actually mean? (check Glossary)
- Is this field complete and accurate? (check Data Quality scores)
- Who approved this data's definition? (check Approval status)
Data Profiling
Data Profiling is the process of examining data from a source and collecting statistics and summaries about it. It's the first step in understanding data quality — you can't fix what you haven't measured.
What Data Profiling Reveals:
| Metric | Description | Example |
|---|---|---|
| Row Count | Total records in the table | CUSTOMER_MASTER: 1.2M rows |
| Null Rate | % of missing values per column | EMAIL column: 23% null |
| Distinct Count | Number of unique values | ACCOUNT_TYPE: 5 distinct values |
| Min/Max | Range of values | WORKING_BALANCE: -50,000 to 12,500,000 |
| Pattern Analysis | Format patterns detected | PHONE_NO: 89% match "+XXX-XXXXXXXX" |
| Value Distribution | How values are spread | KYC_STATUS: VERIFIED(72%), PENDING(18%), EXPIRED(10%) |
| Uniqueness | Are values unique? | CUSTOMER_ID: 100% unique (as expected for PK) |
| Referential Integrity | Do FK relationships hold? | ACCOUNT_MASTER.CUSTOMER_ID: 99.8% match in CUSTOMER_MASTER |
CREDIT_SCORES.SCORE_VALUE column and find
values of -1 and 0. A credit score should be between 300-850. This tells you there are
data quality issues — maybe -1 is used as a placeholder for "not calculated yet."
You'd flag this, create a data quality rule, and work with the Risk data steward to resolve it.
Data Stewardship & Roles
Data Stewardship defines who is responsible for data at every level. Without clear ownership, data issues get ignored — "everyone's responsibility is nobody's responsibility."
| Role | Responsibility | Platform Example |
|---|---|---|
| Admin | Configure the platform, manage users, set up AI providers | Can access all features, AI Settings, User Management |
| Data Owner | Executive accountable for a domain. Final approval authority. | Approves/rejects terms in the Approval Inbox |
| Data Steward | Day-to-day manager. Creates terms, reviews quality, manages lineage. | Creates terms, sets classifications, maps metadata |
| Data Custodian | Technical staff. Manages databases, ETL, and security controls. | Manages Technical Metadata and lineage mappings |
| Viewer | Reads glossary and reports. Cannot modify data. | Browses terms, views lineage, reads dashboards |
Regulatory Landscape
Banks are among the most regulated industries. Data governance is not optional — it's mandated by regulators. Here are the key regulations and how they map to this platform:
| Regulation | What It Requires | Platform Feature |
|---|---|---|
| BCBS 239 | Accurate risk data aggregation. Identify CDEs. Full lineage. | CDE Registry, Data Lineage, Approval Workflows |
| GDPR | Protect personal data. Right to erasure. Data minimization. | PII Classification, Metadata mapping (find all PII) |
| PCI-DSS | Secure payment card data. Tokenization. Access controls. | PCI Classification, CDE flagging for card data |
| IFRS 9 | Accurate Expected Credit Loss (ECL) calculation. | Lineage from PD/LGD/EAD → ECL Report |
| AML/KYC | Know your customer. Monitor suspicious transactions. | Glossary terms for SAR, Sanctions, KYC Status |
Ready to Explore?
Now that you understand the concepts, explore further: