← Back to Projects
AI Fairness in Housing Lending demo
01

Problem

Machine learning models used in lending decisions inherit the biases present in their training data — and historical housing lending data in the United States is shaped by decades of documented discriminatory practice. When an ML model trained on Fannie Mae mortgage records learns which applicant profiles correlate with loan approval, it does not learn from neutral data: it learns from the outcomes of a market that systematically disadvantaged applicants by race and gender. Measuring that bias requires formal fairness metrics that go beyond accuracy, and mitigating it requires pre-processing techniques that reweight the training data before a classifier ever sees it. The critical question is not whether bias can be measured — it can — but whether bias mitigation techniques that work at the data level survive the training process and produce a fairer classifier at the output level. The answer has direct implications for any organization deploying predictive models in high-stakes domains.


02

Solution

This Georgia Tech CS 6603 final project applied two established fairness metric algorithms — Disparate Impact and Statistical Parity Difference — to 2,558,959 Fannie Mae single-family mortgage originations from 2008, analyzing outcomes across Race and Gender as protected attributes. Both metrics confirmed statistically significant disparities in loan approval rates across demographic groups, quantifying the bias embedded in the historical record. A Reweighting pre-processing technique was then applied to adjust sample weights and balance approval rates across demographic groups before classifier training. Two classifiers were trained — one on the original data and one on the reweighted data — and their fairness metrics were compared on the held-out test set. The finding: Reweighting improved fairness metrics on the training data, but the bias substantially re-emerged after classifier training, demonstrating that pre-processing bias mitigation alone is insufficient for producing a fair deployed model.


03

Skills Acquired

What those metrics reveal — and what the mitigation results say about the limits of pre-processing fairness interventions — is the core of what follows.


04

Deep Dive

When a lender uses an ML model to evaluate a mortgage application, it processes numbers — income ratios, loan-to-value ratios, credit history. But embedded in that data are patterns shaped by decades of systemic inequality. The question this project asks is not whether bias exists in housing lending data. It asks: how much, for whom, and can pre-processing fix it?

This was the Final Project for CS6603 — AI, Ethics, and Society, part of the Online Master of Science in Computer Science (OMSCS) program at Georgia Tech. The project applies two established fairness metric algorithms (Disparate Impact and Statistical Parity Difference) to the 2008 Fannie Mae Single-Family Mortgage Dataset across Race and Gender, then applies a Reweighting pre-processing bias mitigation technique and measures whether it survives classifier training.

The Fannie Mae 2008 dataset covers 2,558,959 single-family mortgage originations. This is not a synthetic dataset — it is a record of real lending decisions made on real homes during the year the U.S. housing market collapsed.

Step 1 — Dataset Selection

The dataset selected is the Enterprise Public Use Database: Single-Family Properties: National File A: Release of 2008 Data — the Fannie Mae Dataset. It contains 2,558,959 observations and 16 variables, belonging to the Housing and Public Administration regulated domain, subject to the Fair Housing Act and the Equal Credit Opportunity Act.

The dependent/outcome variables selected are:

Table 1 shows the 5 variables in the dataset associated with a legally recognized protected class:

VariableProtected Classes Associated with each Variable
2000 Census Tract — Percent MinorityRace, Color, Ethnicity, National Origin
Borrower Race or National Origin, and EthnicityRace, Color, Ethnicity, National Origin
Co-Borrower Race or National Origin, and EthnicityRace, Color, Ethnicity, National Origin
Borrower GenderGender
Co-Borrower GenderGender

Table 1 — Variables in the 2008 Single-Family Residences Fannie Mae Dataset

Table 2 shows the Legal Precedence/Law associated with each protected class from Table 1:

Protected ClassLegal Precedence/Law
RaceCivil Rights Act of 1964, 1991
ColorCivil Rights Act of 1964, 1991
EthnicityCivil Rights Act of 1964, 1991
National OriginCivil Rights Act of 1964, 1991
GenderEqual Pay Act of 1963; Civil Rights Act of 1964, 1991

Table 2 — Legal Precedence/Law for each Protected Class in Table 1


Step 2 — Explore the Dataset

Table 3 displays the members/subgroups associated with the protected class variables in the dataset. Members/Subgroups have been discretized into discrete numerical values.

VariableProtected ClassesMembers/Subgroups
2000 Census Tract — Percent Minority Race, Color, Ethnicity, National Origin 1 = 0–<10%
2 = 10–<30%
3 = 30–100%
9 = Missing
Borrower Race or National Origin, and Ethnicity Race, Color, Ethnicity, National Origin 1 = American Indian or Alaska Native
2 = Asian
3 = Black or African American
4 = Native Hawaiian or Other Pacific Islander
5 = White
6 = Two or more races
7 = Hispanic or Latino
9 = Not available/not applicable
Co-Borrower Race or National Origin, and Ethnicity Race, Color, Ethnicity, National Origin 1 = American Indian or Alaska Native
2 = Asian
3 = Black or African American
4 = Native Hawaiian or Other Pacific Islander
5 = White
6 = Two or more races
7 = Hispanic or Latino
9 = Not available/not applicable
Borrower Gender Gender 1 = Male
2 = Female
3 = Not provided (mail/telephone)
4 = Not applicable
9 = Missing
Co-Borrower Gender Gender 1 = Male
2 = Female
3 = Not provided (mail/telephone)
4 = Not applicable
9 = Missing

Table 3 — Members/Subgroups Associated with Protected Class Variables in the Dataset

For this Final Project, two protected classes were selected: Race and Gender. Table 4 shows the four combinations of protected class and outcome variable used throughout the analysis:

Combination #Protected ClassDependent/Outcome Variable
1RaceBorrower Income Ratio
2RaceLoan-to-Value Ratio (LTV) at Origination
3GenderLoan-to-Value Ratio (LTV) at Origination
4GenderBorrower Income Ratio

Table 4 — Combinations of Selected Protected Classes with Selected Dependent/Outcome Variables

Table 5 describes the subgroups of each outcome variable:

Dependent/Outcome VariableSubgroupsDescription
Borrower Income Ratio 1 = 0–60%
2 = >60–100%
3 = >100%
Borrower Income Ratio is a borrower's annual income divided by the area median family income for 2008. The lower the Borrower Income Ratio percentage, the lower a Borrower's income.
Loan-to-Value (LTV) at Origination 1 = >0–≤60%
2 = >60–≤80%
3 = >80–≤90%
4 = >90–≤95%
5 = >95%
LTV Ratio at Origination is the amount of the loan divided by the home value. Lower LTVs are generally achieved by wealthier borrowers as this indicates that they can afford to make a higher down payment.

Table 5 — Subgroups of the Selected Dependent/Outcome Variables

Tables 6–9 show the frequency distributions for all four combinations. The "favorable" outcome for Income Ratio is 0–60% (lowest income tier, indicating the unprivileged group's concentration); the "favorable" outcome for LTV is 0–80% (indicating lower LTV, associated with wealthier borrowers).

Table 6 — Combination #1: Distribution of Borrower Income Ratio by Race

Race0–60%>60–100%>100%Total
American Indian or Alaska Native9963,6501,5726,218
Asian10,24384,57629,317124,136
Black or African American36,97044,92332,148114,041
Native Hawaiian or Other Pacific Islander1,4786,9692,92611,373
White225,5551,042,756402,1091,670,420
Two or More Races7733,6131,5725,958
Bar chart: Distribution of Borrower Income Ratio by Race

Plot 1 — Distribution of Borrower Income Ratio by Race. White borrowers dominate the >100% income tier; Black/African American borrowers are more concentrated in the 0–60% (lowest income) tier.

Table 7 — Combination #2: Distribution of LTV by Race

Race0–60%61–80%81–90%91–95%>95%
American Indian or Alaska Native1,5542,680667391311
Asian33,52268,27612,1625,5421,980
Black or African American16,96140,87813,20610,02110,038
Native Hawaiian or Other Pacific Islander2,6195,8111,317805440
White380,089809,653171,429102,42773,974
Two or More Races1,1482,824737480255
Bar chart: Distribution of LTV by Race

Plot 2 — Distribution of LTV by Race. White borrowers have the largest 61–80% LTV share; Black/African American borrowers show relatively higher concentrations in the upper LTV buckets (81–95%+), indicating lower down-payment capacity.

Table 8 — Combination #3: Distribution of Borrower Income Ratio by Gender

Gender0–60%>60–100%>100%
Female176,904318,333211,999
Male147,1571,068,533349,659
Bar chart: Distribution of Borrower Income Ratio by Gender

Plot 3 — Distribution of Borrower Income Ratio by Gender. Male borrowers have a dramatically larger >100% income ratio count — roughly 3× female borrowers — reflecting the dataset's overall gender imbalance in high-income mortgage originations.

Table 9 — Combination #4: Distribution of LTV by Gender

Gender0–60%61–80%81–90%91–95%>95%
Female153,112307,56375,05244,17235,988
Male352,309778,662170,749102,21970,850
Bar chart: Distribution of LTV by Gender

Plot 4 — Distribution of LTV by Gender. Both genders concentrate in the 61–80% LTV range, but male borrowers' 61–80% count is nearly 2.5× that of female borrowers, reflecting the larger male applicant pool in the dataset.


Step 3 — Fairness Metric Algorithms

Two fairness metric algorithms were applied across all four combinations. The privileged group for Race is Asian; the unprivileged group is Black or African American. For Gender, privileged is Male; unprivileged is Female.

Table 12 lists the specific formulas for each evaluation topic. For LTV, the "favorable" outcome is LTV 0–80% (lower is better). For Income Ratio, the "favorable" outcome is 0–60% (used to measure the gap in low-income concentration).

Evaluation TopicMetricFormula
Race & LTV DI (# African American Borrowers with LTV 0–80% ÷ Total # African American) ÷ (# Asian Borrowers with LTV 0–80% ÷ Total # Asian)
SPD (# African Americans with LTV 0–80% ÷ Total # African Americans) − (# Asians with LTV 0–80% ÷ Total # Asians)
Gender & LTV DI (# Female Borrowers with LTV 0–80% ÷ Total # Female) ÷ (# Male Borrowers with LTV 0–80% ÷ Total # Male)
SPD (# Females with LTV 0–80% ÷ Total # Females) − (# Males with LTV 0–80% ÷ Total # Males)
Race & Borrower Income Ratio DI (# African Americans with Income Ratio 0–60% ÷ Total # African Americans) ÷ (# Asians with Income Ratio 0–60% ÷ Total # Asians)
SPD (# African Americans with Income Ratio 0–60% ÷ Total # African Americans) − (# Asians with Income Ratio 0–60% ÷ Total # Asians)
Gender & Borrower Income Ratio DI (# Females with Income Ratio 0–60% ÷ Total # Females) ÷ (# Males with Income Ratio 0–60% ÷ Total # Males)
SPD (# Females with Income Ratio 0–60% ÷ Total # Females) − (# Males with Income Ratio 0–60% ÷ Total # Males)

Table 12 — DI and SPD Formulas per Evaluation Topic

Table 11 — Results across all four evaluation combinations (DI and SPD):

Evaluation Topic DI DI In Range? SPD
Race & LTV 0.7576 No (below 0.8) −0.2031
Gender & LTV 0.9754 Yes −0.0189
Race & Borrower Income Ratio 3.9288 No (above 1.25) 0.2417
Gender & Borrower Income Ratio 2.6607 No (above 1.25) 0.1561

Table 11 — DI and SPD for each Protected Class within Borrower Income Ratio and LTV

The clearest signal: Race & LTV has a DI of 0.7576 — meaning Black or African American borrowers achieve favorable LTV ratios (0–80%) at only 75.76% the rate of Asian borrowers. The SPD of −0.2031 means there is a 20.3 percentage-point gap in favorable LTV outcomes between the two groups.

Gender & LTV is the one combination within the acceptable DI range (0.9754), with a corresponding SPD of only −1.89%. This stands in contrast to Gender & Borrower Income Ratio, where a DI of 2.6607 shows that female borrowers are proportionally concentrated at the lowest income tier — a structural outcome of the gender wage gap embedded in 2008 lending data.


Step 3 (continued) — Reweighting: Pre-Processing Bias Mitigation

Reweighting is a pre-processing bias mitigation technique: it assigns different weights to training samples so that the distribution across privileged and unprivileged groups becomes statistically balanced — without modifying the underlying feature values or adding synthetic data. It is applied to the dataset before any model is trained.

Table 13 shows the changes after Reweighting is applied to the original dataset (from Table 11):

Fairness Metric Race & LTV Gender & LTV Race & Borrower Income Ratio Gender & Borrower Income Ratio
Original DI0.75760.97543.92882.6607
Reweighted DI0.98490.99490.98221.0111
Change from Original (DI)−0.2273−0.01952.94661.6497
Original SPD−0.2031−0.01890.24170.1561
Reweighted SPD0.00200.00020.00240.0016
Change from Original (SPD)−0.2051−0.19100.23930.1546

Table 13 — Changes after Reweighting Applied to Original Dataset from Table 11

All four DI values moved into or very close to the ideal range after Reweighting. All four SPD values collapsed to near zero. The technique works on the dataset — the question is whether those improvements hold once a classifier is trained.


Step 4 — Classifier Training and Final Analysis

To test whether bias mitigation survives classifier training, the pipeline was run in two tracks using train/test splits.

Track A — Original Dataset

70% Training / 30% Testing Split

Table 14 shows the split of the original dataset:

Dataset% of Original# of Rows
Original Dataset100%2,558,959
Training Dataset70%1,791,271
Testing Dataset30%767,688

Table 14 — Splitting the Original Dataset

A classifier was trained on the 70% training split and fairness metrics were re-evaluated on the 30% test set (Table 15):

Fairness MetricRace & LTVGender & LTVRace & Borrower Income RatioGender & Borrower Income Ratio
DI0.75840.97633.93462.6756
SPD−0.2025−0.01820.24260.1569

Table 15 — Fairness Metrics for Testing Dataset After Training Classifier on Original Dataset

Track B — Transformed (Reweighted) Dataset

80% Training / 20% Testing Split

Table 16 shows the split of the transformed (reweighted) dataset:

Dataset% of Original# of Rows
Transformed Dataset100%2,558,959
Training Dataset80%2,047,167
Testing Dataset20%511,792

Table 16 — Splitting the Transformed Dataset

A classifier was trained on the 80% reweighted training data and fairness metrics were re-evaluated on the 20% test set (Table 17):

Fairness MetricRace & LTVGender & LTVRace & Borrower Income RatioGender & Borrower Income Ratio
DI0.76050.97733.89232.6745
SPD−0.2007−0.01740.24040.1569

Table 17 — Fairness Metrics for Testing Dataset After Training Classifier on Transformed Dataset

Tables 18–21 show the final analysis demonstrating the full step-by-step progression for each combination:

Table 18 — Race (Independent) to LTV (Dependent) Fairness Metrics

Dataset Analysis ItemDisparate ImpactChange vs. PreviousSPDChange vs. Previous
Original Dataset0.7576NA−0.2031NA
After Transforming Dataset0.9849Positive Change0.0020Positive Change
After Classifier on Original0.7584Negative Change−0.2025Negative Change
After Classifier on Transformed0.7605Very Minimal−0.2007Very Minimal

Table 19 — Gender (Independent) to LTV (Dependent) Fairness Metrics

Dataset Analysis ItemDisparate ImpactChange vs. PreviousSPDChange vs. Previous
Original Dataset0.9754NA−0.0189NA
After Transforming Dataset0.9949Positive Change0.0002Very Minimal
After Classifier on Original0.9763Negative Change−0.0182Negative Change
After Classifier on Transformed0.9773Very Minimal−0.0174Very Minimal

Table 20 — Race (Independent) to Borrower Income Ratio (Dependent) Fairness Metrics

Dataset Analysis ItemDisparate ImpactChange vs. PreviousSPDChange vs. Previous
Original Dataset3.9288NA0.2417NA
After Transforming Dataset0.9822Positive Change0.0024Positive Change
After Classifier on Original3.9346Negative Change0.2426Negative Change
After Classifier on Transformed3.8923Very Minimal0.2404Very Minimal

Table 21 — Gender (Independent) to Borrower Income Ratio (Dependent) Fairness Metrics

Dataset Analysis ItemDisparate ImpactChange vs. PreviousSPDChange vs. Previous
Original Dataset2.6607NA0.1561NA
After Transforming Dataset1.0111Positive Change0.0016Positive Change
After Classifier on Original2.6756Negative Change0.1569Negative Change
After Classifier on Transformed2.6745Very Minimal0.1569Very Minimal
Reweighting successfully balanced the dataset — but the classifier trained on that balanced dataset reverted the fairness metrics almost entirely back to the original biased values. Pre-processing alone is not sufficient when the structural patterns driving the disparity are deeply embedded in the features themselves.

This is the core finding: bias mitigation must extend beyond the dataset into the model training process (in-processing) and, in some cases, into post-processing adjustments to model outputs. For the Fannie Mae 2008 data, the income and LTV disparities between racial and gender groups are not artifacts of sampling — they are features. A classifier trained to predict outcomes from those features will learn to perpetuate them.


Step 5 — Team Confirmation

Step 5: I am a team of one.

Appendix 3.1 — Source Code for Step 2: Explore the Dataset

The following is the source code for Step 2 — Explore the Dataset. It reads the Fannie Mae txt file and produces the four frequency distribution tables and bar graphs (Tables 6–9, Figures 1–4).

# STEP 2: Explore the Dataset
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Created 4 Tables and associated Graphs to compare the 2 selected Protected Classes
# (Race and Gender) with the 2 Dependent/Outcome Variables
# (Borrower Income Ratio and LTV Ratio at Origination)
def analyze_loans(file_path):
    try:
        races, genders, incomes, ltvs = [], [], [], []

        with open(file_path, 'r') as file:
            for line in file:
                fields = line.strip().split()

                # Field #'s are from Enterprise Public Use Database: Single-Family Properties:
                # National File A: RELEASE OF 2008 DATA
                races.append(int(fields[9]))    # Field 10
                genders.append(int(fields[11]))  # Field 12
                incomes.append(int(fields[5]))   # Field 6
                ltvs.append(int(fields[6]))     # Field 7

        df = pd.DataFrame({
            'Race':   races,
            'Gender': genders,
            'Income': incomes,
            'LTV':    ltvs
        })

        race_labels = {
            1: 'American Indian/Alaska Native',
            2: 'Asian',
            3: 'Black/African American',
            4: 'Native Hawaiian/Pacific Islander',
            5: 'White',
            6: 'Two or more races',
        }
        gender_labels = {1: 'Male', 2: 'Female'}
        income_labels = {1: '0-60%', 2: '>60-100%', 3: '>100%'}
        ltv_labels    = {1: '0-60%', 2: '61-80%', 3: '81-90%', 4: '91-95%', 5: '>95%'}

        df['Race']   = df['Race'].map(race_labels)
        df['Gender'] = df['Gender'].map(gender_labels)
        df['Income'] = df['Income'].map(income_labels)
        df['LTV']    = df['LTV'].map(ltv_labels)

        tables = {
            'Race_Income':  pd.crosstab(df['Race'],   df['Income']),
            'Race_LTV':     pd.crosstab(df['Race'],   df['LTV']),
            'Gender_Income': pd.crosstab(df['Gender'], df['Income']),
            'Gender_LTV':   pd.crosstab(df['Gender'], df['LTV'])
        }

        def create_bar_plot(data, title, plot_number):
            plt.figure(figsize=(15, 8))
            colors = plt.cm.Set3(np.linspace(0, 1, len(data.columns)))
            ax = data.plot(kind='bar', color=colors)
            plt.title(f'{title} (Plot {plot_number})', fontsize=14, pad=20)
            plt.xlabel('Categories', fontsize=12)
            plt.ylabel('Count', fontsize=12)
            plt.xticks(rotation=45, ha='right')
            plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
            plt.grid(True, axis='y', linestyle='--', alpha=0.7)
            plt.tight_layout()
            plt.show()

        plot_titles = {
            'Race_Income':  'Distribution of Income Ratio by Race',
            'Race_LTV':     'Distribution of LTV by Race',
            'Gender_Income': 'Distribution of Income Ratio by Gender',
            'Gender_LTV':   'Distribution of LTV by Gender'
        }

        for num, (key, table) in enumerate(tables.items(), 1):
            print(f"\nTable {num}: {plot_titles[key]}")
            print("-" * 80)
            table['Total'] = table.sum(axis=1)
            print("\nCounts:")
            print(table)
            plot_data = table.drop('Total', axis=1)
            create_bar_plot(plot_data, plot_titles[key], num)

    except Exception as e:
        print(f"Error processing file: {str(e)}")

file_path = "/Volumes/2.0 Seagate Backup/Learning/OMSCS/CS 6603/CS 6603 - Final Project/fnma_sf2008a_loans.txt"

if __name__ == "__main__":
    plt.style.use('default')
    analyze_loans(file_path)
    plt.close('all')

Why This Matters Beyond the Numbers

The 2008 housing crisis did not affect all borrowers equally. Higher LTV ratios at origination — concentrated among Black borrowers — meant less equity to absorb falling home prices. The income ratio disparity meant fewer resources to weather payment shocks. These were not random distributions; they were the product of decades of redlining, discriminatory lending, and unequal access to wealth-building assets.

When machine learning models are trained on this data without fairness constraints, they don't just predict outcomes — they encode historical discrimination as objective signal. A model trained to minimize prediction error on 2008 Fannie Mae data will, by construction, treat race and income-correlated features as predictive of risk, because in that dataset, they were.

The technical contribution of this project is the empirical demonstration that standard pre-processing (Reweighting) addresses dataset-level distributional bias but does not address the deeper issue: the features themselves carry the signal of historical inequity. Regulatory frameworks like the Equal Credit Opportunity Act (ECOA) prohibit explicit use of protected class variables — but they cannot prohibit models from learning proxies.

Key Takeaway

Measuring fairness is straightforward. Achieving it is not. Reweighting improves dataset statistics but does not change what a classifier learns. Structural bias in lending data requires structural solutions — both technical (in-processing and post-processing mitigation) and regulatory (auditable model requirements, disparate impact testing mandates, and ongoing monitoring across protected classes).


Technical Approach

The full analysis was implemented in a Jupyter Notebook, with the following pipeline:

View the Full Paper & Code

Source code (Jupyter Notebook) and the complete project report are available on GitHub.