Skip to main content

Import Normalization Rules

The PowerSeller Desktop App uses a multi-layer system to normalize incoming loan tape data from different sellers. Each seller sends data in their own format with their own terminology — the normalization rules translate everything into PowerSeller's standard schema.

Source: Azure SQL MI (PS608 / Watermark TPO), extracted 2026-03-17.

The Problem

Every seller's loan tape uses different:

  • Column names ("Loan Amount" vs "UPB" vs "Principal Balance" vs "loan_amount")
  • Value conventions ("Conv noPMI" vs "Conventional" vs "CONV" for the same loan type)
  • Data formats (LTV as 80.0 vs 0.80, dates as MM/DD/YYYY vs YYYY-MM-DD)
  • Product codes ("CONVENTIONAL 30-YEAR FIXED" vs "CF30" vs "Conv 30yr")
  • File formats (CSV, fixed-width, pipe-delimited, different column orders)

Users historically spend significant time writing SQL syntax rules to normalize each new seller's data. At WTPO, there are 1,610 normalization rules across 8 sellers.

Architecture

Layer 1: File Maps (47 maps, 1,618 column definitions)

Each seller/format combination has a file map that defines how to parse the incoming file.

Source: pxcat_external_file_maps + pxcat_external_file_map_cols

FieldPurpose
map_nameUnique identifier (e.g., "Citizens Import")
map_typeImport or export
table_nameTarget table (usually psbid_loans)
delimiter_nameField delimiter
date_formatDate parsing format
conversion_group_nameWhich value conversion rules to apply
conv_expr_group_nameWhich expression conversion rules to apply

Each map has column definitions specifying:

  • column_name — target column in PowerSeller
  • file_column_contents — source column name or position in the file
  • column_type — data type
  • length — field length
  • letter_case — case conversion
  • format — display format
  • import_blanks — how to handle empty values

Layer 2: Simple Value Mapping (1,016 rules)

Direct find-and-replace rules organized by seller. When a value in the incoming file matches from_value, it's replaced with to_value.

Source: pxcat_conversion_values

Conversion Groups at WTPO

SellerTarget TableRule Count
Citizenspsbid_loans~100+
First Alliancepsbid_loansvaries
First Homepsbid_loansvaries
Kind Lendingpsbid_loansvaries
LenderWorks FHMpsbid_loansvaries
LenderWorks ICMpsbid_loansvaries
Loan Steadypsbid_loansvaries
Mortgage Onepsbid_loansvaries
US Mortgagepsbid_loansvaries
Notionalloan_notionalvaries

Also includes export conversion groups for agency delivery (FHLMC, GNMA, Mornet).

Example Rules (Citizens)

ColumnFrom ValueTo Value
loan_typeConv noPMIConventional
loan_typeConv wPMIConventional
property_typeCondoCondominium
property_typeSingle Family DetachedSFR
amort_typeAmort ARMAdjustableRate
first_time_borrowerYtrue
first_time_borrowerNfalse
first_time_borrowerNULLfalse
lienprioritytypeFirstFirst Lien

Every seller needs their own set of these mappings because every seller uses different terminology.

Layer 3: Expression-Based Conversions (594 rules)

SQL syntax rules that evaluate conditions against other columns to compute derived values. More powerful than simple mapping — handles conditional logic and calculations.

Source: pxcat_conversion_expr_values

Example Rules (Citizens)

Product code to instrument mapping:

Syntax (IF condition)ColumnTo Value
product_code = 'CONVENTIONAL 30-YEAR FIXED'instrument_nameCF30
product_code = 'CONV 30 YR FIXED RATE HOME POSSIBLE'instrument_nameHP30
product_code = 'FHA FIXED 25-YEAR (GNMA II)'instrument_nameFF30
product_code = 'HomePossible 30yr Fixed'instrument_nameHP30
product_code = 'CONV 30-YEAR FIXED HOME READY'instrument_nameHR30

Computed values:

SyntaxColumnTo Value
1=1 (always)cltvcltv*100 (decimal to percentage)
1=1 (always)import_los_source'Citizens' (stamp the source)
1=1 (always)import_los_dategetdate()-0 (current timestamp)
refinancecashouttype = 'Refi Equity Takeout'cash_out_indicator'true'
refinancecashouttype <> 'Refi Equity Takeout'cash_out_indicator'false'

Why This Is Painful

For each new seller, someone (usually Jay or a power user) must:

  1. Get a sample loan tape from the seller
  2. Identify which columns map to which PowerSeller fields
  3. Identify all unique values in each field
  4. Write simple conversion rules for every non-standard value
  5. Write expression rules for conditional logic (product code → instrument)
  6. Test the import and fix mismatches
  7. Repeat when the seller changes their format

This process can take hours to days per seller and is error-prone. Changes to a seller's file format require manual rule updates.

How PSX Replaces This

PowerSeller X's loan tape normalizer (engine/field_registry.py) solves this problem programmatically:

Desktop App ApproachPSX approach
Manual SQL rules per sellerAutomatic field detection by name/alias
1,016 simple mapping rulesBuilt-in canonical value mappings
594 expression rulesProgrammatic transformations
Hours/days per new sellerMinutes — upload and auto-detect
Breaks when format changesFuzzy matching handles variations
Per-customer configurationCentralized, learns from all sellers

The PSX field registry maintains a dictionary of known aliases for each canonical field (e.g., loan_amount is also known as "UPB", "unpaid principal balance", "Principal Balance", "loan_amount", "baseloanamount"). When a seller uploads a loan tape, the normalizer matches columns automatically rather than requiring manual mapping rules.

This is a key value proposition of PSX over the Desktop App's import workflow.

Tables Reference

TableRowsPurpose
pxcat_external_file_maps47File format definitions (one per seller/format)
pxcat_external_file_map_cols1,618Column mapping within each file format
pxcat_conversion_groups25Named groups of value conversion rules
pxcat_conversion_values1,016Simple from → to value mappings
pxcat_conversion_expr_groups18Named groups of expression conversion rules
pxcat_conversion_expr_values594SQL syntax-based conditional conversions
pxcat_column_mappings27Column-level mapping definitions
pxcat_column_mapping_contexts2Mapping context definitions
pxcat_column_types14Supported column data types