PropRaven Marketplace — Data Dictionary v1.7

Snowflake Marketplace listing reference. All entities live in PROPRAVEN_MARKETPLACE.


Layer overview

SchemaPurposeTier
SILVERCanonical entities with stable PRPV IDsCore, Owner Graph, Intelligence, Lending
GOLD_GRAPHRelationship edges (ownership, transactions)Owner Graph, Intelligence, Lending
GOLD_INTELLIGENCEPre-computed scores + signalsIntelligence, Lending
LENDINGVertical bundle: 4.3M loans + 400K lenders pre-joinedLending
INSURANCEVertical bundle: 326M parcels with multi-peril risk vectorInsurance
SFR, CRE, GOVTECHVertical bundles (parcel cohorts)Vertical listings
PREVIEWFree Mecklenburg-only sample of every entityPreview
METAData dictionary, lineage, run audit(internal)

Customers query SILVER + GOLD; META is operational.


Stable IDs

Every PropRaven entity carries a prpv_<entity>_id that is deterministic and forever-stable:

PRPV_<ENTITY>_<HEX16>

Example: PRPV_PARCEL_B23F0E32A9BBDE04. Survives source updates, county APN re-issuance, and resolution-method changes within a major version.


System columns (every SILVER + GOLD table)

columntypemeaning
_loaded_attimestamp_ntzWhen this row entered the table
_source_as_oftimestamp_ntzSource-reported as-of
_source_systemstringBRONZE table of origin
_resolution_versionstringResolution run that produced this row
_match_methodstringdeterministic / probabilistic / name_classified / etc.
_confidence_scorefloat0–1; lower means more uncertainty

Filtering by confidence is a first-class operation. Production decisioning queries should use WHERE _confidence_score >= 0.95. Default tier (Core) exposes all rows; Intelligence tier lets customers filter.


SILVER entities

SILVER.PARCEL — 230,913,480 rows

columntypedescription
prpv_parcel_idvarcharStable Parcel ID
prpv_geography_idvarcharFK → GEOGRAPHY
state_fipsvarchar(2)
county_fipsvarchar(3)
apnvarcharVerbatim county parcel number
apn_normalizedvarcharDigits-stripped, upper-cased — used for joining
lot_size_sqftnumber
lot_size_acresnumberComputed if sqft present (1ac = 43,560 sqft)
zoning_code_rawvarchar
zoning_classvarchar(v1.1: normalized R/C/I/M/A/MX/O)
land_use_code_rawvarchar
land_use_classvarchar
legal_descriptionvarchar
geometrygeographyWKT polygon, EPSG:4326
centroid_latfloat
centroid_lonfloat
has_polygonbooleanPre-computed for fast filters
is_activebooleanFalse if subdivided/merged out

Cluster key: (state_fips, county_fips). Source provenance: PROPZILLA.CURATED.PARCEL_ENRICHED + PROPZILLA.CURATED.PARCEL_POLYGONS.


SILVER.PROPERTY — 230,913,480 rows (v1.3 property_type_class fix applied)

columntypedescription
prpv_property_idvarcharStable Property ID (1:1 with parcel in v1)
prpv_parcel_idvarcharFK → PARCEL
property_type_rawvarcharCounty classification verbatim
property_type_classvarcharCanonical MISMO-derived code: SFR / MF_SMALL / MF / CONDO / COOP / MOBILE / COMMERCIAL / INDUSTRIAL / AGRICULTURAL / VACANT / MIXED / INSTITUTIONAL / OTHER. v1.4 coverage: 96.37% national. 5-tier classifier with explicit confidence: T1 deterministic (1.00, MISMO direct), T2 source-normalized (0.95, CamelCase→canonical), T3 land-use inferred (0.85), T4 zoning inferred (0.75), T5 structural inferred (0.50, year-built+lot-size+improvement-value derivation for floor-locked states). 4% NULL = genuinely no-signal parcels (track via _match_method = 'no_signal', _confidence_score = 0). Filter WHERE _confidence_score >= 0.75 for production decisioning.
year_builtint
effective_year_builtintAfter major renovation
gross_building_area_sqftnumber
unit_countint
bedroom_countint
bathroom_countfloat
is_owner_occupiedbooleanMailing address matches situs after normalization

v1 limitation: condo units collapse to one Property per parcel with unit_count > 1. Per-unit modeling deferred to v1.1.


SILVER.OWNER — 27,175,989 rows (post-canonical-merge v1.1)

columntypedescription
prpv_owner_idvarcharStable Owner ID
source_entity_idvarcharInternal Propzilla cluster ID (er-* prefix)
name_rawvarcharOwner name as recorded
name_normalizedvarcharPunctuation-stripped, upper
entity_typevarcharINDIVIDUAL / LLC / CORP / LP / TRUST / GOVT / NONPROFIT / ESTATE / RELIGIOUS / FINANCIAL / OTHER / UNKNOWN
state_of_formationvarchar(2)
formation_datedate
is_active_entityboolean
registered_agent_namevarchar(v1.1: SoS-enriched)
parcel_countintDerived rollup
total_assessed_valuenumberDerived rollup

Resolution methodology: PE.OWNER_ENTITY_ID cluster + name-based entity-type classifier. Confidence varies:

  • 1.00 — exact name match in known-institutional list (Tier A)
  • 0.95 — entity type matched by name regex (Tier B)
  • 0.85 — er-i-* individual cluster (Tier C)

SILVER.TAX_RECORD — 188,095,269 rows

columntypedescription
prpv_tax_record_idvarcharStable Tax Record ID
prpv_parcel_idvarcharFK
tax_yearint
assessed_value_totalnumber
assessed_value_landnumber
assessed_value_improvementnumber
assessed_value_personalnumber
market_value_estimatenumber
taxable_value_totalnumber
tax_amountnumber(v1.1: from TAX_BILL_HISTORY)
is_delinquentboolean(v1.1)
homestead_exemptionboolean(v1.1)

v1 source: PROPZILLA.TAX_ROLL.ASSESSMENT_HISTORY. Multi-year history available.


SILVER.TRANSACTION — 21,100,000 rows (v1.5)

Title-transferring deeds only (warranty / quitclaim / sheriff / foreclosure). Mortgage-only documents are filtered into SILVER.LOAN.

columntypedescription
prpv_transaction_idvarcharStable Transaction ID
prpv_parcel_idvarcharFK (71% of rows match a known parcel)
recorded_datedate
sale_datedate
sale_pricenumber
doc_type_rawvarcharCounty classification verbatim
doc_type_classvarcharNormalized: WARRANTY / QUITCLAIM / SHERIFF / FORECLOSURE / DEED_IN_LIEU / TAX / EXECUTOR / TRUSTEE / OTHER
is_title_transferbooleanTrue for v1 cohort
is_distressedbooleansheriff/foreclosure/deed_in_lieu/tax — 186K nationally
grantor_name, grantee_namevarchar
book, page, instrument_numbervarchar

Source: PROPZILLA.RAW.DEED_TRANSACTIONS filtered by doc_type normalizer UDF.


SILVER.LOAN — 4,290,000 rows (v1.6)

Lien-creating documents: mortgage / deed-of-trust / HELOC / commercial / construction.

columntypedescription
prpv_loan_idvarcharStable Loan ID
prpv_parcel_idvarcharFK
lender_name_rawvarcharAs-recorded
lender_name_normalizedvarcharSuffix-collapsed (NA/FA/INC/CORP/LLC/LP) for canonical join
borrower_namevarchar
originated_upbnumberOriginal principal
loan_type_classvarcharMORTGAGE / DOT / HELOC / COMMERCIAL / CONSTRUCTION / OTHER
recorded_datedate
is_mers_nomineebooleanMERS nominee flag

Source: PROPZILLA.RAW.DEED_TRANSACTIONS filtered by mortgage doc_type. Top originators (by UPB): MERS $152.7B nominee, JPMorgan $54.7B, BofA $40.6B, Wells Fargo $37.0B, Citibank $31.4B.


SILVER.ADDRESS — 146,300,000 rows (v1.6)

Canonical USPS-deduped address index. Powers strict mailing-address equality joins (cf. PARCEL_ABSENTEE_STRICT_HIGHPREC).

columntypedescription
prpv_address_idvarcharStable Address ID
line1, city, state, zip5varcharCanonical components
is_po_boxboolean
parcel_count_situsintParcels using this as situs address
parcel_count_mailingintOwners using this as mailing address

SILVER.PERMIT — 44,776,760 rows

Only permits with a matched parcel are surfaced in v1. Unmatched permits remain in BRONZE.

columntypedescription
prpv_permit_idvarcharStable Permit ID
prpv_parcel_idvarcharFK
permit_numbervarchar
permit_typevarcharNormalized: NEW_CONSTRUCTION / ADDITION / RENOVATION / DEMOLITION / ROOF / ELECTRICAL / PLUMBING / MECHANICAL / OTHER
permit_type_rawvarchar
statusvarcharISSUED / IN_PROGRESS / FINALIZED / EXPIRED / REVOKED
valuationnumberReported job value
fee_amountnumber
filed_datedate
issued_datedate
finalized_datedate
applicant_name, contractor_name, contractor_license, owner_namevarchar
jurisdiction_idvarchar

SILVER.GEOGRAPHY — 1,827 rows

Geographic hierarchy roll-up. Join target for any spatial filter.

columntypedescription
prpv_geography_idvarchar
state_fips, state_abbr
county_fips (3-char), county_fips_5 (5-char)
cbsa_code, cbsa_name, csa_code, csa_name
regionNORTHEAST / MIDWEST / SOUTH / WEST
divisionCensus division (NEW_ENGLAND / MIDDLE_ATLANTIC / etc.)

GOLD_GRAPH edges

GOLD_GRAPH.EDGE_OWNER_PARCEL — 162,476,141 edges (post-canonical-merge v1.1)

columndescription
prpv_owner_id, prpv_parcel_idcomposite PK
effective_from, effective_totime-varying ownership (v1: latest only)
is_currentalways TRUE in v1
ownership_share1.0 (joint owners deferred to v1.1)

GOLD_GRAPH.EDGE_TRANSACTION_PARTY — 34,700,000 edges (v1.7)

GRANTOR/GRANTEE roles per transaction with owner-id resolution. 26% (9.0M) match a canonical owner via name; the rest carry prpv_owner_id = NULL with match_method='unmatched'.

columndescription
prpv_transaction_id, party_rolecomposite PK; party_role ∈ {GRANTOR, GRANTEE}
party_name, party_name_normalizedas-recorded + UPPER-stripped form
prpv_owner_idFK → SILVER.OWNER when name match succeeds
match_methodexact_name / unmatched
confidence_score0.85 when matched, 0.0 otherwise

Top grantees (matched only): Baton Rouge LA (6,200), NVR/Ryan Homes (4,191), HUD (4,151), Wells Fargo, IL DOT.


GOLD_INTELLIGENCE signals

PARCEL_ABSENTEE — 230,913,480 rows

Per-parcel: is_absentee (boolean), absentee_class (LOCAL_ABSENTEE / OUT_OF_COUNTY / OUT_OF_STATE / PO_BOX), confidence_score. Methodology: street-prefix comparison after normalization; confidence_score >= 0.95 requires both ZIP + state evidence.

OWNER_PORTFOLIO — 27,175,989 rows (post-canonical-merge v1.1)

Per-canonical-owner: parcel_count, total_assessed_value, state_count, county_count, property-type mix counts (SFR / MF / commercial / industrial / vacant / other), avg_year_built, primary_state.

PARCEL_INSTITUTIONAL_OWNED — 165,995,168 rows (rebuild after canonical-merge pending)

Per-parcel: is_institutional_owned, institutional_tier (A=known-list / B=>=10 parcels / C=>=5 parcels), is_known_institutional. Tier A confidence 1.0 (curated list match). Tier B 0.90, Tier C 0.75.

PARCEL_RECENT_TRANSFER — 230,913,480 rows

Per-parcel: last_sale_date, last_sale_price, days_since_sale, recency_bucket (LAST_90 / LAST_180 / LAST_365 / OVER_365 / NEVER), boolean flags per bucket. Note: PE.LAST_SALE_DATE has known noise (some 2026 dates from upstream backfills). Filter confidence_score >= 0.95 (requires both date + price) for clean signal.


PARCEL_DISTRESS — 230,913,480 rows (v1.5)

7-indicator composite distress score per parcel.

columntypedescription
is_distressedbooleanTrue when ≥2 indicators fire
distress_scoreint0–7, sum of fired indicators
signals_countintsame as score
f1_distressed_deedbooleansheriff / foreclosure / deed-in-lieu / tax deed in last 5y
f2_low_price_salebooleanlast sale ≤ 50% of latest assessed value
f3_llc_recent_transferbooleanLLC recent grantee within 24mo
f4_old_no_permitsbooleanyear_built < 1960 AND zero permits in 10y
f5_low_imp_to_landbooleanimprovement_value < 30% of land_value
f6_multi_distressbooleanoverlay of ≥2 of f1-f5
f7_vacant_oldbooleanproperty_type=VACANT AND year_built < 1980
confidence_scorefloatscales with signals_count

PARCEL_VALUE_ADD_CANDIDATE — 280,000,000 rows (v1.7)

6-indicator composite for redevelopment / repositioning targeting.

columntypedescription
is_value_add_candidatebooleanTrue for HIGH/MEDIUM tiers
value_add_scoreint0–6, sum of fired indicators
value_add_tiervarcharHIGH (4+) / MEDIUM (2-3) / LOW (0-1)
vai1_old_low_impbooleanyear_built < 1980 AND improvement < 40% land
vai2_recent_permitbooleanpermit issued in last 24mo
vai3_sub_market_salebooleanlast sale ≤ 70% of CBSA median PPSF
vai4_inst_recentbooleaninstitutional grantee within 24mo
vai5_mf_mixbooleanMF property in active CBSA
vai6_active_metrobooleanparcel in top-100 transaction-velocity CBSA

v1.7 rollout: 13,234 HIGH-tier targets, 8.26M MEDIUM nationally.


LENDER_EXPOSURE — 399,681 rows (v1.7)

Per-canonical-lender rollup. Suffix-collapsed across NA / FA / N.A. / INC / CORP / LLC / LP / BANK forms.

columntypedescription
lender_name_canonicalvarcharCanonical form
loan_countint
originated_upb_totalnumberSum of original principal
parcel_countintDistinct collateral parcels
primary_statevarchar(2)Largest-state by loan count
state_countint
loan_type_mixobjectOBJECT_AGG of loan_type_class → count
avg_origination_yearfloat
mers_nominee_sharefloatFraction of loans flagged as MERS-nominee

v1.7 canonicalization: Citibank entries 89K + 89K → 197K consolidated; JPMorgan 92K + 46K → 172K. Use lender_name_canonical for portfolio queries.



INSURANCE bundle (v1.7)

INSURANCE.PARCEL_RISK — 325,897,358 rows

Wide per-parcel multi-peril risk vector. Single join target for carriers / reinsurers / cat modelers.

column groupcolumns
identityprpv_parcel_id, state_fips, county_fips, census_tract, latitude, longitude
FEMA NRInri_overall_rating, nri_overall_score, nri_expected_annual_loss, nri_hurricane_rating, nri_tornado_rating, nri_wildfire_rating, nri_earthquake_rating, nri_social_vulnerability_rating, nri_community_resilience_rating
FEMA NFIPfema_flood_zone, flood_risk_score, flood_risk_category, tract_flood_claims_count, tract_flood_loss_ratio, historical_flood_damage_usd
FEMA Disastersfema_flood_disasters, fema_fire_disasters, fema_hurricane_disasters, fema_tornado_disasters, fema_severe_storm_disasters
NOAA Storm Events 30ynoaa_storm_count_30y, noaa_hurricane_count_30y, noaa_tornado_count_30y, noaa_hail_count_30y, noaa_flood_count_30y, noaa_wildfire_count_30y, noaa_storm_deaths_30y, noaa_storm_property_damage_usd_30y, noaa_top_storm_type, noaa_storm_score
USGS / ASCE7 seismicseismic_design_category, seismic_short_period_accel, seismic_one_sec_accel, seismic_design_short, seismic_design_one_sec, seismic_pgam
CDC SVIsvi_overall_pctile, svi_socioeconomic_pctile, svi_household_pctile, svi_minority_pctile, svi_housing_transport_pctile
Wildfirewildfire_score
Compositeclimate_risk_label, climate_risk_tier, env_risk_flag, lead_paint_child_risk_rate, nearest_fire_station_mi

Coverage: 305.0M (94%) NRI-rated. Sourced from PROPZILLA.CURATED.PARCEL_ENRICHED.

INSURANCE.PARCEL_HIGH_PERIL_FLAGS — 325,897,358 rows

Boolean flags for fast cohort filtering. Each flag fires on the worst NRI tier (VERY HIGH or RELATIVELY HIGH) or its scalar equivalent.

flagcriterioncount
high_peril_hurricaneNRI hurricane rating ≥ relatively-high20.1M
high_peril_tornadoNRI tornado rating ≥ relatively-highvaries
high_peril_wildfireNRI wildfire rating ≥ relatively-high71.9M
high_peril_earthquakeNRI earthquake rating ≥ relatively-high31.5M
in_sfha_flood_zoneFEMA flood zone ∈ {A, AE, AH, AO, AR, A99, V, VE}3.4M
high_flood_claim_historyNFIP claims ≥ 100 in tractvaries
high_seismic_design_catASCE7 SDC ∈ {D, E, F}35.1M
high_climate_risk_tiercomposite climate tier ≥ HIGHvaries
high_wildfire_scorewildfire score ≥ 70varies

META schema (internal)

  • META.ENTITY_DICTIONARY — entity-level descriptions
  • META.ATTRIBUTE_DICTIONARY — column-level (lazy-populated)
  • META.SOURCE_LINEAGE — which BRONZE table feeds each SILVER entity
  • META.RESOLUTION_VERSIONS — semver-tracked resolution runs
  • META.BUILD_RUNS — operational audit per build

Versioning policy

  • Major version (v1, v2) — breaking schema changes (column drops, type changes). Old major lives in parallel listing for 90 days.
  • Minor version (v1.1, v1.2) — additive (new entities, new columns, refined methodology). Backwards compatible.
  • Resolution version (v1.0.0-national) — stamped on every row; pin queries to a specific resolution if reproducibility matters.