PropRaven Marketplace — Data Dictionary v1.7
Snowflake Marketplace listing reference. All entities live in PROPRAVEN_MARKETPLACE.
Layer overview
| Schema | Purpose | Tier |
|---|---|---|
SILVER | Canonical entities with stable PRPV IDs | Core, Owner Graph, Intelligence, Lending |
GOLD_GRAPH | Relationship edges (ownership, transactions) | Owner Graph, Intelligence, Lending |
GOLD_INTELLIGENCE | Pre-computed scores + signals | Intelligence, Lending |
LENDING | Vertical bundle: 4.3M loans + 400K lenders pre-joined | Lending |
INSURANCE | Vertical bundle: 326M parcels with multi-peril risk vector | Insurance |
SFR, CRE, GOVTECH | Vertical bundles (parcel cohorts) | Vertical listings |
PREVIEW | Free Mecklenburg-only sample of every entity | Preview |
META | Data dictionary, lineage, run audit | (internal) |
Customers query SILVER + GOLD; META is operational.
Stable IDs
Every PropRaven entity carries a prpv_<entity>_id that is deterministic and forever-stable:
PRPV_<ENTITY>_<HEX16>
Example: PRPV_PARCEL_B23F0E32A9BBDE04. Survives source updates, county APN re-issuance, and resolution-method changes within a major version.
System columns (every SILVER + GOLD table)
| column | type | meaning |
|---|---|---|
_loaded_at | timestamp_ntz | When this row entered the table |
_source_as_of | timestamp_ntz | Source-reported as-of |
_source_system | string | BRONZE table of origin |
_resolution_version | string | Resolution run that produced this row |
_match_method | string | deterministic / probabilistic / name_classified / etc. |
_confidence_score | float | 0–1; lower means more uncertainty |
Filtering by confidence is a first-class operation. Production decisioning queries should use WHERE _confidence_score >= 0.95. Default tier (Core) exposes all rows; Intelligence tier lets customers filter.
SILVER entities
SILVER.PARCEL — 230,913,480 rows
| column | type | description |
|---|---|---|
prpv_parcel_id | varchar | Stable Parcel ID |
prpv_geography_id | varchar | FK → GEOGRAPHY |
state_fips | varchar(2) | |
county_fips | varchar(3) | |
apn | varchar | Verbatim county parcel number |
apn_normalized | varchar | Digits-stripped, upper-cased — used for joining |
lot_size_sqft | number | |
lot_size_acres | number | Computed if sqft present (1ac = 43,560 sqft) |
zoning_code_raw | varchar | |
zoning_class | varchar | (v1.1: normalized R/C/I/M/A/MX/O) |
land_use_code_raw | varchar | |
land_use_class | varchar | |
legal_description | varchar | |
geometry | geography | WKT polygon, EPSG:4326 |
centroid_lat | float | |
centroid_lon | float | |
has_polygon | boolean | Pre-computed for fast filters |
is_active | boolean | False if subdivided/merged out |
Cluster key: (state_fips, county_fips).
Source provenance: PROPZILLA.CURATED.PARCEL_ENRICHED + PROPZILLA.CURATED.PARCEL_POLYGONS.
SILVER.PROPERTY — 230,913,480 rows (v1.3 property_type_class fix applied)
| column | type | description |
|---|---|---|
prpv_property_id | varchar | Stable Property ID (1:1 with parcel in v1) |
prpv_parcel_id | varchar | FK → PARCEL |
property_type_raw | varchar | County classification verbatim |
property_type_class | varchar | Canonical MISMO-derived code: SFR / MF_SMALL / MF / CONDO / COOP / MOBILE / COMMERCIAL / INDUSTRIAL / AGRICULTURAL / VACANT / MIXED / INSTITUTIONAL / OTHER. v1.4 coverage: 96.37% national. 5-tier classifier with explicit confidence: T1 deterministic (1.00, MISMO direct), T2 source-normalized (0.95, CamelCase→canonical), T3 land-use inferred (0.85), T4 zoning inferred (0.75), T5 structural inferred (0.50, year-built+lot-size+improvement-value derivation for floor-locked states). 4% NULL = genuinely no-signal parcels (track via _match_method = 'no_signal', _confidence_score = 0). Filter WHERE _confidence_score >= 0.75 for production decisioning. |
year_built | int | |
effective_year_built | int | After major renovation |
gross_building_area_sqft | number | |
unit_count | int | |
bedroom_count | int | |
bathroom_count | float | |
is_owner_occupied | boolean | Mailing address matches situs after normalization |
v1 limitation: condo units collapse to one Property per parcel with unit_count > 1. Per-unit modeling deferred to v1.1.
SILVER.OWNER — 27,175,989 rows (post-canonical-merge v1.1)
| column | type | description |
|---|---|---|
prpv_owner_id | varchar | Stable Owner ID |
source_entity_id | varchar | Internal Propzilla cluster ID (er-* prefix) |
name_raw | varchar | Owner name as recorded |
name_normalized | varchar | Punctuation-stripped, upper |
entity_type | varchar | INDIVIDUAL / LLC / CORP / LP / TRUST / GOVT / NONPROFIT / ESTATE / RELIGIOUS / FINANCIAL / OTHER / UNKNOWN |
state_of_formation | varchar(2) | |
formation_date | date | |
is_active_entity | boolean | |
registered_agent_name | varchar | (v1.1: SoS-enriched) |
parcel_count | int | Derived rollup |
total_assessed_value | number | Derived rollup |
Resolution methodology: PE.OWNER_ENTITY_ID cluster + name-based entity-type classifier. Confidence varies:
1.00— exact name match in known-institutional list (Tier A)0.95— entity type matched by name regex (Tier B)0.85— er-i-* individual cluster (Tier C)
SILVER.TAX_RECORD — 188,095,269 rows
| column | type | description |
|---|---|---|
prpv_tax_record_id | varchar | Stable Tax Record ID |
prpv_parcel_id | varchar | FK |
tax_year | int | |
assessed_value_total | number | |
assessed_value_land | number | |
assessed_value_improvement | number | |
assessed_value_personal | number | |
market_value_estimate | number | |
taxable_value_total | number | |
tax_amount | number | (v1.1: from TAX_BILL_HISTORY) |
is_delinquent | boolean | (v1.1) |
homestead_exemption | boolean | (v1.1) |
v1 source: PROPZILLA.TAX_ROLL.ASSESSMENT_HISTORY. Multi-year history available.
SILVER.TRANSACTION — 21,100,000 rows (v1.5)
Title-transferring deeds only (warranty / quitclaim / sheriff / foreclosure). Mortgage-only documents are filtered into SILVER.LOAN.
| column | type | description |
|---|---|---|
prpv_transaction_id | varchar | Stable Transaction ID |
prpv_parcel_id | varchar | FK (71% of rows match a known parcel) |
recorded_date | date | |
sale_date | date | |
sale_price | number | |
doc_type_raw | varchar | County classification verbatim |
doc_type_class | varchar | Normalized: WARRANTY / QUITCLAIM / SHERIFF / FORECLOSURE / DEED_IN_LIEU / TAX / EXECUTOR / TRUSTEE / OTHER |
is_title_transfer | boolean | True for v1 cohort |
is_distressed | boolean | sheriff/foreclosure/deed_in_lieu/tax — 186K nationally |
grantor_name, grantee_name | varchar | |
book, page, instrument_number | varchar |
Source: PROPZILLA.RAW.DEED_TRANSACTIONS filtered by doc_type normalizer UDF.
SILVER.LOAN — 4,290,000 rows (v1.6)
Lien-creating documents: mortgage / deed-of-trust / HELOC / commercial / construction.
| column | type | description |
|---|---|---|
prpv_loan_id | varchar | Stable Loan ID |
prpv_parcel_id | varchar | FK |
lender_name_raw | varchar | As-recorded |
lender_name_normalized | varchar | Suffix-collapsed (NA/FA/INC/CORP/LLC/LP) for canonical join |
borrower_name | varchar | |
originated_upb | number | Original principal |
loan_type_class | varchar | MORTGAGE / DOT / HELOC / COMMERCIAL / CONSTRUCTION / OTHER |
recorded_date | date | |
is_mers_nominee | boolean | MERS nominee flag |
Source: PROPZILLA.RAW.DEED_TRANSACTIONS filtered by mortgage doc_type. Top originators (by UPB): MERS $152.7B nominee, JPMorgan $54.7B, BofA $40.6B, Wells Fargo $37.0B, Citibank $31.4B.
SILVER.ADDRESS — 146,300,000 rows (v1.6)
Canonical USPS-deduped address index. Powers strict mailing-address equality joins (cf. PARCEL_ABSENTEE_STRICT_HIGHPREC).
| column | type | description |
|---|---|---|
prpv_address_id | varchar | Stable Address ID |
line1, city, state, zip5 | varchar | Canonical components |
is_po_box | boolean | |
parcel_count_situs | int | Parcels using this as situs address |
parcel_count_mailing | int | Owners using this as mailing address |
SILVER.PERMIT — 44,776,760 rows
Only permits with a matched parcel are surfaced in v1. Unmatched permits remain in BRONZE.
| column | type | description |
|---|---|---|
prpv_permit_id | varchar | Stable Permit ID |
prpv_parcel_id | varchar | FK |
permit_number | varchar | |
permit_type | varchar | Normalized: NEW_CONSTRUCTION / ADDITION / RENOVATION / DEMOLITION / ROOF / ELECTRICAL / PLUMBING / MECHANICAL / OTHER |
permit_type_raw | varchar | |
status | varchar | ISSUED / IN_PROGRESS / FINALIZED / EXPIRED / REVOKED |
valuation | number | Reported job value |
fee_amount | number | |
filed_date | date | |
issued_date | date | |
finalized_date | date | |
applicant_name, contractor_name, contractor_license, owner_name | varchar | |
jurisdiction_id | varchar |
SILVER.GEOGRAPHY — 1,827 rows
Geographic hierarchy roll-up. Join target for any spatial filter.
| column | type | description |
|---|---|---|
prpv_geography_id | varchar | |
state_fips, state_abbr | ||
county_fips (3-char), county_fips_5 (5-char) | ||
cbsa_code, cbsa_name, csa_code, csa_name | ||
region | NORTHEAST / MIDWEST / SOUTH / WEST | |
division | Census division (NEW_ENGLAND / MIDDLE_ATLANTIC / etc.) |
GOLD_GRAPH edges
GOLD_GRAPH.EDGE_OWNER_PARCEL — 162,476,141 edges (post-canonical-merge v1.1)
| column | description |
|---|---|
prpv_owner_id, prpv_parcel_id | composite PK |
effective_from, effective_to | time-varying ownership (v1: latest only) |
is_current | always TRUE in v1 |
ownership_share | 1.0 (joint owners deferred to v1.1) |
GOLD_GRAPH.EDGE_TRANSACTION_PARTY — 34,700,000 edges (v1.7)
GRANTOR/GRANTEE roles per transaction with owner-id resolution. 26% (9.0M) match a canonical owner via name; the rest carry prpv_owner_id = NULL with match_method='unmatched'.
| column | description |
|---|---|
prpv_transaction_id, party_role | composite PK; party_role ∈ {GRANTOR, GRANTEE} |
party_name, party_name_normalized | as-recorded + UPPER-stripped form |
prpv_owner_id | FK → SILVER.OWNER when name match succeeds |
match_method | exact_name / unmatched |
confidence_score | 0.85 when matched, 0.0 otherwise |
Top grantees (matched only): Baton Rouge LA (6,200), NVR/Ryan Homes (4,191), HUD (4,151), Wells Fargo, IL DOT.
GOLD_INTELLIGENCE signals
PARCEL_ABSENTEE — 230,913,480 rows
Per-parcel: is_absentee (boolean), absentee_class (LOCAL_ABSENTEE / OUT_OF_COUNTY / OUT_OF_STATE / PO_BOX), confidence_score.
Methodology: street-prefix comparison after normalization; confidence_score >= 0.95 requires both ZIP + state evidence.
OWNER_PORTFOLIO — 27,175,989 rows (post-canonical-merge v1.1)
Per-canonical-owner: parcel_count, total_assessed_value, state_count, county_count, property-type mix counts (SFR / MF / commercial / industrial / vacant / other), avg_year_built, primary_state.
PARCEL_INSTITUTIONAL_OWNED — 165,995,168 rows (rebuild after canonical-merge pending)
Per-parcel: is_institutional_owned, institutional_tier (A=known-list / B=>=10 parcels / C=>=5 parcels), is_known_institutional.
Tier A confidence 1.0 (curated list match). Tier B 0.90, Tier C 0.75.
PARCEL_RECENT_TRANSFER — 230,913,480 rows
Per-parcel: last_sale_date, last_sale_price, days_since_sale, recency_bucket (LAST_90 / LAST_180 / LAST_365 / OVER_365 / NEVER), boolean flags per bucket.
Note: PE.LAST_SALE_DATE has known noise (some 2026 dates from upstream backfills). Filter confidence_score >= 0.95 (requires both date + price) for clean signal.
PARCEL_DISTRESS — 230,913,480 rows (v1.5)
7-indicator composite distress score per parcel.
| column | type | description |
|---|---|---|
is_distressed | boolean | True when ≥2 indicators fire |
distress_score | int | 0–7, sum of fired indicators |
signals_count | int | same as score |
f1_distressed_deed | boolean | sheriff / foreclosure / deed-in-lieu / tax deed in last 5y |
f2_low_price_sale | boolean | last sale ≤ 50% of latest assessed value |
f3_llc_recent_transfer | boolean | LLC recent grantee within 24mo |
f4_old_no_permits | boolean | year_built < 1960 AND zero permits in 10y |
f5_low_imp_to_land | boolean | improvement_value < 30% of land_value |
f6_multi_distress | boolean | overlay of ≥2 of f1-f5 |
f7_vacant_old | boolean | property_type=VACANT AND year_built < 1980 |
confidence_score | float | scales with signals_count |
PARCEL_VALUE_ADD_CANDIDATE — 280,000,000 rows (v1.7)
6-indicator composite for redevelopment / repositioning targeting.
| column | type | description |
|---|---|---|
is_value_add_candidate | boolean | True for HIGH/MEDIUM tiers |
value_add_score | int | 0–6, sum of fired indicators |
value_add_tier | varchar | HIGH (4+) / MEDIUM (2-3) / LOW (0-1) |
vai1_old_low_imp | boolean | year_built < 1980 AND improvement < 40% land |
vai2_recent_permit | boolean | permit issued in last 24mo |
vai3_sub_market_sale | boolean | last sale ≤ 70% of CBSA median PPSF |
vai4_inst_recent | boolean | institutional grantee within 24mo |
vai5_mf_mix | boolean | MF property in active CBSA |
vai6_active_metro | boolean | parcel in top-100 transaction-velocity CBSA |
v1.7 rollout: 13,234 HIGH-tier targets, 8.26M MEDIUM nationally.
LENDER_EXPOSURE — 399,681 rows (v1.7)
Per-canonical-lender rollup. Suffix-collapsed across NA / FA / N.A. / INC / CORP / LLC / LP / BANK forms.
| column | type | description |
|---|---|---|
lender_name_canonical | varchar | Canonical form |
loan_count | int | |
originated_upb_total | number | Sum of original principal |
parcel_count | int | Distinct collateral parcels |
primary_state | varchar(2) | Largest-state by loan count |
state_count | int | |
loan_type_mix | object | OBJECT_AGG of loan_type_class → count |
avg_origination_year | float | |
mers_nominee_share | float | Fraction of loans flagged as MERS-nominee |
v1.7 canonicalization: Citibank entries 89K + 89K → 197K consolidated; JPMorgan 92K + 46K → 172K. Use lender_name_canonical for portfolio queries.
INSURANCE bundle (v1.7)
INSURANCE.PARCEL_RISK — 325,897,358 rows
Wide per-parcel multi-peril risk vector. Single join target for carriers / reinsurers / cat modelers.
| column group | columns |
|---|---|
| identity | prpv_parcel_id, state_fips, county_fips, census_tract, latitude, longitude |
| FEMA NRI | nri_overall_rating, nri_overall_score, nri_expected_annual_loss, nri_hurricane_rating, nri_tornado_rating, nri_wildfire_rating, nri_earthquake_rating, nri_social_vulnerability_rating, nri_community_resilience_rating |
| FEMA NFIP | fema_flood_zone, flood_risk_score, flood_risk_category, tract_flood_claims_count, tract_flood_loss_ratio, historical_flood_damage_usd |
| FEMA Disasters | fema_flood_disasters, fema_fire_disasters, fema_hurricane_disasters, fema_tornado_disasters, fema_severe_storm_disasters |
| NOAA Storm Events 30y | noaa_storm_count_30y, noaa_hurricane_count_30y, noaa_tornado_count_30y, noaa_hail_count_30y, noaa_flood_count_30y, noaa_wildfire_count_30y, noaa_storm_deaths_30y, noaa_storm_property_damage_usd_30y, noaa_top_storm_type, noaa_storm_score |
| USGS / ASCE7 seismic | seismic_design_category, seismic_short_period_accel, seismic_one_sec_accel, seismic_design_short, seismic_design_one_sec, seismic_pgam |
| CDC SVI | svi_overall_pctile, svi_socioeconomic_pctile, svi_household_pctile, svi_minority_pctile, svi_housing_transport_pctile |
| Wildfire | wildfire_score |
| Composite | climate_risk_label, climate_risk_tier, env_risk_flag, lead_paint_child_risk_rate, nearest_fire_station_mi |
Coverage: 305.0M (94%) NRI-rated. Sourced from PROPZILLA.CURATED.PARCEL_ENRICHED.
INSURANCE.PARCEL_HIGH_PERIL_FLAGS — 325,897,358 rows
Boolean flags for fast cohort filtering. Each flag fires on the worst NRI tier (VERY HIGH or RELATIVELY HIGH) or its scalar equivalent.
| flag | criterion | count |
|---|---|---|
high_peril_hurricane | NRI hurricane rating ≥ relatively-high | 20.1M |
high_peril_tornado | NRI tornado rating ≥ relatively-high | varies |
high_peril_wildfire | NRI wildfire rating ≥ relatively-high | 71.9M |
high_peril_earthquake | NRI earthquake rating ≥ relatively-high | 31.5M |
in_sfha_flood_zone | FEMA flood zone ∈ {A, AE, AH, AO, AR, A99, V, VE} | 3.4M |
high_flood_claim_history | NFIP claims ≥ 100 in tract | varies |
high_seismic_design_cat | ASCE7 SDC ∈ {D, E, F} | 35.1M |
high_climate_risk_tier | composite climate tier ≥ HIGH | varies |
high_wildfire_score | wildfire score ≥ 70 | varies |
META schema (internal)
META.ENTITY_DICTIONARY— entity-level descriptionsMETA.ATTRIBUTE_DICTIONARY— column-level (lazy-populated)META.SOURCE_LINEAGE— which BRONZE table feeds each SILVER entityMETA.RESOLUTION_VERSIONS— semver-tracked resolution runsMETA.BUILD_RUNS— operational audit per build
Versioning policy
- Major version (
v1,v2) — breaking schema changes (column drops, type changes). Old major lives in parallel listing for 90 days. - Minor version (
v1.1,v1.2) — additive (new entities, new columns, refined methodology). Backwards compatible. - Resolution version (
v1.0.0-national) — stamped on every row; pin queries to a specific resolution if reproducibility matters.