Pricing engine � v6.0

Live in production � Rescores every active listing continuously

How we know what your car is worth.

No vibes. No "the market says". Every car on AllCars gets a 0�100 score against its real peer group: same model, same era, same engine, same condition signals. We do this with a 13-dimension similarity engine and a bunch of adaptive maths that copes with rare cars, sparse data, and the occassional outright lemon.

Dimensions

Value � Quality axes

Live comparables

11k+

Active across Cyprus

Full rescore

45�90s

Whole market in batch

Score range

0�100

Green � Amber � Red

The promise

Real comparables. Real maths. No vibes.

Most "fair price" tools throw a national average at you and call it done. That works for fridges. Not cars. A 2018 Audi A4 with 60,000 km, manual gearbox, full service history, sitting in Limassol is not the same car as a 2018 A4 in Nicosia with 180,000 km and a salvage title. The badge is identical. Everything else isnt. V6 treats them as the different cars they actually are.

Always fresh

The market moves every day. Prices drop, listings vanish, new ones land and shift the median. The engine rescores everything as that happens, so the fair price you see is the fair price right now, not last week's.

Honest about uncertainty

Rare car? Five comparables across the whole island? The engine knows it doesnt really know. So it shrinks the estimate toward the broader segment and gives you a wider band. No false confidence dressed up as a tight number.

Lemon-aware

A car that looks like an absolute steal but smells like a salvage rebuild gets flagged, not celebrated. The engine cross-checks Value against Quality. When they disagree by alot, thats a signal worth pausing on.

Architecture � Step 0

Two axes. One score.

Every listing gets two independent scores. Value asks "is this priced well for what it is?" with 7 dimensions of price-relevance. Quality asks "is this car actually what it claims to be?" across 6 dimensions of condition and confidence. The final deal score is the two of them multiplied. Cheap-but-suspect and expensive-but-pristine both look bad on the chart, for very different reasons.

Value � 7 dimensions

Asking price compared to peers. Mileage, year, engine size, fuel type, gearbox, body style, and where the asking price sits in the segment all feed in. A car priced 20% under peer median scores high on Value. Value alone isnt the deal though.

Quality � 6 dimensions

How complete the description is, how many photos and how good they are, listing freshness, salvage / accident keywords, and a handful of identity-confidence checks. Quality is the "is this real?" axis.

Why multiply? A 9/10 Value times 9/10 Quality (= 81) beats a 10/10 Value times 4/10 Quality (= 40) every single time. Thats how good buyers actually think when they walk a forecourt.

The pipeline

From raw listing to 0�100 score, in seven steps.

Every car flows through the same seven-stage refinery. Each stage adds confidence, removes noise, and guards against a specific failure mode the engine has learned to recognise the hard way.

STEP 01

Ingest & validate

Reject impossible data at the door. Future years, 999,999 km mileage, location-only descriptions, the lot. Garbage-in is the cheapest bug to fix, so we fix it first.

STEP 02

Find peer cars

Build the comparison group using Gower distance, a similarity metric that handles continuous fields (mileage, year) and categorical ones (fuel, body) in the same equation. No bucketing, no fudging.

STEP 03

Weight by similarity

Apply the Epanechnikov kernel so near-identical peers count fully and edge cases fade out gently. No cliff edges where one extra km of mileage flips a peer in or out of the group.

STEP 04

Compute fair price

Take the kernel-weighted price of the peer cloud. Thats what the market is asking for this exact car shape, not for the model in general. The difference is usually �500 to �2,000.

STEP 05

Bayesian shrinkage

If the peer cloud is small or noisy, blend the local estimate toward the broader segment median. The shrinkage strength k = s�_within / s�_between adapts per segment, automatically. Nothing hand-tuned.

STEP 06

Lemon detection

Salvage and accident keywords, plus the Akerlof V�Q gap. A wide gap between Value (cheap) and Quality (poor) means the price is cheap for a reason. Worth asking what that reason is.

STEP 07

Final 0�100 score

Multiply, normalise, classify. Green at 70 and above (steal), amber 40 to 70 (fair), red below 40 (above market or flagged). You also get a fair-price band so you know how confident the number actually is.

+ ALWAYS-ON

Survivorship correction

Cars that vanish fast were probably good deals. The engine keeps a 60-day window of delisted listings with time decay, so peer prices reflect the real market, not just what stuck around because nobody wanted it.

Concept � Step 02

Finding peer cars without faking it.

The naive way is to bucket cars: "2018 A4 diesel manual" goes in one bucket and you compare to the bucket median. That breaks the second a car straddles a boundary. A 2017 A4 is basically the same car as a 2018, but a hard bucket says theyre strangers.

Gower distance smooths this out. It computes a 0�1 similarity across all 13 dimensions at once, mixing continuous fields (mileage, year) with categorical ones (fuel, body, drivetrain) into one unified score. Closer = more weight in the next step.

d(A,B) = S w? � d?(A,B) / S w?

where d handles type per dimension

Concept � Step 03

Closer peers count more. Smoothly.

The Epanechnikov kernel is the bell curve's well-behaved cousin: smooth at the centre, zero at the edges, with the lowest mean-square error of any finite-support kernel. In plain English: peers near your car get full weight, peers at the edge of similarity fade gently out, and theres no arbitrary on/off threshold anywhere.

That gentleness matters alot. A binary "in/out" peer rule gives jumpy estimates the moment one peer leaves or joins the group. Smooth weighting gives smooth scores. Smooth scores are honest scores.

Concept � Step 05

Few peers? Pull toward truth.

Some cars are common. There are 200+ Toyota Yaris listings live right now, easy. Other cars are unicorns. A 2018 Porsche Cayman GTS, manual, might have three peers in the whole country.

Three peers can lie. Maybe one is mispriced and drags the median. Bayesian shrinkage treats the small-peer estimate as suggestive, not gospel, and pulls it toward the broader segment mean. How hard it pulls is proportional to how noisy the local data is.

The shrinkage strength k is computed per segment from real data, not a hand-picked constant. Common cars stay close to their peer cloud. Rare cars relax toward sanity.

Concept � Step 06

When cheap means cheap-for-a-reason.

Economist George Akerlof won a Nobel for noticing that information asymmetry kills used-car markets. Sellers know the lemon. Buyers dont. The market unravels. V6 turns Akerlofs insight into one specific test: flag any car where Value scores high but Quality scores low.

Thats the lemon signal. A car priced 30% under peer median with a thin description, no service history, salvage keywords and stale photos isnt a steal. Its a story you havent heard yet.

Cars in the lemon zone dont get celebrated as great deals. They get a flag, so you know to ask the right questions before you wire a deposit.

Inside the engine

The thirteen dimensions.

Seven dimensions ask "whats it worth?". Six ask "is it real?". Together they describe a car the way a good mechanic would, not the way a spreadsheet would.

Value � 7 dimensions

price relevance

Make � model � variant

hierarchical identity

Year of registration

era-aware tax band

Mileage

km vs peer median

Engine size

cc and HP class

Fuel type

petrol � diesel � EV � hybrid

Body style

sedan � suv � coupe � estate

Gearbox & drivetrain

manual � auto � 4�4 affect peer set

Quality � 6 dimensions

condition signals

Description depth

real specs vs filler

Photo set

count, resolution, freshness

Service & history

service-book signals

Salvage / accident keywords

denial-aware regex

Identity confidence

cross-source coherence

Listing freshness

days live � price drops

How we got here

Six versions of trying to be honest.

Each version fixed something the last one got wrong. Each version is recorded in production history with its scores intact, so we know exactly when the engine got smarter and which calls would have flipped in hindsight.

Median by bucket

naive baseline

Group cars by hard buckets, compare to bucket median. Worked at all of three steps wide. Broke at boundaries.

Linear regression on 4 features

first model

Year, mileage, engine, fuel. Better, but treated all cars as one global market. A 2018 Yaris and a 2018 Cayman aren't on the same line.

Per-segment models

segmentation

One model per segment. Fixed the global-line problem and then introduced a new one. Rare segments had three data points and lied confidently.

Gower distance + KNN

similarity-first

Switched to per-listing peer discovery. Mixed continuous and categorical fields cleanly for the first time. Still binary in/out peers though, jumpy at the edges.

Kernel weighting + first lemon flag

smoothing

Epanechnikov kernel killed the cliff edges. First crude lemon detector via salvage-keyword regex. Still no shrinkage though, so sparse cars stayed shaky.

Value � Quality + adaptive shrinkage

live � current

The current engine. Two-axis multiplicative score, adaptive Bayesian shrinkage that auto-tunes per segment, Akerlof V�Q gap for lemon detection, survivorship-bias correction over a 60-day delisted window with time decay, and full-market batch rescoring in 45 to 90 seconds. Five years of bad calls so you dont have to make them.

Gower 13-D Epanechnikov kernel Adaptive Bayesian shrink V � Q lemon flag Survivorship correction 500+ tests

Now go check a real car.

Every listing in Cyprus, scored against its real peers, with the fair-price band shown. Free, forever.

Search the market Read the weekly report

Sources

The papers behind the pricing engine

None of the maths here is new. Every component of the V6 scorer traces back to a foundational paper. If you want to go deeper than a blog post, these are the originals.

[1]
Gower, J. C. (1971). A general coefficient of similarity and some of its properties.
Biometrics 27(4), 857–871.

The original definition of Gower distance. Lets us blend continuous fields (year, mileage, price) with categorical ones (fuel, body, drivetrain) into a single 0–1 similarity score. The peer-finding step in V6 is built on this.
[2]
Epanechnikov, V. A. (1969). Non-parametric estimation of a multivariate probability density.
Theory of Probability and Its Applications 14(1), 153–158.

Introduces the Epanechnikov kernel, the optimal smoothing kernel under mean-square error. V6 uses it to weight peers by similarity instead of cutting them off at an arbitrary threshold.
[3]
Nadaraya, E. A. (1964). On estimating regression.
Theory of Probability and Its Applications 9(1), 141–142. (Companion: Watson, G. S. 1964, Sankhyā A 26, 359–372.)

The Nadaraya–Watson kernel regression estimator. Combine Gower distance with the Epanechnikov kernel and you have the framework V6 uses to estimate a fair price from local peers.
[4]
James, W. & Stein, C. (1961). Estimation with quadratic loss.
Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, 361–379.

The James–Stein result that started shrinkage estimation. V6's adaptive Bayesian shrinkage pulls thin-segment estimates toward a prior with a strength proportional to local noise — direct descendant of this idea.
[5]
Akerlof, G. A. (1970). The market for "lemons": Quality uncertainty and the market mechanism.
The Quarterly Journal of Economics 84(3), 488–500.

The Nobel-winning paper on used-car markets and information asymmetry. V6's Value × Quality decomposition exists specifically to surface Akerlof's lemons: cars that look cheap on price but score low on everything else.
[6]
Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots.
Journal of the American Statistical Association 74(368), 829–836.

The LOWESS / LOESS paper. V6 isn't strictly LOESS, but the engine inherits its bias against fitting through outliers and its preference for locally-weighted estimates over hard buckets.
[7]
Heckman, J. J. (1979). Sample selection bias as a specification error.
Econometrica 47(1), 153–161.

The classic survivorship-bias paper. Sold cars vanish; cars that linger don't. V6's 60-day delisted-listing memory window with time decay is our local answer to the problem Heckman framed.

Links go directly to the publishers. A few are paywalled by the journals; preprints and free copies are usually findable on Google Scholar by paper title.