Skip to main content
Live in production · Rescores every active listing continuously

How we know what your car is worth.

No vibes. No "the market says". Every car on AllCars gets a 0–100 score against its real peer group: same model, same era, same engine, same condition signals. We do this with a 13-dimension similarity engine and a bunch of adaptive maths that copes with rare cars, sparse data, and the occassional outright lemon.

Dimensions
13
Value × Quality axes
Live comparables
11k+
Active across Cyprus
Full rescore
45–90s
Whole market in batch
Score range
0–100
Green · Amber · Red

Two axes. One score.

Every listing gets two independent scores. Value asks "is this priced well for what it is?" with 7 dimensions of price-relevance. Quality asks "is this car actually what it claims to be?" across 6 dimensions of condition and confidence. The final deal score is the two of them multiplied. Cheap-but-suspect and expensive-but-pristine both look bad on the chart, for very different reasons.

Quality ? condition · history · paperwork Value ? cheaper for what it is expensive · sketchy expensive · pristine cheap · sketchy (lemon zone) cheap · pristine ? Deal score V × Q = 87

Asking price compared to peers. Mileage, year, engine size, fuel type, gearbox, body style, and where the asking price sits in the segment all feed in. A car priced 20% under peer median scores high on Value. Value alone isnt the deal though.

How complete the description is, how many photos and how good they are, listing freshness, salvage / accident keywords, and a handful of identity-confidence checks. Quality is the "is this real?" axis.

Why multiply? A 9/10 Value times 9/10 Quality (= 81) beats a 10/10 Value times 4/10 Quality (= 40) every single time. Thats how good buyers actually think when they walk a forecourt.

From raw listing to 0–100 score, in seven steps.

Every car flows through the same seven-stage refinery. Each stage adds confidence, removes noise, and guards against a specific failure mode the engine has learned to recognise the hard way.

01 Ingest + validate 02 Find peers Gower 13-D 03 Weight peers Epanechnikov 04 Fair price kernel-weighted 05 Shrink Bayesian 06 Lemon check V × Q gap ? 07 Score 0–100 + band
STEP 01

Ingest & validate

Reject impossible data at the door. Future years, 999,999 km mileage, location-only descriptions, the lot. Garbage-in is the cheapest bug to fix, so we fix it first.

STEP 02

Find peer cars

Build the comparison group using Gower distance, a similarity metric that handles continuous fields (mileage, year) and categorical ones (fuel, body) in the same equation. No bucketing, no fudging.

STEP 03

Weight by similarity

Apply the Epanechnikov kernel so near-identical peers count fully and edge cases fade out gently. No cliff edges where one extra km of mileage flips a peer in or out of the group.

STEP 04

Compute fair price

Take the kernel-weighted price of the peer cloud. Thats what the market is asking for this exact car shape, not for the model in general. The difference is usually €500 to €2,000.

STEP 05

Bayesian shrinkage

If the peer cloud is small or noisy, blend the local estimate toward the broader segment median. The shrinkage strength k = s²within / s²between adapts per segment, automatically. Nothing hand-tuned.

STEP 06

Lemon detection

Salvage and accident keywords, plus the Akerlof V–Q gap. A wide gap between Value (cheap) and Quality (poor) means the price is cheap for a reason. Worth asking what that reason is.

STEP 07

Final 0–100 score

Multiply, normalise, classify. Green at 70 and above (steal), amber 40 to 70 (fair), red below 40 (above market or flagged). You also get a fair-price band so you know how confident the number actually is.

+ ALWAYS-ON

Survivorship correction

Cars that vanish fast were probably good deals. The engine keeps a 60-day window of delisted listings with time decay, so peer prices reflect the real market, not just what stuck around because nobody wanted it.

Finding peer cars without faking it.

The naive way is to bucket cars: "2018 A4 diesel manual" goes in one bucket and you compare to the bucket median. That breaks the second a car straddles a boundary. A 2017 A4 is basically the same car as a 2018, but a hard bucket says theyre strangers.

Gower distance smooths this out. It computes a 0–1 similarity across all 13 dimensions at once, mixing continuous fields (mileage, year) with categorical ones (fuel, body, drivetrain) into one unified score. Closer = more weight in the next step.

d(A,B) = S w? · d?(A,B) / S w?
where d handles type per dimension
d < 0.10 · twins d < 0.20 · close d < 0.30 · loose Your car strong peer weak peer out of group
distance from your car ? weight ? your car -h +h (bandwidth)

Closer peers count more. Smoothly.

The Epanechnikov kernel is the bell curve's well-behaved cousin: smooth at the centre, zero at the edges, with the lowest mean-square error of any finite-support kernel. In plain English: peers near your car get full weight, peers at the edge of similarity fade gently out, and theres no arbitrary on/off threshold anywhere.

That gentleness matters alot. A binary "in/out" peer rule gives jumpy estimates the moment one peer leaves or joins the group. Smooth weighting gives smooth scores. Smooth scores are honest scores.

Few peers? Pull toward truth.

Some cars are common. There are 200+ Toyota Yaris listings live right now, easy. Other cars are unicorns. A 2018 Porsche Cayman GTS, manual, might have three peers in the whole country.

Three peers can lie. Maybe one is mispriced and drags the median. Bayesian shrinkage treats the small-peer estimate as suggestive, not gospel, and pulls it toward the broader segment mean. How hard it pulls is proportional to how noisy the local data is.

The shrinkage strength k is computed per segment from real data, not a hand-picked constant. Common cars stay close to their peer cloud. Rare cars relax toward sanity.

SEGMENT ANCHOR €14,800 all 2018 sport coupes n = 142 peers RAW €11,200 n = 3 peers · s high SHRUNKEN €13,400 µ^ = (n · µ_local + k · µ_segment) / (n + k) k = s²_within / s²_between · adapts per segment, automatically
expensive · poor expensive · pristine cheap · poor ? LEMON ZONE cheap · pristine ? STEAL Quality ? Value ?

When cheap means cheap-for-a-reason.

Economist George Akerlof won a Nobel for noticing that information asymmetry kills used-car markets. Sellers know the lemon. Buyers dont. The market unravels. V6 turns Akerlofs insight into one specific test: flag any car where Value scores high but Quality scores low.

Thats the lemon signal. A car priced 30% under peer median with a thin description, no service history, salvage keywords and stale photos isnt a steal. Its a story you havent heard yet.

Cars in the lemon zone dont get celebrated as great deals. They get a flag, so you know to ask the right questions before you wire a deposit.

The thirteen dimensions.

Seven dimensions ask "whats it worth?". Six ask "is it real?". Together they describe a car the way a good mechanic would, not the way a spreadsheet would.

Value · 7 dimensions

price relevance
V1
Make · model · variant
hierarchical identity
V2
Year of registration
era-aware tax band
V3
Mileage
km vs peer median
V4
Engine size
cc and HP class
V5
Fuel type
petrol · diesel · EV · hybrid
V6
Body style
sedan · suv · coupe · estate
V7
Gearbox & drivetrain
manual · auto · 4×4 affect peer set

Quality · 6 dimensions

condition signals
Q1
Description depth
real specs vs filler
Q2
Photo set
count, resolution, freshness
Q3
Service & history
service-book signals
Q4
Salvage / accident keywords
denial-aware regex
Q5
Identity confidence
cross-source coherence
Q6
Listing freshness
days live · price drops

Six versions of trying to be honest.

Each version fixed something the last one got wrong. Each version is recorded in production history with its scores intact, so we know exactly when the engine got smarter and which calls would have flipped in hindsight.

V1

Median by bucket

Group cars by hard buckets, compare to bucket median. Worked at all of three steps wide. Broke at boundaries.

V2

Linear regression on 4 features

Year, mileage, engine, fuel. Better, but treated all cars as one global market. A 2018 Yaris and a 2018 Cayman aren't on the same line.

V3

Per-segment models

One model per segment. Fixed the global-line problem and then introduced a new one. Rare segments had three data points and lied confidently.

V4

Gower distance + KNN

Switched to per-listing peer discovery. Mixed continuous and categorical fields cleanly for the first time. Still binary in/out peers though, jumpy at the edges.

V5

Kernel weighting + first lemon flag

Epanechnikov kernel killed the cliff edges. First crude lemon detector via salvage-keyword regex. Still no shrinkage though, so sparse cars stayed shaky.

V6

Value × Quality + adaptive shrinkage

The current engine. Two-axis multiplicative score, adaptive Bayesian shrinkage that auto-tunes per segment, Akerlof V–Q gap for lemon detection, survivorship-bias correction over a 60-day delisted window with time decay, and full-market batch rescoring in 45 to 90 seconds. Five years of bad calls so you dont have to make them.

Gower 13-D Epanechnikov kernel Adaptive Bayesian shrink V × Q lemon flag Survivorship correction 500+ tests

Now go check a real car.

Every listing in Cyprus, scored against its real peers, with the fair-price band shown. Free, forever.

The papers behind the pricing engine

None of the maths here is new. Every component of the V6 scorer traces back to a foundational paper. If you want to go deeper than a blog post, these are the originals.

  1. [1]
    Gower, J. C. (1971). A general coefficient of similarity and some of its properties.
    Biometrics 27(4), 857–871.

    The original definition of Gower distance. Lets us blend continuous fields (year, mileage, price) with categorical ones (fuel, body, drivetrain) into a single 0–1 similarity score. The peer-finding step in V6 is built on this.

  2. [2]
    Epanechnikov, V. A. (1969). Non-parametric estimation of a multivariate probability density.
    Theory of Probability and Its Applications 14(1), 153–158.

    Introduces the Epanechnikov kernel, the optimal smoothing kernel under mean-square error. V6 uses it to weight peers by similarity instead of cutting them off at an arbitrary threshold.

  3. [3]
    Nadaraya, E. A. (1964). On estimating regression.
    Theory of Probability and Its Applications 9(1), 141–142. (Companion: Watson, G. S. 1964, Sankhyā A 26, 359–372.)

    The Nadaraya–Watson kernel regression estimator. Combine Gower distance with the Epanechnikov kernel and you have the framework V6 uses to estimate a fair price from local peers.

  4. [4]
    James, W. & Stein, C. (1961). Estimation with quadratic loss.
    Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, 361–379.

    The James–Stein result that started shrinkage estimation. V6's adaptive Bayesian shrinkage pulls thin-segment estimates toward a prior with a strength proportional to local noise — direct descendant of this idea.

  5. [5]
    Akerlof, G. A. (1970). The market for "lemons": Quality uncertainty and the market mechanism.
    The Quarterly Journal of Economics 84(3), 488–500.

    The Nobel-winning paper on used-car markets and information asymmetry. V6's Value × Quality decomposition exists specifically to surface Akerlof's lemons: cars that look cheap on price but score low on everything else.

  6. [6]
    Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots.
    Journal of the American Statistical Association 74(368), 829–836.

    The LOWESS / LOESS paper. V6 isn't strictly LOESS, but the engine inherits its bias against fitting through outliers and its preference for locally-weighted estimates over hard buckets.

  7. [7]
    Heckman, J. J. (1979). Sample selection bias as a specification error.
    Econometrica 47(1), 153–161.

    The classic survivorship-bias paper. Sold cars vanish; cars that linger don't. V6's 60-day delisted-listing memory window with time decay is our local answer to the problem Heckman framed.

Links go directly to the publishers. A few are paywalled by the journals; preprints and free copies are usually findable on Google Scholar by paper title.