How we know what your car is worth.
No vibes. No "the market says". Every car on AllCars gets a 0–100 score against its real peer group: same model, same era, same engine, same condition signals. We do this with a 13-dimension similarity engine and a bunch of adaptive maths that copes with rare cars, sparse data, and the occassional outright lemon.
Real comparables. Real maths. No vibes.
Most "fair price" tools throw a national average at you and call it done. That works for fridges. Not cars. A 2018 Audi A4 with 60,000 km, manual gearbox, full service history, sitting in Limassol is not the same car as a 2018 A4 in Nicosia with 180,000 km and a salvage title. The badge is identical. Everything else isnt. V6 treats them as the different cars they actually are.
Always fresh
The market moves every day. Prices drop, listings vanish, new ones land and shift the median. The engine rescores everything as that happens, so the fair price you see is the fair price right now, not last week's.
Honest about uncertainty
Rare car? Five comparables across the whole island? The engine knows it doesnt really know. So it shrinks the estimate toward the broader segment and gives you a wider band. No false confidence dressed up as a tight number.
Lemon-aware
A car that looks like an absolute steal but smells like a salvage rebuild gets flagged, not celebrated. The engine cross-checks Value against Quality. When they disagree by alot, thats a signal worth pausing on.
Two axes. One score.
Every listing gets two independent scores. Value asks "is this priced well for what it is?" with 7 dimensions of price-relevance. Quality asks "is this car actually what it claims to be?" across 6 dimensions of condition and confidence. The final deal score is the two of them multiplied. Cheap-but-suspect and expensive-but-pristine both look bad on the chart, for very different reasons.
Asking price compared to peers. Mileage, year, engine size, fuel type, gearbox, body style, and where the asking price sits in the segment all feed in. A car priced 20% under peer median scores high on Value. Value alone isnt the deal though.
How complete the description is, how many photos and how good they are, listing freshness, salvage / accident keywords, and a handful of identity-confidence checks. Quality is the "is this real?" axis.
Why multiply? A 9/10 Value times 9/10 Quality (= 81) beats a 10/10 Value times 4/10 Quality (= 40) every single time. Thats how good buyers actually think when they walk a forecourt.
From raw listing to 0–100 score, in seven steps.
Every car flows through the same seven-stage refinery. Each stage adds confidence, removes noise, and guards against a specific failure mode the engine has learned to recognise the hard way.
Ingest & validate
Reject impossible data at the door. Future years, 999,999 km mileage, location-only descriptions, the lot. Garbage-in is the cheapest bug to fix, so we fix it first.
Find peer cars
Build the comparison group using Gower distance, a similarity metric that handles continuous fields (mileage, year) and categorical ones (fuel, body) in the same equation. No bucketing, no fudging.
Weight by similarity
Apply the Epanechnikov kernel so near-identical peers count fully and edge cases fade out gently. No cliff edges where one extra km of mileage flips a peer in or out of the group.
Compute fair price
Take the kernel-weighted price of the peer cloud. Thats what the market is asking for this exact car shape, not for the model in general. The difference is usually €500 to €2,000.
Bayesian shrinkage
If the peer cloud is small or noisy, blend the local estimate toward the broader segment median. The shrinkage strength k = s²within / s²between adapts per segment, automatically. Nothing hand-tuned.
Lemon detection
Salvage and accident keywords, plus the Akerlof V–Q gap. A wide gap between Value (cheap) and Quality (poor) means the price is cheap for a reason. Worth asking what that reason is.
Final 0–100 score
Multiply, normalise, classify. Green at 70 and above (steal), amber 40 to 70 (fair), red below 40 (above market or flagged). You also get a fair-price band so you know how confident the number actually is.
Survivorship correction
Cars that vanish fast were probably good deals. The engine keeps a 60-day window of delisted listings with time decay, so peer prices reflect the real market, not just what stuck around because nobody wanted it.
Finding peer cars without faking it.
The naive way is to bucket cars: "2018 A4 diesel manual" goes in one bucket and you compare to the bucket median. That breaks the second a car straddles a boundary. A 2017 A4 is basically the same car as a 2018, but a hard bucket says theyre strangers.
Gower distance smooths this out. It computes a 0–1 similarity across all 13 dimensions at once, mixing continuous fields (mileage, year) with categorical ones (fuel, body, drivetrain) into one unified score. Closer = more weight in the next step.
Closer peers count more. Smoothly.
The Epanechnikov kernel is the bell curve's well-behaved cousin: smooth at the centre, zero at the edges, with the lowest mean-square error of any finite-support kernel. In plain English: peers near your car get full weight, peers at the edge of similarity fade gently out, and theres no arbitrary on/off threshold anywhere.
That gentleness matters alot. A binary "in/out" peer rule gives jumpy estimates the moment one peer leaves or joins the group. Smooth weighting gives smooth scores. Smooth scores are honest scores.
Few peers? Pull toward truth.
Some cars are common. There are 200+ Toyota Yaris listings live right now, easy. Other cars are unicorns. A 2018 Porsche Cayman GTS, manual, might have three peers in the whole country.
Three peers can lie. Maybe one is mispriced and drags the median. Bayesian shrinkage treats the small-peer estimate as suggestive, not gospel, and pulls it toward the broader segment mean. How hard it pulls is proportional to how noisy the local data is.
The shrinkage strength k is computed per segment from real data, not a hand-picked constant. Common cars stay close to their peer cloud. Rare cars relax toward sanity.
When cheap means cheap-for-a-reason.
Economist George Akerlof won a Nobel for noticing that information asymmetry kills used-car markets. Sellers know the lemon. Buyers dont. The market unravels. V6 turns Akerlofs insight into one specific test: flag any car where Value scores high but Quality scores low.
Thats the lemon signal. A car priced 30% under peer median with a thin description, no service history, salvage keywords and stale photos isnt a steal. Its a story you havent heard yet.
Cars in the lemon zone dont get celebrated as great deals. They get a flag, so you know to ask the right questions before you wire a deposit.
The thirteen dimensions.
Seven dimensions ask "whats it worth?". Six ask "is it real?". Together they describe a car the way a good mechanic would, not the way a spreadsheet would.
Value · 7 dimensions
price relevanceQuality · 6 dimensions
condition signalsSix versions of trying to be honest.
Each version fixed something the last one got wrong. Each version is recorded in production history with its scores intact, so we know exactly when the engine got smarter and which calls would have flipped in hindsight.
Median by bucket
naive baselineGroup cars by hard buckets, compare to bucket median. Worked at all of three steps wide. Broke at boundaries.
Linear regression on 4 features
first modelYear, mileage, engine, fuel. Better, but treated all cars as one global market. A 2018 Yaris and a 2018 Cayman aren't on the same line.
Per-segment models
segmentationOne model per segment. Fixed the global-line problem and then introduced a new one. Rare segments had three data points and lied confidently.
Gower distance + KNN
similarity-firstSwitched to per-listing peer discovery. Mixed continuous and categorical fields cleanly for the first time. Still binary in/out peers though, jumpy at the edges.
Kernel weighting + first lemon flag
smoothingEpanechnikov kernel killed the cliff edges. First crude lemon detector via salvage-keyword regex. Still no shrinkage though, so sparse cars stayed shaky.
Value × Quality + adaptive shrinkage
live · currentThe current engine. Two-axis multiplicative score, adaptive Bayesian shrinkage that auto-tunes per segment, Akerlof V–Q gap for lemon detection, survivorship-bias correction over a 60-day delisted window with time decay, and full-market batch rescoring in 45 to 90 seconds. Five years of bad calls so you dont have to make them.
Now go check a real car.
Every listing in Cyprus, scored against its real peers, with the fair-price band shown. Free, forever.
The papers behind the pricing engine
None of the maths here is new. Every component of the V6 scorer traces back to a foundational paper. If you want to go deeper than a blog post, these are the originals.
-
[1]Gower, J. C. (1971). A general coefficient of similarity and some of its properties.Biometrics 27(4), 857–871.
The original definition of Gower distance. Lets us blend continuous fields (year, mileage, price) with categorical ones (fuel, body, drivetrain) into a single 0–1 similarity score. The peer-finding step in V6 is built on this.
-
[2]Epanechnikov, V. A. (1969). Non-parametric estimation of a multivariate probability density.Theory of Probability and Its Applications 14(1), 153–158.
Introduces the Epanechnikov kernel, the optimal smoothing kernel under mean-square error. V6 uses it to weight peers by similarity instead of cutting them off at an arbitrary threshold.
-
[3]Nadaraya, E. A. (1964). On estimating regression.Theory of Probability and Its Applications 9(1), 141–142. (Companion: Watson, G. S. 1964, Sankhyā A 26, 359–372.)
The Nadaraya–Watson kernel regression estimator. Combine Gower distance with the Epanechnikov kernel and you have the framework V6 uses to estimate a fair price from local peers.
-
[4]James, W. & Stein, C. (1961). Estimation with quadratic loss.Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, 361–379.
The James–Stein result that started shrinkage estimation. V6's adaptive Bayesian shrinkage pulls thin-segment estimates toward a prior with a strength proportional to local noise — direct descendant of this idea.
-
[5]Akerlof, G. A. (1970). The market for "lemons": Quality uncertainty and the market mechanism.The Quarterly Journal of Economics 84(3), 488–500.
The Nobel-winning paper on used-car markets and information asymmetry. V6's Value × Quality decomposition exists specifically to surface Akerlof's lemons: cars that look cheap on price but score low on everything else.
-
[6]Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots.Journal of the American Statistical Association 74(368), 829–836.
The LOWESS / LOESS paper. V6 isn't strictly LOESS, but the engine inherits its bias against fitting through outliers and its preference for locally-weighted estimates over hard buckets.
-
[7]Heckman, J. J. (1979). Sample selection bias as a specification error.Econometrica 47(1), 153–161.
The classic survivorship-bias paper. Sold cars vanish; cars that linger don't. V6's 60-day delisted-listing memory window with time decay is our local answer to the problem Heckman framed.
Links go directly to the publishers. A few are paywalled by the journals; preprints and free copies are usually findable on Google Scholar by paper title.