How every car ends up here.
The internet says one thing, the market says another. Public car listings sit scattered across a dozen places, in a dozen formats, with the same physical car often appearing in three of them at once. The AllCars indexer takes that sprawl and turns it into one clean, deduplicated, enriched, continously-refreshed feed. So you can search the whole market the way youd search one website.
One feed. Always fresh. Nothing lost.
We index public listings the same way a search engine indexes the public web. Every listing, deduplicated against the rest, enriched with the specs and tax data the listing itself doesnt carry, and tracked over time so the price story is preserved even when an ad disappears off the face of the internet.
Search the whole market
Stop hopping between tabs. Every public Cyprus car listing in one search box, with the same filters, the same scoring, the same fair-price band. Doesnt matter where the ad originally lives.
Always fresh
The market moves. The indexer runs continuously, so new listings, price drops and removals show up within hours, not weeks. What you see is the market right now, not a Tuesday snapshot from three weeks ago.
Nothing lost
Every observation gets logged. When a listing vanishes, the history doesnt vanish with it. First asking price, every drop, time to disappear, all kept. The market has a memory now.
From raw listing to search result, in eight stages.
Every listing flows through the same eight stations. Each one is a small, sharp idea, and each one earned its place by catching a specific class of failure the indexer used to ship. Painfully.
Discover
Find new public listings as they appear. Only stuff thats already publicly visible to anyone with a browser. No walls, no private feeds, no logins.
Validate
Reject impossible data at the door. Future-dated years, half-a-million-kilometre mileage, location strings dressed up as descriptions, the lot. Caught before they touch the index.
Normalise
One canonical form per make, model, body type, and fuel, across spelling variants, chassis codes and language quirks. "W211" and "E-Class" finally agree theyre the same thing.
Parse
Pull structured tags out of free-form descriptions. Mileage hidden in a sentence, fuel type buried in a paragraph, service-history mentions, accident keywords. English, Greek, Greeklish, Russian and Russlish all handled.
Enrich
Bolt on the specs the listing doesnt carry. Horsepower, torque, fuel consumption, kerb weight, Cyprus road-tax band, all pulled from public vehicle databases. The listing becomes a vehicle.
Deduplicate
Three listings of the same physical car? One vehicle record, one timeline. Image fingerprints and spec match find the twins. Hard veto rules stop false merges from happening.
Track
Append-only observation log. Every price seen, every change, every disappearance, all preserved. The feed shows you today. The history shows you the whole story.
Index
Pre-compute everything search needs (enrichment, scoring, tags, road tax) into a hot index. Warm queries return in under half a second. The pricing engine plugs in here too.
Discovery, public listings only.
The indexer behaves like a polite reader. It looks at the same public pages a buyer would, in moderate cadences, with standard request hygiene. No private accounts, no walled areas, no personal-data harvesting.
What flows in is exactly whats already on the open web: the public listing. Make, model, year, mileage, price, public photos, the sellers own free-form description. Thats the entire input.
Anything inside a login wall, anything marked private, and anything that looks like personal contact data is left alone. The index reflects the public market. Full stop.
Reject impossible data at the door.
The cheapest bug to fix is the one that never enters the system. Validation runs before anything else: future-dated registrations, integer-overflow mileage, prices below a spare-parts floor, descriptions made of nothing but a city name, parts-and-accessories listings dressed up as cars.
Everything that survives this gate is at least plausibly a real car, which means downstream stages dont need defensive logic for the obviously broken cases.
One canonical name per car.
The same car shows up under five different names. Chassis codes in parentheses, brand variants spelled four ways, dealer-specific shorthand, language mixes. Without normalisation, search for one model and you miss half the market.
A rules engine collapses every variant down to one canonical identity per make, model, body and fuel type. So one search for "E-Class" finds them all, and the pricing engine peers them against each other properly.
Add the data the listing forgot.
A listing tells you what the seller chose to type. It rarely tells you horsepower, torque, fuel consumption, kerb weight, or what your road tax bill is going to look like in January. The indexer bolts these on from public vehicle databases the moment a model is recognised.
The Cyprus road-tax calculator is built in: three registration eras, CO2 bands, Euro surcharges. UK imports get the dual-rate display so theres no nasty surprise three weeks after you collect the car.
By the time a listing leaves enrichment, its no longer a listing. Its a fully described vehicle.
Same car, one timeline.
The same physical car often appears in three listings at once. Different prices, different photos, different descriptions. Without dedup, every search returns the same car five times over and the price chart looks like noise.
The deduper merges twins using two signal families: image fingerprints (perceptual hashes computed over public photos so the original images arent retained as personal data) and spec match (year + make + model + mileage + engine close enough to be the same car).
Hard veto rules stop false merges. Mismatch in body type, fuel, year, or colour and the merge is rejected, even if every other signal is screaming "match". Different cars stay different cars. Always.
Every observation, every change, kept.
The lifecycle layer is an append-only log of every observation. First asking price, every subsequent drop, the days the ad was live, the moment it disappeared. Nothing is overwritten. Nothing is forgotten.
When a listing vanishes, thats a signal too. Cars that disappear within 48 hours of a price drop probably sold. Cars that linger for 90 days probably didnt. The pricing engine reads this stream to correct for survivorship bias.
And the buyer gets the full price story instead of a snapshot.
Pre-computed. Hot. Sub-second.
When you tap "search", you dont want the engine to start thinking. You want the answer. The final stage materialises everything (enrichment, scoring, tags, road tax, deal band) into a hot index that warm queries hit in under half a second for typical filter combinations.
The pricing engine plugs in here. The browse experience plugs in here. Saved-search alerts plug in here. One source of truth, many surfaces.
When the upstream data changes, only the affected slice re-materialises. Incremental, not nuclear.
What's running, right now.
The pipeline is not a side project. It runs continuously, gets instrumented like infrastructure, and earns every line through an incident. Here's the order-of-magnitude shape of it.
Public market, public data, light footprint.
AllCars indexes a public market. We treat the underlying sources the way any considered reader would, and we draw a hard line around personal data. These rules arent aspirational. Theyre baked into the indexer itself.
Public listings only
Anything publicly visible to a logged-out browser is fair to read. Anything behind a wall is not. The indexer never attempts to access private feeds, paywalled content, or areas that require authentication.
No personal data
The index stores facts about cars, not facts about people. No personal contact details, no buyer or seller profiling, no tracking individuals across the network. The dataset describes vehicles in a public market.
Polite cadence
Refresh runs are paced to be a small fraction of normal public traffic and respect platform-level signals. The indexer is a quiet reader, not a load test.
Right to be removed
A removal request, a hide-from-index request, or a source-side opt-out is honoured promptly across the index, history, and search results. It's just me running this, message me on Telegram and the record is gone the same day.
Now go search the whole market.
One feed. Always fresh. Deduplicated, enriched, scored. Spend your attention on cars, not tabs.