Retail AI

Inside the Gem Engine: Better Decisions Drive Higher Conversion

Primoz Zajsek

Every shopper journey starts with a vague intent. "I'm looking for a TV." Not a spec, not a model number, but a situation: a living room, a budget, a use case that has not yet been translated into product attributes. Most AI shopping tools take that intent and search for products whose descriptions match the words the shopper used, rank the results by relevance, and return a list. The shopper is left to figure out which one is actually right for them. The Gem Engine works differently. It does not search. It decides.

What a decision layer is

A decision layer sits between what a shopper says and what a catalog contains, and does the translation work that neither the shopper nor a search algorithm can do reliably. The job has four distinct stages: understanding intent and discovering needs, structuring and validating requirements, scoring every candidate product, and recommending with honest trade-offs. Each stage is handled separately with different logic applied at each step, and that separation is what makes the recommendations accountable.

Stage one: understand and discover

A shopper who says "I'm looking for a TV for gaming" has not told the system what they need. They have told it their situation. The first stage does two things simultaneously: it maps the natural language query to the right product category in the catalog, and it decides what to do next, whether that is asking clarifying questions, refining existing requirements, or recommending directly if enough is already known.

When questions are needed they are not generic. They are generated from the category's buying guides and the live catalog data, and the options shown to the shopper reflect products that are actually in stock. This is what makes the questions feel like talking to an expert rather than filling out a form. For a TV query this means asking what the shopper mainly uses it for, how far they sit from the screen, and what their budget is. Three questions whose answers, combined, are enough to filter a catalog of hundreds down to a handful of genuine matches.

Stage two: structure and validate

The shopper's answers are transformed into structured requirements, each one mapped to real product attributes in the catalog. The requirement "65 to 75 inch screen" becomes a precise filter expression against the screen size attribute in the product database. The field has to exist in the catalog for the filter to run. There is no hallucinating specifications that are not there.

Each requirement then passes through two parallel processes. The first runs it as an exact database query, returning a precise match count rather than a relevance score. When the system says 82 products match a requirement, that number is exact, verified against live inventory at the moment the shopper is asking. The second evaluates the quality of each requirement through what Gem calls SafeGuard, a dedicated evaluation layer that assigns every requirement one of five dispositions before it influences a single recommendation.

Clear, actionable requirements proceed to filtering normally. Requirements that are valid but cannot be expressed as database filters, such as "good picture quality," are flagged for per-product AI evaluation instead. Ambiguous requirements trigger a clarifying question rather than a guess. Requirements that raise advisory concerns, such as a budget that is tight for the features requested, surface a transparent warning to the shopper. Requirements that raise safety or compliance issues block recommendations entirely until resolved. SafeGuard exists for one reason: to prevent the engine from confidently recommending the wrong product.

Stage three: score and select

Once requirements are validated, every candidate product in the catalog is scored against them using a hybrid approach. Requirements that can be expressed as precise database filters, covering attributes like screen size, refresh rate, and price range, are scored instantly via exact attribute matching. Either the product satisfies the condition or it does not. Requirements that cannot be reduced to a database filter, such as subjective preferences and qualitative attributes, are evaluated per product by dedicated AI models. The two scores merge into a single confidence score per product.

The result is a ranked pool of candidates where the scoring logic is transparent. Quantifiable requirements were matched exactly and qualitative requirements were evaluated per product, and every product's position in the ranking can be explained in terms of the requirements it satisfies and the ones it does not. If no products satisfy all requirements, the system does not surface approximate matches hoping the shopper will not notice. It computes which requirements to relax first, preferring negotiable ones over non-negotiable ones, and offers the shopper a clear path forward.

Stage four: recommend and guide

The anchor recommendation, the best overall match, is selected along with alternatives, each with explicit trade-offs covering why it was chosen, where it excels, and where it falls short. A recommendation without honest trade-offs is not guidance. It is a guess dressed up as confidence. The shopper who knows their recommended product is slightly above budget but the best match for every other requirement can make an informed decision in a way that a shopper who receives a product with no context cannot.

The conversation does not end at the recommendation. Contextual follow-up suggestions let the shopper refine, compare, or explore without starting over, and each follow-up carries the validated requirements forward so the system does not ask the same questions again.

What this looks like in practice

A real example from a live catalog: a shopper queries "kupujem tv," Slovenian for "I'm buying a TV." The system identifies the TV category, generates four questions based on the category's buying guide, and receives the shopper's answers. It extracts four structured requirements: screen size 65 to 75 inches, refresh rate 120Hz or above, display technology suited for dark rooms, and HDR support.

The first three are filter-computable. Screen size returns 100 exact matches, refresh rate returns 136, and display technology returns 82. The intersection of products satisfying all three is a focused pool ready for scoring. HDR support cannot be reliably expressed as a database filter for this catalog, so it goes to the per-product AI scorer instead. The hybrid score merges both, an anchor recommendation surfaces with alternatives, each with explicit trade-offs, and the total time from query to recommendation is measured in seconds. The shopper sees each stage as it completes: requirements appearing, match counts populating, recommendations streaming.

Why the architecture matters

The practical outcome is a shopping experience that performs differently from what most retailers have seen in AI demos. A demo is built on clean data and clear queries. The Gem Engine is built for production, for the full range of real shopper behaviour, ambiguous inputs, incomplete catalogs, and edge cases a demo never encounters. The SafeGuard evaluation layer catches quality issues before they reach the shopper. The hybrid scoring handles both precise and subjective requirements without collapsing them into a single probabilistic pass. Because the architecture is built around exact matching and transparent scoring rather than probabilistic generation, every recommendation is auditable. The retailer can see exactly why a product was recommended. The shopper can see exactly what requirements it satisfies. There is no black box.

The Gem Engine is live and deployable on your existing storefront in 2 to 4 weeks, without rebuilding your stack. Book a demo and we will build a live version on your actual catalog.