Case Study Datasentics: Search query understanding use cases
Originally published on Datasentics.com
Search in eCommerce websites is one of the most important factors determining whether customers will make a transaction, thus either bringing revenue to the company or not. On Heureka, 1/3 of transactions have searched in their journey to conversion. Relevant results and filtering options lead to a better customer experience and increase the probability of customer conversion. Search queries are often unique and vary in time. The intention behind some queries can even slightly change throughout the year (depending on the season). Query understanding using machine learning techniques allows us to identify the product category of most of the queries.
Heureka Group is the largest price comparator and shopping advisor in Europe. Present in 9 countries in Central and Eastern Europe with over 23 million visitors per month and a network of over 55,000 online stores.
The overall goal of this project was to shorten the customer journey, specifically the search part. The idea was to understand the user’s intention behind a search query and according to this understanding provide more relevant results. Specifically, our aim was to being able to redirect them to the relevant product category and apply filters according to the information in the search query. For example, when one searches for „blue t-shirt Nike men M“ the goal is to redirect this user to the category „Men’s t-shirts“ and apply filters brand: Nike, size: M and colour: blue.
Although the client already had access to large volumes of useful data, it was not utilised to the best of its potential. The search queries were stored and occasionally analysed but not used to adjust search results or determine query properties based on historical clicks. Only when a user’s query exactly matched one of the categories' names or their pre-defined keywords, she was redirected to search results in the particular category. All other search results were displayed on a generic full-text results page, which does not offer filtering options. Category names and their keywords had to be manually maintained. The customer identified the untapped potential of their data and asked us to help them develop the Query understanding solution.
We had to overcome several challenges
- Very high number of categories: Products on Heureka are organised into more than 4.000 categories, some of them being very similar to each other.
- Specific parameters for each category: Each category has a specific set of attributes (e. g., brand, size, colour, type). These attributes are then used as filters for that category and help users find what they are looking for in a more efficient way. Since each category has a different set of attributes and values, we first needed to understand the query category (specified or implied). We wanted to determine with (high enough) certainty the category in which the search query belongs and then redirect the customer to this category results page. This way, the customer can take advantage of the offered filters and we can process the query further to extract category-specific attributes.
- No labelling available: Instead of labelled data we have utilized historical behavioural data, i. e., historical queries with the resulting click's category.
- Low latency required: When rendering a web page with search results, every millisecond counts (can have detrimental effect on leave rate and conversion rate). Our solution was required to analyse the query within a few dozens of milliseconds.
In more than one year of ongoing cooperation with Heureka, we first manually analysed queries and conversion rates on different search results pages. The analyses showed that various crucial business metrics like conversion rates and leave rates were much better for users who were redirected to results in a specific category. Therefore, we needed to be able to recognize the product category the user wants to search in with very high precision. To this end, we have designed a modular, extensible solution that is able to combine various detectors and manual rules; see the schema below. It first determines the product category using an ensemble of machine learning categorization models and then extracts various attributes of the query such as brand or colour. The categorization models are trained on behavioural data about search queries and the associated product category. To address the issue of seasonality and changing search trends, we use a large volume of training data from a whole past year and give higher weight to the most recent data.
Query understanding is a very complex task. One of the reasons is that one query can have multiple meanings or belong to multiple categories. Therefore, we have created a "Voting master/result entity decision", where the results of the ML models and manual rules are collected, filtered through a set of criteria and then returned to the search API to be shown to the users. We were able to do all of this while maintaining the requirement to keep total search latency below 80ms. The implementation of our solution allows for redirecting approximately half of all queries to their corresponding categories with precision above 95 %. For example, when customers search for "HDR monitor Samsung", they instantly get monitors with relevant filters checked, and can apply whatever additional filters that are available for this category.
The solution is currently being used on the Slovak and Czech Heureka web pages. As a result, we have improved search-relevant statistics, such as leave rate, CTR and click position.
- Improved leave rate by more than 12%
- When the results page is displayed, and the user chooses a product that they clicked on, the median position of this clicked product was enhanced by 25%
- Redirecting customers to categories allows them to use filters for their results
- With the category determined, we can continue working on understanding the rest of the query properties and thus make results even more relevant to the customer
Currently, we are working on improving the relevance of the results further using deep neural networks trained on customers’ search behavior.