A Complete List of Google’s Autocomplete Signals in Chrome

Vernon June 27, 2024

0 2 minutes read

A Complete List of Google’s Autocomplete Signals in Chrome

Google Chrome utilizes a machine learning model for address bar autocomplete. This model, likely a Multilayer Perceptron (MLP) processes numerous input signals to predict and rank suggestions.

Here’s a breakdown of these signals:

Input Features:

User Browsing History:

log_visit_count: (float32[-1,1]) Logarithmic count of user visits to the URL.
log_typed_count: (float32[-1,1]) Logarithmic count of the URL being typed in the address bar.
log_shortcut_visit_count: (float32[-1,1]) Logarithmic count of user visits to the URL via a desktop shortcut.
elapsed_time_last_visit_days: (float32[-1,1]) Days elapsed since the user last visited the URL.
log_elapsed_time_last_visit_secs: (float32[-1,1]) Logarithmic seconds elapsed since the user last visited the URL.
elapsed_time_last_shortcut_visit_days: (float32[-1,1]) Days elapsed since the user last visited the URL via a desktop shortcut.
log_elapsed_time_last_shortcut_visit_sec: (float32[-1,1]) Logarithmic seconds elapsed since the user last visited the URL via a desktop shortcut.
num_bookmarks_of_url: (float32[-1,1]) Count of bookmarks associated with the URL.
shortest_shortcut_len: (float32[-1,1]) Length of the shortest desktop shortcut for the URL.

Website Characteristics:

length_of_url: (float32[-1,1]) Length of the URL string.

Match Characteristics:

total_title_match_length: (float32[-1,1]) Total length of matches between the user’s input and the website title.
total_bookmark_title_match_length: (float32[-1,1]) Total length of matches between the user’s input and the bookmark titles for the URL.
total_host_match_length: (float32[-1,1]) Total length of matches between the user’s input and the URL host.
total_path_match_length: (float32[-1,1]) Total length of matches between the user’s input and the URL path.
total_query_or_ref_match_length: (float32[-1,1]) Total length of matches between the user’s input and the URL query/referral parts.
first_url_match_position: (float32[-1,1]) Position of the first match between the user’s input and the URL.
first_bookmark_title_match_position: (float32[-1,1]) Position of the first match between the user’s input and the bookmark titles for the URL.
host_match_at_word_boundary: (float32[-1,1]) Boolean indicator of whether the host match occurs at a word boundary.
has_non_scheme_www_match: (float32[-1,1]) Boolean indicator of whether a match occurs without considering the scheme (http/https) or “www” prefix.
is_host_only: (float32[-1,1]) Boolean indicator of whether the user’s input matches the host only.

Model Processing:

These features are fed into the neural network. The network architecture, including specific layers and weights, is defined within the model file.

Output:

The model outputs a prediction score (float32[-1,1]) representing the relevance of each potential autocomplete suggestion. This score is used to rank suggestions, with higher scores appearing higher in the address bar dropdown.

Model Architecture:

Input Layer: 20 input features, each represented by a separate node (e.g., elapsed_time_last_shortcut_visit_days, log_visit_count, total_title_match_length).
Concatenation Layer: All 20 input features are concatenated along axis 1, resulting in a single tensor of shape ? x 20. The “?” indicates a variable batch size.
Dense Layer (FullyConnected): A fully connected layer with:

- Weights: Shape 64 x 20, suggesting 64 neurons in this layer. The weights are quantized as int8 for efficiency.
- Bias: Shape 64, a bias term for each neuron.
- Activation Function: ReLU (Rectified Linear Unit).
- Quantization: Asymmetric quantization of inputs is applied.
- Dense Layer (FullyConnected): Another fully connected layer with:
  - Weights: Shape 1 x 64, leading to a single output neuron.
  - Bias: Shape 1, a bias term for the output neuron.

Logistic Layer: This likely represents a sigmoid activation function applied to the output of the previous dense layer, producing a value between 0 and 1.
Output Layer: A single output node (“sigmoid”) representing the predicted score.

Key Observations:

Simple Architecture: The model consists of two hidden dense layers with a ReLU activation and a final sigmoid activation for output.
Quantization: The model employs quantization to reduce size and improve performance, using int8 weights for the first dense layer.
Feature Engineering: The input features are a combination of raw values and engineered features (e.g., logarithmic transformations, match lengths, boolean indicators).

Source link