How to Leverage your Schema.org Knowledge Graph for LLMs

Vernon August 3, 2023

0 9 minutes read

How to Leverage your Schema.org Knowledge Graph for LLMs

It’s no secret that the AI revolution is well underway. According to a report by Accenture, 42% of companies want to make a large investment in ChatGPT in 2023.

Most organizations are trying to stay competitive by embracing the AI changes in the market and identifying ways to leverage “off-the-shelf” Large Language ****** (LLMs) to optimize tasks and automate business processes.

However, as the adoption of generative AI accelerates, companies will need to fine-tune their Large Language ****** (LLM) using their own data sets to maximize the value of the technology and address their unique needs. There is an opportunity for organizations to leverage their content Knowledge Graphs to accelerate their AI initiatives and get SEO benefits at the same time.

So what is an LLM?

A Large Language Model (LLM) is a type of generative artificial intelligence (AI) that relies on deep learning and massive data sets to understand, summarize, translate, predict and generate new content.

LLMs are most commonly used in natural language processing (NLP) applications like ChatGPT, where users can input a query in natural language and generate a response. Businesses can utilize these LLM-powered tools internally to provide employees with Q&A support or externally to deliver a better customer experience.

Despite the efficiency and benefits it offers, however, LLMs also have their challenges.

LLMs are known for their tendencies to ‘hallucinate’ and produce erroneous outputs that are not grounded in the training data or based on misinterpretations of the input prompt. They are expensive to train and run, hard to audit and explain, and often provide inconsistent answers.

Thankfully, you can use knowledge graphs to help mitigate some of these issues and provide structured and reliable information for the LLMs to use.

What is a Knowledge Graph?

A Knowledge Graph is a collection of relationships between things defined using a standardized vocabulary, from which new knowledge can be gained through inferencing. When knowledge is organized in a structured format, it enables efficiencies in the retrieval of information and improves accuracy.

For instance, most organizations have websites that consist of large amounts of information about the business – such as the products and services offered, locations, blogs, events, case studies, and more. However, the information exists as text on the website – which means the data is unstructured.

You can use structured data, also known as Schema Markup, to describe the content and entities on each page. You can also use structured data to connect the different topics on your site or link them to external authoritative knowledge bases (i.e. Wikidata).

Most users implement Schema Markup on their sites to help search engines understand and contextualize how the entities on their sites relate to each other. This semantic SEO tactic will then help search engines provide users with more accurate responses to their queries.

By connecting your Schema markup, you are also effectively developing your content Knowledge Graph – a marketing knowledge graph filled with information about your business. You can then use your content knowledge graph to provide valuable structured information to enhance the capabilities of LLMs for your business.

LLMs and Schema Markup

To develop your content knowledge graph, you can create your Schema Markup to represent your content. One of the new ways SEOs can achieve this is to use the LLM to generate Schema Markup for a page. This sounds great in theory however, there are several risks and challenges associated with this approach.

One such risk includes property hallucinations. This happens when the LLM makes up properties that don’t exist in the Schema.org vocabulary. Secondly, the LLM is likely unaware of Google’s required and recommended structured data properties, so it will predict them and jeopardize your chances of achieving a rich result. To overcome this, you need a human to verify the structured data properties generated by the LLM.

LLMs are good at identifying entities on Wikidata. However, it lacks knowledge of entities defined elsewhere on your site. This means the markup created by the LLM will create duplicate entities, disconnected across pages on your site or even within a page, making it even more difficult for you to manage your entities.

In addition to duplicate entities, LLMs lack the ability to manage your Schema Markup at scale. It can only produce static Schema Markup for each page. If you make changes to the content on your site, your Schema Markup will not update dynamically, which results in schema drift.

With all the risks and challenges to this piecemeal approach, the Schema Markup created by the LLM is static, unconnected Schema Markup for a page – it doesn’t help you develop your content knowledge graph.

Instead, you should create your Schema Markup in a connected, scalable way that updates dynamically. That way, you’ll have an up-to-**** knowledge graph that can be used not only for SEO but also to accelerate your AI experiences and initiatives.

Synergy Between Knowledge Graphs and LLMs

There are three main ways of leveraging the content knowledge graph to enhance the capabilities of LLMs for businesses.

Businesses can train their LLMs using their content knowledge graph.
Businesses can use LLMs to query their content knowledge graphs.
Businesses can structure their information in the form of a knowledge graph to help the LLM function more efficiently.

Training the LLM using Your Content Knowledge Graph

For a business to thrive in this technological age, connecting with customers through their preferred channel is crucial. LLM-powered AI experiences that answer questions in an automated, context-aware manner can support multi-channel digital strategies. By leveraging AI to support multiple channels, businesses can serve their customers through their preferred channels without having to hire more employees.

That said, if you want to leverage an AI chatbot to serve your customers, you want it to be providing your customers with the right answers at all times. However, LLMs don’t have the ability to perform a fact check. They generate responses based on patterns and probabilities. This results in issues such as inaccurate responses and hallucinations.

To mitigate this issue, businesses can use their content knowledge graphs to train and ground the LLM for specific use cases. In the case of an AI chatbot, the LLMs would need an understanding of what entities and relations you have in your business to provide accurate responses to your customers.

The Schema.org vocabulary is robust and by leveraging the wide range of properties available in the vocabulary, you can describe the entities on your website and how they are related with more specificity. The collection of website entities forms a content knowledge graph that is a comprehensive dataset that can ground your LLMs. The result is accurate, fact-based answers to enhance your AI experience.

Let’s illustrate how your content knowledge graph can train and inform your AI Chatbot.

A healthcare network in the US has a website with pages on their physicians, locations, specializations, services, etc. The physician page has content relating to the specific physician’s specialties, ratings, service areas and opening hours.

If the healthcare network has a content knowledge graph that captures all the information on their site, when a user searches on the AI Chatbot “I want to book a morning appointment with a neurologist in Minnesota this week”, the AI Chatbot can deduce the information by accessing the healthcare network’s content knowledge graph. The response would be the names of the neurologists that services patients in Minnesota and has morning appointments available with their booking link.

The content knowledge graph is also readily available, so you can quickly deploy your knowledge graph and train your LLM. If you are a Schema App customer, we can easily export your content knowledge graph for you to train your LLM.

Using LLMs to Query Your Knowledge Graph

Instead of training the LLM, you can use the LLM to generate the queries to get the answers directly from your content knowledge graph.

This approach of generating answers through the LLM is less complicated, less expensive and more scalable. All you need is a content knowledge graph and a SPARQL endpoint. (Good news, Schema App offers both of these.)

Here is how it works:

Schema App application loads the content model from your content knowledge graph. These would be all the Schema.org data types and properties that exist within your website knowledge graph.
Then the user would ask the Schema App application a question.
The Schema App application combines the question with the content model and asks the LLM to write a SPARQL query. Note: the only thing the LLM does is transform the question into a query.
Schema App application then executes the SPARQL against your content knowledge graph and displays the results or requests as a formatted response using the LLM.

This method is possible because the LLMs have a great understanding of SPARQL and can help translate the question from natural language to a SPARQL query.

By doing this, the LLM doesn’t have to hold the data in memory or be trained on the data because the answers exist within the content knowledge graph, which makes it stateless and a less resource-intensive solution. Furthermore, companies can avoid providing all their data to the LLM as this method introduces a control point to the knowledge graph owner to only allow questions on their data that they approve.

This approach also overcomes some of the restrictions of the LLMs.

For example, LLMs have token limits, which restrict the input and output number of words that can be included. This approach eliminates this problem by using the LLMs to build the query/prompt and using the knowledge graph to query. Since SPARQL queries can query gigabytes of data, they don’t have any token limitations. This means you can use an entire content knowledge graph without worrying about the word limit.

By using the LLM for the sole purpose of querying the knowledge graph, you can achieve your AI outcomes in an elegant, cost-effective manner and have control of your data while also overcoming some of the current LLM restrictions.

Optimizing LLMs by Managing Data in the form of a Knowledge Graph

You can machine learn Obama’s birthplace every time you need it, but it costs a lot and you’re never sure it is correct.” – Jamie Taylor, Google Knowledge Graph

One of the most considerable costs of running an LLM is the inference cost (aka the cost of running a query through the LLM).

In comparison to a traditional query, LLMs like ChatGPT have to run on expensive GPUs to answer queries ($0.36 per query according to research), which can eat into profits in the long run.

Businesses can reduce the inference cost of the LLM by storing the historical responses or knowledge generated by the LLM in the form of a knowledge graph. That way, if a question was asked again, the LLM does not have to exhaust resources to regenerate the same answer. It can simply look up the answer stored in the knowledge graph.

Unstructured data that the LLM is trained on can also cause inefficiencies in the retrieval of information and high inference costs. Therefore, converting unstructured data such as documents and web pages into a knowledge graph can reduce information retrieval time and produce more reliable facts.

As the volume of data in the hybrid cloud environment continues to grow at an exponential rate, knowledge graphs play a crucial role in the management and organization of data. They contribute to the ‘Big Convergence’, which combines data management and knowledge management to ensure efficient organization and retrieval of information.

Build Your Knowledge Graph through Schema App

In summary, the integration of knowledge graphs with LLMs can significantly enhance decision-making accuracy, especially in the realm of Marketing.

The content knowledge graph is an excellent foundation to leverage schema data in LLM tools, leading to more AI-ready platforms. It’s an investment that could pay off handsomely, especially in a world increasingly reliant on AI and knowledge management.

At Schema App, we can help you quickly implement your Schema Markup data layer and develop a semantically relevant and ready-to-use content knowledge graph to prepare your organization for AI.

Regardless of whether you use Schema App to author your Schema Markup, we can produce a content knowledge graph for you. Schema App can capture the Schema.org data from your existing implementation using our Schema App Analyzer to develop your marketing knowledge graph.

Get in touch with our team to find out more about how Schema App can help you build your marketing knowledge graph to enhance your LLM.

Mark van Berkel is the co-founder and COO of Hunch Manifest and the creator of Schema App. Schema App is an end-to-end Schema Markup solution that helps enterprise SEO teams create, deploy and manage Schema Markup to stand out in search. He is an expert in Semantic Technology and Semantic Search Marketing. Mark built Schema App to solve his own challenges in writing and validating schema markup.

Source link