Skip to content

How Does the Vertexgraph Assistant Vectorized Articles

The Vertexgraph assistant could vectorize the files and documents in the storage drives, and then generate the desired content for users. In order to do this, Vertexgraph uses the help of Large Language Models (LLMs). The interesting part is that the assistant did not vectorize an entire article in the conventional sense. Instead, they process text in a tokenized form and represent it using embeddings. Here's how the vectorization process works in LLMs:

The first step is tokenization. When you input and upload the files and documents to the Vertexgraph storage drive, the assistant will first tokenize the content of each file. Tokenization breaks down the input text into smaller units called tokens. These tokens can be as short as a single character or as long as a word. For example, the sentence "AI is amazing!" might be tokenized into ["AI", " is", " amazing", "!"]. After the files are tokenized, each token produced is associated with a word embedding, which is essentially a vector representation of the word or subword. These embeddings are pre-trained on large corpora of text and capture the semantic meaning and context of the words. So the assistant could "know" the meaning of the 

Then the assistant would generate contextual embeddings for each token in a given context. These embeddings take into account the context of the text, including the surrounding tokens. This means that the same word can have different embeddings depending on its context. So that the assistant could figure out the meaning of the word under different circumstances. At last, the assistant processes tokens sequentially and generates embeddings for each token in the input text. The resulting sequence of embeddings represents the input text's semantic content and context.

In conclusion, instead of vectorizing the entire article into a single vector, the Vertexgraph assistant creates a sequence of contextual embeddings for the tokens in the article. So that your agent would be able to capture the nuances of the language, including context and semantics, which eventually makes the AI assistant could handle the tasks like text generation, language understanding, and translation.