{"id":33618,"date":"2024-04-29T03:59:31","date_gmt":"2024-04-29T03:59:31","guid":{"rendered":"https:\/\/www.searchenginejournal.com\/google-infini-attention\/514869\/"},"modified":"2024-04-29T03:59:31","modified_gmt":"2024-04-29T03:59:31","slug":"googles-new-infini-attention-and-seo-via-sejournal-martinibuster","status":"publish","type":"post","link":"https:\/\/marketingnewsbox.com\/?p=33618","title":{"rendered":"Google\u2019s New Infini-Attention And SEO via @sejournal, @martinibuster"},"content":{"rendered":"<div><img decoding=\"async\" src=\"https:\/\/www.searchenginejournal.com\/wp-content\/uploads\/2024\/04\/infini-attention-259.jpg\" class=\"ff-og-image-inserted\"><\/div>\n<p>Google has published a research paper on a new technology called Infini-attention that allows it to process massively large amounts of data with \u201cinfinitely long contexts\u201d while also being capable of being easily inserted into other models to vastly improve their capabilities<\/p>\n<p>That last part should be of interest to those who are interested in Google\u2019s algorithm. Infini-attention is plug-and-play, which means it\u2019s relatively easy to insert into other models, including those in use by Google\u2019s core algorithm. The part about \u201cinfinitely long contexts\u201d may have implications for how some of Google\u2019s search systems can be updated.<\/p>\n<p>The name of the research paper is: <em>Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention<\/em><\/p>\n<h2>Memory Is Computationally Expensive For LLMs<\/h2>\n<p>Large Language Models (LLM) have limitations on how much data they can process at one time because the computational complexity and memory usage can spiral upward significantly. Infini-Attention gives the LLM the ability to handle longer contexts while keeping the down memory and processing power needed.<\/p>\n<p><em>The research paper explains:<\/em><\/p>\n<blockquote>\n<p>\u201cMemory serves as a cornerstone of intelligence, as it enables efficient computations tailored to specific contexts. However, Transformers \u2026and Transformer-based LLMs \u2026have a constrained context-dependent memory, due to the nature of the attention mechanism.<\/p>\n<p>Indeed, scaling LLMs to longer sequences (i.e. 1M tokens) is challenging with the standard Transformer architectures and serving longer and longer context models becomes costly financially.\u201d<\/p>\n<\/blockquote>\n<p><em>And elsewhere the research paper explains:<\/em><\/p>\n<blockquote>\n<p>\u201cCurrent transformer models are limited in their ability to process long sequences due to quadratic increases in computational and memory costs. Infini-attention aims to address this scalability issue.\u201d<\/p>\n<\/blockquote>\n<p>The researchers hypothesized that Infini-attention can scale to handle extremely long sequences with Transformers without the usual increases in computational and memory resources.<\/p>\n<h2>Three Important Features<\/h2>\n<p>Google\u2019s Infini-attention solves the shortcomings of transformer models by incorporating three features that enable transformer-based LLMs to handle longer sequences without memory issues and enable them to use the context from earlier data in the sequence and match it to the context further away toward the end of the sequence.<\/p>\n<p><strong>The features of Infini-Attention<\/strong><\/p>\n<ul>\n<li>Compressive Memory System<\/li>\n<li>Long-term Linear Attention<\/li>\n<li>Local Masked Attention<\/li>\n<\/ul>\n<h2>Compressive Memory System<\/h2>\n<p>Infini-attention uses what\u2019s called a compressive memory system. As more data is input (as part of a long sequence of data), the compressive memory system compresses some of the older information in order to reduce the amount of space needed to store the data.<\/p>\n<h2>Long-term Linear Attention<\/h2>\n<p>Infini-attention also uses what\u2019s called, \u201clong-term linear attention mechanisms\u201d which enable the LLM to process data that exists earlier in the sequence.<\/p>\n<p>This is important for tasks where the context exists on a larger plane of data. It\u2019s like being able to discuss an entire book within the context of all of the chapters and explain how the first chapter relates to another chapter in the middle of the book.<\/p>\n<h2>Local Masked Attention<\/h2>\n<p>In addition to the long-term attention, Infini-attention also uses what\u2019s called local masked attention. This kind of attention processes nearby (localized) parts of the input data, which is useful for responses that depend on the closer parts of the data.<\/p>\n<p>Combining the long-term and local attention together helps solve the problem of transformers being limited to how much input data it can remember and use for context.<\/p>\n<p><em>The researchers explain:<\/em><\/p>\n<blockquote>\n<p>\u201cThe Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block.\u201d<\/p>\n<\/blockquote>\n<h2>Results Of Experiments And Testing<\/h2>\n<p>Infini-attention was tested with regular models for comparison across multiple benchmarks involving long input sequences, such as long-context language modeling, passkey retrieval, and book summarization tasks. Passkey retrieval is a test where the language model has to retrieve specific data from within a extremely long text sequence.<\/p>\n<p><strong>List of the three tests:<\/strong><\/p>\n<ol>\n<li>Long-context Language Modeling<\/li>\n<li>Passkey Test<\/li>\n<li>Book Summary<\/li>\n<\/ol>\n<h3>Long-Context Language Modeling And The Perplexity Score<\/h3>\n<p>The researchers write that the models with Infini-attention outperformed the baseline models and that increasing the training sequence length brought even further improvements in the <strong>Perplexity score.<\/strong> The Perplexity score is a metric that measures language model performance, with lower scores indicating better performance.<\/p>\n<p><em>The researchers shared their findings:<\/em><\/p>\n<blockquote>\n<p>\u201cInfini-Transformer outperforms both Transformer-XL \u2026and Memorizing Transformers baselines while maintaining 114x less memory parameters than the Memorizing Transformer model with a vector retrieval-based KV memory with length of 65K at its 9th layer. Infini-Transformer outperforms memorizing transformers with memory length of 65K and achieves 114x compression ratio.<\/p>\n<p>We further increased the training sequence length to 100K from 32K and trained the models on Arxiv-math dataset. 100K training further decreased the perplexity score to 2.21 and 2.20 for Linear and Linear + Delta models.\u201d<\/p>\n<\/blockquote>\n<h3>Passkey Test<\/h3>\n<p>The passkey test is where a random number is hidden within a long text sequence with the task being that the model must fetch the hidden text. The passkey is hidden either near the beginning, middle or the end of the long text. The model was able to solve the passkey test up to a length of 1 million.<\/p>\n<blockquote>\n<p>\u201cA 1B LLM naturally scales to 1M sequence length and solves the passkey retrieval task when injected with Infini-attention. Infini-Transformers solved the passkey task with up to 1M context length when fine-tuned on 5K length inputs. We report token-level retrieval accuracy for passkeys hidden in a different part (start\/middle\/end) of long inputs with lengths 32K to 1M.\u201d<\/p>\n<\/blockquote>\n<h3>Book Summary Test<\/h3>\n<p>Infini-attention also excelled at the book summary test by outperforming top benchmarks achieving new state of the art (SOTA) performance levels.<\/p>\n<p><em>The results are described:<\/em><\/p>\n<blockquote>\n<p>\u201cFinally, we show that a 8B model with Infini-attention reaches a new SOTA result on a 500K length book summarization task after continual pre-training and task fine-tuning.<\/p>\n<p>\u2026We further scaled our approach by continuously pre-training a 8B LLM model with 8K input length for 30K steps. We then fine-tuned on a book summarization task, BookSum (Kry\u00b4sci\u00b4nski et al., 2021) where the goal is to generate a summary of an entire book text.<\/p>\n<p>Our model outperforms the previous best results and achieves a new SOTA on BookSum by processing the entire text from book. \u2026There is a clear trend showing that with more text provided as input from books, our Infini-Transformers improves its summarization performance metric.\u201d<\/p>\n<\/blockquote>\n<h2>Implications Of Infini-Attention For SEO<\/h2>\n<p>Infini-attention is a breakthrough in modeling long and short range attention with greater efficiency than previous models without Infini-attention. It also supports \u201c<em>plug-and-play continual pre-training and long-context adaptation by design<\/em>\u201d which means that it can easily be integrated into existing models.<\/p>\n<p>Lastly, the <em>\u201ccontinual pre-training and long-context adaptation<\/em>\u201d makes it ideal for scenarios where there\u2019s a stream of new data&nbsp; that\u2019s constantly needed to be added to train a model. That last part is super interesting because it may make it useful for applications on the back end of Google\u2019s search systems, particularly where it is necessary to be able to analyze long sequences of information and understand the relevance from one part near the beginning of the sequence to another part that\u2019s closer to the end.<\/p>\n<p>The fact that the researchers claim \u201cinfinitely long inputs\u201d is amazing but what\u2019s really important for SEO is that this mechanism is the ability to handle long sequences of data in order to \u201cLeave No Context Behind\u201d as well as the plug and play aspect of it.&nbsp; It gives an idea of how some of Google\u2019s systems could be improved if Google adapted Infini-attention to systems within their core algorithm.<\/p>\n<p><strong>Read the research paper:<\/strong><\/p>\n<p><a href=\"https:\/\/arxiv.org\/abs\/2404.07143\" target=\"_blank\" rel=\"noopener noreferrer\">Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention<\/a><\/p>\n<p><em>Featured Image by Shutterstock\/JHVEPhoto<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Google has published a research paper on a new technology called Infini-attention that allows it to process massively large amounts of data with \u201cinfinitely long contexts\u201d while also being capable of being easily inserted into other models to vastly improve their capabilities That last part should be of interest to those who are interested in&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-33618","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"https:\/\/marketingnewsbox.com\/index.php?rest_route=\/wp\/v2\/posts\/33618","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marketingnewsbox.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marketingnewsbox.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marketingnewsbox.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marketingnewsbox.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=33618"}],"version-history":[{"count":0,"href":"https:\/\/marketingnewsbox.com\/index.php?rest_route=\/wp\/v2\/posts\/33618\/revisions"}],"wp:attachment":[{"href":"https:\/\/marketingnewsbox.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=33618"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marketingnewsbox.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=33618"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marketingnewsbox.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=33618"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}