The need for search in Generative AI

Authored by Jessica Freedman, Associate at MizMaa Ventures.
jessica.freedman@mizmaa.com

The relevance and opportunity of AI-powered search in the creation craze. 

Generative AI is no doubt the buzzword of 2023. Generative AI (Gen-AI) is an umbrella term that describes algorithms that can be used to create new content, including audio, code, images, text, simulations, and videos. 

In November 2022, OpenAI released ChatGPT, a free chatbot-like interface connected via API to GPT-3.5, a large language model (LLM). Gen-AI has a long research history, but it was the addition of this simple interface that tipped the balance to mass adoption and popularity (verging on hysteria). 

Source: ChatGPT, Prompt: "Explain the history of generative AI in 100 words"

Before ChatGPT, GPT-3.5 and other LLMs had only been accessible via API or through open source libraries of fine-tuned LLMs (models trained for a specific domain). This limited use to developers and the tech-savvy. 

ChatGPT was the first consumer-facing AI chatbot; it removed the tech barriers to interact with AI models, democratized access, and catapulted Gen-AI into the mainstream. According to UBS Research, ChatGPT is the fastest growing app in history with a reported 100M users in its first few months

With the advent of ChatGPT, and the ensuing flurry of enthusiasm and innovation, we are asking ourselves “How does this impact technology development in the next few years - and who will be the winners and losers in the next chapter of AI?” 

Infrastructure: Enabling technologies for Generative AI

The Infrastructure layer in Gen-AI encompasses the cloud platforms and hardware manufacturers responsible for running training and inference workloads for generative AI models.

Investors are quick to discuss the applications of Generative AI. A less commonly addressed topic is the infrastructure requirements for Generative AI: which remain largely unmet. This presents a huge opportunity for innovation… 

GenAI is largely powered by Large Language Models (LLMs), a type of machine learning model that handles a wide range of text-based use cases and performs well in new situations outside of training events (so-called “zero shot scenarios”). The most well-known LLM is OpenAI’s GPT-4 (Generative Pretrained Transformer 4).

Today, there are many infrastructure constraints that restrict the wider adoption and efficacy of Gen-AI. One such limitation is the high costs and latency in vector search. To address this, we sat down with Hyperspace CEO, Ohad Levi, to discuss the challenges of search within Generative AI:

A conversation with Ohad Levi: LLMs, Vector Search, and the Hardware Compute Gap

What is the relevance of unstructured data to Generative AI?

Ohad: “Unstructured data is data that doesn’t follow a conventional model or schema; it comes in the form of text, images and video, spoken language and music, and even genetic or molecular information. As you can imagine, it is far more complex to store, and search, than structured data which can be stored in tables and documents with consistent characteristics. But there is huge value in unstructured data - it is the gatekeeper to multi-modal AI (algorithms that span multiple modalities e.g. text-to-image generation)”

In fact, eighty percent of the world’s data is unstructured (that is about 100 zettabytes of unstructured data) and it is growing at about five times the rate of structured data (source: IDC). 

What is a vector database and how is data searched?

Source: Hyperspace, Vector DB pipeline

The vector database is a new type of database that is becoming popular in the world of machine learning and AI. It is a way to store unstructured data and make it available to be searched. 

Ohad: “To extract contextual meaning from an image, the image must be converted into a multidimensional numerical vector (often referred to as vector embedding). Once transformed to a vector, a search database will index the vectors, cluster the vectors based on semantically meaningful similarity,  and keep them in-memory for real-time retrieval. This allows users to query the database and retrieve semantically similar objects from the stored  image, text, or video representations”

Vector DBs are integral to Gen-AI. When deploying LLMs (such as GPT-4) and other AI models, a vector database can be used to store the vector embeddings that result from training of the model (converting text, images, audio etc. to vectors). The vector database could need to store and analyze potentially billions of vector embeddings, depending on the data size, and then perform a similarity search to find the best match to the user’s prompt (a word, sentence or paragraph).

To briefly validate that vector databases is a sizzling hot space, I’ll point to Pinecone’s recent $100M Series B, led by Andreessen Horowitz, at a valuation of $750M. Pinecone is an external vector database to allow developers to store and work with the data generated by LLMs and other AI models. A16z considers Vector DBs to be “an important infrastructure component, providing the storage, or “memory”, layer to a brand new AI application stack”. 

What are the limitations of vector search today?

The demand for advanced search capabilities is growing exponentially. However, it is extremely difficult to perform real-time search at scale due to cloud computing costs, software limits, and strict user experience requirements which dictates response  latencies of under 100ms. Therefore, implementing vector search at scale involves a tradeoff between the query performance and the search accuracy, resulting in  suboptimal results for the user.

Similarity search: A technique used to identify and retrieve items in a database that share similarities or patterns with a given query, often using distance metrics or feature representations.

Semantic search: An advanced search method that leverages natural language understanding to interpret context and meaning behind queries, providing more relevant and accurate results by considering user intent and query relationships.

Source: GradientFlow.com

Ohad: “Vector similarity search is built upon algorithms such as k-nearest neighbors (k-NN) and Approximate Nearest Neighbors which are compute intensive and do not scale well when running on pure software. This results in low quality search results with high recall alongside non realistic compute costs (driven by large index size) and a limited ability to run in real-time search at billion-scale.” 

The opportunity: The hardware required for Gen-AI to advance 

This new generation of AI requires a new tech stack. 

Hyperspace is leveraging Domain-Specific-Computing, designing a dedicated chip for search: a Search Processing Unit (SPU®). By programming a cloud-based chip for the purpose of search, Hyperspace is able to increase performance, scale to larger datasets, and reduce the cost. 

Ohad: “This is where our core technology comes into play; we are able to deliver superior performance, scalable technology and cost optimisation. We have gone much deeper in the technology stack, programming a dedicated chip for search allowing 10x in performance, 5x in scalability while cutting the compute costs in half. 

Generative AI is a relatively new field, and the hardware requirements have not yet been addressed. This is why our technology is based on domain specific computing, leveraging cloud-based FPGA to enable the top data science teams to implement production-grade LLMs with vector search allowing top search quality”

And, of course, even ChatGPT recognises the opportunity for FPGAs in vector search…

Source: ChatGPT, Prompt "How can FPGAs help improve vector search?"

While it is too early to ascertain with any certainty the future of AI, it is clear to us that there is great need and opportunity in the infrastructure layer. Generation and search are two sides of the same coin, and for Generative-AI to advance, so too must search capabilities. MizMaa is proud to have seeded Hyperspace in April 2022 - bringing us a step closer to the future of search and AI.