BIG DATA
BIG DATA
BIG DATA
Vector database startup Pinecone Systems Inc. today announced a new, high-performance deployment option for customers that need to support the most demanding enterprise use cases.
It’s called Dedicated Read Nodes or DRN, and it’s now available in public preview, giving customers access to reserved capacity for low-latency queries with predictable performance and cost. The company explained that DRNs allow it to support a wider range of use cases that have extreme but variable performance requirements.
Pinecone is the creator of an advanced vector database that can dynamically store, transform and index billions of high-dimensional data points, enabling it to respond rapidly and accurately to queries such as nearest-neighbor search.
Unlike relational databases, which store data in rows and columns, vector databases represent unstructured data as high-dimensional data points, each representing a vector or an array of numbers. One of the primary functions of a vector database is to perform similarity searches, which can quickly find vectors that are most similar to a given query vector using measures such as cosine similarity or Euclidean distance. Vector databases are seen as essential for artificial intelligence workloads, as large language models need rapid access to vast amounts of unstructured data.
In a blog post, Pinecone explained that AI systems have complex requirements. Some applications, such as RAG, AI agents, model prototypes and scheduled jobs have “bursty” workloads, where they maintain a low and steady flow of traffic most of the time, before suddenly bursting into life when there are spikes in query volume. In such cases, Pinecone’s standard on-demand database is ideal, providing a combination of simplicity, elasticity and usage-based pricing.
However, some applications require consistent high throughput, operate at larger scales and can be extremely sensitive to latency. For instance, billion-vector-scale semantic searches, real-time recommendation systems and user-facing assistants with tight service-level objectives demand a more consistent level of performance, along with predictable costs at scale.
This is why Pinecone is introducing DRNs, a new deployment option where queries run on isolated, provisioned nodes that are dedicated to these kinds of workloads. With these nodes, the data stays “warm” in the system’s memory and on a local solid-state drive.
That means it can be accessed rapidly without “cold starts,” which are caused by the need to fetch information from object storage first. Because the nodes are dedicated to each workload, there are no issues with noisy neighbors or shared queues and query limits.
DRNs scale along two dimensions, with replicas ensuring maximum throughput and availability to improve resilience, and shards used to expand storage capacity. Users can add as many replicas and shards as they desire to ensure their workloads can scale. To ensure predictable costs, pricing is based on an hourly rate per node.
Pinecone said customers will benefit from the lowest possible latency and guaranteed high throughput to ensure more consistent performance for high query-per-second workloads. DRNs can also scale indefinitely, and the company further claims that customers will see lower, more predictable costs compared to its on-demand nodes, which are based on a per-request pricing model.
DRNs are a deployment option for the most demanding use cases, where companies require performance isolation, predictable low-latency under heavy loads and linear scaling as demand grows. In addition to billion vector-scale search and recommendation systems, DRNs can also be useful for mission-critical AI applications, large enterprise or multitenant platforms that require isolation to prevent one workload impacting on another, and other applications that need performance at scale.
Pinecone said its DRNs have proven their reliability under real-world conditions for several early adopters. One customer is using DRN to support metadata-filtered real-time media searches on its design platform, and was able to sustain 600-queries-per-second performance with latency of just 45 milliseconds across 135 million vectors. The same customer also pushed it to the limit, running a load test that saw its node reach an impressive 2,200 queries per second with a P50 latency of just 60 milliseconds.
In another example, a customer running a large e-commerce marketplace deployed its recommendation engine on Pinecone’s DRNs to support 5,700 queries per second with a P50 latency of just 26 milliseconds across a database of 1.4 billion vectors. .
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.