UPDATED 14:35 EDT / SEPTEMBER 14 2023

BIG DATA

Open-source database PostgreSQL 16 brings performance and scaling enhancements

The developer behind the popular open-source PostgreSQL database software, the PostgreSQL Global Development Group, today announced the release of PostgreSQL 16 that includes enhancements to security, scaling and performance.

PostgreSQL is a relational database management system, also known as Postgres, which is free and open source. It supports both nonrelational and relational data types and comes with features designed to protect data integrity and create fault-tolerant environments. It’s also highly extensible and works well with extremely large datasets, which has made it a go to for the open-source developer community because it can also be used for artificial intelligence workloads.

Today’s announcement notably includes performance improvements to query parallelism, bulk data loading and replication. Developers and administrators will receive a better experience with additional syntax for SQL/JSON code use and new monitoring capabilities and better access control rules.

“As relational database patterns evolve, PostgreSQL continues to make performance gains in searching and managing data at scale,” said Dave Page, a PostgreSQL core team member. “PostgreSQL 16 gives users more methods to scale-up and scale-out their workloads, while giving them new ways to gain insights and optimize how they manage their data.”

Under the hood, PostgreSQL is getting optimizations to its query planner, which updates its ability to parallelize database table “joins.” This means that multiple rows can be joined from two or more tables based on a related column between them more efficiently and with much larger datasets. Parallelization allows the software to take advantage of multiple processors to make database queries run faster.

Enhancement to logical replication, which is a method of copying data from one database to another using a streaming protocol according to a publish-and-subscribe basis, will make it easier for users to push data to downstream servers. With PostgreSQL 16, logical replication can now be triggered from standby servers, which is when a database is being used as a replica for a backup or read-only queries. This allows users to shift replication work from busier servers to standby servers during times of high workloads, which can help reduce the total amount of strain on a network.

Logical replication also received a performance improvement that allows subscribers to apply large data streams using parallel workers.

With bulk loading using COPY, the new version can also boost performance up to 300% for both single and concurrent operations in some cases. In addition to that, the team added CPU acceleration using the parallel processing technique called “single instruction, multiple data,” or SIMD, for both x86 and ARM architectures.

The changes to bulk loading and hardware acceleration will make PostgreSQL 16 even more enticing for AI-related workloads. Although the management software is relational, it can be used to store and query AI “embeddings,” such as those from OpenAI LP’s GPT-4, which are vectors produced by generative AI models that represent semantic meanings of textual data. A popular embedding open-source vector extension for PostgreSQL is pgvector, which allows for efficiently searching similarity and querying such data.

It’s possible to create an AI-powered recommendation engine by pre-processing the data through an AI model and then send the tokenized response of values, also called embeddings, to PostgreSQL and store them in the database. That can then be used to query the database using distance computation to address the database based on vector information. This could be used, for example, by an AI-powered application to quickly find text information closely related in meaning contextually in the database through search.

Developers get more syntax options for the SQL/JSON standard, the lightweight SQL to JavaScript Object Notation language, including extra constructors and ways to represent data and non-decimal integers. The new version also improved general support for text collations, which is a system for rules about how text is sorted using sort order and character classification.

On the administrative side, PostgreSQL 16 introduced a new way to provide key input-output metrics for more granular analysis and monitoring. With the new version, the software introduced pg_stat_io, a new source for I/O metrics that allows for analysis of access patterns. It includes information about backend type, object, context and cluster-wide I/O statistics.

Access control and security was also enhanced in this release including improved management of configuration files that control client authentication and username maps. The new version adds regular expression matching for user and database names and include directives for external configuration files, which makes it easier to customize configurations based on changing needs.

Security-oriented client connection parameters have been added that allow clients to specify what kind of authentication they require from a server. The release also added support for the Kerberos computer authentication protocol for credential delegation, which will allow servers to use credentials to connect to trusted services.

Image: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU