This report is supported by Transwarp Technology Co. Ltd.
Based in the coastal city of Yantai in China’s Shandong province, HengFeng bank has grown quickly since its founding in 2003. It currently operates more than 210 branches throughout Shandong, Chongqing, Sichuan, East China, Fujian, Yunnan, Xi’an and Beijing and boasts more than 1 trillion yuan ($145 billion in assets), up 70 percent in just two years. In 2015 it reported a net profit of 8 billion yuan on revenue of 23.8 billion.
Born of the Internet age, HengFeng has made technology a cornerstone of its strategy. It was the first bank in China to use the cloud for its core banking system, and the first to build its analytics on the Hadoop big data platform. Its “Customer 360” initiative captures information about all customer interactions with the bank via all channels for use by account managers.
Situation: Legacy Data Warehouse Platform Suboptimal for Future Scale and New Customer Engagement Apps
Information technology has been a double-edged sword for the consumer banking industry. On one hand, it has streamlined operations and introduced new channels to reach the customer. On the other, it has challenged banks to maintain the person-to-person connections that are so important to the banking experience, whether it’s routine transactions in a branch, small business loans or even commercial transactions. A recent McKinsey survey found that nearly 70 percent of Chinese consumers were open to banking with a digital-only bank.
Big data presents an opportunity to re-establish connections with customers and strengthen brand equity based upon personalized products and services that appeal to each customer’s unique interests. China’s HengFeng Bank is making this capability a cornerstone of its strategy.
“Technology has reduced the opportunities for banks to engage with their customers in person. Our human interaction with customers has reduced a lot so we are introducing digital channels to enable customers to bank with us at any time,” said HuiHui Li, chief executive of Consumer Bank at HengFeng (then called Evergrowing Bank) in a 2015 interview with Fintech Innovation.
As HengFeng’s legacy data warehouse grew, it became clear that traditional, proprietary relational database management systems would limit the bank’s ability to build a platform for the future. A platform built on traditional RDBMS would be too expensive to handle both the much larger data volumes required by their traditional data warehouse applications and the need to support a new class of real-time applications. The new real-time applications would facilitate capabilities like on-the-spot risk prediction and personalized product recommendations, whether through online channels or in person. HengFeng needed a unified platform to support both the data warehouse and real-time applications because all the functionality depended on a single, integrated method to access to all the data.
Open-source platforms built around Hadoop were a natural solution in terms of functionality and cost. But building a scalable warehouse that could also support sophisticated streaming capabilities would take time and require skills that are in short supply. The mix-and-match building blocks for advanced analytic applications that are part of the Apache and Hadoop ecosystems come with a high overhead of operational and developer complexity. The complexity associated with those open source solutions is what finally brought HengFeng to Shanghai-based Transwarp Technology Co. Ltd.
Solution: Transwarp Data Hub
Transwarp Technology released the first Hadoop distribution built on Spark in China (brand name Transwarp Data Hub) in 2013. Transwarp Data Hub, or TDH, features an analytic SQL database named Inceptor for the new data warehouse scenario, a machine learning platform Discover with capabilities for doing machine learning on big data, a NewSQL database Hyperbase handling large volumes of unstructured data and providing search functionality, and a SQL-based streaming engine Stream to build real-time applications.
Since TDH provides the capabilities to build a scalable data warehouse and sophisticated streaming platform, it met the criteria on the HengFeng Bank’s checklist very well and so it was chosen by HengFeng for its big data platform. Unlike traditional Hadoop distributions originating in the Apache Software Foundation and sold by U.S. vendors, Transwarp built much of the technology underlying Data Hub on a specialized version of Spark. While constructing a new, ground-up foundation on top of Spark required several years of effort that wasn’t visible at the time, the new foundation yielded significant benefits in speed, scale, and usability.
In terms of usability, developers can use a single language, the ANSI 2003 version of SQL, for batch processing, interactive analysis, graph analysis, streaming analytics, and search. A wide-variety of user-friendly business intelligence tools can work well with TDH through ODBC/JDBC drivers since TDH supports the full standard of SQL 2003. The analytic database, Inceptor, has more functionality than other Hadoop-based analytic engines: like database federation, where Inceptor can analyze data from related tables even if some of the data resides in remote RDBMS. Most Hadoop-based analytic engines don’t have this critical capability. Another highlight is that Inceptor supports vendor-specific dialects of SQL and stored procedures, and this can make it easy for customers to migrate applications from competitive products to TDH. TDH supports 98% of Oracle’s PL/SQL and 90% DB2 SQL/PL and Teradata SQL.
Stream, the streaming analytic product, leverages the very same SQL dialect for usability. Using SQL on streams simplifies analysis because developers can specify what they’re looking for exactly as if they were querying a DBMS. And because Stream supports the same SQL stored procedures as the DBMS, developers can build sophisticated applications that otherwise would require using a separate language. Stream can also process events both individually and in batches so that it can handle both real-time data as well as historical data. Supporting both event-at-a-time and batch processing is a major simplification for developers. Even the standard Apache Spark distribution can’t do both. Both Inceptor and Stream can access the same machine-learning tools from SQL. That integration extends the simplicity of SQL queries to much more sophisticated predictive analytics.
TDH’s speed and scale enables the same instance of the underlying platform to support not just the old applications, but newer ones that require greater capacity and performance. Like other high-performance analytic DBMS’s, Inceptor supports an in-memory or SSD based columnar store named Holodesk, and the columnar store also provides Index functionality which makes data accessing fast. With a high performance columnar store plus Transwarp’s proprietary spark-based engine, Inceptor can perform up to 10 times faster than the open source Apache Impala DBMS from Cloudera. Inceptor also supports updates using ACID transactions. ACID transactions enable analysis to operate continuously while maintaining data integrity as new data is ingested. That capability makes it possible to extend the amount of time available for analysis since there is no time exclusively dedicated to ingest as it would be in traditional extract, transform, load processes.
Outcome: Faster, Cheaper, Better Enables Fundamentally New Applications
Migration and deployment were relatively quick and easy for what would traditionally be involved in a major platform change. Doing a moderate amount of data modeling, optimizing stored procedures, and rebuilding some indexes collectively took about two months. More typically this type of migration to TDH would take about four months. Once the migration was completed, HengFeng found the platform to be not only much less expensive to operate but also more scalable and easier to extend with additional services.
Cost savings have been significant. For example, the legacy hardware alone for the Customer 360 application cost 5,600,000 RMB ($800,000). The applications was re-deployed on commodity hardware for 600,000 RMB ($86,000), which makes for a 90 percent cost savings. A 2,800,000 RMB ($400,000) Oracle software license for Customer 360 was replaced with a 400,000 RMB ($58,000) license for Transwarp’s solution.
The bank didn’t have to sacrifice performance for lower cost. Production reporting based on batch processing on the Oracle-based data warehouse used to take up to eight hours. That was cut to one hour with Transwarp. Integrating Customer 360 data was formerly a one- to two-hour batch process. That was reduced to less than six minutes. A risk management application that used to require two hours to run now executes in 10 minutes.
The most crucial change was that with a combination of both up-to-date as well as fully integrated customer information, the platform greatly increased the utility of applications built on it. Information can now be both structured as well as unstructured. That flexibility opens up new sources of information such as logs from the bank’s Web site and other online sources, whether stored in a database or analyzed on-the-fly from streaming sources. For example, the bank’s credit managers, investment advisers, and account managers all have access to better information when they are talking to clients. Not only do account managers know about a customer’s complete activity across all channels and touch points, but the account managers would know about the activity in real time. That combination enables more informed and productive sales conversations.
In addition to improving the utility of existing applications, new applications become possible. HengFeng can now build qualitatively better risk management applications. For example, China doesn’t have standardized credit ratings and histories. Instead, the bank is integrating their internal data with data from the National Bureau of Statistics as well as other external sources of information on companies and individuals. When bank credit managers are talking to companies about loans, they now have richer information with which to make risk assessments. The platform continues to monitor the information behind these assessments and alerts a manager if there’s abnormal activity that might change the risk.
Lower costs and a more capable processing engine are also enabling HengFeng Bank to tackle new application areas that were prohibitively expensive or even impossible with its previous infrastructure. A consumer-focused risk application now tracks multiple categories of previously unintegrated data that enables account managers, credit managers, and investment managers to provide intelligent recommendations in real time for products such as car loans, home refinancing, or investments tailored to a consumer’s financial needs.
In addition to all the improvements to existing applications and introduction of new applications, the bank sees room for many years of innovation on the new platform. Digital banking in China is advancing at a rapid pace. Most payments are already being done with cell phones with services such as Alibaba’s and WeChat’s Wallet. Over the next decade, HengFeng sees an opportunity to convert its operations and channels using mobile, cloud, real-time, and big data technologies. HengFeng sees Transwarp’s TDH as the information infrastructure for this transformation.
Lessons Learned: Compromise Exists Between Extremes of Open Source and Traditional, Proprietary Products
Proprietary data warehouses have delivered on their promise of supporting business intelligence on carefully curated data. But the requirement for a unified platform that could support much greater data volumes and real-time analytics meant traditional pricing wouldn’t work. At the same time, the lower-priced, open source Hadoop and Apache Software Foundation ecosystems would introduce administrative and developer complexity that would drive the hidden total cost of ownership to unsupportable levels. While some highly-sophisticated enterprises have the skills to work with open source technology, it’s not for everyone.
Customers should understand that there is a spectrum of solutions, not just two extremes of proprietary and open source. Product lines such as Transwarp’s, Microsoft Azure’s and MapR’s, among others, feature standard interfaces to functionality that is in part built on proprietary technology. The proprietary technology is the key to the uniformity that enables them to be built, tested, and delivered as an integrated platform. While customers give up the option of having fully open source software, they can maintain standard interfaces and the simplicity and cost of ownership benefits of underlying proprietary integration.