UPDATED 22:29 EDT / MAY 29 2019

CLOUD

AWS announces general availability of its document reading service Textract

Amazon Web Services Inc.’s Textract service, which uses machine learning to extract text and data from documents including tables and forms, is now generally available.

Textract was first announced during the AWS re:Invent conference in November as one of several new machine learning services designed for use by people with no expertise in the subject.

Amazon reckons the service is a big improvement over the traditional optical character recognition software that enterprises have previously relied on to extract text-based data from documents. The problem with traditional OCR is that it can’t recognize common layouts seen on forms and tables. As a result, OCR software is often inaccurate when attempting to pull data from those kinds of sources.

Amazon says Textract is more of an “OCR++ service” because it can recognize tables with a document and understand that the data is placed in rows and columns.

“The power of Amazon Textract is that it accurately extracts text and structured data from virtually any document with no machine learning experience required,” Swami Sivasubramanian, AWS’s vice president of machine learning, said in a statement. “Subsequently, developers can analyze and query the extracted text and data using our database and analytics services like Amazon Elasticsearch Service, Amazon DynamoDB, and Amazon Athena and integrate with other machine learning services like Amazon Comprehend, Amazon Comprehend Medical, Amazon Translate, and Amazon SageMaker to help customers derive deeper meaning from the extracted text and data.”

Textract supports multiple image formats, including regular JPEG and PNG photo files, scans and PDF documents.

Amazon’s announcement that Textract is now generally available was met with excitement by analyst Patrick Moorhead of Moor Insights & Strategy:

“I believe that Textract will be a game changer for industries like healthcare that still rely on printed documents,” Moorhead told SiliconANGLE. “Unlike OCR, Textract identifies text positionally so it’s accurate and useful.”

Numerous customers have been using Textract since it was made available in limited preview last year, including The Globe and Mail Inc., PricewaterhouseCoopers, UiPath Inc. and Alfresco Software Inc., Amazon said.

Textract is currently available in four AWS regions, namely US East (Ohio), US East (Northern Virginia), US West (Oregon) and EU (Ireland). The company said the service will be extended to more regions later in the year.

Photo: Goumbik/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.