UPDATED 22:29 EDT / MAY 29 2019

CLOUD

AWS announces general availability of its document reading service Textract

Amazon Web Services Inc.’s Textract service, which uses machine learning to extract text and data from documents including tables and forms, is now generally available.

Textract was first announced during the AWS re:Invent conference in November as one of several new machine learning services designed for use by people with no expertise in the subject.

Amazon reckons the service is a big improvement over the traditional optical character recognition software that enterprises have previously relied on to extract text-based data from documents. The problem with traditional OCR is that it can’t recognize common layouts seen on forms and tables. As a result, OCR software is often inaccurate when attempting to pull data from those kinds of sources.

Amazon says Textract is more of an “OCR++ service” because it can recognize tables with a document and understand that the data is placed in rows and columns.

“The power of Amazon Textract is that it accurately extracts text and structured data from virtually any document with no machine learning experience required,” Swami Sivasubramanian, AWS’s vice president of machine learning, said in a statement. “Subsequently, developers can analyze and query the extracted text and data using our database and analytics services like Amazon Elasticsearch Service, Amazon DynamoDB, and Amazon Athena and integrate with other machine learning services like Amazon Comprehend, Amazon Comprehend Medical, Amazon Translate, and Amazon SageMaker to help customers derive deeper meaning from the extracted text and data.”

Textract supports multiple image formats, including regular JPEG and PNG photo files, scans and PDF documents.

Amazon’s announcement that Textract is now generally available was met with excitement by analyst Patrick Moorhead of Moor Insights & Strategy:

“I believe that Textract will be a game changer for industries like healthcare that still rely on printed documents,” Moorhead told SiliconANGLE. “Unlike OCR, Textract identifies text positionally so it’s accurate and useful.”

Numerous customers have been using Textract since it was made available in limited preview last year, including The Globe and Mail Inc., PricewaterhouseCoopers, UiPath Inc. and Alfresco Software Inc., Amazon said.

Textract is currently available in four AWS regions, namely US East (Ohio), US East (Northern Virginia), US West (Oregon) and EU (Ireland). The company said the service will be extended to more regions later in the year.

Photo: Goumbik/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU