UPDATED 21:55 EDT / SEPTEMBER 15 2021

CLOUD

Amazon adds ability to scan Word and PDF documents to AWS Comprehend

Amazon Web Services Inc. today added new features on its Amazon Comprehend service that can extract custom details from documents in their native format.

The new features include the ability to extract personally identifiable information, entity extraction, document classification and sentiment analysis. The added features are said to help users find insights within unconstructed documents such as email, dense paragraphs of text, or social media feeds.

Additionally, Amazon said, “Comprehend Custom” helps with custom entity extraction and document classification that are business or domain-specific. In Amazon’s words, “One pain point we heard from customers is that preprocessing other document formats, such as PDF, into plain text to use Amazon Comprehend is a challenge and takes time to complete.”

Starting today, users of Amazon Comprehend can use custom entity recognition on more documents types without the need to convert files to plain text. Amazon Comprehend can now process document layouts such as dense text, lists or bullets in document types including PDF and Word. Previously, Amazon Comprehend only worked with plain text files.

There are some restrictions, such as a single file not allowing access to the service. The starting base is 250 documents and 100 annotations per entity type to train a model and get started. The service also calls on Amazon Textract for custom entity recognition and those calls are billed separately.

“This feature can help with document processing workflows in business verticals such as insurance, mortgage, finance and more,” Anant Patel and Andrea Morton-Youmans from AWS said in a blog post. “The complexity of different document layouts and formats across these verticals makes it challenging to extract the information you need because you might not need every single data point on the page.”

Other benefits of the new functionality include deploying machine learning to extract custom entities using a single model and application programming interface calls.

“The information locked within documents is important to business operations and by using AI, you can now automate the process while reducing manual efforts and improving productivity, which delivers answers to customers faster,” Morton-Youmans noted in a separate blog post.

Image: AWS

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU