Guest Column | April 29, 2016

How To Incorporate Documents Into Your Big Data Strategy

By Greg Council, Greg Council, Parascript

Big Data forces a lot of big questions and for good reason; it’s complex stuff. It encompasses both structured and unstructured information, and what’s new about Big Data is the tremendous volume and access to this information.

With computing power continuing to grow allowing more sophisticated machine learning to be applied to all sorts of data, questions naturally turn to how to practically apply these powerful technologies to answering relevant questions. These questions become more complex when organizations wish to incorporate information stored within documents into their Big Data strategies.

Documents Core To Big Data Strategy

Rather than just tech-talk, let’s start with how documents provide significant value to an organization’s internal and external processes through Big Data. On the accounting side, there’s spend management. Every organization receives invoices and incurs employee expenses, both of which need to be managed to ensure that payment obligations are vetted and settled.

To perform these activities, the proof of purchases or records of orders are required. For business expenses, the record is the invoice and the purchase order. For individual employee expenses, receipts are often the primary record.

While many companies do an effective job at policy management for business-level expenses and individual employee reimbursements, rare is the company that employs data on these documents within a big data strategy aimed at making business operations more efficient. And yet, let’s take a look at what data is contained in these documents. They typically include vendor names, dates, amounts, addresses, shipping charges, and individual descriptions of items or services purchased.

Extracting The Right Data

What if these data were extracted and aggregated into monthly or quarterly reports based not only upon spending categories, but also the individual prices of items, the vendors involved with the sale, the locations where purchases were made and then blended and cross-referenced with publicly-available retail data? All of a sudden, an organization has a benchmark of spending on an item-by-item basis compared with known retail prices and insights into trends of where and when the best available prices were offered. They can also identify substitutions for some items that enable cost reductions. The result is a more-coordinated and comprehensive sourcing capability.

Next, let’s look at the marketing department. Several years ago, Gartner analyst Craig Roth introduced the concept of big content as a marketing-specific application dealing with hard-to-produce content such as eBooks, white papers, presentations, and videos. While there is no direct tie between Big Data and big content, the ultimate goal is to utilize big content to engage prospective customers, take their digital exhaust, and analyze it to understand where the prospect is with their purchasing decision and what materials work best.

Data Essentials: What Prospect Viewed What In What Order

In this case, documents are at the core of the Big Data strategy and the data itself is based upon what the prospect viewed or read, at what times, and in what order. One critical element to make this all work is to effectively classify and tag big content once it is developed and to keep metadata up-to-date to reflect the marketing programs and any changes to sales stages or qualification criteria. In larger organizations, the marketing department can produce a prodigious amount of sales and marketing materials and the ability to efficiently, reliably, and consistently apply metadata is daunting. Without the correct metadata, marketing automation systems cannot suggest other materials to prospects. Using automated classification, upon the final approval of any material, metadata can automatically be applied using visual and content analysis of the materials. This metadata can then be exported to the marketing automation software so that inventory can be managed and the right marketing tool can be used at the right time.

Automation Makes Possible High-Value Big Data Initiatives

Classification and data extraction can even be used earlier in the big content creation process. The marketing department can take product design documents and requirements, classify them, and automate location and extraction of key information that would be relevant to a prospect. This information could include new features and support of new software platforms or versions to name just a few. Using business rules, marketing staff can easily access key data and incorporate it into their own efforts.

Both of these examples can certainly be implemented without technology to automate it but generally the volume of information is so great that the effort would be greater than the benefit. With automated classification and data extraction, organizations can rethink how seemingly irrelevant or unmanaged data trapped in documents can support larger, higher-value Big Data initiatives.