XML: The Future Of Content Management?
An outsourcing trend prompted this integrator to offer XML (extensible markup language) conversion services to gain more business within its client base.
While Quality Associates, Inc. (QAI) developed the expertise and infrastructure to offer XML (extensible markup language) services to its clients, it was a need that initially fell outside the boundaries of the integrator’s core competency of developing in-house document management and imaging solutions. A growing customer demand that hit much closer to home for QAI was document collaboration.
“More of our clients are beginning to build workflow and collaboration requirements around their internal document management initiatives,” says Scott Swidersky, director of information services for QAI. “These clients need the ability to share documents with multiple individuals, in sometimes remote locations, in secure collaboration spaces. Many collaboration software suites have existed in the market for years, but there is an inherent difficulty in enabling paper content to be shared effectively within an online collaboration space.”
Seeking an answer to this common customer need, QAI uncovered a new solution called eCapture for eRoom (e4e). e4e is a joint product of document scanner manufacturer Visioneer and content management services company Daybreak Intellectual Capital Solutions (Daybreak ICS). The product is a software application that connects Visioneer’s OneTouch scanning interface to EMC Documentum’s eRoom collaboration platform. e4e includes a client component that is installed on a user’s PC and a server component installed on the eRoom server. These components communicate securely via HTTPS. After installation, the OneTouch setting is configured on a compatible Visioneer scanner for the desired document type (i.e. paper size, resolution, color depth), file format (e.g. searchable PDF, PDF, JPEG, BMP), and target eRoom (i.e. secure collaboration space). After this configuration process, users just press the selected button to scan documents directly into eRoom.
QAI became the first authorized reseller of the e4e solution to the federal government space in January of this year, and the product has been successful for the integrator. For example, QAI recently won a contract to install the e4e solution and Visioneer Strobe XP 470 scanners for the National Labor Relations Board (NLRB) offices in New York and Washington, D.C.
www.visioneer.com
www.daybreakics.com
ASPs: A Growing Trend In Government
In addition to providing a new revenue stream for Quality Associates, Inc. (QAI), XML (extensible markup language) conversion services also helped the integrator identify yet another emerging trend among its client base. “We’re finding that a lot of the data that we’re producing in XML is actually being published back to an ASP [application service provider],” says Scott Swidersky, director of information services for QAI. “A primary reason for this is the fact that there is currently a big push to share information within the federal government.”
Evidence of the information-sharing trend in government can be seen in programs such as the eRulemaking Initiative. The eRulemaking Initiative is a cross-agency e-government effort to build an integrated rulemaking docket and management system to ensure efficiency, economies of scale, and increased accountability of the federal rulemaking process to the public. The first accomplishment of the initiative was the launch of www.regulations.gov in 2003. This Web site provides an easy and consistent way for the public to search, view, and comment on proposed federal regulations.
“Cross-agency initiatives, like eRulemaking, are common in federal circles right now,” says Swidersky. “Rather than making one agency responsible to collect and host all of the data, these coalitions are turning to ASPs to host this information
Quality Associates, Inc. (QAI) is celebrating its 20th year as a document management integrator in 2006. The company has a history of steady growth over those 20 years — highlighted by an 81% spike in revenue in 2005 and an additional 25% revenue gain projected for this year. QAI’s longevity and success are largely due to its knack for identifying emerging trends in the industry and its ability to quickly ramp up the new technologies and/or services to capitalize on those trends. The most recent example of this is the integrator’s investment in XML (extensible markup language) conversion capabilities.
Inshore XML Conversion Has Its Advantages
QAI’s core competency is developing in-house document management and imaging systems for a client base that consists primarily of federal, state, and local government agencies. Approximately two years ago, QAI realized that many of its government clients not only had a need to image documents, but also desired to convert their paper records, images, and electronic files into XML format. XML is a markup language that allows richly structured documents to be shared over the Web and easily repurposed. XML allows designers to create their own customized tags to indicate what role specific content plays in a document. For example:
<Client>
<name>Quality Associates Inc</name>
<street>9017 Red Branch Road</street>
<city>Columbia</city>
<state>MD</state>
<zip>21045</zip>
<phone>410-884-9100</phone>
</Client>
This process enables the definition, transmission, validation, and interpretation of data between applications and organizations, because the tags conform to particular standards.
There are several benefits to using XML as a document representation format. Perhaps the most important is the fact that text elements are identified, not on the basis of what they look like, but on their significance in the context of a document. This opens up new possibilities for highly efficient information search and retrieval engines (i.e. intelligent data mining). Furthermore, because XML consists only of ASCII (American Standard Code for Information Interchange)- and Unicode-approved characters, XML data can be moved freely among all hardware, software, and operating system platforms. For example, this allows organizations to easily exchange information between an ECM (enterprise content management) and CRM (customer relationship management) system.
Many of QAI’s clients turned to offshore service providers to fulfill their XML conversion needs, because they wanted to avoid the slow and labor-intensive process of manually converting documents into an XML format in-house. The rekeying and hand-tagging of content can also introduce both typographic and syntactic errors into the conversion process. The labor costs involved from a quality control perspective also made it difficult for U.S.-based companies to earn significant margins from XML conversion; therefore few service providers offer the service domestically. QAI realized it had an opportunity to earn significant revenue from existing and new customers if it could find a way to cost-effectively provide XML conversion services.
“Many businesses and government agencies are uncomfortable using offshore resources for data conversion because they are concerned about the security of the information,” says Scott Swidersky, director of information systems for QAI. “In fact, agencies such as the DoD won’t even consider offshore XML conversion because of the sensitivity of the data. Several logistics and quality control issues also come into play with overseas service providers. Many of these concerns would be alleviated by using an inshore source for XML conversion.”
Four Steps To XML Conversion
QAI spent over a year and several hundred thousand dollars developing full-service XML conversion capabilities in-house. Capital expenses included numerous servers, a variety of XML authoring tools, a cleanup component, a quality control module, custom software development and integration costs, and additional physical square footage for workflow processes. QAI also needed to invest in a few key staff members with XML, SGML (standard generalized markup language), and quality control backgrounds.
“Our goal from the outset was to develop an XML infrastructure that would be able to support the conversion of tens of millions of records,” says Swidersky. “To accomplish this, while still making the service affordable to clients and profitable for our business, required us to automate several steps of the process and drastically reduce the amount of manual labor typically involved in XML conversion. We feel like we’ve developed a solution that has been able to accomplish that.”
QAI built an infrastructure that can convert any file that can be printed to PostScript or PDF into XML. The solution uses visual cues to uncover a document’s structure, much the same way humans do. Documents can be submitted to QAI for XML conversion in hard copy, PDF, or a variety of other formats. Paper documents are scanned into electronic files by QAI prior to XML conversion. The initial step in converting these files to XML is to “block” the document. This step involves drawing color-coded boxes around sections of each page to define text, tables, and images. Once blocking has been validated, OCR (optical character recognition) technologies are used to automatically capture content from the designated text areas on each page. This text is then verified and edited, and the content is saved in a PDF Normal format before being sent through an XML processing engine.
Content must then go through four processes within the XML processing engine to be converted into valid XML output:
1. A document’s PostScript or PDF representation must be analyzed to extract all information about the appearance of a document. This includes the characters in the document and their typography, and any other visual objects. This process extracts text directly from the input data stream, so all content is accurately retained during conversion.
2. Basic building blocks of document structure, including important visual cues and large-scale layout areas of each page, are identified.
3. Identified document building blocks are placed into a tree structure. This phase identifies sections, paragraphs, quotes, lists, tables, footnotes, and other graphical objects, forming a complete internal representation of the structured document.
4. The internal representation of the document is used to export an XML file that presents the document’s content in a logical structure and retains all relevant formatting information.
QAI charges clients on a per-page or per-kilocharacter basis for its XML conversion services. A kilocharacter is a way to measure the size of a file and is equivalent to 1,024 electronic characters. The XML files QAI generates are output in a client-specified schema or DTD (document type definition) and placed on optical media (e.g. DVD) or delivered to the customer via secure VPN (virtual private network) connectivity.
Because QAI’s XML conversion service offering required a significant upfront investment and is still in its early stages of availability, it is unclear how successful the initiative will be for QAI. However, the integrator is extremely optimistic.
“Our XML conversion services have generated a lot of interest within our existing client base,” says Swidersky. “We currently have six clients using the service, each of which has entrusted us to convert anywhere from 500,000 to 1 million records to XML. We feel this is going to be a very profitable stream of business for us.”