The tiniest of flaws in a massive forklift truck is crucial information for Ryan McLawhorn, quality-improvement manager at NACCO Industries...

Share story

The tiniest of flaws in a massive forklift truck is crucial information for Ryan McLawhorn, quality-improvement manager at NACCO Industries. If his cargo-vehicle division can detect common problems and fix them during manufacturing, it can save millions on warranty claims.

That’s not easy when 80,000 claims roll in a year. So McLawhorn turned to data-mining software that examines service reports for precise trends.

This automated analysis is becoming remarkably agile at giving companies detailed answers to the old business question of “How are we doing?”

The technology can be made to work not only on service records and other internal data but also on the hue and cry of the Internet, where products and corporate reputations are discussed in blogs, message boards and e-commerce sites.

Most Read Stories

Unlimited Digital Access. $1 for 4 weeks.

Eastman Kodak uses this unstructured-data analysis to identify connections in its own and its competitors’ patent filings. Government agents use it to hunt for insider trading or linkages between terrorist groups. Mayo Clinic researchers use it to scan physicians’ notes for evidence about the efficacy of treatments.

The breakthrough has been in getting computers to understand the content of the documents they scan.

Often by diagramming sentences as a grammar school student would, text-analysis programs can tell the difference between a blog that says a motorcycle is so fast “it smokes” and one that says the bike’s engine emits smoke.

Picking up on such details quickly is vital when fountains of data gush every minute.

“Our technology, on a simple laptop, can read through ‘Moby-Dick’ and analyze it in nine seconds,” said Craig Norris, head of Attensity, which supplied NACCO’s software.

To help broaden the potential of this kind of software, several companies planned to announce today an agreement on a technological standard that will allow computing engines for sorting unstructured data to work together.

The programming codes will be open source and freely available.

Cooperation is required because so many different kinds of unstructured-data engines have sprung up, driven in large part by the U.S. government’s demand for intelligence analysis. The CIA has funded several unstructured-data management companies, including Attensity.

Another CIA-backed company, Intelliseek, recently partnered with the Factiva information service to offer “reputation insight.”

Intelliseek scans 4 million Web logs and e-mail list servers, and Factiva combs media reports. Together they give companies a detailed analysis of how the public thinks about them at any given point.

“The world has become more democratic. In the old days the company would issue a message, and the only alternative to that was, people could meet on the street and talk about it,” said Randy Clark, marketing director of ClearForest, a data-analysis company. “Now those communications are pretty visible.”