SAS announced the acquisition of privately held Teragram, the leader in natural language processing (NLP) and advanced linguistic technology. The acquisition will enhance SAS’ own robust text mining and analytical BI offerings, and extend them to enterprise and mobile search. More than a decade ago, SAS was among the first companies to recognize the importance of text mining, the analysis of text and other unstructured data such as Web pages, documents, email, images and other information not stored in a structured database. Today, SAS leads this important and growing space.
“The addition of Teragram’s domain expertise and NLP technology will change the landscape of the BI and analytics markets,†said SAS CEO Jim Goodnight. “Teragram’s technologies augment, strengthen and extend SAS’ ability to combine structured and unstructured data – not only in our text mining solution but embedded across the entire SAS Enterprise Intelligence Platform – to drive better answers faster.â€
Teragram, a 40-person firm headquartered in Cambridge, Mass., will be run as a SAS company. Terms of the acquisition deal were not disclosed. Teragram’s NLP technology is well-established, with a customer base including CNN, Forbes.com, NYTimes Digital, Sony, WashingtonPost.com, Wolters Kluwer, the World Bank and Yahoo!
“As the data explosion continues, companies need an intelligent way to make sense of it all, whether data is in structured databases or in the huge variety of unstructured sources,†said Yves Schabes, President of Teragram. “Teragram and its technology fit perfectly into SAS’ analytics and text mining efforts, as SAS continues to innovate in this rapidly growing market. We’re pleased to join a company that delivers the software businesses need to blend structured and unstructured data and reach better, timelier and more accurate decisions.â€
Natural Language Processing
Teragram’s natural language processing (NLP) technologies help turn text – in many languages and from many sources – into useable information. NLP enables richer data processing at the level of words, linguistic relations and word meanings. Teragram has developed and maintains large annotated dictionaries containing several hundred million words in more than 30 languages.
Automatic Categorization
Teragram’s advanced categorization technologies provide instant, advanced classification of documents according to custom criteria, applied throughout the organization. This enables faster and more accurate access to documents organized by specific topics that match the interest of a given user, regardless of the original document’s location.
Natural Language Enterprise Search
For enterprise search, Teragram’s NLP technologies scan structured corporate databases and unstructured sources including text-based reports and Web pages to provide comprehensive answers from these multiple information sources.
“With today’s multinational companies and distributed workforces, as well as tremendous amounts of data in disparate systems and formats, it’s more important than ever to get quick and accurate answers to key business questions,†said Schabes. “Enterprise search is a competitive weapon for tapping an organization’s existing data resources. Combining SAS’ business intelligence, data integration and advanced analytics with Teragram’s NLP technologies will deliver answers to search queries in seconds.â€
Teragram’s sophisticated search capabilities deliver an easy-to-use environment for BI, extending the availability and use of BI throughout organizations. The combination of SAS and Teragram technologies provides indexing driven not just by a report’s header, but by its actual content and the metadata associated with it.
Mobile Search
Teragram also brings SAS the next generation of mobile search, helping individuals scan information remotely and get answers faster. Using Teragram’s mobile search technology, individuals can store and retrieve information, connect to outside applications such as BI systems, and search databases from their BlackBerry, smart phone or other mobile device.
An Explosion in Unstructured Data
Business management expert Bill Jensen first decried the downsides of today’s information explosion back in 2001, in his book “Simplicity.” According to his research, echoed by others, the most conservative estimates currently show that business information is doubling every eighteen months. This data flood has only grown more pronounced in recent years, and much of this data lies outside traditional, “structured†databases. According to estimates, unstructured data comprises up to 70% of all business data. This unstructured data resides in customer comments and service notes, e-mail and chat threads, documents and surveys, blogs and RSS feeds, warranty claims, resumes, voicemail and phone logs, among other sources.
If businesses fail to include this unstructured data in their analyses – of customers, market opportunities, internal operations, supply chains, etc. – they are only seeing part of the complete picture, and can make bad decisions as a result. Powerful analytics like SAS’ can help organizations weave structured and unstructured data to uncover hidden patterns and trends, and then use this insight to make better decisions, solve problems and take advantage of opportunities.
SAS: A Pioneer in Text Mining
SAS has offered text mining capabilities within its software for more than a decade, and launched a specific text-mining product, SAS Text Miner, in 2002.
Today, SAS Text Miner is included in several industry-specific solutions, including SAS Warranty Analysis. Manufacturers such as Sub-Zero and Shanghai General Motors use this solution to bring together and analyze warranty claims and service data, much of it unstructured. They can discover potential problems early enough to take quick action, leading to improved product quality and enhanced customer satisfaction and loyalty.
Banks use text analytics on transcripts of customer calls and related metadata (such as length of call, hold time, number of transfers) to determine customers’ satisfaction sentiment and predict outcomes (is a customer a good credit risk or are they likely to close an account, for example). Insurance companies use text analytics on adjusters’ claim notes and demographic information to detect possible fraudulent claims.