big dataOver 150 billion email messages are currently being sent every day from about 3.5 billion email accounts worldwide. So, it’s easy to understand why we miss important messages as we struggle to keep up with the surge. For some, the situation has become so bad that email is no longer a reliable way to get in touch with them since they can’t quickly sort out the important stuff.

A new research project coming out of Israel looks to solve part of the problem by using big data to boil down email messages to their most important information and summarize them to be digested much more quickly, especially on mobile devices.

The project is led by Mark Last, an assistant professor at Ben Gurion University in Be’er-Sheva, Israel, and it’s focused on using algorithms to summarize blocks of text into their most important elements. From the standpoint of email, this could have two main benefits:

1. Create one-sentence email summaries to be used in preview panes so that users can quickly flip through a list of messages and see the main idea of each email without having to open it.

2. Summarize long emails into 100-200 words that highlight the key points.

Last and his team of researchers at BGU are accomplishing this by using the tools of big data. In fact, Last has been working on big data and using it to solve problems since 1996 when he was a PhD student at Tel Aviv University. That was long before it was ever called “big data.” Back then it just data mining with unstructured data — one of the key elements of big data — and Last’s PhD sponsor barely understood the web mining and text mining research that he was doing.

Now, Last, who was born in Russia and came to Israel as a kid in 1977, is putting that experience to good use in what has become one of the hottest fields in IT. In 2008, he became a professor of Information Systems Engineering at Ben Gurion University and one of his big projects as been using text mining to find terrorist sites on the web.

There are tens of thousands of terrorist organization sites on the Internet, but they often disguise themselves as news, information, or community sites. Last and his team have used algorithms called “characterization models” to scan the web and pinpoint terrorist sites by identifying words that they use repeatedly, such as “enemy” and “martyr,” and phrases that they try to dance around, such as saying “human bomb” rather than “suicide bomber.”

Clearly, this kind of data mining is different from the text summarization that is used in the email research mentioned.

“In data mining, in text mining, we have different methods, different tools,” said Last.

Nevertheless, the work supports each other and both are aspects of big data. The text summarization work started as an initiative to help summarize lots of news articles, short books, and documents on the web. This was especially useful to intelligence agencies, who used this technology to comb through thousands of news reports and web documents as quickly as possible. They could look at the 100-200 word summaries of the pages/documents and then decide which ones deserved a further look verses which ones they could avoid wasting time on.

Out of that grew the idea from Last and his team to apply this idea to email, where summarization could be employed to help quickly sort through messages and find the ones that you need to pay attention to versus the ones you can safely ignore. By Jason Hiner read more