Gas for Google: Researcher develops fuel for Internet search engine

Thursday, November 10, 2005

Before the 21st century, the word

"google" might have conjured up images of fairy tales or cartoons, but today Google has become as common as the verb "drive."

And that

s just what Google does it drives computer users to their destination.

Now, a University of North Texas researcher is helping create high octane fuel that will drive computer users to more destinations than they thought possible.

Google Print, a division of Google, Inc., provided UNT researcher Rada Mihalcea with a $107,112 grant to continue her research on information retrieval to benefit Google Print

s online electronic database of this kind of information.

The Google grant relates to her research in the field of natural-language processing, which looks at how language is used in information technology.

Mihalcea extracts information from texts online and works to create a program that could sort through large texts. The program will sort through large texts available online and create keyword indexes and short summaries of the texts.

Her research could allow a computer user to automatically generate indexes or summaries of the documents they retrieve in response to a search engine query, which will ease and speed up access to information stored in these documents.

"

There are millions of books and an enormous amount of information already available online, which is too hard to process and evaluate without this sort of access to concise information," Mihalcea says.

Mihalcea has already developed technology that can summarize short texts such as news articles, and she plans to use this technology as the foundation for her research to sum up larger texts such as books.

She says creating a program to determine the important themes of larger texts is challenging, since the important information could be spread out in megabytes of text and may be harder to locate.

Mihalcea

s research could help Google Print effort provide the means to generate indexes or summaries for books stored in electronic format, which could help users determine if they are interested in a particular book.

Google is currently working with libraries at Harvard University, Stanford University, the University of Michigan, the University of Oxford and the New York Public Library to digitally scan books and provide databases with specific book content.

A huge catalog of books that are already in the public domain as well as books that publishers have submitted to Google, are currently available on print.google.com.

UNT News Service Phone Number: (940) 565-2108