Inteks - Custom software development company
Inteks : Innovation + Technology = Software Inteks : Challenging Task - Intellegent Solution
Innovatiion. Technology. Software
Java/J2EE .Net Eclipse platform
Send us an E-mail Inteks site map Email our Sales Representative
About Us
IT Services
Company Profile
Contacts

Send us an email Contact us
Request for quote Request for quote

  Search Engine

Home About Us

Project name

Search Engine

Customer

USA-based company Covered by NDA

Business case

The current scale and growth of the World Wide Web makes effective and accurate search and location of Web pages crucial. Now the only feasible way for searcher to locate a particular Web-based source is to use a Web search engine. Generic large-scale search engines return thousands web-pages, and since many of them lack for relevance to the query, searchers only tend to look at the first few results. That's why an accurate rank is critically important.
The Customer decided to build a Web-scale search engine mitigating problems of the existing search systems. The goal of the project was to address many issues, both in quality and scalability, by scaling search engine technology to extraordinary web growth. Creating a search engine which scales even to today's web presents many challenges. Fast crawling technology is needed to gather the web documents and keep them up to date. Storage space must be used efficiently to store indices and the documents themselves.
The system had to keep local copies of documents retrieved from the Internet and had to have fast data storage. Full size of the document repository that contains all information about web pages (including document header, archived document body, etc.) was estimated as dozens terabytes.

Solution overview

Inteks specialists spent over 1000 man-hours investigating the issues of relevance calculation for large collections of documents. As a result, an architecture was designed that could support novel research activities on large-scale web data. Due to distributed data processing architecture, search engine can be scaled to any target system, from desktops to high-end computers. A novel "search by meaning" feature employs thesauri-based retrieval concept: the purpose of Thesaurus is to provide meanings and synonyms for a given word, and to store relations between words. This information is used by front-end application to provide a search-refining capability. This capability drastically increases quality and relevance of search results. Two dictionaries were implemented: one as a wrapper to WordNet, the second - user-defined. WordNet is an on-line lexical reference system: its design is inspired by current psycholinguistic theories of human lexical memory.

Benefits

The Customer received a system which meets the highest market requirements. The functionality of the designed system is on the same level with the world leading search engines, and the following facts show the advantages:
The search engine can index up to billions of web pages.
The search system processes a query faster than in 1 second on the index of 1 billion documents.

Tools and technologies

JAVA, JSP, TCP/IP, WordNet® lexical database (by Cognitive Science Laboratory), Linux
Copyright © Inteks LLC
2003-2016
to topto top