In this book, we aim to provide a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. More specifically, we cover the issues involved in the design of three separate systems that are commonly available in every web-scale search engine: web crawling, indexing, and query processing systems. We present the performance challenges encountered in these systems and review a wide range of design alternatives employed as solution to these challenges, specifically focusing on algorithmic and architectural optimizations. We discuss the available optimizations at different computational granularities, ranging from a single computer node to a collection of data centers. We provide some hints to both the practitioners and theoreticians involved in the field about the way large-scale web search engines operate and the adopted design choices. Moreover, we survey the efficiency literature, providing pointers to a large number of relatively important research papers. Finally, we discuss some open research problems in the context of search engine efficiency.
We use string processing to denote any use of computers to process and manage strings or sequences of symbols. This includes text retrieval, compression, computational biology, natural language processing, word theory, etc. Strings can also be extended to other dimensions, including images and complex objects, such as trees or graphs. These areas are important for many applications, including text, image or genetic databases. Nowadays, the most important motivation for research is searching and managing the World Wide Web. The Web contains terabytes of data and searching for information is becoming as difficult as finding a needle in a haystack. Future versions of this work-shop will focus on generic information retrieval, query languages, user interfaces and visualization tools.
In this book, we aim to provide a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. More specifically, we cover the issues involved in the design of three separate systems that are commonly available in every web-scale search engine: web crawling, indexing, and query processing systems. We present the performance challenges encountered in these systems and review a wide range of design alternatives employed as solution to these challenges, specifically focusing on algorithmic and architectural optimizations. We discuss the available optimizations at different computational granularities, ranging from a single computer node to a collection of data centers. We provide some hints to both the practitioners and theoreticians involved in the field about the way large-scale web search engines operate and the adopted design choices. Moreover, we survey the efficiency literature, providing pointers to a large number of relatively important research papers. Finally, we discuss some open research problems in the context of search engine efficiency.
We use string processing to denote any use of computers to process and manage strings or sequences of symbols. This includes text retrieval, compression, computational biology, natural language processing, word theory, etc. Strings can also be extended to other dimensions, including images and complex objects, such as trees or graphs. These areas are important for many applications, including text, image or genetic databases. Nowadays, the most important motivation for research is searching and managing the World Wide Web. The Web contains terabytes of data and searching for information is becoming as difficult as finding a needle in a haystack. Future versions of this work-shop will focus on generic information retrieval, query languages, user interfaces and visualization tools.
This will help us customize your experience to showcase the most relevant content to your age group
Please select from below
Login
Not registered?
Sign up
Already registered?
Success – Your message will goes here
We'd love to hear from you!
Thank you for visiting our website. Would you like to provide feedback on how we could improve your experience?
This site does not use any third party cookies with one exception — it uses cookies from Google to deliver its services and to analyze traffic.Learn More.