Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

 Data Revolution

Tags: search


In Search-related references, there have been three documented "revolutions"). The interactive revolution is still going on today. We are also in the midst of a fourth revolution: the data revolution, which is being driven by increased capabilities and interest in recording, analyzing, and learning from user data in aggregate and individually. Data mining and machine learning models applied to query-click data have resulted in improvements in ranking, query suggestion, and search advertising, as well as a better understanding of searchers, their activities, and their satisfaction and success. 

In the future, search tasks must be treated as first-class components of the search process. When the focus should be on supporting end-to-end task completion, session data is still only used to augment individual questions3. Web browser trails record behavioral traces in online environments, which can be used to guide others. Such activity sequences could be mined by search providers to tap into populations' procedural search know-how. These could be used for strategic search support, such as guided tours that cover entire tasks rather than just the starting points provided by today's search results. 

Web search engines have made significant progress in understanding the intended meaning of a query, such as recognizing entity mentions and common query patterns. These are used not only to improve query precision and recall but also to provide a direct "best answer" for the most likely query meaning. Beyond query text, search systems now have access to a plethora of new signals via new interaction modalities such as touch and gesture, as well as sensor tracking signals such as physiology, eye gaze, and locomotion. Cursor movements can also be collected at scale and used to interpret user activity when click-through data is unavailable. Search engines are increasingly using semantic data to better understand document content, in addition to interactions. This information is derived from background knowledge graphs and using the common schema.org ontology and markup standards, from the documents themselves in the form of embedded semantic data (microdata, RDFa, and JSON-LD). 

Data collected from users and elsewhere is no longer siloed in specific machines or applications thanks to cloud services. Longitudinal data on searchers aids in the development of rich models of their interests and expertise. Short- and long-term data from individuals is used to operationalize search personalization, which can be scaled to cohorts if data is scarce. Non-search services (such as productivity apps) can also provide data to enrich contextual models and improve search effectiveness. A spreadsheet search for "VAR," for example, provides evidence that the intent is variance rather than the value at risk. Other signals, such as spatial context and time, provide additional context for the search situation.

Using massive amounts of activity data to better understand the human condition has enormous potential. Despite the lack of ground truth about search intent, success, experience, and attention, logs can still be used to characterize search behavior, build machine-learning models, and make meaningful discoveries, such as forecasting influenza in populations. Access to this data is restricted to search providers or is only available for a significant fee from analytics companies, impeding scientific progress. To address this, search providers have made limited search log data and other resources available (for example, Microsoft recently released MARCO, a machine reading comprehension dataset), and some researchers have made user study data widely available (an encouraging trend).Data.gov and other open data initiatives promote data availability, but not for search data, at least not yet.

Privacy and data reliability are critical when working with search interaction data (or any user data) to make intelligent inferences. Concerns about privacy must be addressed as a result of the creation of user-profiles and detailed surveillance of people's activities. Systems should obtain user permission and provide clear explanations of what is being recorded and how it will be used. Search providers must act responsibly and correct any biases in search results, user data, and user sampling. Many factors have an impact on recorded activities in humans(for example, cognitive biases, behavioral biases, common misconceptions, and misinformation and rumor). These factors can skew behavioral signals used in ranking algorithms, such as click-through rates, resulting in "filter bubbles."  This must be taken into account during data collection and experimental analysis. 

Many of these lessons are applicable in areas other than Web search. Many searches are domain-specific, such as legal, medical, and intellectual property. Even within Web search, different verticals (such as images, video, and news) each have their own presentation format and interaction method (for example, "infinite scrolling" in image search). Vertical and generic search boundaries are blurring as content from verticals bleeds into general result pages, affecting search interactions. 

Read Also:  Evolving Capabilities and Expectations



This post first appeared on With Me You Will Learn, please read the originial post: here

Share the post

 Data Revolution

×

Subscribe to With Me You Will Learn

Get updates delivered right to your inbox!

Thank you for your subscription

×