Paper: Anticipating Information Needs Based on Check-in Activity¶

Resources¶

On this page we provide additional resources related
to the above mentioned paper. The resources include
experiment details, datasets to download, further
analyses and algorithms.

Currently, the paper is in the reviewing process and
from obvious reasons we provide only previews
of the files at the moment.

Crowdsourcing experiments¶

Experiment #1

We seek to measure the recall of the extracted information needs. We ask people to imagine being at a location from a given top-level POI category and provide us with the top three information needs that they would search for on a mobile device in that situation. (graphics/experiment01-form.png)
see experiment layout

Experiment #2 (textual mode)

The second experiment is aimed at determining how well we can rank information needs with respect to their relevance given an activity (i.e., $P(i|a)$). We ask study participants to rank the usefulness of a given information need with respect to a selected category on a 5-point Likert scale, from 'not useful' to a 'very useful' piece of information. We evaluated the top 25 information needs for the 5 most visited second-level categories for each of the 9 top-level categories, amounting to 1125 distinct information need and activity pairs.
see experiment layout

Experiment #3 (card-based mode)

Identical settings as in experiment #2 with one difference: the information needs are not presented as text, instead, information cards are used.
see experiment layout

Experiment #4

Experiment #4 is focused on collecting measurements for temporal scope of information needs. In batches of 5, we presented the 30 top-ranked information needs in each top-level category. The task for the assessors was to decide when they would search for that piece of information in the given activity context: before, during, or after they have performed that activity. They were allowed to select one or more answers if the particular information need was regarded as useful for multiple time slots.
see experiment layout

Experiment #5 (top category)

Experiments #5 and #6 are used to evaluate how well we can anticipate (i.e., rank) information needs given a past activity. Crowd judges are tasked with evaluating the usefulness of individual information needs, presented as cards, given the transition between two activities. We collected judgments for the top 10 information needs from each of the activities in the transition. #5 considers top-level activities.
see experiment layout

Experiment #6 (second category)

Identical settings as in experiment #2 with one difference: second-level activities are used.
see experiment layout

Overview table:

Experiment	#Tasks	Workers/task	Payment/task	Worker satisfection	Payment total	Download dataset
#1	9	30	10.0 ¢	86.0%	\$ 27	download
#2	1125	5	0.60 ¢	68.0%	\$ 34	download
#3	1125	5	0.60 ¢	66.9%	\$ 34	download
#4	335	9	2.00 ¢	84.0%	\$ 60	download
#5	1148	5	0.75 ¢	60.0%	\$ 43	download
#6	1240	3	0.75 ¢	72.0%	\$ 28	download
Total					\$ 226	download all

Datasets¶

Query suggestions¶

We retrieved query suggestions for a sample of Foursquare POIs (see Section 3.2.1). Here we provide this data after cleansing steps described in the paper.

Normalized information needs¶

As described in Section 3.3.2, we normalized information needs extracted from Google Suggestions. Clustering provided by 3 assessors as well as final canonical set is made available here.

Foursquare check-in dataset¶

Author: Dingqi Yang
Download at: https://sites.google.com/site/yangdingqi/home/foursquare-dataset