view  

The 3taps Data Commons

The Search API

The 3taps Search API is responsible for searching against the database of postings. For example, it can be used to find all postings from a particular data source, category and location, or to find postings with a given annotation value.

There are two ways in which the Search API can be used:

To improve performance, the Search API makes use of two important concepts:

Let's take a closer look at these two concepts.

Tiered Search

When it comes to postings, time is very important: recent postings are far more likely to be relevant than older postings. To improve the performance of the 3taps Search API, the postings are divided into tiers, where the first tier is optimised to access the most recent postings as quickly as possible. Older postings are still available in subsequent tiers, though they may take longer to access:

The exact number of tiers in the database is not hardwired, and there is no guarantee that a given tier will cover a given period of time. What is certain, however, is that tier 0 will contain the most recent or most "active" postings, and should always be tried first for any given search

By default, all requests to the 3taps Search API will search for postings in tier 0. This can be overridden by supplying a tier=N parameter when making your request.

Note that the Search API will tell you which tier to use for obtaining more results, in the next_tier field.

Pagination of Search Results

A search request will often find too many postings to return all at once. For example, imagine a search for the word "the" in a tier of the posting database consisting of 17 million postings. This search may well return 12 million postings -- there is no way the Search API could read all those postings from the database tier, bundle them up and return them back to the caller all at once. Such a request would take several minutes at least, and the client would probably choke on the huge packet of data that was returned.

To avoid this, the search API makes use of pagination -- that is, an initial search will return the first page of search results, which would typically be the "N" most recent matching postings within the specified tier:

To retrieve more postings, the caller makes another call to the Search API, asking for the second page of results:

More calls can be made to retrieve more postings, going further and further back through the posting database.

There is, however, a problem with this pagination model: the posting database is not static. New postings are being added all the time, pushing the existing postings further back in time. What this means for the Search API is ambiguous -- how should the Search API interpret the concept of the "second page of search results" when new postings have been added to the database?

To avoid this issue, the Search API returns an anchor along with the search results themselves:

This anchor is used to keep the pages of search results in the same place in the posting database, even as new postings come in. The anchor is also used to tell the caller that more postings have been added to the database since the initial search was made.

Note that the 3taps Search API will tell you which page number to use to obtain more results, in the next_page field. If the current tier has no more matches, the next_page field will be reset to zero as you move to the next tier.

Search Criteria

A search request can include the following search criteria:

category_group

The desired 3taps category grouping code. Note that this parameter can include logical operators, as described below.

category

The desired 3taps category code. Note that this parameter can include logical operators, as described below.

location.country

The desired 3taps country code, or "*" to search for all postings which have a country value. Note that this parameter can include logical operators, as described below.

location.state

The desired 3taps state, or "*" to search for all postings which have a state value. Note that this parameter can include logical operators, as described below.

location.metro

The desired 3taps metro area code, or "*" to search for all postings which have a metro value. Note that this parameter can include logical operators, as described below.

location.region

The desired 3taps region code, or "*" to search for all postings which have a metro value. Note that this parameter can include logical operators, as described below.

location.county

The desired 3taps county code, or "*" to search for all postings which have a metro value. Note that this parameter can include logical operators, as described below.

location.city

The desired 3taps city code, or "*" to search for all postings which have a metro value. Note that this parameter can include logical operators, as described below.

location.locality

The desired 3taps locality code, or "*" to search for all postings which have a metro value. Note that this parameter can include logical operators, as described below.

location.zipcode

The desired 3taps ZIP code, or "*" to search for all postings which have a metro value. Note that this parameter can include logical operators, as described below.

radius

The desired radius for a radius-based search. This should be a string consisting of a number and a suffix, like this:

radius=10.2mi

The number can be any whole or floating point value, and the following suffixes are currently supported:

ft = feet.
m = meters.
mi = miles.
km = kilometers.

There is no default value for the units. If a suffix is not supplied, the search request will be rejected and an error returned back to the caller; this is to prevent confusion where, for example, the user assumes the radius is in miles but it is actually in metres, and surprising results are returned.

Note that if the radius parameter is supplied, the lat and long parameters must also be supplied.

lat

The desired latitude for a radius-based search, in decimal degrees. Note that this parameter can also be used for distance-based sorting, as described below.

long

The desired longitude for a radius-based search, in decimal degrees. Note that this parameter can also be used for distance-based sorting, as described below.

source

The desired 3taps source code. Note that this parameter can include logical operators, as described below.

external_id

The desired external ID for a posting. Note that external ID values are guaranteed to be unique for a single data source, but the same value may be used across sources. For this reason, you would normally search for both the source and the external ID at the same time. Note that this parameter can include logical operators, as described below.

heading

A freeform text string. Only postings with the given string in their heading will be included in the search results. Note that this parameter can make use of text operators, as described below.

body

A freeform text string. Only postings with the given string in their body will be included in the search results. Note that this parameter can make use of text operators, as described below.

text

A freeform text string. Only postings with the given string in their heading or body will be included in the search results. Note that this parameter can make use of text operators, as described below.

timestamp

This parameter is used to only include postings with a given range of timestamp values. The timestamp parameter can either be set to the special value all to match all postings, or alternatively you can use one of the following three formats to specify a desired date/time range:

MIN_TIMESTAMP..MAX_TIMESTAMP   
MIN_TIMESTAMP..   
..MAX_TIMESTAMP

MIN_TIMESTAMP and MAX_TIMESTAMP represent the minimum and maximum timestamp values, respectively. If both are supplied, then only postings within the given range of timestamp values will be included in the search results. If only a minimum timestamp is supplied, only postings later than that timestamp will be included. Similarly, if only a maximum timestamp is supplied, only postings earlier than that timestamp will be included.

Timestamp value can be represented in one of the following ways:

1364387696

An integer value represents a "unix time" value -- that is, the number of seconds since the 1st of January, 1970, in UTC.

2013-03-27 12:34:56

A string containing hyphens or colons are interpreted as a date/time value, in UTC. The following formats are accepted:

YYYY-MM-DD
YYYY-MM-DD HH:MM
YYYY-MM-DD HH:MM:SS

7d
30m

A string consisting of a number following by a single letter represents a relative time value. The timestamp is calculated relative to the current date and time. For example, if the current date and time is 2013-03-28 12:06:30, and the relative time value is 30m, then the timestamp will be calculated as 30 minutes ago -- ie, 2013-03-28 11:36:30.

The following relative timestamp codes are currently supported:

s = seconds.
m = minutes.
h = hours.
d = days.
w = weeks.

id

This parameter can be used to search for a single posting with a single record ID, or a number of postings within a given range of record IDs. To search against a single record ID, the value should simply be the integer record ID, like this:

?id=1234567

Alternatively, to search against a range of record IDs, the parameter should look like this:

?id=1234567..1234890

price

This parameter is used to only include postings with a given range of price values. The parameter's value should be a string formatted in one of the following ways:

MIN_PRICE..MAX_PRICE
MIN_PRICE..
..MAX_PRICE
*

MIN_PRICE and MAX_PRICE represent the minimum and maximum price values. If both are supplied, then only postings with the given range of price values will be included in the search results. If only a minimum price is supplied, only postings greater than or equal to that minimum price will be included. Similarly, if only a maximum price is supplied, only postings less than or equal to that maximum price will be included.

If the price criteria is set to *, the search will match all postings which have a price.

Note that price values can be given as either integer or floating-point values.

currency

Only include postings in the given currency. The parameter's value should be a 3-character ISO-4217 currency code.

annotations

A specially-formatted string identifying one or more annotation values. Only postings which match the given annotation value(s) will be included in the search results.

The value of this parameter should consist of one or more key:value pairs separated by either AND or OR, and the whole string surrounded by { and } characters. For example:

annotations={make:ford AND model:mustang}

Note that the value can be set to "*", which means that all postings will be found that have the given annotation regardless of its value. For example, the following search criteria:

annotations={size:large AND color:*}

will find all postings which have their size annotation set to large, and which have a color. Note that if you combine multiple annotation comparisons, you can use parentheses to control the order in which they are evaluated. For example:

annotations={(bedrooms:2br OR bedrooms:3br) AND dogs:yes}

status

Only include postings with the given status value. The following status values are currently supported:

registered   
for_sale   
for_hire      
for_rent  
wanted   
lost   
stolen   
found

Note that this parameter can include logical operators, as described below.

state

The state the posting is in. This will be one of the following values:

available   
unavailable   
expired

Note that this parameter can include logical operators, as described below.

has_image

If this parameter is supplied and has the value "1", only postings which have an image will be included in the search results. If this parameter has the value "0", only postings which do not have images will be included in the search results.

has_price

If this parameter is supplied and has the value "1", only postings which have a price value will be included in the search results. If this parameter has the value "0", only postings which do not have a price value will be included in the search results.

include_deleted

By default, deleted postings will not be included in the search results. If this parameter is supplied and has the value "1", however, deleted postings will be included in the search results.

only_deleted

If this parameter is supplied and has the value "1", only deleted postings will be included in the search results.

Logical Operators

A number of parameters can make use of logical operators to allow for more sophisticated search options. For these parameters, a single value can be supplied simply by providing the value itself, like this:

category=SBIK

However, special characters can be used to change the way in which the parameter is interpreted. Two special characters are currently supported:

Text Operators

For text-based searches, the following operators can be used to perform more sophisticated text searches. Firstly, if multiple words (separate by spaces) are provided, the search will find all postings with those individual words in them. For example, a search for:

 heading=big bike

will find postings with the word "big" and the word "bike" in their heading.

To find postings with an exact phrase in them, surround the desired search text with double quote characters, like this:

 heading="big bike"

This will only find postings containing the phrase "big bike".

The following text operators can be used, either with individual words or with quoted phrases, to build more sophisticated searches:

Note that these text operators can be combined to yield quite complex text searches. For example:

 heading="big bike"&~"trek superfly"

This will search for all postings containing the phrase "big bike", but without the phrase "trek superfly".

Radius-Based Searches

The Search API allows you to find all postings within a given radius of a point on the Earth's surface. To perform a radius-based search, you need to supply the following three parameters:

lat=...
long=...
radius=...

Note that when you request a radius-based search, only those postings which have a latitude and longitude value within the given search radius will be found. Postings which do not have a latitude and longitude value, and postings where this information is not correct, will not be included in the search results.

Sorting of Search Results

By default, search results are returned in descending order of timestamp -- that is, the newest posting will be returned first. This ordering can be changed if desired, using the sort parameter. The following sort orders are currently supported:

Timestamp

sort=timestamp

This sorts the postings in ascending time order -- that is, the oldest posting will be found first.

Reverse Timestamp

sort=-timestamp

This sorts the postings in descending time order -- that is, the newest posting will be found first. This is the default sort ordering for search results.

Price

sort=price

This sorts the postings in ascending price order -- that is, the cheapest posting will be found first.

Reverse Price

sort=-price

This sorts the postings in in descending price order -- that is, the most expensive posting will be found first.

Distance

sort=distance

This sorts the postings by distance from a specified point. When sorting by distance, the closest postings will be found first. To sort the postings by distance, you must provide the following additional parameters:

lat

The latitude of the desired starting point, in decimal degrees.

long

The longitude of the desired starting point, in decimal degrees.


Note that if you sort the postings by anything other than the default (descending order of timestamp), the Search API may return postings in what appears to be the "wrong" order if you search across tiers. This is because each tier has its own distinct set of postings. So if, for example, you are sorting your search results by ascending price, when you move from one tier to the next your next page of search results will include the lowest-priced postings within that tier -- even if you have received lower-priced postings in an earlier page of results. It is up to you to correctly sort the postings when searching across multiple tiers with a non-default sort order.


Index Searches

While the Search API allows for general searching, it can also be used to identify postings only by their internal record ID. Searches can return a list of record IDs that match a given set of criteria, returning only the ID and no other additional details. This is often useful for generating visualisations based on the number of records that fall into certain groupings, without wanting to display the details of the individual records.

To perform an index search, set the retvals parameter so that the Search API will only return the record ID:

retvals=id

When you do this, the number of matching records which can be returned in a single page of search results (as controlled by the rpp parameter) can be increased up to a maximum of 10,000.

Count Mode

When calling the Search API in count mode, you have to provide a count field to use for calculating the counts. The following fields can be used for count mode:

source

Calculates the number of matching postings for each data source.

category

Calculates the number of matching postings for each unique category value.

category_group

Calculates the number of matching postings for each unique category group value.

location.country

Calculates the number of matching postings for each country.

location.state

Calculates the number of matching postings for each state.

location.metro

Calculates the number of matching postings for each metro area.

location.region

Calculates the number of matching postings for each region.

location.county

Calculates the number of matching postings fore each county.

location.city

Calculates the number of matching postings for each city.

location.locality

Calculates the number of matching postings for each location.

location.zipcode

Calculates the number of matching postings for each ZIP code.

language

Calculates the number of matching postings for each ISO 639-1 language code.

currency

Calculates the number of matching postings for each ISO 4217 currency code.

status

Calculates the number of matching postings for each status value.

state

Calculates the number of matching postings for each state value.

flagged_status

Calculates the number of matching postings for each flagged status value.

Upon completion, the Search API will return the number of matching postings for each unique field value, rather than individual postings themselves. For example, a count by status might return the following:

registered = 816
for_sale = 4,612
for_hire = 1,208
wanted = 319
lost = 57
stolen = 63
found = 36

Calling the 3taps Search API

The Search API is accessed via a single entry point, which can be found at the following URL:

http://search.3taps.com

The Search API call is made using an HTTP GET request to the above URL. To make a search request, pass the following parameters in addition to the search criteria themselves:

auth_token (required)

The authentication token used to ensure that you are authorized to make this API call.

rpp (optional)

The desired number of results per page. This defaults to 10, but can be set to any value from 1 to 100 for ordinary searches, and up to 10,000 for index-only searches, as described above.

retvals (optional)

A string listing the fields which should be returned back to the caller. The various fields should be separated by commas. At present, the following fields can be included in this parameter:

id
account_id
source
category
category_group
location
external_id
external_url
heading
body
timestamp
timestamp_deleted
expires
language
price
currency
images
annotations
status
state
immortal
deleted
flagged_status

If no retvals parameter is provided, the following default will be used:

id source
category
location
external_id
external_url
heading
timestamp

sort (optional)

A string indicating the desired sort order to use for the found postings. The following sort orders are currently supported:

timestamp

Sort the postings in ascending order of timestamp (ie, oldest to newest).

-timestamp

Sort the postings in descending order of timestamp (ie, newest to oldest).

price

Sort the postings in ascending order of price (ie, cheapest to most expensive).

-price

Sort the postings in descending order of price (ie, most expensive to cheapest).

distance

Sort the postings in ascending order of distance (ie, closest to furthest) from the point defined by the lat and long parameters.

If no sort parameter is provided, a default value of -timestamp will be used (ie, sorting from newest to oldest).


WARNING: when sorting the postings by price, make sure you include a currency in the search criteria. Otherwise, the search results will include postings in different currencies, and the sort order will be meaningless.


count (optional)

A string containing the field to use for calculating the count. Supplying this parameter enables count mode. The following values can be supplied for this parameter:

source
category
category_group
location.country
location.state
location.metro
location.region
location.county
location.city
location.locality
location.zipcode
language
currency
status
state
flagged_status

Upon completion, the Search API will send back a response with a Content-Type of application/json, and the body of the response will be a JSON-formatted object. The contents of this object will vary depending on whethe the search API was called in search mode or count mode.

Search Mode Results

In search mode, the JSON-formatted object returned in the body of the HTTP response will include the following fields:

success

A boolean indicating whether or not the search request succeeded.

error

If the search request was not successfully processed, this will be a string describing what went wrong.

anchor

This will be a string that can be used in subsequent search requests to ensure that the pagination is consistent even if new postings are received. See above for details.

next_page

The "page" value to use to obtain the next set of search results. If there are no more results, a value of -1 will be returned.

next_tier

The "tier" value to use to obtain the next set of search results. If there are no more results, a value of -1 will be returned.

postings

An array containing the found postings, in the order specified by the sort=... parameter. Each entry in this array will be an object with all or some of the following fields, depending on the value of the retvals parameter, above:

id

The internal record ID used to uniquely identify this posting. Note that this will be a (possibly very large) integer.

account_id

A string identifying the user who submitted the posting.

source

The code for the source system where this posting originated from.

category

The 3taps category code for this posting.

category_group

The 3taps category group code for this posting.

location

An object with some or all of the following fields:

lat

The latitude of this posting, in decimal degrees.

long

The longitude of this posting, in decimal degrees.

accuracy

An integer indicating the accuracy of the supplied lat/long value.

country

The 3taps country code for this posting.

state

The 3taps state code for this posting.

metro

The 3taps metro area code for this posting.

region

The 3taps region code for this posting.

county

The 3taps county code for this posting.

city

The 3taps city code for this posting.

locality

The 3taps locality code for this posting.

zipcode

The 3taps ZIP code for this posting.

formatted_address

A string containing the address associated with this posting as a one-line string, if this information is available.

external_id

A string that uniquely identifies the posting in the source system.

external_url

A URL pointing to the original posting.

heading

A string containing the heading for this posting.

body

A string containing the body of this posting, in plain (unformatted) text.

timestamp

The date and time at which the posting was created, as an integer number of seconds since the 1st of January 1970 ("unix time"), in UTC.

timestamp_deleted

The date and time at which the posting was deleted, as an integer number of seconds since the 1st of January 1970 ("unix time"), in UTC.

expires

The date and time at which this posting should expire, as an integer number of seconds since the 1st of January 1970 ("unix time"), in UTC.

language

The 2-character ISO 639-1 language code indicating which language the posting is in.

price

The price associated with this posting, if any.

currency

The 3-character ISO-4217 currency code indicating which currency the price is in.

images

An array of image objects representing the images associated with this posting. Each image object can have the following fields:

full

The URL pointing to the full-sized image for this posting.

full_width

The width, in pixels, for the full-sized image, if known.

full_height

The height, in pixels, for the full-sized image, if known.

thumbnail

The URL pointing to a thumbnail-sized image for this posting.

thumbnail_width

The width, in pixels, for the thumbnail-sized image, if known.

thumbnail_height

The height, in pixels, for the thumbnail-sized image, if known.

annotations

A object holding the various annotations to associate with this posting. Each field in this object maps the annotation name to its associated value, formatted as a string.

status

The posting's current status. This will be one of the following values:

registered
for_sale
for_hire
for_rent
wanted
lost
stolen
found

state

The current state of the posting. This will be one of the following values:

available
unavailable
expired

immortal

true if the posting is immortal (ie, never expires), false otherwise.

deleted

true if the posting is deleted, false otherwise.

flagged_status

A number indicating the current "flagged" status of the posting. This will be one of the following values:

0 = The posting has never been flagged.
1 = The posting has been flagged by a user.
2 = The flagged status was overruled.

num_matches

The total number of matching postings found by this search. Note that this is the number of matching postings in the database, not the number of postings actually returned in the current page of search results.

time_taken

The total time taken to process this search request, in milliseconds. The total search time consists of the time taken to identify the matching records (time_search), plus the time taken to fetch the desired set of fields from those matching records (time_fetch).

time_search

The time taken to identify the matching records, in milliseconds.

time_fetch

The time taken to fetch the search results once the matching records have been identified, in milliseconds. Note that this is only available for tier 0 searches.

After making the initial search request, you can ask for more pages of results by reissuing the search request with the following additional parameters:

anchor

The anchor value returned by the Search API when the first request was made.

page

The page number for the desired page of postings. You should use the value returned by the previous call to the Search API in the next_page field to obtain the next page of search results.

tier

Which tier to use for the search. You should use the value of returned by the previous call to the Search API in the next_tier field to ensure that you continue to search all the available postings in the 3taps system.

WARNING: Make sure you also include the other parameters you passed when making the original search request, in particular the search criteria and the rpp value. Changing these will result in the wrong set of postings being returned.

The response will be the same as for the initial search request, with the addition of one extra field in the returned JSON object:

new_postings

This is the number of new postings which have been added to the database since the initial search request was made.

Note that if new postings have come in, you can find these postings by reissuing the search request without the anchor parameter.

Count Mode Results

In count mode, the JSON-formatted object returned in the body of the HTTP response will include the following fields:

success

A boolean indicating whether or not the search request succeeded.

error

If the search request was not successfully processed, this will be a string describing what went wrong.

num_matches

The total number of matching postings found by this search.

counts

An array of values holding the various field values and the number of matching postings with that field value. For example, when performing a search with count=status, the JSON object will include something like the following:

 "counts": [{"count": 816, "term": "registered"},
            {"count": 4612, "term": "for_sale"},
            {"count": 1208, "term": "for_hire"},
            {"count": 910,  "term": "for_rent"},
            {"count": 319, "term": "wanted"},
            {"count": 57, "term": "lost"},
            {"count": 63, "term": "stolen"},
            {"count": 36, "term": "found"}]

As you can see, each item in the returned array is an object with count and term fields, where term is the value of the count field, and count is the number of matching postings with that count field value.

time_taken

The total time taken to process this search request, in milliseconds.

time_search

The time taken to identify the matching records, in milliseconds.