Aggregate¶

xframes.aggregate.ARGMAX(agg_column, out_column)[source]¶

Builtin arg maximum aggregator for groupby.

Examples

Get the movie with maximum rating per user.

>>> xf.groupby("user",
                {'best_movie':aggregate.ARGMAX('rating','movie')})

xframes.aggregate.ARGMIN(agg_column, out_column)[source]¶

Builtin arg minimum aggregator for groupby.

Examples

Get the movie with minimum rating per user.

>>> xf.groupby("user",
                {'best_movie':aggregate.ARGMIN('rating','movie')})

xframes.aggregate.CONCAT(src_column, dict_value_column=None)[source]¶

Builtin aggregator that combines values from one or two columns in one group into either a dictionary value, list value or array value.

Examples

To combine values from two columns that belong to one group into one dictionary value:

>>> xf.groupby(["document"],
               {"word_count": aggregate.CONCAT("word", "count")})

To combine values from one column that belong to one group into a list value:

>>> xf.groupby(["user"],
               {"friends": aggregate.CONCAT("friend")})

xframes.aggregate.COUNT()[source]¶

Builtin count aggregator for groupby

Examples

Get the number of occurrences of each user.

>>> xf.groupby("user",
                {'count':aggregate.COUNT()})

xframes.aggregate.MAX(src_column)[source]¶

Builtin maximum aggregator for groupby

Examples

Get the maximum rating of each user.

>>> xf.groupby("user",
                {'rating_max':aggregate.MAX('rating')})

xframes.aggregate.MEAN(src_column)[source]¶

Builtin average aggregator for groupby.

Examples

Get the average rating of each user.

>>> xf.groupby("user",
                {'rating_mean':aggregate.MEAN('rating')})

xframes.aggregate.MIN(src_column)[source]¶

Builtin minimum aggregator for groupby

Examples

Get the minimum rating of each user.

>>> xf.groupby("user",
                {'rating_min':aggregate.MIN('rating')})

xframes.aggregate.QUANTILE(src_column, *args)[source]¶

Builtin approximate quantile aggregator for groupby. Accepts as an argument, one or more of a list of quantiles to query.

Examples

To extract the median

>>> xf.groupby("user",
                {'rating_quantiles': aggregate.QUANTILE('rating', 0.5)})

To extract a few quantiles

>>> xf.groupby("user",
                {'rating_quantiles': aggregate.QUANTILE('rating', [0.25,0.5,0.75])})

Or equivalently

>>> xf.groupby("user",
                {'rating_quantiles': aggregate.QUANTILE('rating', 0.25,0.5,0.75)})

The returned quantiles are guaranteed to have 0.5% accuracy. That is to say, if the requested quantile is 0.50, the resultant quantile value may be between 0.495 and 0.505 of the true quantile.

xframes.aggregate.SELECT_ONE(src_column)[source]¶

Builtin aggregator for groupby which selects one row in the group.

Examples

Get one rating row from a user.

>>> xf.groupby("user", {'rating':aggregate.SELECT_ONE('rating')})

If multiple columns are selected, they are guaranteed to come from the same row. For instance: >>> xf.groupby(“user”, {‘rating’:aggregate.SELECT_ONE(‘rating’), ‘item’:aggregate.SELECT_ONE(‘item’)})

The selected ‘rating’ and ‘item’ value for each user will come from the same row in the XFrame.

xframes.aggregate.STDV(src_column)[source]¶

Builtin standard deviation aggregator for groupby.

Examples

Get the rating standard deviation of each user.

>>> xf.groupby("user",
                {'rating_stdv':aggregate.STDV('rating')})

xframes.aggregate.SUM(src_column)[source]¶

Builtin sum aggregator for groupby

Examples

Get the sum of the rating column for each user. >>> xf.groupby(“user”, {‘rating_sum’:aggregate.SUM(‘rating’)})

xframes.aggregate.VALUES(src_column)[source]¶

Builtin aggregator that combines distinct values from one column in one group into a list value.

Examples

To combine values from one column that belong to one group into a list value:

>>> xf.groupby(["user"],
                 {"friends": aggregate.VALUES("friend")})

xframes.aggregate.VALUES_COUNT(src_column)[source]¶

Builtin aggregator that combines distinct values from one column in one group into a dictionary value of unique values and their counts.

Examples

To combine values from one column that belong to one group into a dictionary of friend: count values:

>>> xf.groupby(["user"],
   {"friends": aggregate.VALUES_COUNT("friend")})

xframes.aggregate.VARIANCE(src_column)[source]¶

Builtin variance aggregator for groupby.

Examples

Get the rating variance of each user.

>>> xf.groupby("user",
             {'rating_var':aggregate.VARIANCE('rating')})