Aggregate¶
-
xframes.aggregate.
ARGMAX
(agg_column, out_column)[source]¶ Builtin arg maximum aggregator for groupby.
Examples
Get the movie with maximum rating per user.
>>> xf.groupby("user", {'best_movie':aggregate.ARGMAX('rating','movie')})
-
xframes.aggregate.
ARGMIN
(agg_column, out_column)[source]¶ Builtin arg minimum aggregator for groupby.
Examples
Get the movie with minimum rating per user.
>>> xf.groupby("user", {'best_movie':aggregate.ARGMIN('rating','movie')})
-
xframes.aggregate.
CONCAT
(src_column, dict_value_column=None)[source]¶ Builtin aggregator that combines values from one or two columns in one group into either a dictionary value, list value or array value.
Examples
To combine values from two columns that belong to one group into one dictionary value:
>>> xf.groupby(["document"], {"word_count": aggregate.CONCAT("word", "count")})
To combine values from one column that belong to one group into a list value:
>>> xf.groupby(["user"], {"friends": aggregate.CONCAT("friend")})
-
xframes.aggregate.
COUNT
()[source]¶ Builtin count aggregator for groupby
Examples
Get the number of occurrences of each user.
>>> xf.groupby("user", {'count':aggregate.COUNT()})
-
xframes.aggregate.
MAX
(src_column)[source]¶ Builtin maximum aggregator for groupby
Examples
Get the maximum rating of each user.
>>> xf.groupby("user", {'rating_max':aggregate.MAX('rating')})
-
xframes.aggregate.
MEAN
(src_column)[source]¶ Builtin average aggregator for groupby.
Examples
Get the average rating of each user.
>>> xf.groupby("user", {'rating_mean':aggregate.MEAN('rating')})
-
xframes.aggregate.
MIN
(src_column)[source]¶ Builtin minimum aggregator for groupby
Examples
Get the minimum rating of each user.
>>> xf.groupby("user", {'rating_min':aggregate.MIN('rating')})
-
xframes.aggregate.
QUANTILE
(src_column, *args)[source]¶ Builtin approximate quantile aggregator for groupby. Accepts as an argument, one or more of a list of quantiles to query.
Examples
- To extract the median
>>> xf.groupby("user", {'rating_quantiles': aggregate.QUANTILE('rating', 0.5)})
- To extract a few quantiles
>>> xf.groupby("user", {'rating_quantiles': aggregate.QUANTILE('rating', [0.25,0.5,0.75])})
- Or equivalently
>>> xf.groupby("user", {'rating_quantiles': aggregate.QUANTILE('rating', 0.25,0.5,0.75)})
The returned quantiles are guaranteed to have 0.5% accuracy. That is to say, if the requested quantile is 0.50, the resultant quantile value may be between 0.495 and 0.505 of the true quantile.
-
xframes.aggregate.
SELECT_ONE
(src_column)[source]¶ Builtin aggregator for groupby which selects one row in the group.
Examples
Get one rating row from a user.
>>> xf.groupby("user", {'rating':aggregate.SELECT_ONE('rating')})
If multiple columns are selected, they are guaranteed to come from the same row. For instance: >>> xf.groupby(“user”, {‘rating’:aggregate.SELECT_ONE(‘rating’), ‘item’:aggregate.SELECT_ONE(‘item’)})
The selected ‘rating’ and ‘item’ value for each user will come from the same row in the XFrame.
-
xframes.aggregate.
STDV
(src_column)[source]¶ Builtin standard deviation aggregator for groupby.
Examples
Get the rating standard deviation of each user.
>>> xf.groupby("user", {'rating_stdv':aggregate.STDV('rating')})
-
xframes.aggregate.
SUM
(src_column)[source]¶ Builtin sum aggregator for groupby
Examples
Get the sum of the rating column for each user. >>> xf.groupby(“user”, {‘rating_sum’:aggregate.SUM(‘rating’)})
-
xframes.aggregate.
VALUES
(src_column)[source]¶ Builtin aggregator that combines distinct values from one column in one group into a list value.
Examples
To combine values from one column that belong to one group into a list value:
>>> xf.groupby(["user"], {"friends": aggregate.VALUES("friend")})
-
xframes.aggregate.
VALUES_COUNT
(src_column)[source]¶ Builtin aggregator that combines distinct values from one column in one group into a dictionary value of unique values and their counts.
Examples
To combine values from one column that belong to one group into a dictionary of friend: count values:
>>> xf.groupby(["user"], {"friends": aggregate.VALUES_COUNT("friend")})