The X_PERCENTILE user-defined aggregate
X_PERCENTILE(y, 50)
That is, the above aggregate returns the value within a sample of y values that has an equal number of values (50 percent) above and below it in the sample.
- Uses a setup argument to enable the end user to specify the x-percentile to obtain
- Uses a pointer-valued state to hold the state information, and allocates and initializes the state in the INIT support function
- Uses a COMBINE function that must do more than just call the ITER support function
- Handles NULL values as aggregate arguments, including returning an SQL NULL value if the aggregate argument to return was NULL
- The user-specified setup argument
- The current number of values processed
- The current list of values processed
- The current list of whether the values processed are NULL.
#define MAX_N 1000
typedef struct percentile_state
{
mi_integer percentile;
mi_integer count;
mi_integer value_array[MAX_N];
mi_integer value_is_null[MAX_N];
} percentile_state_t;
8 + 4(MAX_N)
If X_PERCENTILE used an opaque-type state, this structure must be less than the maximum opaque-type size. For systems that have a 32 KB maximum opaque-type size, the X_PERCENTILE aggregate can use an opaque-type state as long as it is called in a query that finds 8190 or fewer rows. If the query finds more than 8190 rows, the state would not fit into an opaque type. To avoid this restriction, X_PERCENTILE implements the aggregate state as a pointer-valued state.
/* X_PERCENTILE INIT support function on INTEGER */
mi_pointer init_xprcnt(dummy, prcntile, fparam)
mi_integer dummy;
mi_integer prcntile;
MI_FPARAM *fparam;
{
percentile_state_t *state;
/* Allocate memory for the state from the PER_COMMAND
* pool
*/
state = (percentile_state_t *)
mi_dalloc(sizeof(percentile_state_t), PER_COMMAND);
/* Initialize the aggregate state */
if ( mi_fp_argisnull(fparam, 1) )
state->percentile = 50; /* median */
else
state->percentile = prcntile;
state->count = 0;
return ((mi_pointer)state);
}
- Handles a setup argument
This setup argument is the value that the end user specifies so that the aggregate can determine the value that has x percent values below and (100-x) percent above. If the end user provides an SQL NULL for the setup argument, X_PERCENTILE assumes a value of 50 and therefore calculates the median.
- Allocates PER_COMMAND memory for the pointer-valued state
The database server does not perform state management for pointer-valued states. Therefore, the INIT function must allocate the memory for the state. It also assigns the appropriate values to the percentile_state_t structure to initialize the state.
/* X_PERCENTILE ITER support function on INTEGER */
mi_pointer iter_xprcnt(state_ptr, value, fparam)
mi_pointer state_ptr;
mi_integer value;
MI_FPARAM *fparam;
{
mi_integer i, j;
mi_integer is_null = 0;
percentile_state_t *state =
(percentile_state_t *)state_ptr;
/* Check for NULL-valued 'value' */
if ( mi_fp_argisnull(fparam, 1) )
{
value = 0;
is_null = 1;
}
/* Find position of 'value' in ordered 'value_array' */
for ( i=0; i < state->count; i++ )
{
if ( state->value_array[i] > value )
break;
}
/* Increment number of values (count) */
++state->count;
/* Put value into ordered list of existing values */
for (j=state->count - 1; j > i; j--)
{
state->value_array[j] = state->value_array[j-1];
state->value_is_null[j] = state->value_is_null[j-1];
}
state->value_array[i] = value;
state->value_is_null[i] = is_null;
return ((mi_pointer)state);
}
- Increments the number of aggregate arguments processed (count)
- Stores the new aggregate argument in increasing sorted order in the value_array array
- Stores the is-NULL flag that corresponds to each aggregate argument in its corresponding position in the value_is_null array
The ITER function also handles a possible NULL-valued aggregate argument. Because the X_PERCENTILE aggregate is defined to handle NULL values (see Figure 1), the database server calls ITER for NULL-valued aggregate arguments.
/* X_PERCENTILE COMBINE support function on INTEGER */
mi_pointer combine_xprcnt(state1_ptr, state2_ptr)
mi_pointer state1_ptr, state2_ptr;
{
mi_integer i;
percentile_state_t *state1 =
(percentile_state_t *)state1_ptr;
percentile_state_t *state2 =
(percentile_state_t *)state2_ptr;
/* Merge the two ordered value arrays */
for ( i=0; i < state2->count; i++ )
(void) iter_xprcnt(state1_ptr,
state2->value_array[i]);
/* Free the PER_COMMAND memory allocated to the state of
* the 2nd parallel thread (state2). The memory
* allocated to the state of the 1st parallel thread
* (state1) holds the updated state. It is in the FINAL
* support function.
*/
mi_free(state2);
return (state1_ptr);
}
- Two ordered lists are merged into a single ordered list.
- Two counts are added together.
- Memory for one of the partial states is freed.
- A pointer to the merged aggregate state is returned.
/* X_PERCENTILE FINAL support function on INTEGER */
mi_integer final_xprcnt(state_ptr, fparam)
mi_pointer state_ptr;
MI_FPARAM *fparam;
{
mi_integer index, trunc_int;
mi_integer x_prcntile;
percentile_state_t *state =
(percentile_state_t *)state_ptr;
/* Obtain index position of x-percentile value */
trunc_int = (state->count) * (state->percentile);
index = trunc_int/100;
if ( (trunc_int % 100) >= 50 )
index++;
/* Obtain x-percentile value from sorted 'value_array' */
x_prcntile = state->value_array[index];
/* Check for NULL value so it can be returned as such */
if ( state->value_is_null[index] )
mi_fp_setreturnisnull(fparam, 0, MI_TRUE);
/* Free the PER_COMMAND memory allocated to the state */
mi_free(state);
/* Return retrieved x-percentile value */
return (x_prcntile);
}
- Calculates the x-percentile for the values in the sorted array
The FINAL function must obtain the index position for the value that represents the x-percentile, where x is the percentage that the end user has passed in as a setup argument.
- Deallocates PER_COMMAND memory for the pointer-valued state
The database server does not perform any state management for pointer-valued states. Therefore, the FINAL function must deallocate the PER_COMMAND state memory that the INIT function has allocated.
CREATE AGGREGATE x_percentile
WITH (INIT = init_x_prcntile,
ITER = iter_x_prcntile,
COMBINE = combine_x_prcntile,
FINAL = final_x_prcntile,
HANDLESNULLS);
Suppose that the INIT, ITER, COMBINE, and FINAL aggregate support functions for the X_PERCENTILE aggregate are compiled and linked into a shared-object module named percent.
On UNIX or Linux, the executable code for the X_PERCENTILE aggregate support functions would be in a shared library named percent.so.
CREATE FUNCTION init_x_prcntile(dummy INTEGER, x_percent INTEGER)
RETURNING POINTER
WITH (HANDLESNULLS)
EXTERNAL NAME
'/usr/udrs/aggs/percent/percent.so(init_xprcnt)'
LANGUAGE C;
CREATE FUNCTION iter_x_prcntile(agg_state POINTER,
one_value INTEGER)
RETURNS POINTER
WITH (HANDLESNULLS)
EXTERNAL NAME
'/usr/udrs/aggs/percent/percent.so(iter_xprcnt)'
LANGUAGE C;
CREATE FUNCTION combine_x_prcntile(agg_state1 POINTER,
agg_state2 POINTER)
RETURNS POINTER
WITH (HANDLESNULLS)
EXTERNAL NAME
'/usr/udrs/aggs/percent/percent.so(combine_xprcnt)'
LANGUAGE C;
CREATE FUNCTION final_x_prcntile(agg_state POINTER)
RETURNS INTEGER
WITH (HANDLESNULLS)
EXTERNAL NAME
'/usr/udrs/aggs/percent/percent.so(final_xprcnt)'
LANGUAGE C;
These CREATE FUNCTION statements use the SQL data type, POINTER, to indicate that the aggregate support functions accept a generic C pointer and perform their own memory management. They must all include the HANDLESNULLS routine modifier because the X_PERCENTILE aggregate handles NULL values.
SELECT X_PERCENTILE(col3, 25) FROM tab1;
5, 9, 13, 19, 24, 31
Because 25 percent of 6 values is 1.5, X_PERCENTILE obtains the list item that has 2 values (1.5 rounded up to the nearest integer) below it. The preceding query returns 13 as the quartile for col3.
INSERT INTO tab1 (7, NULL:complexnum_t, NULL);
(NULL), 5, 9, 13, 19, 24, 31
Twenty-five percent of 7 values is 1.75, so X_PERCENTILE obtains the list item that has 2 (1.75 truncated to the nearest integer) values below it. Now the quartile for col3 would be 9. If X_PERCENTILE was not registered with the HANDLESNULLS modifier, however, the database server would not call X_PERCENTILE for this newest row and the quartile for col3 would have been 13 (the quartile for 6 rows, even though col3 actually has 7 rows).
SELECT X_PERCENTILE(col3, 5) FROM tab1;
This query asks for the 5th percentile for the seven values in col3. Because 5 percent of 7 values is 0.35, X_PERCENTILE obtains the list item that has zero values (0.35 truncated to the nearest integer) below it. The preceding query returns NULL as the quartile for col3. The ITER function has stored NULL values as zeros in the sorted value_array. For the FINAL support function to determine when a value of zero indicates a NULL and when it indicates zero, it checks the value_is_null array. If the zero indicates a NULL value, FINAL calls the DataBlade API function mi_fp_setreturnisnull() to set the aggregate result to NULL.