The Gandiva Expression CompilerΒΆ
Gandiva is a runtime expression compiler that uses LLVM to generate efficient native code for projections and filters on Arrow record batches. Gandiva only handles projections and filters. For other transformation, see Compute Functions.
Gandiva was designed to take advantage of the Arrow memory format and modern hardware. Compiling expressions using LLVM allows the execution to be optimized to the local runtime environment and hardware, including available SIMD instructions. To minimize optimization overhead, all Gandiva functions are pre-compiled into LLVM IR (intermediate representation).
Building ExpressionsΒΆ
Gandiva provides a general expression representation where expressions are
represented by a tree of nodes. The expression trees are built using
gandiva::TreeExprBuilder
. The leaves of the expression tree are typically
field references, created by gandiva::TreeExprBuilder::MakeField()
, and
literal values, created by gandiva::TreeExprBuilder::MakeLiteral()
. Nodes
can be combined into more complex expression trees using:
gandiva::TreeExprBuilder::MakeFunction()
to create a function node. See available functions below.
gandiva::TreeExprBuilder::MakeIf()
to create if-else logic.
gandiva::TreeExprBuilder::MakeAnd()
andgandiva::TreeExprBuilder::MakeOr()
to create boolean expressions. (For βnotβ, use thenot(bool)
function inMakeFunction
.)
gandiva::TreeExprBuilder::MakeInExpressionInt32()
and the other βin expressionβ functions to create set membership tests.
Once an expression tree is built, they are wrapped in either gandiva::Expression
or gandiva::Condition
, depending on how they will be used.
Expression
is used in projections while Condition
is used filters.
As an example, here is how to create an Expression representing x + 3
and a
Condition representing x < 3
:
auto field_x_raw = arrow::field("x", arrow::int32());
auto field_x = TreeExprBuilder::MakeField(field_x_raw);
auto literal_3 = TreeExprBuilder::MakeLiteral(3);
auto field_result = arrow::field("result", arrow::int32());
auto add_node = TreeExprBuilder::MakeFunction("add", {field_x, literal_3}, arrow::int32());
auto expression = TreeExprBuilder::MakeExpression(add_node, field_result);
auto less_than_node = TreeExprBuilder::MakeFunction("less_than", {field_x, literal_3},
boolean());
auto condition = TreeExprBuilder::MakeCondition(less_than_node);
For simpler expressions, there are also convenience functions that allow you to
use functions directly in MakeExpression
and MakeCondition
:
auto expression = TreeExprBuilder::MakeExpression("add", {field_x, literal_3}, field_result);
auto condition = TreeExprBuilder::MakeCondition("less_than", {field_x, literal_3});
Projectors and FiltersΒΆ
Gandivaβs two execution kernels are gandiva::Projector
and
gandiva::Filter
. Projector
consumes a record batch and projects
into a new record batch. Filter
consumes a record batch and produces a
gandiva::SelectionVector
containing the indices that matched the condition.
For both Projector
and Filter
, optimization of the expression IR happens
when creating instances. They are compiled against a static schema, so the
schema of the record batches must be known at this point.
Continuing with the expression
and condition
created in the previous
section, here is an example of creating a Projector and a Filter:
auto schema = arrow::schema({field_x});
std::shared_ptr<Projector> projector;
auto status = Projector::Make(schema, {expression}, &projector);
ARROW_CHECK_OK(status);
std::shared_ptr<Filter> filter;
status = Filter::Make(schema, condition, &filter);
ARROW_CHECK_OK(status);
Once a Projector or Filter is created, it can be evaluated on Arrow record batches. These execution kernels are single-threaded on their own, but are designed to be reused to process record batches in parallel.
Execution is performed with gandiva::Projector::Evaluate()
and
gandiva::Filter::Evaluate()
. Filters produce gandiva::SelectionVector
,
a vector of row indices that matched the filter condition. When filtering and
projecting record batches, you can pass the selection vector into the projector
so that the projection is only evaluated on matching rows.
Here is an example of evaluating the Filter and Projector created above:
auto pool = arrow::default_memory_pool();
int num_records = 4;
auto array = MakeArrowArrayInt32({1, 2, 3, 4}, {true, true, true, true});
auto in_batch = arrow::RecordBatch::Make(schema, num_records, {array});
// Just project
arrow::ArrayVector outputs;
status = projector->Evaluate(*in_batch, pool, &outputs);
ARROW_CHECK_OK(status);
// Evaluate filter
gandiva::SelectionVector result_indices;
status = filter->Evaluate(*in_batch, &result_indices);
ARROW_CHECK_OK(status);
// Project with filter
arrow::ArrayVector outputs_filtered;
status = projector->Evaluate(*in_batch, selection_vector.get(),
pool, &outputs_filtered);
Available Gandiva FunctionsΒΆ
ComparisonsΒΆ
not |
|
equal |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
eq |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
same |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
not_equal |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
less_than |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
less_than_or_equal_to |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
greater_than |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
greater_than_or_equal_to |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
isnull |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
isnotnull |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
isnumeric |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
is_distinct_from |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
is_not_distinct_from |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CastΒΆ
castBIGINT |
|
|
|
|
|
|
|
castINT |
|
|
|
castFLOAT4 |
|
|
|
|
|
|
|
castFLOAT8 |
|
|
|
|
|
|
|
|
|
castDECIMAL |
|
|
|
|
|
|
|
|
|
|
|
castDECIMALNullOnOverflow |
|
castDATE |
|
|
|
|
|
|
|
|
|
castTIMESTAMP |
|
|
|
|
|
castVARCHAR |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
castTIME |
|
castBIT |
|
castBOOLEAN |
|
castVARBINARY |
|
|
|
|
|
|
|
|
|
|
Arithmetic and MathΒΆ
add |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
subtract |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
multiply |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
divide |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
mod |
|
|
|
|
|
|
|
modulo |
|
|
|
|
|
|
|
div |
|
|
|
|
|
|
|
bitwise_and |
|
|
|
bitwise_or |
|
|
|
bitwise_xor |
|
|
|
bitwise_not |
|
|
|
round |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
cbrt |
|
|
|
|
|
|
|
|
|
|
|
exp |
|
|
|
|
|
|
|
|
|
|
|
log |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
log10 |
|
|
|
|
|
|
|
|
|
|
|
power |
|
pow |
|
sin |
|
|
|
|
|
|
|
|
|
|
|
cos |
|
|
|
|
|
|
|
|
|
|
|
asin |
|
|
|
|
|
|
|
|
|
|
|
acos |
|
|
|
|
|
|
|
|
|
|
|
tan |
|
|
|
|
|
|
|
|
|
|
|
atan |
|
|
|
|
|
|
|
|
|
|
|
sinh |
|
|
|
|
|
|
|
|
|
|
|
cosh |
|
|
|
|
|
|
|
|
|
|
|
tanh |
|
|
|
|
|
|
|
|
|
|
|
cot |
|
|
|
|
|
|
|
|
|
|
|
radians |
|
|
|
|
|
|
|
|
|
|
|
degrees |
|
|
|
|
|
|
|
|
|
|
|
atan2 |
|
|
|
|
|
|
|
|
|
|
|
abs |
|
ceil |
|
floor |
|
truncate |
|
|
|
|
|
trunc |
|
|
|
|
|
random |
|
|
|
rand |
|
|
OtherΒΆ
bin |
|
|
|
to_time |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
to_timestamp |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
sha |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
sha1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
sha256 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
space |
|
|
|
convert_toDOUBLE |
|
convert_toDOUBLE_be |
|
convert_toFLOAT |
|
convert_toFLOAT_be |
|
convert_toINT |
|
convert_toINT_be |
|
convert_toBIGINT |
|
convert_toBIGINT_be |
|
convert_toBOOLEAN_BYTE |
|
Dates and TimestampsΒΆ
extractMillennium |
|
|
|
extractCentury |
|
|
|
extractDecade |
|
|
|
extractYear |
|
|
|
year |
|
|
|
|
|
|
|
extractQuarter |
|
|
|
extractMonth |
|
|
|
month |
|
|
|
|
|
|
|
extractWeek |
|
|
|
weekofyear |
|
|
|
|
|
|
|
yearweek |
|
|
|
|
|
|
|
extractDay |
|
|
|
|
|
day |
|
|
|
|
|
|
|
dayofmonth |
|
|
|
|
|
|
|
extractHour |
|
|
|
|
|
hour |
|
|
|
|
|
|
|
|
|
extractMinute |
|
|
|
|
|
minute |
|
|
|
|
|
|
|
|
|
extractSecond |
|
|
|
|
|
second |
|
|
|
|
|
|
|
|
|
date_trunc_Millennium |
|
|
|
date_trunc_Century |
|
|
|
date_trunc_Decade |
|
|
|
date_trunc_Year |
|
|
|
date_trunc_Quarter |
|
|
|
date_trunc_Month |
|
|
|
date_trunc_Week |
|
|
|
date_trunc_Day |
|
|
|
date_trunc_Hour |
|
|
|
date_trunc_Minute |
|
|
|
date_trunc_Second |
|
|
|
extractDoy |
|
|
|
extractDow |
|
|
|
extractEpoch |
|
|
|
to_date |
|
last_day |
|
|
|
sha |
|
|
|
|
|
|
|
sha1 |
|
|
|
|
|
|
|
sha256 |
|
|
|
|
|
|
|
convert_toTIME_EPOCH |
|
convert_toTIME_EPOCH_be |
|
convert_toTIMESTAMP_EPOCH |
|
convert_toTIMESTAMP_EPOCH_be |
|
convert_toDATE_EPOCH |
|
convert_toDATE_EPOCH_be |
|
months_between |
|
|
|
timestampdiffSecond |
|
timestampdiffMinute |
|
timestampdiffHour |
|
timestampdiffDay |
|
timestampdiffWeek |
|
timestampdiffMonth |
|
timestampdiffQuarter |
|
timestampdiffYear |
|
timestampaddSecond |
|
|
|
|
|
|
|
timestampaddMinute |
|
|
|
|
|
|
|
timestampaddHour |
|
|
|
|
|
|
|
timestampaddDay |
|
|
|
|
|
|
|
timestampaddWeek |
|
|
|
|
|
|
|
timestampaddMonth |
|
|
|
|
|
|
|
timestampaddQuarter |
|
|
|
|
|
|
|
timestampaddYear |
|
|
|
|
|
|
|
date_add |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
date_sub |
|
|
|
|
|
|
|
date_diff |
|
|
|
|
|
|
String TransformationsΒΆ
to_date |
|
|
|
sha |
|
|
|
sha1 |
|
|
|
sha256 |
|
|
|
starts_with |
|
ends_with |
|
is_substr |
|
locate |
|
|
|
position |
|
|
|
octet_length |
|
|
|
bit_length |
|
|
|
char_length |
|
length |
|
lengthUtf8 |
|
reverse |
|
ltrim |
|
|
|
rtrim |
|
|
|
btrim |
|
|
|
ascii |
|
base64 |
|
unbase64 |
|
upper |
|
lower |
|
initcap |
|
like |
|
|
|
ilike |
|
substr |
|
|
|
substring |
|
|
|
lpad |
|
|
|
rpad |
|
|
|
concatOperator |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
concat |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
byte_substr |
|
bytesubstring |
|
convert_fromUTF8 |
|
convert_fromutf8 |
|
convert_replaceUTF8 |
|
convert_replaceutf8 |
|
convert_toUTF8 |
|
replace |
|
binary_string |
|
left |
|
right |
|
split_part |
|
HashΒΆ
hash |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
hash32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
hash32AsDouble |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
hash64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
hash64AsDouble |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
hashSHA1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
hashSHA256 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|