Reference¶
BQL Statements¶
CREATE SINK¶
Synopsis¶
CREATE SINK name TYPE type_name [WITH parameter_name = parameter_value [, ...]]
Description¶
CREATE SINK creates a new sink in a topology.
Parameters¶
- name
- The name of the sink to be created.
- type_name
- The type name of the sink.
- parameter_name
- The name of a sink-specific parameter.
- parameter_value
- The value for a sink-specific parameter.
Sink Parameters¶
The optional WITH clause specifies parameters specific to the sink.
See each sink’s documentation to find out parameters it provides.
Examples¶
To create a sink having the name “snk” with no sink-specific parameter:
CREATE SINK snk TYPE fluentd;
To create a sink with sink-specific parameters:
CREATE SINK fluentd TYPE fluentd WITH tag_field = "fluentd_tag";
As you can see, the name of a sink can be same as the type name of a sink.
CREATE SOURCE¶
Synopsis¶
CREATE [PAUSED] SOURCE name TYPE type_name [WITH parameter_name = parameter_value [, ...]]
Description¶
CREATE SOURCE creates a new source in a topology.
Parameters¶
PAUSED- The source is paused when it’s created if this option is used.
- name
- The name of the source to be created.
- type_name
- The type name of the source.
- parameter_name
- The name of a source-specific parameter.
- parameter_value
- The value for a source-specific parameter.
Source Parameters¶
The optional WITH clause specifies parameters specific to the source.
See each source’s documentation to find out parameters it provides.
Notes¶
Some sources stop after emitting all tuples. They can stop even before any
subsequent statement is executed. For such sources, specify PAUSED parameter
and run RESUME SOURCE after completely setting up a topology so
that no tuple emitted from the source will be lost.
Examples¶
To create a source having the name “src” with no source-specific parameter:
CREATE SOURCE src TYPE dropped_tuples;
To create a source with source-specific parameters:
CREATE SOURCE fluentd TYPE fluentd WITH bind = "0.0.0.0:12345",
tag_field = "my_tag";
As you can see, the name of a source can be same as the type name of a source.
CREATE STATE¶
Synopsis¶
CREATE STATE name TYPE type_name [WITH parameter_name = parameter_value [, ...]]
Description¶
CREATE STATE creates a new UDS (User Defined State) in a topology.
Parameters¶
- name
- The name of the UDS to be created.
- type_name
- The type name of the UDS.
- parameter_name
- The name of a UDS-specific parameter.
- parameter_value
- The value for a UDS-specific parameter.
UDS Parameters¶
The optional WITH clause specifies parameters specific to the UDS.
See each UDS’s documentation to find out parameters it provides.
Examples¶
To create a UDS named “my_uds” with no UDS-specific parameter:
CREATE STATE my_uds TYPE my_uds_type;
To create a UDS with UDS-specific parameters:
CREATE STATE my_ids TYPE snowflake_id WITH machine_id = 1;
CREATE STREAM¶
Synopsis¶
CREATE STREAM name AS select
Description¶
CREATE STREAM creates a new stream (a.k.a a continuous view) in a topology.
Parameters¶
- name
- The name of the stream to be created.
- select
- The
SELECTstatement to generate a stream. select can be anySELECTstatement including a statement usingUNION ALL.
Examples¶
To create a stream named “strm”:
CREATE STREAM strm AS SELECT RSTREAM * FROM src [RANGE 1 TUPLES];
To create a stream which merges all tuples from multiple streams:
CREATE STREAM strm AS
SELECT RSTREAM * FROM src1 [RANGE 1 TUPLES]
UNION ALL
SELECT RSTREAM * FROM src2 [RANGE 1 TUPLES];
DROP SINK¶
Synopsis¶
DROP SINK name
Description¶
DROP SINK drops a sink that is already created in a topology. The sink can
no longer be used after executing the statement.
Parameters¶
- name
- The name of the sink to be dropped.
DROP SOURCE¶
Synopsis¶
DROP SOURCE name
Description¶
DROP SOURCE drops a source that is already created in a topology. The
source is stopped and removed from a topology. After executing the statement,
the source cannot be used.
Parameters¶
- name
- The name of the source to be dropped.
DROP STATE¶
Synopsis¶
DROP STATE name
Description¶
DROP STATE drops a UDS that is already created in a topology. The UDS can
no longer be used after executing the statement.
Note
Even if a uds sink exist for the UDS, the sink will not be dropped when
dropping the UDS. The uds sink must be dropped manually.
Parameters¶
- name
- The name of the UDS to be dropped.
DROP STREAM¶
Synopsis¶
DROP STREAM name
Description¶
DROP STREAM drops a stream that is already created in a topology. The
stream can no longer be used after executing the statement.
Parameters¶
- name
- The name of the stream to be dropped.
INSERT INTO¶
Synopsis¶
INSERT INTO sink FROM stream
Description¶
INSERT INTO inserts tuples from a stream or a source to a sink.
Parameters¶
- sink
- The name of the sink to which tuples are inserted.
- stream
- The name of a stream or a source.
Examples¶
To insert tuples into a sink from a source having the name “src”:
INSERT INTO snk FROM src;
LOAD STATE¶
Synopsis¶
LOAD STATE name TYPE type_name [TAG tag]
[SET set_parameter_name = set_parameter_key]
[create_if_not_saved]
where create_if_not_saved is:
OR CREATE IF NOT SAVED
[WITH create_parameter_name = create_parameter_value]
Description¶
LOAD STATE loads a UDS that is previously saved by
SAVE STATE.
LOAD STATE fails if the UDS hasn’t been saved yet. When
OR CREATE IF NOT SAVED is specified, LOAD STATE creates a new UDS with
the given optional parameters if the UDS hasn’t been saved yet.
LOAD STATE, even with OR CREATE IF NOT SAVED, fails if the UDS doesn’t
support the statement.
Parameters¶
- name
- The name of the UDS to be loaded.
- type_name
- The type name of the UDS.
- tag
- The name of the user defined tag for versioning of the saved UDS data. When tag is omitted, “default” is used as the default tag name.
- set_parameter_name
- The name of a UDS-specific parameter defied for
LOAD STATE. - set_parameter_value
- The value for a UDS-specific parameter defined for
LOAD STATE. - create_parameter_name
- The name of a UDS-specific parameter defined for
CREATE STATE. - create_parameter_value
- The value for a UDS-specific parameter defined for
CREATE STATE.
LOAD STATE can have two sets of parameters: set_parameters and
create_parameters. set_parameters are used when there’s a saved UDS
data having the given tag. On the other hand, create_parameters are used
when the UDS hasn’t been saved yet. create_parameters are exactly same as
parameters that the UDS defines for CREATE STATE. However,
set_parameters are often completely different from create_parameters.
Because create_parameters are often saved as a part of the UDS’s
information by SAVE STATE, set_parameters doesn’t have to have the same
set of parameters defined in create_parameters.
There’re some use-cases that a UDS uses set_parameters:
Customize loading behavior
- When a UDS doesn’t provide a proper versioning of saved data,
LOAD STATEmay fail to load it due to the format incompatibility. In such a case, it’s difficult to modify saved binary data to have a format version number. Thus, providing a set_parameter specifying the format version number could be the only solution.
- When a UDS doesn’t provide a proper versioning of saved data,
Overwrite some saved values of create_parameters
Like create_parameters, set_parameters are specific to each UDS. See the documentation of each UDS to find out parameters it provides.
Examples¶
To load a UDS named “my_uds” and having the type “my_uds_type”:
LOAD STATE my_uds TYPE my_uds_type;
Note that “my_uds” needs to be saved before executing this statement. Otherwise, it fails.
To load a UDS named “my_uds” and having the type “my_uds_type” and assigned a tag “trained”:
LOAD STATE my_uds TYPE my_uds_type TAG trained;
To load a UDS with set_parameters:
LOAD STATE my_uds TYPE my_uds_type TAG trained
SET force_format_version = "v1";
To load a UDS that hasn’t been saved yet with OR CREATE IF NOT SAVED:
LOAD STATE my_uds TYPE my_uds_type OR CREATE IF NOT SAVED;
To load a UDS that hasn’t been saved yet with OR CREATE IF NOT SAVED with
create_parameters:
LOAD STATE my_uds TYPE my_uds_type OR CREATE IF NOT SAVED
WITH id = 1;
When the UDS hasn’t been saved previously, the statement above falls back into
CREATE STATE as follows:
CREATE STATE my_uds TYPE my_uds_type WITH id = 1;
OR CREATE IF NOT SAVED can be used with a tag and set_parameters:
LOAD STATE my_uds TYPE my_uds_type TAG trained SET force_format_version = "v1"
OR CREATE IF NOT SAVED WITH id = 1;
PAUSE SOURCE¶
Synopsis¶
PAUSE SOURCE name
Description¶
PAUSE SOURCE pauses a running source so that the source stops emitting
tuples until executing RESUME SOURCE on it again. Executing
PAUSE SOURCE on a paused source doesn’t affect anything.
PAUSE SOURCE fails if the source doesn’t support the statement.
Parameters¶
- name
- The name of the source to be paused.
RESUME SOURCE¶
Synopsis¶
RESUME SOURCE name
Description¶
RESUME SOURCE resumes a paused source so that the source can start to emit
tuples again. Executing RESUME SOURCE on a running source doesn’t affect anything.
RESUME SOURCE fails if the source doesn’t support the statement.
Parameters¶
- name
- The name of the source to be resumed.
Examples¶
A common use case of RESUME SOURCE to resume a source which is created by
CREATE PAUSED SOURCE.
CREATE PAUSED SOURCE src TYPE some_source_type WITH ...parameters...;
-- ... construct a topology connected to src ...
RESUME SOURCE src;
By doing this, no tuple emitted from src will be lost.
REWIND SOURCE¶
Synopsis¶
REWIND SOURCE name
Description¶
REWIND SOURCE rewinds a source so that the source emits tuples from the
beginning again.
REWIND SOURCE fails if the source doesn’t support the statement.
Parameters¶
- name
- The name of the source to be rewound.
SAVE STATE¶
Synopsis¶
SAVE STATE name [TAG tag]
Description¶
SAVE STATE saves a UDS to SensorBee’s storage. The location or the format
of the saved UDS depends on a storage that SensorBee uses and is not
controllable from BQL.
SAVE STATE fails if the UDS doesn’t support the statement.
Parameters¶
- name
- The name of the UDS to be saved.
- tag
- The name of the user defined tag for versioning of the saved UDS data. When tag is omitted, “default” is used as the default tag name.
Examples¶
To save a UDS named “my_uds” without a tag:
SAVE STATE my_uds;
To save a UDS with a tag “trained”:
SAVE STATE my_uds TAG trained;
SELECT¶
Synopsis¶
SELECT emitter {* | expression [AS output_name]} [, ...]
FROM from_item stream_to_relation_operator
[AS stream_alias] [, ...]
[WHERE condition [, ...]]
[GROUP BY expression [, ...]]
[HAVING condition [, ...]]
[UNION ALL select]
where emitter is:
{RSTREAM | ISTREAM | DSTREAM}
where stream_to_relation_operator is:
'[' RANGE range_number {TUPLES | SECONDS | MILLISECONDS}
[, BUFFER SIZE buffer_size]
[, drop_mode IF FULL]
']'
Description¶
SELECT retrieves tuples from one or more streams. The general processing of
SELECT is as follows:
- Each from_item is converted into a relation (i.e. a window) from a
stream. Then, each tuple emitted from a from_item is computed within
the window. If more than one element is specified in the
FROMclause, they are cross-joined. - When the
WHEREclause is specified, condition is evaluated for each set of tuples cross-joined in theFROMclause. Tuples which do not satisfy the condition are eliminated from the output. - If the
GROUP BYclause is specified, tuples satisfied the condition in theWHEREclause are combined into groups based on the result of expressions. Then, aggregate functions are performed on each group. When theHAVINGclause is given, it eliminates groups that do not satisfy the given condition. - The output tuples are computed using the
SELECToutput expressions for each tuple or group. - Computed output tuples are converted into a stream using
Relation-to-Stream Operators and emitted from the
SELECT. UNION ALL, if present, combines outputs from multipleSELECT. It simply emits all tuples from allSELECTwithout considering duplicates.
Parameters¶
Emitter¶
emitter controls how a SELECT emits resulting tuples.
RSTREAM- When
RSTREAMis specified, all tuples in a relation as a result of processing a newly coming tuple are output. See Relation-to-Stream Operators for more details. ISTREAM- When
ISTREAMis specified, tuples contained in the current relation but not in the previously computed relation are emitted. In other words, tuples that are newly inserted or updated since the previous computation are output. See Relation-to-Stream Operators for more details. DSTREAM- When
DSTREAMis specified, tuples contained in the previously computed relation but not in the current relation are emitted. In other words, tuples in the previous relation that are deleted or updated in the current relation are output. Note that output tuples are from the previous relation so that they have old values. See Relation-to-Stream Operators for more details.
FROM Clause¶
The FROM clause specifies one or more source streams for the SELECT
and converts those streams into relations using stream to relation operators.
The FROM clause contains following parameters:
- from_item
- from_item is a source stream which can be either a source, a stream, or a UDSF.
- range_number
range_number is a numeric value which specifies how many tuples are in the window. range_number is followed by one of interval types:
TUPLES,SECONDS, orMILLISECONDS.When
TUPLESis given, range_number must be a positive integer and the window can contain at most range_number tuples. If a new tuple is inserted into the window having range_number tuples, the oldest tuple is removed. “The oldest tuple” is the tuple that was inserted into the window before any other tuples, not the tuple having the oldest timestamp. The maximum range_number is 1048575 withTUPLESkeywords.When
SECONDSorMILLISECONDSis specified, range_number can be a positive number and the difference of the minimum and maximum timestamps of tuples in the window can be at most range_number seconds or milliseconds. If a new tuple is inserted into the window, tuples whose timestamp is range_number seconds or milliseconds earlier than the new tuple’s timestamp are removed. The maximum range_number is 86400 withSECONDSand 86400000 withMILLISECONDS.- buffer_size
- buffer_size specifies the size of buffer, or a queue, located between
from_item and the
SELECT. buffer_size must be an integer and greater than 0. The meximum buffer_size is 131071. - drop_mode
drop_mode controls how a new tuple is inserted into the buffer located between from_item and the
SELECTwhen the buffer is full. drop_mode can be one of the followings:WAIT- A new tuple emitted from from_item is blocked until the
SELECTconsumes at least one tuple.
- A new tuple emitted from from_item is blocked until the
DROP OLDEST- The oldest tuple in the buffer is removed and a new tuple is inserted into the buffer. “The oldest tuple” is the tuple that was inserted into the buffer before any other tuples, not the tuple having the oldest timestamp.
DROP NEWEST- The oldest tuple in the buffer is removed and a new tuple is inserted into the buffer. “The newest tuple” is the tuple that was inserted into the buffer after any other tuples, not the tuple having the newest timestamp.
Note
A buffer is different from a window. A buffer is placed in front of a window. A window gets a new tuple from a buffer and computes a new relation. A buffer is used not to block emitting tuples so that multiple
SELECTstatements can work concurrently without waiting for their receivers to consume tuples.- stream_alias
- stream_alias provides an alias of from_item and it can be referred
by the alias in other parts of the
SELECT. If the alias is given, the original name is hidden and cannot be used to refer from_item.
Fields of tuples can be referred by <field_name> or
<stream>:<field_name> in other clauses and the SELECT list. For
example, when the SELECT has FROM strm [RANGE 1 TUPLES] and strm
emits {"a":<some value>}, the field a can be referred by a or
strm:a. These two forms cannot be mixed in a SELECT statement.
The form <stream>:<field_name> is required when the FROM clause has
multiple input streams.
WHERE Clause¶
The SELECT can optionally have a WHERE clause. The WHERE clause
have a condition. The condition can be any expression that evaluates to a
result of type bool. Any tuple that does not satisfy the condition
(i.e. the result of the expression is false) will be eliminated from the
output.
Operators describes operators that can be used in the condition.
GROUP BY Clause¶
The GROUP BY clause is an optional clause and condenses into a single
tuple all selected tuples whose expressions specified in GROUP BY clause
result in the same value.
expression can be any expression using fields of an input tuple. When there’re multiple expressions in the clause, tuples having the same set of values computed from those expressions are grouped into a single tuple.
When the GROUP BY clause is present, any ungrouped field cannot be used
as an output field without aggregate functions. For example, when tuples have
4 fields a, b, c, and d, and the GROUP BY clause has
following expressions:
GROUP BY a, b + c
a can only be used as an output field:
SELECT a FROM stream [RANGE 1 TUPLES]
GROUP BY a, b + c;
Other fields need to be specified in aggregate functions:
SELECT a, max(b), min(b + c), avg(c * d) FROM stream [RANGE 1 TUPLES]
GROUP BY a, b + c;
Aggregate functions are evaluated for each group using all tuples in the group.
Note
The GROUP BY clause performs grouping within a window:
SELECT a FROM stream [RANGE 10 TUPLES]
GROUP BY a;
This SELECT computes at most 10 groups of tuples because there’re only
10 tuples in the window.
HAVING Clause¶
The HAVING clause is an optional clause and placed after the GROUP BY
clause. The HAVING clause has a condition and evaluate it for each group,
instead of each tuple. When ungrouped fields are used in the condition, they
need to be in aggregate functions:
SELECT a, max(b), min(b + c), avg(c * d) FROM stream [RANGE 1 TUPLES]
GROUP BY a, b + c HAVING min(b + c) > 1 AND avg(c * d) < 10;
In this example, b, c, and d are ungrouped fields and cannot
directly specified in the condition.
SELECT List¶
The SELECT list, placed between the emitter and the FROM clause,
defines the form of the output tuples emitted from the SELECT statement.
Each item in the list can be any expression. Each item (i.e. output field)
will have a name. When an expression only consists of a field name, the output
name of the expression will be the field name. For example, the output name
of strm:price in SELECT RSTREAM strm:price FROM ... will be price,
not strm:price. When the expression is a UDF call, the name of the UDF
will be used as the name of the output field. For example, the result of
count(*) is named as count. If an expression is other than a field
name or a UDF call, the output name will be col_n where n is replaced
with the number corresponding to n-th expression (counting from 0). The output
field name can manually be specified by AS output_name.
When the expression is *, all fields which have not been specified in the
SELECT list yet will be included. Output names of those fields will be
identical to the original field names.
If an expression results in a map, its output name can be AS *. In such
case, all fields of the map is extended to the top level fields. For example,
in SELECT RSTREAM a, b AS *, c FROM strm ..., when strm emits tuples
having
{
"a": v1,
"b": {
"d": v3,
"e": v4
},
"c": v2,
}
to the SELECT, its output will be
{
"a": v1,
"c": v2,
"d": v3,
"e": v4
}
When some fields have the same name, only one of them will be included in the result. It is undefined which field will be chosen as a result.
Notes¶
An Emitter and Its Performance¶
There’re some use case specific optimizations of the evaluation of the
SELECT and this subsection describes each optimization and its limitation.
Simple Transformation and Filtering¶
Performing a simple per-tuple transformation or filtering over an input stream is a very common task. Therefore, BQL optimizes statements having the following form:
SELECT RSTREAM projection FROM input [RANGE 1 TUPLES] WHERE condition;
Limitations of this optimization are:
- There can only be one input stream and its range is
[RANGE 1 TUPLES]. - The emitter must be
RSTREAM.
Evaluation in WHERE Clause¶
Each set of tuples cross-joined in the FROM clause is evaluated exactly once
in the WHERE clause. Therefore, all functions in the WHERE clause are
only called once for each set:
SELECT RSTREAM * FROM stream1 [RANGE 100 TUPLES], stream2 [RANGE 100 TUPLES]
WHERE random() < 0.2;
In this example, 80% of sets of cross-joined tuples are filtered out and only 20% of sets (around 20 tuples for each input from either stream) are emitted.
Commands¶
build_sensorbee¶
Because SensorBee is written in Go, all its dependencies including plugins
generally need to be statically linked, or at least cannot be dynamically
loaded at runtime. The build_sensorbee command is provided to support
building a custom sensorbee command.
Basic Usage¶
Prepare build.yaml configuration file and run build_sensorbee in the
same directory as build.yaml is located at:
$ ls
build.yaml
$ build_sensorbee
sensorbee_main.go
$ ls
build.yaml
sensorbee
sensorbee_main.go
sensorbee is the result executable file and sensorbee_main.go is a
Go file generated by build_sensorbee and passed for go build command
to build sensorbee.
Configuration¶
build_sensorbee requires a configuration file named build.yaml. The file
is written in YAML and has following optional sections:
pluginscommands
plugins¶
The plugins section is optional and may have a list of plugins as follows:
plugins:
- github.com/sensorbee/twitter/plugin
- github.com/sensorbee/fluentd/plugin
- github.com/sensorbee/nlp/plugin
- github.com/sensorbee/tutorial/ml/plugin
- github.com/sensorbee/jubatus/classifier/plugin
A plugin must be provided as a valid import path of Go. A path depends on each plugin.
commands¶
The commands section is optional and is used to customize subcommands that
the sensorbee command will have. By default, or when the section is empty,
subcommands include all standard commands:
runrunfileshelltopology
commands section is a map whose key is the name of subcommand. Standard
subcommands like run can be added by providing empty entries:
commands:
run:
shell:
With this configuration, the sensorbee command will only have run and
shell commands.
To add a custom command, an entry must have path parameter that is a Go
import path of the command:
commands:
run:
shell:
mytest:
path: "path/to/sb-mytest"
With this configuration, the sensorbee command will also have the mytest
subcommand. The subcommand is implemented at path/to/sb-mytest.
Names of commands must be unique and cannot be any of:
cliosversiontime
Prohibited names might be added in the future version.
Custom Subcommand Development¶
A custom subcommand for the sensorbee command can be developed as a Go
package. Only thing the package has to do is to provide a function
func SetUp() cli.Command. cli is gopkg.in/urfave/cli.v1.
A minimum example is provided in
the SensorBee tutorial repository:
package hello
import (
"fmt"
cli "gopkg.in/urfave/cli.v1"
)
func SetUp() cli.Command {
return cli.Command{
Name: "hello",
Usage: "say hello",
Action: func(c *cli.Context) error {
fmt.Println("hello")
return nil
},
}
}
This command prints “hello” when sensorbee hello is executed. See
https://github.com/urfave/cli to learn how to create a command using the
cli library.
A Complete Example¶
plugins:
- github.com/sensorbee/twitter/plugin
- github.com/sensorbee/fluentd/plugin
- github.com/sensorbee/nlp/plugin
- github.com/sensorbee/tutorial/ml/plugin
- github.com/sensorbee/jubatus/classifier/plugin
commands:
run:
runfile:
shell:
topology:
hello:
path: "github.com/sensorbee/tutorial/custom_command/hello"
Flags and Options¶
--config path or -c path
This option specifies the path to the configuration file to be used. Its default value is
build.yaml. With this option, a configuration file in another directory can be used as follows:$ build_sensorbee -c /path/to/dir/special_build.yaml
--download-plugins={true|false}
This option have to be
trueorfalse. When the value istrue,build_sensorbeedownloads (i.e.go get) all plugins listed inbuild.yaml. When it’sfalse,build_sensorbeedoesn’t download plugins and tries to used plugins as installed in the environment. The default value istrue.Specifying
falseis useful when the customsensorbeecommand needs to depend on a plugin that is in a special git branch or locally modified.
--help or -h
When this flag is given, the command shows the usage of itself and exits without doing anything.
--only-generate-source
When this flag is given,
build_sensorbeedoesn’t build asensorbeecommand but only generate a source code that can be built bygo buildcommand. For example:$ build_sensorbee --only-generate-source sensorbee_main.go $ go build -o sensorbee sensorbee_main.go
--out executable_name or -o executable_name
This option customizes the name of the output executable file. The default is
sensorbee.exein Windows andsensorbeein all other environment. The following command generates an executable namedmy_sensorbeeinstead ofsensorbee:$ build_sensorbee -o my_sensorbee
--source-filename filename
The filename of the Go source code file automatically generated by
build_sensorbeecan be specified by this option. The default value issensorbee_main.go.$ build_sensorbee --source-filename custom_main.goBy executing this command,
custom_main.gois generated instead ofsensorbee_main.go.
--version or -v
When this flag is given, the command prints the version of thebuild_sensorbeecommand.
sensorbee¶
sensorbee is the main command to manipulate SensorBee. sensorbee
consists of a set of following subcommands:
sensorbee command can needs to be created by build_sensorbee command and
all the example commands are written as ./sensorbee to emphasize that
there’s no default sensorbee command.
See each command’s reference for details.
Flags and Options¶
--help or -h
When this flag is given, the command prints the usage of itself.
--version or -v
The command prints the version of SensorBee.
sensorbee run¶
sensorbee run runs the SensorBee server that manages multiple topologies
that can dynamically modified at runtime.
Basic Usage¶
$ ./sensorbee run -c sensorbee.yaml
Configuration¶
A configuration file can optionally be provided for sensorbee run command.
The file is written in YAML and has following optional
sections:
loggingnetworkstoragetopologies
logging¶
The logging section customizes behavior of the logger. It has following
optional parameters:
target
The
targetparameter changes the destination of log messages. Its value can be of followings:
stdout: write log messages to stdoutstderr: write log messages to stderr- file path: a path to the log file
When the value is neither
stdoutnorstderr, it’s considered to be a file path. The default value of this parameter isstderr.
min_log_level
This option specifies the minimum level (severity) of log messages to be written. Valid values are one ofdebug,info,warn,warning,error, orfatal.warncan also bewarning. Whendebugis given, all levels of messages will be written into the log. When the value iserror, only log messages witherrororfatallevel will be written. The default value of this parameter isinfo.
log_dropped_tuples
The SensorBee server can prints a log message and contents of tuples when they’re dropped from a topology. When this option istrue, the server writes log messages reporting dropped tuples. When it’sfalse, the server doesn’t. The default value of this option isfalse.
log_destinationless_tuples
A destinationless tuple is one kind of dropped tuples that is caused when a source or a stream doesn’t have any destination and it drops a tuple. By settingtrueto this option, the server reports all destinationless tuples. The default value of this option isfalse. Note that, to log replication tuples, thelog_dropped_tuplesoption also needs to betrue.
summarize_dropped_tuples
This option turns on or off summarization of dropped tuple logging. Valid values for this option is
trueorfalse. When its value is thetrue, dropped tuples are summarized in log messages.Note
At the current version, only
blobfields are summarized to"(blob)". Other configuration parameters will be supported in the future version such as the maximum number of fields, the maximum depths of maps, the maximum length of arrays, and so on.When
falseis specified, each log message shows a complete JSON that are compatible to the original tuple. Although this is useful for debugging, tuples containing large binary data like images may result in disk.The default value of this option is
false.
Example:
logging:
target: /path/to/sensorbee.log
min_log_level: info
log_dropped_tuples: true
summarize_dropped_tuples: true
network¶
The network section has parameters related to server’s network
configuration. It has following optional parameters:
listen_on
This parameter controls how the server expose its listening port. The syntax of the value is likehost:port.hostcan be IP addresses such as0.0.0.0or127.0.0.1. Whenhostis given, the server only listens on the interface with the given host address. If thehostis omitted, the server listens on all available interfaces, that is, the server accepts connections from any host. The default value of this parameter is:15601.
Example:
network:
listen_on: ":15601"
storage¶
The storage section contains the configuration of storages used for saving
UDSs or other information. It has following optional subsections:
uds
uds¶
The uds subsection configures the storage for saving and loading UDSs. It
provides following optional parameters:
type
The type name of the storage.in_memoryis used as the default value.
params
paramshas subparameter specific to the giventype.
Currently, following types are available:
in_memoryfs
Descriptions of types and parameters are provided below:
in_memory
in_memorysaves UDSs in memory. It loses all saved data when the server restarts. This type doesn’t have any parameter.Example:
storage: uds: type: in_memory
fs
fssaves UDSs in the local file system. It has following required parameters:
dirdirhas the path to the directory that saved data will be stored.
fsalso has following optional parameters:
temp_dirtemp_dirhas the path to the temporary directory that is used when the UDS writes data. After the UDS has written all the data, the file is move to the directory specified bydirparameter. The same value asdiris used by default.The file name of each saved UDS is formatted as
<topology>-<name>-<tag>.state.Example:
storage: uds: type: fs params: dir: /path/to/uds_dir temp_dir: /tmp
topologies¶
The topologies section contains the configuration of topologies in the
following format:
topologies:
name_of_topology1:
... configuration for name_of_topology1 ...
name_of_topology2:
... configuration for name_of_topology2 ...
name_of_topology3:
... configuration for name_of_topology3 ...
... other topologies ...
Topologies listed in this section will be created at the startup of the server based on the sub-configuration of each topology. Following optional configuration parameters are provided for each topology:
bql_file
This parameter has the path to the file containing BQL statements for the topology. All statements are executed before the server gets ready. If the execution fails, the server would exit with an error.
Example:
$ ls
my_topology.bql
sensorbee.yaml
$ cat my_topology.bql
CREATE SOURCE fluentd TYPE fluentd;
CREATE STREAM users AS
SELECT RSTREAM users FROM fluentd [RANGE 1 TUPLES];
CREATE SINK user_file TYPE file WITH path = "users.jsonl";
$ cat sensorbee.yaml
topologies:
my_topology:
bql_file: my_topology.bql
$ ./sensorbee run -c sensorbee.yaml
As a result of these commands above, the server started with sensorbee.yaml
has a topology named my_topology. The topology has three nodes: fluentd,
users, and user_file.
Note
This is the only way to persist the configuration of topologies at the
moment. Any updates applied at runtime will not be reflected into the bql file.
For example, if the server restarts after creating a new stream in
my_topology, the new stream will be lost unless it’s explicitly added
to my_topology.bql manually.
The configuration of a topology can be empty:
topologies:
my_empty_topology:
In this case, an empty topology my_empty_topology will be created so that
the sensorbee topology create command doesn’t have to be executed every
time the server restarts.
A Complete Example¶
logging:
target: /path/to/sensorbee.log
min_log_level: info
log_dropped_tuples: true
summarize_dropped_tuples: true
network:
listen_on: ":15601"
storage:
uds:
type: fs
params:
dir: /path/to/uds_dir
temp_dir: /tmp
topologies:
empty_topology:
my_topology:
bql_file: /path/to/my_topology.bql
Flags and Options¶
--config path or -c path
This option receives the path of the configuration file. By default, the value is empty and no configuration file is used. This value can also be passed throughSENSORBEE_CONFIGenvironment variable.
--help or -h
When this flag is given, the command prints the usage of itself.
sensorbee runfile¶
sensorbee runfile runs a single BQL file. This command is mainly designed
for offline data processing but can be used as a standalone SensorBee process
that doesn’t expose any interface to manipulate the topology.
sensorbee runfile stops after all the nodes created by the given BQL file
stops. The command doesn’t stop if it contains a source that generates infinite
tuples or is rewindable. Other non-rewindable sources such as file stopping
when it emits all tuples written in a file can work well with the command.
Sources generally need to be created with PAUSED keyword in the
CREATE SOURCE statement. Without PAUSED, a source can start
emitting tuples before all nodes in a topology can correctly be set up.
Therefore, a BQL file passed to the command should look like:
CREATE PAUSED SOURCE source_1 TYPE ...;
CREATE PAUSED SOURCE source_2 TYPE ...;
...
CREATE PAUSED SOURCE source_n TYPE ...;
... CREATE STREAM, CREATE SINK, or other statements
RESUME SOURCE source_1;
RESUME SOURCE source_2;
...
RESUME SOURCE source_n;
With the --save-uds option described later, it saves UDSs at the end of its
execution.
Basic Usage¶
$ ./sensorbee runfile my_topology.bql
With options:
$ ./sensorbee runfile -c sensorbee.yaml -s '' my_topology.bql
Configuration¶
sensorbee runfile accepts the configuration file for sensorbee run. It
only uses logging and storage sections. The configuration file may
contain other sections as well and the same file for sensorbee run can also
be used for sensorbee runfile. See
its configuration for details.
Flags and Options¶
--config path or -c path
This option receives the path of the configuration file. By default, the value is empty and no configuration file is used. This value can also be passed throughSENSORBEE_CONFIGenvironment variable.
--help or -h
When this flag is given, the command prints the usage of itself.
--save-uds udss or -s udss
This option receives a list of names of UDSs separated by commas. UDSs listed in it will be saved at the end of execution. For example, when the option is
-s "a,b,c", UDSs nameda,b, andcwill be saved. To save all UDSs in a topology, pass an empty string:-s "".By default, all UDSs will not be saved at the end of execution.
--topology name or -t name
This option changes the name of the topology to be run with the given BQL file. The default name is taken from the file name of the BQL file. The name specified to this option will be used in log messages or saved UDS data. Especially, names of files containing saved UDS data has contains the name of the topology. Therefore, providing the same name as the topology that will be run bysensorbee runlater on allows users to prepare UDSs including pre-trained machine learning models in advance.
sensorbee shell or bql¶
sensorbee shell or bql starts a new shell to manipulate the SensorBee
server. The shell can be terminated by writing exit or typing C-d.
Both sensorbee shell and bql have the same interface, but bql is
installed by default while the sensorbee command needs to be built manually
to run sensorbee shell.
Basic Usage¶
To run sensorbee shell,
$ ./sensorbee shell -t my_topology
my_topology>
To run bql,
$ bql -t my_topology
my_topology>
Flags and options¶
--api-version version
This option changes the API version of the SensorBee server. The default value of this option isv1.
--help or -h
When this flag is given, the command prints the usage of itself.
--topology name or -t name
The name of a topology to be manipulated can be specified through this option so thatUSE topology_namedoesn’t have to be used in the shell. The default value is an empty name, that is, no topology is specified.
--uri
This option is used when the SensorBee server is running at non-localhost or using non-default port number (15601). The value should have a format likehttp://host:port/. The default value of this option ishttp://localhost:15601/.
sensorbee topology¶
sensorbee topology, or sensorbee t, is used to manipulate topologies on
the SensorBee server.
Note
This command is provided because the syntax of BQL statements that controls topologies has not been discussed enough yet.
The command consists of following subcommands:
sensorbee topology create <name> or sensorbee t c <name>
This command creates a new topology on the SensorBee server. The<name>argument is the name of the topology to be created.$?will be 0 if the command is successful. Otherwise, it’ll be non-zero. The command fails if the topology already exists on the server.
sensorbee topology drop <name> or sensorbee t drop <name>
This command drops an existing topology on the SensorBee server. The<name>argument is the name of the topology to be dropped.$?will be 0 if the command is successful. Otherwise, it’ll be non-zero. The command doesn’t fail even if the topology doesn’t exist on the server.
sensorbee topology list or sensorbee t l
This commands prints names of all topologies that the SensorBee server has, one name per line.
All commands share the same flags and options. Flags and options need to be given after the subcommand name:
$ ./sensorbee topology create --flag --option value my_topology
In this example, a flag --flag and an option --option value are
provided. The argument of the command, i.e. the name of topology, is
my_topology.
Flags and Options¶
--api-version version
This option changes the API version of the SensorBee server. The default value of this option isv1.
--help or -h
When this flag is given, the command prints the usage of itself.
--uri
This option is used when the SensorBee server is running at non-localhost or using non-default port number (15601). The value should have a format likehttp://host:port/. The default value of this option ishttp://localhost:15601/.
Function Reference¶
Common Mathematical Functions¶
For the functions below, if a given parameter is outside the mathematically valid range for that function (e.g., sqrt(-2), log(0), div(2.0, 0.0)) and the return type is float, then NaN is returned.
However, if the return type is int (e.g., div(2, 0)), there is no NaN option and an error will occur instead.
abs¶
abs(x)
Description¶
abs computes the absolute value of a number.
Parameter Types¶
xintorfloat
Return Type¶
same as input
Examples¶
| Function Call | Result |
|---|---|
abs(-17.4) |
17.4 |
cbrt¶
cbrt(x)
Description¶
cbrt computes the cube root of a number.
Parameter Types¶
xintorfloat
Return Type¶
float
Examples¶
| Function Call | Result |
|---|---|
cbrt(27.0) |
3.0 |
cbrt(-3) |
-1.4422495703074083 |
ceil¶
ceil(x)
Description¶
ceil computes the smallest integer not less than its argument.
Parameter Types¶
xintorfloat
Return Type¶
same as input
The return type is float for float input in order to avoid problems with input values that are too large for the int data type.
Examples¶
| Function Call | Result |
|---|---|
ceil(1.3) |
2.0 |
ceil(-1.7) |
-1.0 |
degrees¶
degrees(x)
Description¶
degrees converts radians to degrees.
Parameter Types¶
xintorfloat
Return Type¶
float
Examples¶
| Function Call | Result |
|---|---|
degrees(3.141592653589793) |
180.0 |
div¶
div(y, x)
Description¶
div computes the integer quotient y/x of two numbers y and x.
If x is 0.0 (float) then NaN will be returned; if it is 0 (integer) then a runtime error will occur.
Parameter Types¶
yintorfloatx- same as
y
Return Type¶
same as input
Examples¶
| Function Call | Result |
|---|---|
div(9, 4) |
2 |
div(9.3, 4.5) |
2.0 |
exp¶
exp(x)
Description¶
exp computes the exponential of a number.
Parameter Types¶
xintorfloat
Return Type¶
float
Examples¶
| Function Call | Result |
|---|---|
exp(1.0) |
2.718281828459045 |
floor¶
floor(x)
Description¶
floor computes the largest integer not greater than its argument.
Parameter Types¶
xintorfloat
Return Type¶
same as input
The return type is float for float input in order to avoid problems with input values that are too large for the int data type.
Examples¶
| Function Call | Result |
|---|---|
floor(1.3) |
1.0 |
floor(-1.7) |
-2.0 |
ln¶
ln(x)
Description¶
ln computes the natural logarithm of a number.
If the parameter is not strictly positive, NaN is returned.
Parameter Types¶
xintorfloat
Return Type¶
float
Examples¶
| Function Call | Result |
|---|---|
ln(2) |
0.6931471805599453 |
log¶
log(x)
log(b, x)
Description¶
log computes the logarithm of a number x to base b (default: 10).
Parameter Types¶
xintorfloatb(optional)- same as
x
Return Type¶
float
Examples¶
| Function Call | Result |
|---|---|
log(100) |
2.0 |
log(2.5, 6.25) |
2.0 |
log(2, 8) |
3.0 |
mod¶
mod(y, x)
Description¶
mod computes the remainder of integer division y/x of two numbers y and x.
If x is 0.0 (float) then NaN will be returned; if it is 0 (integer) then a runtime error will occur.
Parameter Types¶
yintorfloatx- same as
y
Return Type¶
same as input
Examples¶
| Function Call | Result |
|---|---|
mod(9, 4) |
1 |
mod(9.3, 4.5) |
0.3 |
pi¶
pi()
Description¶
pi returns the π constant (more or less 3.14).
Return Type¶
float
Examples¶
| Function Call | Result |
|---|---|
pi() |
3.141592653589793 |
power¶
power(a, b)
Description¶
power computes a raised to the power of b.
Parameter Types¶
aintorfloatb- same as
a
Return Type¶
float
The return type is float even for integer input in order to have a uniform behavior for cases such as power(2, -2).
Examples¶
| Function Call | Result |
|---|---|
power(9.0, 3.0) |
729.0 |
power(2, -1) |
0.5 |
radians¶
radians(x)
Description¶
radians converts degrees to radians.
Parameter Types¶
xintorfloat
Return Type¶
float
Examples¶
| Function Call | Result |
|---|---|
radians(180) |
3.141592653589793 |
round¶
round(x)
Description¶
round computes the nearest integer of a number.
Parameter Types¶
xintorfloat
Return Type¶
same as input
The return type is float for float input in order to avoid problems with input values that are too large for the int data type.
Examples¶
| Function Call | Result |
|---|---|
round(1.3) |
1.0 |
round(0.5) |
1.0 |
round(-1.7) |
-2.0 |
sign¶
sign(x)
Description¶
sign returns the sign of a number: 1 for positive numbers, -1 for negative numbers and 0 for zero.
Parameter Types¶
xintorfloat
Return Type¶
int
Examples¶
| Function Call | Result |
|---|---|
sign(2) |
1 |
sqrt¶
sqrt(x)
Description¶
sqrt computes the square root of a number.
If the parameter is negative, NaN is returned.
Parameter Types¶
xintorfloat
Return Type¶
float
Examples¶
| Function Call | Result |
|---|---|
sqrt(2) |
1.4142135623730951 |
trunc¶
trunc(x)
Description¶
trunc computes the truncated integer (towards zero) of a number.
Parameter Types¶
xintorfloat
Return Type¶
same as input
The return type is float for float input in order to avoid problems with input values that are too large for the int data type.
Examples¶
| Function Call | Result |
|---|---|
trunc(1.3) |
1.0 |
trunc(-1.7) |
-1.0 |
width_bucket¶
width_bucket(x, left, right, count)
Description¶
width_bucket computes the bucket to which x would be assigned in an equidepth histogram with count buckets in the range \([\text{left},\text{right}[\).
Points on a bucket border belong to the right bucket.
Points outside of the \([\text{left},\text{right}[\) range have bucket number \(0\) and \(\text{count}+1\), respectively.
Parameter Types¶
xintorfloatleftintorfloatrightintorfloatcountint
Return Type¶
int
Examples¶
| Function Call | Result |
|---|---|
width_bucket(5, 0, 10, 5) |
3 |
Pseudo-Random Functions¶
The characteristics of the functions below are equal to those from the Go rand module. They are not suitable for cryptographic applications.
random¶
random()
Description¶
random returns a pseudo-random number in the range \(0.0 <= x < 1.0\).
This function is not safe for use in cryptographic applications. See the Go math/rand package for details.
Return Type¶
float
Examples¶
| Function Call | Result |
|---|---|
random() |
0.6046602879796196 |
setseed¶
setseed(x)
Description¶
setseed initializes the seed for subsequent random() calls.
The parameter must be in the range \(-1.0 <= x <= 1.0\).
This function is not safe for use in cryptographic applications. See the Go math/rand package for details.
Parameter Types¶
xfloat
Trigonometric Functions¶
All trigonometric functions take arguments and return values of type float.
Trigonometric functions arguments are expressed in radians.
Inverse functions return values are expressed in radians.
String Functions¶
bit_length¶
bit_length(s)
Description¶
bit_length computes the number of bits in a string s.
Note that due to UTF-8 encoding, this is equal to octet_length(s) * 8,
not necessarily char_length(s) * 8.
Parameter Types¶
sstring
Return Type¶
int
Examples¶
| Function Call | Result |
|---|---|
bit_length("über") |
40 |
btrim¶
btrim(s)
btrim(s, chars)
Description¶
btrim removes the longest string consisting only of characters in chars (default: whitespace) from the start and end of s.
Parameter Types¶
sstringchars(optional)string
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
btrim(" trim ") |
"trim" |
btrim("xyxtrimyyx", "xy") |
"trim" |
char_length¶
char_length(s)
Description¶
char_length computes the number of characters in a string.
Parameter Types¶
sstring
Return Type¶
int
Examples¶
| Function Call | Result |
|---|---|
char_length("über") |
4 |
concat¶
concat(s [, ...])
Description¶
concat concatenates all strings given as input arguments.
NULL values are ignored, i.e., treated like an empty string.
Parameter Types¶
sand all subsequent parametersstring
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
concat("abc", NULL, "22") |
"abc22" |
concat_ws¶
concat_ws(sep, s [, ...])
Description¶
concat_ws concatenates all strings given as input arguments s using the separator sep.
NULL values are ignored.
Parameter Types¶
sepstringsand all subsequent parametersstring
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
concat_ws(":", "abc", NULL, "22") |
"abc:22" |
format¶
format(s, [x, ...])
Description¶
format formats a variable number of arguments x according to a format string s.
See the Go package fmt for details of what formatting codes are allowed.
Parameter Types¶
sstringxand all subsequent parameters (optional)- any
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
format("%s-%d", "abc", 22) |
"abc-22" |
lower¶
lower(s)
Description¶
lower converts a string s to lower case.
Non-ASCII Unicode characters are mapped to their lower case, too.
Parameter Types¶
sstring
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
lower("ÜBer") |
"über" |
ltrim¶
ltrim(s)
ltrim(s, chars)
Description¶
ltrim removes the longest string consisting only of characters in chars (default: whitespace) from the start of s.
Parameter Types¶
sstringchars(optional)string
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
ltrim(" trim ") |
"trim " |
ltrim("xyxtrimyyx", "xy") |
"trimyyx" |
md5¶
md5(s)
Description¶
md5 computes the MD5 checksum of a string s and returns it in hexadecimal format.
Parameter Types¶
sstring
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
md5("abc") |
"900150983cd24fb0d6963f7d28e17f72" |
octet_length¶
octet_length(s)
Description¶
octet_length computes the number of bytes in a string s.
Note that due to UTF-8 encoding, this may differ from the number returned by char_length.
Parameter Types¶
sstring
Return Type¶
int
Examples¶
| Function Call | Result |
|---|---|
octet_length("über") |
5 |
overlay¶
overlay(s, repl, from)
overlay(s, repl, from, for)
Description¶
overlay replaces for characters in a string s with the string repl, starting at from.
(Index counting starts at 0.)
If for is not given, the length of repl is used as a default.
Parameter Types¶
sstringreplstringfromintfor(optional)int
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
overlay("Txxxxas", "hom", 1) |
"Thomxas" |
overlay("Txxxxas", "hom", 1, 4) |
"Thomas" |
rtrim¶
rtrim(s)
rtrim(s, chars)
Description¶
rtrim removes the longest string consisting only of characters in chars (default: whitespace) from the end of s.
Parameter Types¶
sstringchars(optional)string
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
rtrim(" trim ") |
" trim" |
rtrim("xyxtrimyyx", "xy") |
"xyxtrim" |
sha1¶
sha1(s)
Description¶
sha1 computes the SHA1 checksum of a string s and returns it in hexadecimal format.
Parameter Types¶
sstring
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
sha1("abc") |
"a9993e364706816aba3e25717850c26c9cd0d89d" |
sha256¶
sha256(s)
Description¶
sha256 computes the SHA256 checksum of a string s and returns it in hexadecimal format.
Parameter Types¶
sstring
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
sha256("abc") |
"ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad" |
strpos¶
strpos(s, t)
Description¶
strpos returns the index of the first occurrence of t in s (index counting starts at 0) or -1 if it is not found.
Parameter Types¶
sstringtstring
Return Type¶
int
Examples¶
| Function Call | Result |
|---|---|
strpos("high", "ig") |
1 |
substring¶
substring(s, r)
substring(s, from)
substring(s, from, for)
Description¶
substring(s, r) extracts the substring matching regular expression r from s.
See the Go regexp package for details of matching.
substring(s, from, for) returns the for characters of str starting from the from index.
(Index counting starts at 0.)
If for is not given, everything until the end of str is returned.
Which of those behaviors is used depends on the type of the second parameter (int or string).
Parameter Types¶
sstringrstringfromintfor(optional)int
Return Type¶
string
Examples¶
| Function Call | Result |
|---|---|
substring("Thomas", "...$") |
"mas" |
substring("Thomas", 1) |
"homas" |
substring("Thomas", 1, 3) |
"hom" |
Time Functions¶
distance_us¶
distance_us(u, v)
Description¶
distance_us computes the signed temporal distance from u to v in microseconds.
Parameter Types¶
utimestampvtimestamp
Return Type¶
int
Examples¶
| Function Call | Result |
|---|---|
distance_us("2016-02-09T05:40:25.123Z"::timestamp, "2016-02-09T05:41:25.456Z"::timestamp) |
60333000 |
distance_us(clock_timestamp(), clock_timestamp()) |
2 |
clock_timestamp¶
clock_timestamp()
Description¶
clock_timestamp returns the current date and time in UTC.
Return Type¶
timestamp
now¶
now()
Description¶
now returns the date and time in UTC of the point in time when processing of the current tuple started.
In particular and as opposed to clock_timestamp, the timestamp returned by now() does not change during a processing run triggered by the arrival of a tuple.
For example, in
SELECT RSTREAM clock_timestamp() AS a, clock_timestamp() AS b,
now() AS c, now() AS d FROM ...
the values of a and b are most probably different by a very short timespan, but c and d are equal by definition of now().
now cannot be used in an EVAL statement outside of a stream processing context.
Return Type¶
timestamp
Array Functions¶
Other Scalar Functions¶
Aggregate Functions¶
Aggregate functions compute a single result from a set of input values.
It should be noted that except for count, these functions return a NULL value when no rows are selected.
In particular, sum of no rows returns NULL, not zero as one might expect, and array_agg returns NULL rather than an empty array when there are no input rows.
The coalesce function can be used to substitute zero or an empty array for NULL when necessary.
Also note that most aggregate functions ignore singular NULL values in their input, i.e., processing is done as if this row had not been in the input.
(One notable exception is the array_agg function that includes input NULL values in its output.)
array_agg¶
array_agg(x)
Description¶
array_agg returns an array containing all input values, including NULL values.
There is no guarantee on the order of items in the result.
Use the ORDER BY clause to achieve a certain ordering.
Parameter Types¶
x- any
Return Type¶
array
avg¶
avg(x)
Description¶
avg computes the average (arithmetic mean) of all input values.
Parameter Types¶
xintorfloat(mixed types are allowed)
Return Type¶
float
bool_and¶
bool_and(x)
Description¶
bool_and returns true if all input values are true, otherwise false.
Parameter Types¶
xbool
Return Type¶
bool
bool_or¶
bool_or(x)
Description¶
bool_or returns true if at least one input value is true, otherwise false.
Parameter Types¶
xbool
Return Type¶
bool
count¶
count(x)
count(*)
Description¶
count returns the number of input rows for which x is not NULL, or the number of total rows if * is passed.
Parameter Types¶
x- any
Return Type¶
int
json_object_agg¶
json_object_agg(k, v)
Description¶
json_object_agg aggregates pairs of key k and value v as a map.
If both key and value are NULL, the pair is ignored.
If only the value is NULL, it is still added with the corresponding key.
It is an error if only the key is NULL.
It is an error if a key appears multiple times.
A map does not have an ordering, therefore there is no guarantee on the result map ordering, whether or not ORDER BY is used.
Parameter Types¶
kstringv- any
Return Type¶
map
max¶
max(x)
Description¶
max computes the maximum value of all input values.
Parameter Types¶
xintorfloat(mixed types are allowed)
Return Type¶
same as largest input value
median¶
median(x)
Description¶
median computes the median of all input values.
Parameter Types¶
xintorfloat(mixed types are allowed)
Return Type¶
float
min¶
min(x)
Description¶
min computes the minimum value of all input values.
Parameter Types¶
xintorfloat(mixed types are allowed)
Return Type¶
same as smallest input value