Reference¶

BQL Statements¶

CREATE SINK¶

Synopsis¶

CREATE SINK name TYPE type_name [WITH parameter_name = parameter_value [, ...]]

Description¶

CREATE SINK creates a new sink in a topology.

Parameters¶

name: The name of the sink to be created.
type_name: The type name of the sink.
parameter_name: The name of a sink-specific parameter.
parameter_value: The value for a sink-specific parameter.

Sink Parameters¶

The optional WITH clause specifies parameters specific to the sink. See each sink’s documentation to find out parameters it provides.

Examples¶

To create a sink having the name “snk” with no sink-specific parameter:

CREATE SINK snk TYPE fluentd;

To create a sink with sink-specific parameters:

CREATE SINK fluentd TYPE fluentd WITH tag_field = "fluentd_tag";

As you can see, the name of a sink can be same as the type name of a sink.

CREATE SOURCE¶

Synopsis¶

CREATE [PAUSED] SOURCE name TYPE type_name [WITH parameter_name = parameter_value [, ...]]

Description¶

CREATE SOURCE creates a new source in a topology.

Parameters¶

PAUSED: The source is paused when it’s created if this option is used.
name: The name of the source to be created.
type_name: The type name of the source.
parameter_name: The name of a source-specific parameter.
parameter_value: The value for a source-specific parameter.

Source Parameters¶

The optional WITH clause specifies parameters specific to the source. See each source’s documentation to find out parameters it provides.

Notes¶

Some sources stop after emitting all tuples. They can stop even before any subsequent statement is executed. For such sources, specify PAUSED parameter and run RESUME SOURCE after completely setting up a topology so that no tuple emitted from the source will be lost.

Examples¶

To create a source having the name “src” with no source-specific parameter:

CREATE SOURCE src TYPE dropped_tuples;

To create a source with source-specific parameters:

CREATE SOURCE fluentd TYPE fluentd WITH bind = "0.0.0.0:12345",
  tag_field = "my_tag";

As you can see, the name of a source can be same as the type name of a source.

CREATE STATE¶

Synopsis¶

CREATE STATE name TYPE type_name [WITH parameter_name = parameter_value [, ...]]

Description¶

CREATE STATE creates a new UDS (User Defined State) in a topology.

Parameters¶

name: The name of the UDS to be created.
type_name: The type name of the UDS.
parameter_name: The name of a UDS-specific parameter.
parameter_value: The value for a UDS-specific parameter.

UDS Parameters¶

The optional WITH clause specifies parameters specific to the UDS. See each UDS’s documentation to find out parameters it provides.

Examples¶

To create a UDS named “my_uds” with no UDS-specific parameter:

CREATE STATE my_uds TYPE my_uds_type;

To create a UDS with UDS-specific parameters:

CREATE STATE my_ids TYPE snowflake_id WITH machine_id = 1;

CREATE STREAM¶

Synopsis¶

CREATE STREAM name AS select

Description¶

CREATE STREAM creates a new stream (a.k.a a continuous view) in a topology.

Parameters¶

name: The name of the stream to be created.
select: The SELECT statement to generate a stream. select can be any SELECT statement including a statement using UNION ALL.

Examples¶

To create a stream named “strm”:

CREATE STREAM strm AS SELECT RSTREAM * FROM src [RANGE 1 TUPLES];

To create a stream which merges all tuples from multiple streams:

CREATE STREAM strm AS
    SELECT RSTREAM * FROM src1 [RANGE 1 TUPLES]
    UNION ALL
    SELECT RSTREAM * FROM src2 [RANGE 1 TUPLES];

DROP SINK¶

Synopsis¶

DROP SINK name

Description¶

DROP SINK drops a sink that is already created in a topology. The sink can no longer be used after executing the statement.

Parameters¶

name: The name of the sink to be dropped.

Examples¶

To drop a sink having the name “snk”:

DROP SINK snk;

DROP SOURCE¶

Synopsis¶

DROP SOURCE name

Description¶

DROP SOURCE drops a source that is already created in a topology. The source is stopped and removed from a topology. After executing the statement, the source cannot be used.

Parameters¶

name: The name of the source to be dropped.

Examples¶

To drop a source having the name “src”:

DROP SOURCE src;

DROP STATE¶

Synopsis¶

DROP STATE name

Description¶

DROP STATE drops a UDS that is already created in a topology. The UDS can no longer be used after executing the statement.

Note

Even if a uds sink exist for the UDS, the sink will not be dropped when dropping the UDS. The uds sink must be dropped manually.

Parameters¶

name: The name of the UDS to be dropped.

Examples¶

To drop a UDS named “my_uds”:

DROP STATE my_ds;

DROP STREAM¶

Synopsis¶

DROP STREAM name

Description¶

DROP STREAM drops a stream that is already created in a topology. The stream can no longer be used after executing the statement.

Parameters¶

name: The name of the stream to be dropped.

Examples¶

To drop a stream having the name “strm”:

DROP STREAM strm;

INSERT INTO¶

Synopsis¶

INSERT INTO sink FROM stream

Description¶

INSERT INTO inserts tuples from a stream or a source to a sink.

Parameters¶

sink: The name of the sink to which tuples are inserted.
stream: The name of a stream or a source.

Examples¶

To insert tuples into a sink from a source having the name “src”:

INSERT INTO snk FROM src;

LOAD STATE¶

Synopsis¶

LOAD STATE name TYPE type_name [TAG tag]
    [SET set_parameter_name = set_parameter_key]
    [create_if_not_saved]

where create_if_not_saved is:

    OR CREATE IF NOT SAVED
        [WITH create_parameter_name = create_parameter_value]

Description¶

LOAD STATE loads a UDS that is previously saved by SAVE STATE.

LOAD STATE fails if the UDS hasn’t been saved yet. When OR CREATE IF NOT SAVED is specified, LOAD STATE creates a new UDS with the given optional parameters if the UDS hasn’t been saved yet.

LOAD STATE, even with OR CREATE IF NOT SAVED, fails if the UDS doesn’t support the statement.

Parameters¶

name: The name of the UDS to be loaded.
type_name: The type name of the UDS.
tag: The name of the user defined tag for versioning of the saved UDS data. When tag is omitted, “default” is used as the default tag name.
set_parameter_name: The name of a UDS-specific parameter defied for LOAD STATE.
set_parameter_value: The value for a UDS-specific parameter defined for LOAD STATE.
create_parameter_name: The name of a UDS-specific parameter defined for CREATE STATE.
create_parameter_value: The value for a UDS-specific parameter defined for CREATE STATE.

LOAD STATE can have two sets of parameters: set_parameters and create_parameters. set_parameters are used when there’s a saved UDS data having the given tag. On the other hand, create_parameters are used when the UDS hasn’t been saved yet. create_parameters are exactly same as parameters that the UDS defines for CREATE STATE. However, set_parameters are often completely different from create_parameters. Because create_parameters are often saved as a part of the UDS’s information by SAVE STATE, set_parameters doesn’t have to have the same set of parameters defined in create_parameters.

There’re some use-cases that a UDS uses set_parameters:

Customize loading behavior
- When a UDS doesn’t provide a proper versioning of saved data, LOAD STATE may fail to load it due to the format incompatibility. In such a case, it’s difficult to modify saved binary data to have a format version number. Thus, providing a set_parameter specifying the format version number could be the only solution.
Overwrite some saved values of create_parameters

Like create_parameters, set_parameters are specific to each UDS. See the documentation of each UDS to find out parameters it provides.

Examples¶

To load a UDS named “my_uds” and having the type “my_uds_type”:

LOAD STATE my_uds TYPE my_uds_type;

Note that “my_uds” needs to be saved before executing this statement. Otherwise, it fails.

To load a UDS named “my_uds” and having the type “my_uds_type” and assigned a tag “trained”:

LOAD STATE my_uds TYPE my_uds_type TAG trained;

To load a UDS with set_parameters:

LOAD STATE my_uds TYPE my_uds_type TAG trained
    SET force_format_version = "v1";

To load a UDS that hasn’t been saved yet with OR CREATE IF NOT SAVED:

LOAD STATE my_uds TYPE my_uds_type OR CREATE IF NOT SAVED;

To load a UDS that hasn’t been saved yet with OR CREATE IF NOT SAVED with create_parameters:

LOAD STATE my_uds TYPE my_uds_type OR CREATE IF NOT SAVED
    WITH id = 1;

When the UDS hasn’t been saved previously, the statement above falls back into CREATE STATE as follows:

CREATE STATE my_uds TYPE my_uds_type WITH id = 1;

OR CREATE IF NOT SAVED can be used with a tag and set_parameters:

LOAD STATE my_uds TYPE my_uds_type TAG trained SET force_format_version = "v1"
    OR CREATE IF NOT SAVED WITH id = 1;

PAUSE SOURCE¶

Synopsis¶

PAUSE SOURCE name

Description¶

PAUSE SOURCE pauses a running source so that the source stops emitting tuples until executing RESUME SOURCE on it again. Executing PAUSE SOURCE on a paused source doesn’t affect anything.

PAUSE SOURCE fails if the source doesn’t support the statement.

Parameters¶

name: The name of the source to be paused.

Examples¶

To pause a source named “src”:

PAUSE SOURCE src;

RESUME SOURCE¶

Synopsis¶

RESUME SOURCE name

Description¶

RESUME SOURCE resumes a paused source so that the source can start to emit tuples again. Executing RESUME SOURCE on a running source doesn’t affect anything.

RESUME SOURCE fails if the source doesn’t support the statement.

Parameters¶

name: The name of the source to be resumed.

Examples¶

A common use case of RESUME SOURCE to resume a source which is created by CREATE PAUSED SOURCE.

CREATE PAUSED SOURCE src TYPE some_source_type WITH ...parameters...;

-- ... construct a topology connected to src ...

RESUME SOURCE src;

By doing this, no tuple emitted from src will be lost.

REWIND SOURCE¶

Synopsis¶

REWIND SOURCE name

Description¶

REWIND SOURCE rewinds a source so that the source emits tuples from the beginning again.

REWIND SOURCE fails if the source doesn’t support the statement.

Parameters¶

name: The name of the source to be rewound.

Examples¶

To rewind a source named “src”:

REWIND SOURCE src;

SAVE STATE¶

Synopsis¶

SAVE STATE name [TAG tag]

Description¶

SAVE STATE saves a UDS to SensorBee’s storage. The location or the format of the saved UDS depends on a storage that SensorBee uses and is not controllable from BQL.

SAVE STATE fails if the UDS doesn’t support the statement.

Parameters¶

name: The name of the UDS to be saved.
tag: The name of the user defined tag for versioning of the saved UDS data. When tag is omitted, “default” is used as the default tag name.

Examples¶

To save a UDS named “my_uds” without a tag:

SAVE STATE my_uds;

To save a UDS with a tag “trained”:

SAVE STATE my_uds TAG trained;

SELECT¶

Synopsis¶

SELECT emitter {* | expression [AS output_name]} [, ...]
    FROM from_item stream_to_relation_operator
        [AS stream_alias] [, ...]
    [WHERE condition [, ...]]
    [GROUP BY expression [, ...]]
    [HAVING condition [, ...]]
    [UNION ALL select]

where emitter is:

    {RSTREAM | ISTREAM | DSTREAM}

where stream_to_relation_operator is:

    '[' RANGE range_number {TUPLES | SECONDS | MILLISECONDS}
        [, BUFFER SIZE buffer_size]
        [, drop_mode IF FULL]
    ']'

Description¶

SELECT retrieves tuples from one or more streams. The general processing of SELECT is as follows:

Each from_item is converted into a relation (i.e. a window) from a stream. Then, each tuple emitted from a from_item is computed within the window. If more than one element is specified in the FROM clause, they are cross-joined.
When the WHERE clause is specified, condition is evaluated for each set of tuples cross-joined in the FROM clause. Tuples which do not satisfy the condition are eliminated from the output.
If the GROUP BY clause is specified, tuples satisfied the condition in the WHERE clause are combined into groups based on the result of expressions. Then, aggregate functions are performed on each group. When the HAVING clause is given, it eliminates groups that do not satisfy the given condition.
The output tuples are computed using the SELECT output expressions for each tuple or group.
Computed output tuples are converted into a stream using Relation-to-Stream Operators and emitted from the SELECT.
UNION ALL, if present, combines outputs from multiple SELECT. It simply emits all tuples from all SELECT without considering duplicates.

Parameters¶

Emitter¶

emitter controls how a SELECT emits resulting tuples.

RSTREAM: When RSTREAM is specified, all tuples in a relation as a result of processing a newly coming tuple are output. See Relation-to-Stream Operators for more details.
ISTREAM: When ISTREAM is specified, tuples contained in the current relation but not in the previously computed relation are emitted. In other words, tuples that are newly inserted or updated since the previous computation are output. See Relation-to-Stream Operators for more details.
DSTREAM: When DSTREAM is specified, tuples contained in the previously computed relation but not in the current relation are emitted. In other words, tuples in the previous relation that are deleted or updated in the current relation are output. Note that output tuples are from the previous relation so that they have old values. See Relation-to-Stream Operators for more details.

`FROM` Clause¶

The FROM clause specifies one or more source streams for the SELECT and converts those streams into relations using stream to relation operators.

The FROM clause contains following parameters:

from_item

from_item is a source stream which can be either a source, a stream, or a UDSF.

range_number

range_number is a numeric value which specifies how many tuples are in the window. range_number is followed by one of interval types: TUPLES, SECONDS, or MILLISECONDS.

When TUPLES is given, range_number must be a positive integer and the window can contain at most range_number tuples. If a new tuple is inserted into the window having range_number tuples, the oldest tuple is removed. “The oldest tuple” is the tuple that was inserted into the window before any other tuples, not the tuple having the oldest timestamp. The maximum range_number is 1048575 with TUPLES keywords.

When SECONDS or MILLISECONDS is specified, range_number can be a positive number and the difference of the minimum and maximum timestamps of tuples in the window can be at most range_number seconds or milliseconds. If a new tuple is inserted into the window, tuples whose timestamp is range_number seconds or milliseconds earlier than the new tuple’s timestamp are removed. The maximum range_number is 86400 with SECONDS and 86400000 with MILLISECONDS.

buffer_size

buffer_size specifies the size of buffer, or a queue, located between from_item and the SELECT. buffer_size must be an integer and greater than 0. The meximum buffer_size is 131071.

drop_mode

drop_mode controls how a new tuple is inserted into the buffer located between from_item and the SELECT when the buffer is full. drop_mode can be one of the followings:

WAIT
- A new tuple emitted from from_item is blocked until the SELECT consumes at least one tuple.
DROP OLDEST
- The oldest tuple in the buffer is removed and a new tuple is inserted into the buffer. “The oldest tuple” is the tuple that was inserted into the buffer before any other tuples, not the tuple having the oldest timestamp.
DROP NEWEST
- The oldest tuple in the buffer is removed and a new tuple is inserted into the buffer. “The newest tuple” is the tuple that was inserted into the buffer after any other tuples, not the tuple having the newest timestamp.

Note

A buffer is different from a window. A buffer is placed in front of a window. A window gets a new tuple from a buffer and computes a new relation. A buffer is used not to block emitting tuples so that multiple SELECT statements can work concurrently without waiting for their receivers to consume tuples.

stream_alias

stream_alias provides an alias of from_item and it can be referred by the alias in other parts of the SELECT. If the alias is given, the original name is hidden and cannot be used to refer from_item.

Fields of tuples can be referred by <field_name> or <stream>:<field_name> in other clauses and the SELECT list. For example, when the SELECT has FROM strm [RANGE 1 TUPLES] and strm emits {"a":<some value>}, the field a can be referred by a or strm:a. These two forms cannot be mixed in a SELECT statement.

The form <stream>:<field_name> is required when the FROM clause has multiple input streams.

`WHERE` Clause¶

The SELECT can optionally have a WHERE clause. The WHERE clause have a condition. The condition can be any expression that evaluates to a result of type bool. Any tuple that does not satisfy the condition (i.e. the result of the expression is false) will be eliminated from the output.

Operators describes operators that can be used in the condition.

`GROUP BY` Clause¶

The GROUP BY clause is an optional clause and condenses into a single tuple all selected tuples whose expressions specified in GROUP BY clause result in the same value.

expression can be any expression using fields of an input tuple. When there’re multiple expressions in the clause, tuples having the same set of values computed from those expressions are grouped into a single tuple.

When the GROUP BY clause is present, any ungrouped field cannot be used as an output field without aggregate functions. For example, when tuples have 4 fields a, b, c, and d, and the GROUP BY clause has following expressions:

GROUP BY a, b + c

a can only be used as an output field:

SELECT a FROM stream [RANGE 1 TUPLES]
GROUP BY a, b + c;

Other fields need to be specified in aggregate functions:

SELECT a, max(b), min(b + c), avg(c * d) FROM stream [RANGE 1 TUPLES]
GROUP BY a, b + c;

Aggregate functions are evaluated for each group using all tuples in the group.

Note

The GROUP BY clause performs grouping within a window:

SELECT a FROM stream [RANGE 10 TUPLES]
GROUP BY a;

This SELECT computes at most 10 groups of tuples because there’re only 10 tuples in the window.

`HAVING` Clause¶

The HAVING clause is an optional clause and placed after the GROUP BY clause. The HAVING clause has a condition and evaluate it for each group, instead of each tuple. When ungrouped fields are used in the condition, they need to be in aggregate functions:

SELECT a, max(b), min(b + c), avg(c * d) FROM stream [RANGE 1 TUPLES]
GROUP BY a, b + c HAVING min(b + c) > 1 AND avg(c * d) < 10;

In this example, b, c, and d are ungrouped fields and cannot directly specified in the condition.

`SELECT` List¶

The SELECT list, placed between the emitter and the FROM clause, defines the form of the output tuples emitted from the SELECT statement.

Each item in the list can be any expression. Each item (i.e. output field) will have a name. When an expression only consists of a field name, the output name of the expression will be the field name. For example, the output name of strm:price in SELECT RSTREAM strm:price FROM ... will be price, not strm:price. When the expression is a UDF call, the name of the UDF will be used as the name of the output field. For example, the result of count(*) is named as count. If an expression is other than a field name or a UDF call, the output name will be col_n where n is replaced with the number corresponding to n-th expression (counting from 0). The output field name can manually be specified by AS output_name.

When the expression is *, all fields which have not been specified in the SELECT list yet will be included. Output names of those fields will be identical to the original field names.

If an expression results in a map, its output name can be AS *. In such case, all fields of the map is extended to the top level fields. For example, in SELECT RSTREAM a, b AS *, c FROM strm ..., when strm emits tuples having

{
    "a": v1,
    "b": {
        "d": v3,
        "e": v4
    },
    "c": v2,
}

to the SELECT, its output will be

{
    "a": v1,
    "c": v2,
    "d": v3,
    "e": v4
}

When some fields have the same name, only one of them will be included in the result. It is undefined which field will be chosen as a result.

Notes¶

An Emitter and Its Performance¶

There’re some use case specific optimizations of the evaluation of the SELECT and this subsection describes each optimization and its limitation.

Simple Transformation and Filtering¶

Performing a simple per-tuple transformation or filtering over an input stream is a very common task. Therefore, BQL optimizes statements having the following form:

SELECT RSTREAM projection FROM input [RANGE 1 TUPLES] WHERE condition;

Limitations of this optimization are:

There can only be one input stream and its range is [RANGE 1 TUPLES].
The emitter must be RSTREAM.

Evaluation in `WHERE` Clause¶

Each set of tuples cross-joined in the FROM clause is evaluated exactly once in the WHERE clause. Therefore, all functions in the WHERE clause are only called once for each set:

SELECT RSTREAM * FROM stream1 [RANGE 100 TUPLES], stream2 [RANGE 100 TUPLES]
    WHERE random() < 0.2;

In this example, 80% of sets of cross-joined tuples are filtered out and only 20% of sets (around 20 tuples for each input from either stream) are emitted.

Commands¶

build_sensorbee¶

Because SensorBee is written in Go, all its dependencies including plugins generally need to be statically linked, or at least cannot be dynamically loaded at runtime. The build_sensorbee command is provided to support building a custom sensorbee command.

Basic Usage¶

Prepare build.yaml configuration file and run build_sensorbee in the same directory as build.yaml is located at:

$ ls
build.yaml
$ build_sensorbee
sensorbee_main.go
$ ls
build.yaml
sensorbee
sensorbee_main.go

sensorbee is the result executable file and sensorbee_main.go is a Go file generated by build_sensorbee and passed for go build command to build sensorbee.

Configuration¶

build_sensorbee requires a configuration file named build.yaml. The file is written in YAML and has following optional sections:

plugins
commands

`plugins`¶

The plugins section is optional and may have a list of plugins as follows:

plugins:
  - github.com/sensorbee/twitter/plugin
  - github.com/sensorbee/fluentd/plugin
  - github.com/sensorbee/nlp/plugin
  - github.com/sensorbee/tutorial/ml/plugin
  - github.com/sensorbee/jubatus/classifier/plugin

A plugin must be provided as a valid import path of Go. A path depends on each plugin.

`commands`¶

The commands section is optional and is used to customize subcommands that the sensorbee command will have. By default, or when the section is empty, subcommands include all standard commands:

run
runfile
shell
topology

commands section is a map whose key is the name of subcommand. Standard subcommands like run can be added by providing empty entries:

commands:
  run:
  shell:

With this configuration, the sensorbee command will only have run and shell commands.

To add a custom command, an entry must have path parameter that is a Go import path of the command:

commands:
  run:
  shell:
  mytest:
    path: "path/to/sb-mytest"

With this configuration, the sensorbee command will also have the mytest subcommand. The subcommand is implemented at path/to/sb-mytest.

Names of commands must be unique and cannot be any of:

cli
os
version
time

Prohibited names might be added in the future version.

Custom Subcommand Development¶

A custom subcommand for the sensorbee command can be developed as a Go package. Only thing the package has to do is to provide a function func SetUp() cli.Command. cli is gopkg.in/urfave/cli.v1. A minimum example is provided in the SensorBee tutorial repository:

package hello

import (
    "fmt"

    cli "gopkg.in/urfave/cli.v1"
)

func SetUp() cli.Command {
    return cli.Command{
        Name:  "hello",
        Usage: "say hello",
        Action: func(c *cli.Context) error {
            fmt.Println("hello")
            return nil
        },
    }
}

This command prints “hello” when sensorbee hello is executed. See https://github.com/urfave/cli to learn how to create a command using the cli library.

A Complete Example¶

plugins:
  - github.com/sensorbee/twitter/plugin
  - github.com/sensorbee/fluentd/plugin
  - github.com/sensorbee/nlp/plugin
  - github.com/sensorbee/tutorial/ml/plugin
  - github.com/sensorbee/jubatus/classifier/plugin

commands:
  run:
  runfile:
  shell:
  topology:
  hello:
    path: "github.com/sensorbee/tutorial/custom_command/hello"

Flags and Options¶

--config path or -c path

This option specifies the path to the configuration file to be used. Its default value is build.yaml. With this option, a configuration file in another directory can be used as follows:
$ build_sensorbee -c /path/to/dir/special_build.yaml

--download-plugins={true|false}

This option have to be true or false. When the value is true, build_sensorbee downloads (i.e. go get) all plugins listed in build.yaml. When it’s false, build_sensorbee doesn’t download plugins and tries to used plugins as installed in the environment. The default value is true.

Specifying false is useful when the custom sensorbee command needs to depend on a plugin that is in a special git branch or locally modified.

--help or -h

When this flag is given, the command shows the usage of itself and exits without doing anything.

--only-generate-source

When this flag is given, build_sensorbee doesn’t build a sensorbee command but only generate a source code that can be built by go build command. For example:
$ build_sensorbee --only-generate-source
sensorbee_main.go
$ go build -o sensorbee sensorbee_main.go

--out executable_name or -o executable_name

This option customizes the name of the output executable file. The default is sensorbee.exe in Windows and sensorbee in all other environment. The following command generates an executable named my_sensorbee instead of sensorbee:
$ build_sensorbee -o my_sensorbee

--source-filename filename

The filename of the Go source code file automatically generated by build_sensorbee can be specified by this option. The default value is sensorbee_main.go.
$ build_sensorbee --source-filename custom_main.go
By executing this command, custom_main.go is generated instead of sensorbee_main.go.

--version or -v

When this flag is given, the command prints the version of the build_sensorbee command.

sensorbee¶

sensorbee is the main command to manipulate SensorBee. sensorbee consists of a set of following subcommands:

sensorbee run
sensorbee runfile
sensorbee shell or bql
sensorbee topology

sensorbee command can needs to be created by build_sensorbee command and all the example commands are written as ./sensorbee to emphasize that there’s no default sensorbee command.

See each command’s reference for details.

Flags and Options¶

--help or -h

When this flag is given, the command prints the usage of itself.

--version or -v

The command prints the version of SensorBee.

sensorbee run¶

sensorbee run runs the SensorBee server that manages multiple topologies that can dynamically modified at runtime.

Basic Usage¶

$ ./sensorbee run -c sensorbee.yaml

Configuration¶

A configuration file can optionally be provided for sensorbee run command. The file is written in YAML and has following optional sections:

logging
network
storage
topologies

`logging`¶

The logging section customizes behavior of the logger. It has following optional parameters:

target

The target parameter changes the destination of log messages. Its value can be of followings:

stdout: write log messages to stdout

stderr: write log messages to stderr

file path: a path to the log file

When the value is neither stdout nor stderr, it’s considered to be a file path. The default value of this parameter is stderr.

min_log_level

This option specifies the minimum level (severity) of log messages to be written. Valid values are one of debug, info, warn, warning, error, or fatal. warn can also be warning. When debug is given, all levels of messages will be written into the log. When the value is error, only log messages with error or fatal level will be written. The default value of this parameter is info.

log_dropped_tuples

The SensorBee server can prints a log message and contents of tuples when they’re dropped from a topology. When this option is true, the server writes log messages reporting dropped tuples. When it’s false, the server doesn’t. The default value of this option is false.

log_destinationless_tuples

A destinationless tuple is one kind of dropped tuples that is caused when a source or a stream doesn’t have any destination and it drops a tuple. By setting true to this option, the server reports all destinationless tuples. The default value of this option is false. Note that, to log replication tuples, the log_dropped_tuples option also needs to be true.

summarize_dropped_tuples

This option turns on or off summarization of dropped tuple logging. Valid values for this option is true or false. When its value is the true, dropped tuples are summarized in log messages.

Note

At the current version, only blob fields are summarized to "(blob)". Other configuration parameters will be supported in the future version such as the maximum number of fields, the maximum depths of maps, the maximum length of arrays, and so on.

When false is specified, each log message shows a complete JSON that are compatible to the original tuple. Although this is useful for debugging, tuples containing large binary data like images may result in disk.

The default value of this option is false.

Example:

logging:
  target: /path/to/sensorbee.log
  min_log_level: info
  log_dropped_tuples: true
  summarize_dropped_tuples: true

`network`¶

The network section has parameters related to server’s network configuration. It has following optional parameters:

listen_on

This parameter controls how the server expose its listening port. The syntax of the value is like host:port. host can be IP addresses such as 0.0.0.0 or 127.0.0.1. When host is given, the server only listens on the interface with the given host address. If the host is omitted, the server listens on all available interfaces, that is, the server accepts connections from any host. The default value of this parameter is :15601.

Example:

network:
  listen_on: ":15601"

`storage`¶

The storage section contains the configuration of storages used for saving UDSs or other information. It has following optional subsections:

uds

`uds`¶

The uds subsection configures the storage for saving and loading UDSs. It provides following optional parameters:

type

The type name of the storage. in_memory is used as the default value.

params

params has subparameter specific to the given type.

Currently, following types are available:

in_memory
fs

Descriptions of types and parameters are provided below:

in_memory

in_memory saves UDSs in memory. It loses all saved data when the server restarts. This type doesn’t have any parameter.

Example:
storage:
  uds:
    type: in_memory

fs

fs saves UDSs in the local file system. It has following required parameters:

dir

dir has the path to the directory that saved data will be stored.

fs also has following optional parameters:

temp_dir

temp_dir has the path to the temporary directory that is used when the UDS writes data. After the UDS has written all the data, the file is move to the directory specified by dir parameter. The same value as dir is used by default.

The file name of each saved UDS is formatted as <topology>-<name>-<tag>.state.

Example:
storage:
  uds:
    type: fs
    params:
      dir: /path/to/uds_dir
      temp_dir: /tmp

`topologies`¶

The topologies section contains the configuration of topologies in the following format:

topologies:
  name_of_topology1:
    ... configuration for name_of_topology1 ...
  name_of_topology2:
    ... configuration for name_of_topology2 ...
  name_of_topology3:
    ... configuration for name_of_topology3 ...
  ... other topologies ...

Topologies listed in this section will be created at the startup of the server based on the sub-configuration of each topology. Following optional configuration parameters are provided for each topology:

bql_file

This parameter has the path to the file containing BQL statements for the topology. All statements are executed before the server gets ready. If the execution fails, the server would exit with an error.

Example:

$ ls
my_topology.bql
sensorbee.yaml
$ cat my_topology.bql
CREATE SOURCE fluentd TYPE fluentd;
CREATE STREAM users AS
    SELECT RSTREAM users FROM fluentd [RANGE 1 TUPLES];
CREATE SINK user_file TYPE file WITH path = "users.jsonl";
$ cat sensorbee.yaml
topologies:
  my_topology:
    bql_file: my_topology.bql
$ ./sensorbee run -c sensorbee.yaml

As a result of these commands above, the server started with sensorbee.yaml has a topology named my_topology. The topology has three nodes: fluentd, users, and user_file.

Note

This is the only way to persist the configuration of topologies at the moment. Any updates applied at runtime will not be reflected into the bql file. For example, if the server restarts after creating a new stream in my_topology, the new stream will be lost unless it’s explicitly added to my_topology.bql manually.

The configuration of a topology can be empty:

topologies:
  my_empty_topology:

In this case, an empty topology my_empty_topology will be created so that the sensorbee topology create command doesn’t have to be executed every time the server restarts.

A Complete Example¶

logging:
  target: /path/to/sensorbee.log
  min_log_level: info
  log_dropped_tuples: true
  summarize_dropped_tuples: true

network:
  listen_on: ":15601"

storage:
  uds:
    type: fs
    params:
      dir: /path/to/uds_dir
      temp_dir: /tmp

topologies:
  empty_topology:
  my_topology:
    bql_file: /path/to/my_topology.bql

Flags and Options¶

--config path or -c path

This option receives the path of the configuration file. By default, the value is empty and no configuration file is used. This value can also be passed through SENSORBEE_CONFIG environment variable.

--help or -h

When this flag is given, the command prints the usage of itself.

sensorbee runfile¶

sensorbee runfile runs a single BQL file. This command is mainly designed for offline data processing but can be used as a standalone SensorBee process that doesn’t expose any interface to manipulate the topology.

sensorbee runfile stops after all the nodes created by the given BQL file stops. The command doesn’t stop if it contains a source that generates infinite tuples or is rewindable. Other non-rewindable sources such as file stopping when it emits all tuples written in a file can work well with the command.

Sources generally need to be created with PAUSED keyword in the CREATE SOURCE statement. Without PAUSED, a source can start emitting tuples before all nodes in a topology can correctly be set up. Therefore, a BQL file passed to the command should look like:

CREATE PAUSED SOURCE source_1 TYPE ...;
CREATE PAUSED SOURCE source_2 TYPE ...;
...
CREATE PAUSED SOURCE source_n TYPE ...;

... CREATE STREAM, CREATE SINK, or other statements

RESUME SOURCE source_1;
RESUME SOURCE source_2;
...
RESUME SOURCE source_n;

With the --save-uds option described later, it saves UDSs at the end of its execution.

Basic Usage¶

$ ./sensorbee runfile my_topology.bql

With options:

$ ./sensorbee runfile -c sensorbee.yaml -s '' my_topology.bql

Configuration¶

sensorbee runfile accepts the configuration file for sensorbee run. It only uses logging and storage sections. The configuration file may contain other sections as well and the same file for sensorbee run can also be used for sensorbee runfile. See its configuration for details.

Flags and Options¶

--config path or -c path

This option receives the path of the configuration file. By default, the value is empty and no configuration file is used. This value can also be passed through SENSORBEE_CONFIG environment variable.

--help or -h

When this flag is given, the command prints the usage of itself.

--save-uds udss or -s udss

This option receives a list of names of UDSs separated by commas. UDSs listed in it will be saved at the end of execution. For example, when the option is -s "a,b,c", UDSs named a, b, and c will be saved. To save all UDSs in a topology, pass an empty string: -s "".

By default, all UDSs will not be saved at the end of execution.

--topology name or -t name

This option changes the name of the topology to be run with the given BQL file. The default name is taken from the file name of the BQL file. The name specified to this option will be used in log messages or saved UDS data. Especially, names of files containing saved UDS data has contains the name of the topology. Therefore, providing the same name as the topology that will be run by sensorbee run later on allows users to prepare UDSs including pre-trained machine learning models in advance.

sensorbee shell or bql¶

sensorbee shell or bql starts a new shell to manipulate the SensorBee server. The shell can be terminated by writing exit or typing C-d.

Both sensorbee shell and bql have the same interface, but bql is installed by default while the sensorbee command needs to be built manually to run sensorbee shell.

Basic Usage¶

To run sensorbee shell,

$ ./sensorbee shell -t my_topology
my_topology>

To run bql,

$ bql -t my_topology
my_topology>

Flags and options¶

--api-version version

This option changes the API version of the SensorBee server. The default value of this option is v1.

--help or -h

When this flag is given, the command prints the usage of itself.

--topology name or -t name

The name of a topology to be manipulated can be specified through this option so that USE topology_name doesn’t have to be used in the shell. The default value is an empty name, that is, no topology is specified.

--uri

This option is used when the SensorBee server is running at non-localhost or using non-default port number (15601). The value should have a format like http://host:port/. The default value of this option is http://localhost:15601/.

sensorbee topology¶

sensorbee topology, or sensorbee t, is used to manipulate topologies on the SensorBee server.

Note

This command is provided because the syntax of BQL statements that controls topologies has not been discussed enough yet.

The command consists of following subcommands:

sensorbee topology create <name> or sensorbee t c <name>

This command creates a new topology on the SensorBee server. The <name> argument is the name of the topology to be created. $? will be 0 if the command is successful. Otherwise, it’ll be non-zero. The command fails if the topology already exists on the server.

sensorbee topology drop <name> or sensorbee t drop <name>

This command drops an existing topology on the SensorBee server. The <name> argument is the name of the topology to be dropped. $? will be 0 if the command is successful. Otherwise, it’ll be non-zero. The command doesn’t fail even if the topology doesn’t exist on the server.

sensorbee topology list or sensorbee t l

This commands prints names of all topologies that the SensorBee server has, one name per line.

All commands share the same flags and options. Flags and options need to be given after the subcommand name:

$ ./sensorbee topology create --flag --option value my_topology

In this example, a flag --flag and an option --option value are provided. The argument of the command, i.e. the name of topology, is my_topology.

Flags and Options¶

--api-version version

This option changes the API version of the SensorBee server. The default value of this option is v1.

--help or -h

When this flag is given, the command prints the usage of itself.

--uri

This option is used when the SensorBee server is running at non-localhost or using non-default port number (15601). The value should have a format like http://host:port/. The default value of this option is http://localhost:15601/.

Function Reference¶

Common Mathematical Functions¶

For the functions below, if a given parameter is outside the mathematically valid range for that function (e.g., sqrt(-2), log(0), div(2.0, 0.0)) and the return type is float, then NaN is returned. However, if the return type is int (e.g., div(2, 0)), there is no NaN option and an error will occur instead.

`abs`¶

abs(x)

Description¶

abs computes the absolute value of a number.

Parameter Types¶

x: int or float

Return Type¶

same as input

Examples¶

Function Call	Result
`abs(-17.4)`	`17.4`

`cbrt`¶

cbrt(x)

Description¶

cbrt computes the cube root of a number.

Parameter Types¶

x: int or float

Return Type¶

float

Examples¶

Function Call	Result
`cbrt(27.0)`	`3.0`
`cbrt(-3)`	`-1.4422495703074083`

`ceil`¶

ceil(x)

Description¶

ceil computes the smallest integer not less than its argument.

Parameter Types¶

x: int or float

Return Type¶

same as input

The return type is float for float input in order to avoid problems with input values that are too large for the int data type.

Examples¶

Function Call	Result
`ceil(1.3)`	`2.0`
`ceil(-1.7)`	`-1.0`

`degrees`¶

degrees(x)

Description¶

degrees converts radians to degrees.

Parameter Types¶

x: int or float

Return Type¶

float

Examples¶

Function Call	Result
`degrees(3.141592653589793)`	`180.0`

`div`¶

div(y, x)

Description¶

div computes the integer quotient y/x of two numbers y and x. If x is 0.0 (float) then NaN will be returned; if it is 0 (integer) then a runtime error will occur.

Parameter Types¶

y: int or float
x: same as y

Return Type¶

same as input

Examples¶

Function Call	Result
`div(9, 4)`	`2`
`div(9.3, 4.5)`	`2.0`

`exp`¶

exp(x)

Description¶

exp computes the exponential of a number.

Parameter Types¶

x: int or float

Return Type¶

float

Examples¶

Function Call	Result
`exp(1.0)`	`2.718281828459045`

`floor`¶

floor(x)

Description¶

floor computes the largest integer not greater than its argument.

Parameter Types¶

x: int or float

Return Type¶

same as input

The return type is float for float input in order to avoid problems with input values that are too large for the int data type.

Examples¶

Function Call	Result
`floor(1.3)`	`1.0`
`floor(-1.7)`	`-2.0`

`ln`¶

ln(x)

Description¶

ln computes the natural logarithm of a number. If the parameter is not strictly positive, NaN is returned.

Parameter Types¶

x: int or float

Return Type¶

float

Examples¶

Function Call	Result
`ln(2)`	`0.6931471805599453`

`log`¶

log(x)
log(b, x)

Description¶

log computes the logarithm of a number x to base b (default: 10).

Parameter Types¶

x: int or float
b (optional): same as x

Return Type¶

float

Examples¶

Function Call	Result
`log(100)`	`2.0`
`log(2.5, 6.25)`	`2.0`
`log(2, 8)`	`3.0`

`mod`¶

mod(y, x)

Description¶

mod computes the remainder of integer division y/x of two numbers y and x. If x is 0.0 (float) then NaN will be returned; if it is 0 (integer) then a runtime error will occur.

Parameter Types¶

y: int or float
x: same as y

Return Type¶

same as input

Examples¶

Function Call	Result
`mod(9, 4)`	`1`
`mod(9.3, 4.5)`	`0.3`

`pi`¶

pi()

Description¶

pi returns the π constant (more or less 3.14).

Return Type¶

float

Examples¶

Function Call	Result
`pi()`	`3.141592653589793`

`power`¶

power(a, b)

Description¶

power computes a raised to the power of b.

Parameter Types¶

a: int or float
b: same as a

Return Type¶

float

The return type is float even for integer input in order to have a uniform behavior for cases such as power(2, -2).

Examples¶

Function Call	Result
`power(9.0, 3.0)`	`729.0`
`power(2, -1)`	`0.5`

`radians`¶

radians(x)

Description¶

radians converts degrees to radians.

Parameter Types¶

x: int or float

Return Type¶

float

Examples¶

Function Call	Result
`radians(180)`	`3.141592653589793`

`round`¶

round(x)

Description¶

round computes the nearest integer of a number.

Parameter Types¶

x: int or float

Return Type¶

same as input

The return type is float for float input in order to avoid problems with input values that are too large for the int data type.

Examples¶

Function Call	Result
`round(1.3)`	`1.0`
`round(0.5)`	`1.0`
`round(-1.7)`	`-2.0`

`sign`¶

sign(x)

Description¶

sign returns the sign of a number: 1 for positive numbers, -1 for negative numbers and 0 for zero.

Parameter Types¶

x: int or float

Return Type¶

int

Examples¶

Function Call	Result
`sign(2)`	`1`

`sqrt`¶

sqrt(x)

Description¶

sqrt computes the square root of a number. If the parameter is negative, NaN is returned.

Parameter Types¶

x: int or float

Return Type¶

float

Examples¶

Function Call	Result
`sqrt(2)`	`1.4142135623730951`

`trunc`¶

trunc(x)

Description¶

trunc computes the truncated integer (towards zero) of a number.

Parameter Types¶

x: int or float

Return Type¶

same as input

The return type is float for float input in order to avoid problems with input values that are too large for the int data type.

Examples¶

Function Call	Result
`trunc(1.3)`	`1.0`
`trunc(-1.7)`	`-1.0`

`width_bucket`¶

width_bucket(x, left, right, count)

Description¶

width_bucket computes the bucket to which x would be assigned in an equidepth histogram with count buckets in the range $[\text{left},\text{right}[$. Points on a bucket border belong to the right bucket. Points outside of the $[\text{left},\text{right}[$ range have bucket number $0$ and $\text{count}+1$, respectively.

Parameter Types¶

x: int or float
left: int or float
right: int or float
count: int

Return Type¶

int

Examples¶

Function Call	Result
`width_bucket(5, 0, 10, 5)`	`3`

Pseudo-Random Functions¶

The characteristics of the functions below are equal to those from the Go rand module. They are not suitable for cryptographic applications.

`random`¶

random()

Description¶

random returns a pseudo-random number in the range $0.0 <= x < 1.0$.

This function is not safe for use in cryptographic applications. See the Go math/rand package for details.

Return Type¶

float

Examples¶

Function Call	Result
`random()`	`0.6046602879796196`

`setseed`¶

setseed(x)

Description¶

setseed initializes the seed for subsequent random() calls. The parameter must be in the range $-1.0 <= x <= 1.0$.

This function is not safe for use in cryptographic applications. See the Go math/rand package for details.

Parameter Types¶

x: float

Trigonometric Functions¶

All trigonometric functions take arguments and return values of type float. Trigonometric functions arguments are expressed in radians. Inverse functions return values are expressed in radians.

`acos`¶

acos(x)

Description¶

acos computes the inverse cosine of a number.

`asin`¶

asin(x)

Description¶

asin computes the inverse sine of a number.

`atan`¶

atan(x)

Description¶

atan computes the inverse tangent of a number.

`cos`¶

cos(x)

Description¶

cos computes the cosine of a number.

`cot`¶

cot(x)

Description¶

cot computes the cotangent of a number.

`sin`¶

sin(x)

Description¶

sin computes the sine of a number.

`tan`¶

tan(x)

Description¶

tan computes the tangent of a number.

String Functions¶

`bit_length`¶

bit_length(s)

Description¶

bit_length computes the number of bits in a string s. Note that due to UTF-8 encoding, this is equal to octet_length(s) * 8, not necessarily char_length(s) * 8.

Parameter Types¶

s: string

Return Type¶

int

Examples¶

Function Call	Result
`bit_length("über")`	`40`

`btrim`¶

btrim(s)
btrim(s, chars)

Description¶

btrim removes the longest string consisting only of characters in chars (default: whitespace) from the start and end of s.

Parameter Types¶

s: string
chars (optional): string

Return Type¶

string

Examples¶

Function Call	Result
`btrim(" trim ")`	`"trim"`
`btrim("xyxtrimyyx", "xy")`	`"trim"`

`char_length`¶

char_length(s)

Description¶

char_length computes the number of characters in a string.

Parameter Types¶

s: string

Return Type¶

int

Examples¶

Function Call	Result
`char_length("über")`	`4`

`concat`¶

concat(s [, ...])

Description¶

concat concatenates all strings given as input arguments. NULL values are ignored, i.e., treated like an empty string.

Parameter Types¶

s and all subsequent parameters: string

Return Type¶

string

Examples¶

Function Call	Result
`concat("abc", NULL, "22")`	`"abc22"`

`concat_ws`¶

concat_ws(sep, s [, ...])

Description¶

concat_ws concatenates all strings given as input arguments s using the separator sep. NULL values are ignored.

Parameter Types¶

sep: string
s and all subsequent parameters: string

Return Type¶

string

Examples¶

Function Call	Result
`concat_ws(":", "abc", NULL, "22")`	`"abc:22"`

`format`¶

format(s, [x, ...])

Description¶

format formats a variable number of arguments x according to a format string s.

See the Go package fmt for details of what formatting codes are allowed.

Parameter Types¶

s: string
x and all subsequent parameters (optional): any

Return Type¶

string

Examples¶

Function Call	Result
`format("%s-%d", "abc", 22)`	`"abc-22"`

`lower`¶

lower(s)

Description¶

lower converts a string s to lower case. Non-ASCII Unicode characters are mapped to their lower case, too.

Parameter Types¶

s: string

Return Type¶

string

Examples¶

Function Call	Result
`lower("ÜBer")`	`"über"`

`ltrim`¶

ltrim(s)
ltrim(s, chars)

Description¶

ltrim removes the longest string consisting only of characters in chars (default: whitespace) from the start of s.

Parameter Types¶

s: string
chars (optional): string

Return Type¶

string

Examples¶

Function Call	Result
`ltrim(" trim ")`	`"trim "`
`ltrim("xyxtrimyyx", "xy")`	`"trimyyx"`

`md5`¶

md5(s)

Description¶

md5 computes the MD5 checksum of a string s and returns it in hexadecimal format.

Parameter Types¶

s: string

Return Type¶

string

Examples¶

Function Call	Result
`md5("abc")`	`"900150983cd24fb0d6963f7d28e17f72"`

`octet_length`¶

octet_length(s)

Description¶

octet_length computes the number of bytes in a string s. Note that due to UTF-8 encoding, this may differ from the number returned by char_length.

Parameter Types¶

s: string

Return Type¶

int

Examples¶

Function Call	Result
`octet_length("über")`	`5`

`overlay`¶

overlay(s, repl, from)
overlay(s, repl, from, for)

Description¶

overlay replaces for characters in a string s with the string repl, starting at from. (Index counting starts at 0.) If for is not given, the length of repl is used as a default.

Parameter Types¶

s: string
repl: string
from: int
for (optional): int

Return Type¶

string

Examples¶

Function Call	Result
`overlay("Txxxxas", "hom", 1)`	`"Thomxas"`
`overlay("Txxxxas", "hom", 1, 4)`	`"Thomas"`

`rtrim`¶

rtrim(s)
rtrim(s, chars)

Description¶

rtrim removes the longest string consisting only of characters in chars (default: whitespace) from the end of s.

Parameter Types¶

s: string
chars (optional): string

Return Type¶

string

Examples¶

Function Call	Result
`rtrim(" trim ")`	`" trim"`
`rtrim("xyxtrimyyx", "xy")`	`"xyxtrim"`

`sha1`¶

sha1(s)

Description¶

sha1 computes the SHA1 checksum of a string s and returns it in hexadecimal format.

Parameter Types¶

s: string

Return Type¶

string

Examples¶

Function Call	Result
`sha1("abc")`	`"a9993e364706816aba3e25717850c26c9cd0d89d"`

`sha256`¶

sha256(s)

Description¶

sha256 computes the SHA256 checksum of a string s and returns it in hexadecimal format.

Parameter Types¶

s: string

Return Type¶

string

Examples¶

Function Call	Result
`sha256("abc")`	`"ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad"`

`strpos`¶

strpos(s, t)

Description¶

strpos returns the index of the first occurrence of t in s (index counting starts at 0) or -1 if it is not found.

Parameter Types¶

s: string
t: string

Return Type¶

int

Examples¶

Function Call	Result
`strpos("high", "ig")`	`1`

`substring`¶

substring(s, r)
substring(s, from)
substring(s, from, for)

Description¶

substring(s, r) extracts the substring matching regular expression r from s. See the Go regexp package for details of matching.

substring(s, from, for) returns the for characters of str starting from the from index. (Index counting starts at 0.) If for is not given, everything until the end of str is returned.

Which of those behaviors is used depends on the type of the second parameter (int or string).

Parameter Types¶

s: string
r: string
from: int
for (optional): int

Return Type¶

string

Examples¶

Function Call	Result
`substring("Thomas", "...$")`	`"mas"`
`substring("Thomas", 1)`	`"homas"`
`substring("Thomas", 1, 3)`	`"hom"`

`upper`¶

upper(s)

Description¶

upper converts a string s to upper case. Non-ASCII Unicode characters are mapped to their upper case, too.

Parameter Types¶

s: string

Return Type¶

string

Examples¶

Function Call	Result
`upper("ÜBer")`	`"ÜBER"`

Time Functions¶

`distance_us`¶

distance_us(u, v)

Description¶

distance_us computes the signed temporal distance from u to v in microseconds.

Parameter Types¶

u: timestamp
v: timestamp

Return Type¶

int

Examples¶

Function Call	Result
`distance_us("2016-02-09T05:40:25.123Z"::timestamp, "2016-02-09T05:41:25.456Z"::timestamp)`	`60333000`
`distance_us(clock_timestamp(), clock_timestamp())`	`2`

`clock_timestamp`¶

clock_timestamp()

Description¶

clock_timestamp returns the current date and time in UTC.

Return Type¶

timestamp

`now`¶

now()

Description¶

now returns the date and time in UTC of the point in time when processing of the current tuple started. In particular and as opposed to clock_timestamp, the timestamp returned by now() does not change during a processing run triggered by the arrival of a tuple. For example, in

SELECT RSTREAM clock_timestamp() AS a, clock_timestamp() AS b,
    now() AS c, now() AS d FROM ...

the values of a and b are most probably different by a very short timespan, but c and d are equal by definition of now().

now cannot be used in an EVAL statement outside of a stream processing context.

Return Type¶

timestamp

Array Functions¶

`array_length`¶

array_length(a)

Description¶

array_length computes the number of elements in an array a. Elements with a NULL value are also counted.

Parameter Types¶

a: array

Return Type¶

int

Examples¶

Function Call	Result
`array_length([3, NULL, "foo"])`	`3`

Other Scalar Functions¶

`coalesce`¶

coalesce(x [, ...])

Description¶

coalesce returns the first non-null input parameter or NULL if there is no such parameter.

Parameter Types¶

x and all subsequent: any

Return Type¶

same as input

Examples¶

Function Call	Result
`coalesce(NULL, 17, "foo")`	`17`

Aggregate Functions¶

Aggregate functions compute a single result from a set of input values. It should be noted that except for count, these functions return a NULL value when no rows are selected. In particular, sum of no rows returns NULL, not zero as one might expect, and array_agg returns NULL rather than an empty array when there are no input rows. The coalesce function can be used to substitute zero or an empty array for NULL when necessary.

Also note that most aggregate functions ignore singular NULL values in their input, i.e., processing is done as if this row had not been in the input. (One notable exception is the array_agg function that includes input NULL values in its output.)

`array_agg`¶

array_agg(x)

Description¶

array_agg returns an array containing all input values, including NULL values. There is no guarantee on the order of items in the result. Use the ORDER BY clause to achieve a certain ordering.

Parameter Types¶

x: any

Return Type¶

array

`avg`¶

avg(x)

Description¶

avg computes the average (arithmetic mean) of all input values.

Parameter Types¶

x: int or float (mixed types are allowed)

Return Type¶

float

`bool_and`¶

bool_and(x)

Description¶

bool_and returns true if all input values are true, otherwise false.

Parameter Types¶

x: bool

Return Type¶

bool

`bool_or`¶

bool_or(x)

Description¶

bool_or returns true if at least one input value is true, otherwise false.

Parameter Types¶

x: bool

Return Type¶

bool

`count`¶

count(x)
count(*)

Description¶

count returns the number of input rows for which x is not NULL, or the number of total rows if * is passed.

Parameter Types¶

x: any

Return Type¶

int

`json_object_agg`¶

json_object_agg(k, v)

Description¶

json_object_agg aggregates pairs of key k and value v as a map. If both key and value are NULL, the pair is ignored. If only the value is NULL, it is still added with the corresponding key. It is an error if only the key is NULL. It is an error if a key appears multiple times.

A map does not have an ordering, therefore there is no guarantee on the result map ordering, whether or not ORDER BY is used.

Parameter Types¶

k: string
v: any

Return Type¶

map

`max`¶

max(x)

Description¶

max computes the maximum value of all input values.

Parameter Types¶

x: int or float (mixed types are allowed)

Return Type¶

same as largest input value

`median`¶

median(x)

Description¶

median computes the median of all input values.

Parameter Types¶

x: int or float (mixed types are allowed)

Return Type¶

float

`min`¶

min(x)

Description¶

min computes the minimum value of all input values.

Parameter Types¶

x: int or float (mixed types are allowed)

Return Type¶

same as smallest input value

`string_agg`¶

string_agg(x, sep)

Description¶

string_agg returns a string with all values of x concatenated, separated by the (non-aggregate) sep parameter.

Parameter Types¶

x: string
sep: string (scalar)

Return Type¶

string

`sum`¶

sum(x)

Description¶

sum computes the sum of all input values.

Parameter Types¶

x: int or float (mixed types are allowed)

Return Type¶

float if the input contains a float, int otherwise