Cloudera Enterprise 6.3.x | Other versions

Flume Kudu Sink

Flume Kudu sink is a Flume sink that reads events from a channel and writes them to a Kudu table. If Kudu is installed on a node where the Flume agent runs, the Flume start script discovers it and puts the Kudu sink on the classpath of Flume, so it can be used without any additional environment configuration.

Kudu sink can be used as the following type in the Flume configuration: org.apache.kudu.flume.sink.KuduSink

For more information on the Flume Kudu sink, including all configuration parameters, see the Apache Kudu documentation.

  Important: Flume does not create Kudu tables. To write to a Kudu table using the Flume Kudu sink, create the table in Kudu in advance.
Example:
a1.sinks.k1.type = org.apache.kudu.flume.sink.KuduSink
a1.sinks.k1.masterAddresses = kudu.master.address.example.com
a1.sinks.k1.tableName = mytable
a1.sinks.k1.producer = org.apache.kudu.flume.sink.SimpleKuduOperationsProducer
a1.sinks.k1.batchSize = 1000
a1.sinks.k1.kerberosPrincipal = myflumeprincipal
a1.sinks.k1.kerberosKeytab = myflume.keytab
  Note: Cloudera Manager provides two Flume substitution variables called $KERBEROS_PRINCIPAL and $KERBEROS_KEYTAB to configure the principal name and the keytab file path respectively on each host.
The Kudu sink has the following four types of producers available by default, which control how Flume events are written into the Kudu table:
  • org.apache.kudu.flume.sink.AvroKuduOperationsProducer

    This is an Avro serializer that generates one operation per event by deserializing the event body as an Avro record and mapping its fields to columns in a Kudu table.

    Example:
    a1.sinks.k1.producer = org.apache.kudu.flume.sink.SimpleKuduOperationsProducer
    a1.sinks.k1.producer.operation = upsert
    a1.sinks.k1.producer.schemaPath = /tmp/myschema.json
    

    For more information, see: Apache Kudu documentation.

  • borg.apache.kudu.flume.sink.RegexpKuduOperationsProducer

    This is an operations producer that generates one or more Kudu Insert or Upsert operation per Flume Event by parsing the event body as text using a custom regular expression. Values are coerced to the types of the named columns in the Kudu table using Java named-capturing groups in the regular expression.

    Example:
    a1.sinks.k1.producer = org.apache.kudu.flume.sink.RegexpKuduOperationsProducer
    a1.sinks.k1.producer.pattern =  (?<id>\\d+),(?<value>\\w+)
    a1.sinks.k1.producer.operation = upsert
    a1.sinks.k1.producer.unmatchedRowPolicy = IGNORE
    

    For more information, see: Apache Kudu documentation.

  • org.apache.kudu.flume.sink.SimpleKeyedKuduOperationsProducer

    This is a simple serializer that generates one Insert or Upsert per Event by writing the event body into a BINARY column. The pair (key column name, key column value) must be a header in the Event. The column name is configurable, but the column type must be a STRING. Multiple key columns are not supported.

    Example:
    a1.sinks.k1.producer = org.apache.kudu.flume.sink.SimpleKeyedKuduOperationsProducer
    a1.sinks.k1.producer.operation = upsert
    a1.sinks.k1.producer.keyColumn = id
    a1.sinks.k1.producer.payloadColumn = value
    

    For more information, see: Apache Kudu documentation.

  • org.apache.kudu.flume.sink.SimpleKuduOperationsProducer

    This is a simple serializer that generates one Insert per Event by writing the event body into a BINARY column. The headers are discarded.

    Example:
    a1.sinks.k1.producer = org.apache.kudu.flume.sink.SimpleKuduOperationsProducer
    a1.sinks.k1.producer.operation = upsert
    a1.sinks.k1.producer.payloadColumn = value
    

    For more information, see: Apache Kudu documentation.

Page generated August 29, 2019.