🌀 Apache Flume Programming Overview

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating, and moving large amounts of log data (especially from web servers) to a centralized data store like HDFS or HBase.

Flume is mainly used in Big Data ecosystems for ingesting streaming log data into Hadoop.

✅ Core Use Case

⚙️ Flume Architecture Components

ComponentDescription
SourceReceives data from external sources (e.g., syslogs, files, etc.)
ChannelActs as a buffer (e.g., memory or file-based)
SinkSends data to a final destination (e.g., HDFS, HBase)

➡ Flow: Source → Channel → Sink

📋 Example: Flume Agent Configuration

Create a config file like flume.conf:

agent1.sources = src1
agent1.channels = ch1
agent1.sinks = sink1

# Source: tail log file
agent1.sources.src1.type = exec
agent1.sources.src1.command = tail -F /var/log/syslog

# Channel: memory buffer
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 1000
agent1.channels.ch1.transactionCapacity = 100

# Sink: HDFS
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = hdfs://localhost:9000/logs/
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.writeFormat = Text

# Bind source/channel/sink
agent1.sources.src1.channels = ch1
agent1.sinks.sink1.channel = ch1
  

▶️ Running Flume

Use the following command:

flume-ng agent --conf ./conf/ --conf-file flume.conf --name agent1 -Dflume.root.logger=INFO,console
  

🧠 Programming with Flume

Flume itself is configuration-driven, but you can extend or customize it using Java:

Custom Source / Sink in Java

public class MyCustomSource extends AbstractSource implements EventDrivenSource {
    @Override
    public void start() {
        // logic to collect data
    }

    @Override
    public void stop() {
        // cleanup code
    }
}
  

You register it in your Flume config:

agent1.sources.src1.type = com.mycompany.MyCustomSource

🔌 Built-in Sources, Channels, and Sinks

SourceChannelSink
execmemoryhdfs
netcatfilelogger
avrojdbchbase
spooling dirfile roll

📦 Installation Steps

  1. Install Java 8+
  2. Download Flume:
    wget https://downloads.apache.org/flume/1.11.0/apache-flume-1.11.0-bin.tar.gz
  3. Extract and configure:
    tar -xvzf apache-flume-1.11.0-bin.tar.gz
    cd apache-flume-1.11.0-bin
          

🧭 Learning Roadmap for Flume

🟢 Beginner

🟡 Intermediate

🔴 Advanced