{"id":79565,"date":"2020-08-20T14:27:42","date_gmt":"2020-08-20T08:57:42","guid":{"rendered":"https:\/\/techvidvan.com\/tutorials\/?p=79565"},"modified":"2020-08-20T14:27:42","modified_gmt":"2020-08-20T08:57:42","slug":"apache-flume-tutorial","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/","title":{"rendered":"Apache Flume Tutorial for Beginners"},"content":{"rendered":"<p>Looking for a complete guide for Apache Flume? This is the right place for you. This Apache Flume tutorial article will provide you the complete guide for Apache Flume.<\/p>\n<p>In this article, you will learn what Apache Flume is, why we use it, and many more.Apache Flume is a tool used to transfer data from different sources to the Hadoop Distributed Files System. The article will cover all the basics concepts related to Flume.<\/p>\n<p>&nbsp;<\/p>\n<h3>What is Apache Flume?<\/h3>\n<p>Flume is an open-source, distributed, and reliable system designed for collecting, aggregating, and transferring huge volumes of log data from various different sources to the centralized repository. The centralized repository can be HDFS, HBase, etc.<\/p>\n<p>If we want to transfer social-media-generated data, email messages, log data to Hadoop, then we use Apache Flume. It is the top-level project at Apache Software Foundation.<\/p>\n<p>The main purpose of designing Apache Flume is to copy the streaming data (log data) from different web servers to HDFS.<\/p>\n<h3>Why Apache Flume?<\/h3>\n<p>Since we all know that millions of services of a company are running on multiple servers. These servers produce lots of logs. With the advent of Big Data technology, that is, Apache Hadoop, companies want to analyze these logs to generate insights.<\/p>\n<p>Businesses want to analyze these log data to understand their customer behavior.<\/p>\n<p>So for processing logs, they require a scalable and reliable distributed data collection service. This service must be capable of transferring logs from their web servers to the system, which can store and process these logs (such as HDFS).<\/p>\n<p>Here Apache Flume came into the picture. It is an open-source distributed data collection service that we can use for data transferring from source to destination.<\/p>\n<p>Flume is a highly available service used for collecting, aggregating, and transporting massive amounts of logs into the HDFS. It has tunable reliability mechanisms for fail-over and recovery.<\/p>\n<h3>Features of Apache Flume<\/h3>\n<p>1. It is an open-source framework.<\/p>\n<p>2. It is a highly available, robust, and fault-tolerant service.<\/p>\n<p>3. Apache Flume has tunable reliability mechanisms for fail-over and recovery.<\/p>\n<p>4. It provides support for the complex data flows such as fan-in flows, multi-hop flows, fan-out flows. It also provides support for Contextual routing as well as backup routes.<\/p>\n<p>5. Flume is horizontally scalable.<\/p>\n<p>6. Flume supports large sets of channels, sources, and sinks.<\/p>\n<p>7. We can use Apache Flume for efficiently ingesting log data from other servers into the centralized data store.<\/p>\n<p>8. Apache Flume allows us to collect data from web servers in real-time as well as in the batch mode.<\/p>\n<p>9. We can easily move social networking sites generated data and e-commerce site data into the Hadoop Distributed File System through Flume.<\/p>\n<p>10. Flume offers steady data flow between the read and the write operations.<\/p>\n<p>11. It offers a high throughput and lower latency.<\/p>\n<p>12. Flume is an in-expensive distributed system.<\/p>\n<h3>Apache Flume Architecture<\/h3>\n<p><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2020\/08\/apache-flume-architeture-TV.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-79584\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2020\/08\/apache-flume-architeture-TV.jpg\" alt=\"apache flume architeture \" width=\"896\" height=\"404\" \/><\/a><\/p>\n<p>The architecture of Apache Flume is very simple and flexible. The above image shows the Apache Flume architecture.<\/p>\n<p>Data generators generate massive amounts of data. This data is collected by the individual agents (Flume agents) running on them. Facebook, e-commerce sites, Twitter, etc are the data generators.<\/p>\n<p>The data collector will then collect the data from the Flume agents, and aggregate them, and then push them into the centralized store, which can be HBase or HDFS.<\/p>\n<h4>Flume Event<\/h4>\n<p>It is a unit of data that is to be transferred from source to destination.<\/p>\n<h4>Flume Agent<\/h4>\n<p>It is an independent JVM process (JVM) in Flume. Flume agents receive events from the clients or the other Flume agents. It passes these events to another flume agent or to the centralized store.<br \/>\nFlume Agent basically contains three main components. Let&#8217;s explore each of them in detail.<\/p>\n<h4>Source<\/h4>\n<p>It is a component of a flume agent that receives the data from data generators. Source transfers the data received from data generators to one or more flume channels in the form of events.<br \/>\nThere are several types of sources supported by Flume.<br \/>\nExample \u2212 Thrift source, Exec source, Avro source, twitter 1% source, etc.<\/p>\n<h4>Channel<\/h4>\n<p>It is a component of a flume agent that receives events from source and buffers them until the flume sinks consume them.<br \/>\nThere are several types of channels supported by Flume.<br \/>\nExample \u2212 JDBC channel, Memory channel, File system channel, etc.<\/p>\n<h4>Sink<\/h4>\n<p>It is a component of a flume agent that consumes the data from the flume channel and stores them into the next destination, which can be a centralized store or the other flume agents.<br \/>\nExample \u2212 HDFS sink.<\/p>\n<h3>Additional Components of Flume Agent<\/h3>\n<p>There are some more components that play a vital role in the events transfer.<\/p>\n<p><strong>Interceptors<\/strong><br \/>\nInterceptors inspect or alter the flume events transferred between source and channel.<\/p>\n<p><strong>Channel Selectors<\/strong><br \/>\nChannel Selectors determine the channel which is to be chosen for data transfer when multiple channels exist. They are of 2 types- Default and multiplexing.<\/p>\n<p><strong>Sink Processors<\/strong><br \/>\nThey invoke the particular sink from the sink group.<\/p>\n<h3>Apache Flume &#8211; Data Flow<\/h3>\n<p>Flume provides support for the complex data flow. The three types of data flow in Flume are:<\/p>\n<p><strong>1. Multi-hop Flow<\/strong><br \/>\nIn multi-hop flow, before reaching the final destination, the event goes through two or more flume agents.<\/p>\n<p><strong>2. Fan-out Flow<\/strong><br \/>\nIn fan-out flow, an event will flow from one source to multiple channels. It is of two types \u2212 replicating and multiplexing.<\/p>\n<p><strong>3. Fan-in Flow<\/strong><br \/>\nIn fan-in flow, an event is transferred from many sources to one channel.<\/p>\n<h3>Flume Advantages<\/h3>\n<ul>\n<li>Flume permits us to store streaming data into centralized repositories (HBase, HDFS).<\/li>\n<li>It offers steady data flow during read\/write between producer and consumer.<\/li>\n<li>Flume supports contextual routing.<\/li>\n<li>It guarantees reliable message delivery.<\/li>\n<li>Flume is open source, reliable, fault-tolerant, scalable, extensible, customizable, and manageable.<\/li>\n<\/ul>\n<h3>Flume Disadvantages<\/h3>\n<ul>\n<li>It provides weaker ordering guarantees.<\/li>\n<li>Flume doesn\u2019t guarantee about the uniqueness of the messages, that is, the messages reaching are 100% unique.<\/li>\n<li>Flume has a complex topology. Reconfiguration is challenging.<\/li>\n<li>Sometimes Apache Flume suffers from scalability and reliability issues.<\/li>\n<\/ul>\n<h3>Apache Flume Applications<\/h3>\n<ul>\n<li>E-commerce companies use flume for analyzing customer behavior from the particular region.<\/li>\n<li>Flume is useful for dumping large datasets produced by application servers into HDFS at a higher speed.<\/li>\n<li>Flume is useful for detecting frauds.<\/li>\n<li>It is useful in IoT applications.<\/li>\n<li>We can use it for collecting and aggregating machine and sensor-generated data.<\/li>\n<li>We can use Flume in the alerting or SIEM.<\/li>\n<\/ul>\n<h3>Summary<\/h3>\n<p>In short, Apache Flume is an open-source system for collecting and moving data from multiple servers to HDFS or HBase. It can transfer data in real-time as well as in batch mode. Flume is highly robust, scalable, and fault-tolerant.<\/p>\n<p>It supports complex data flow as well as contextual routing. The flume agent consists of a flume source, channel, and sink. Larger set of channels, sources, and sink are supported by Apache Flume.<\/p>\n<p>E-commerce companies use Flume to move their server data to HDFS and then process these data to understand customer behavior.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Looking for a complete guide for Apache Flume? This is the right place for you. This Apache Flume tutorial article will provide you the complete guide for Apache Flume. In this article, you will&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":79583,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[544],"tags":[3081,3082,3083,3084,3085,3086],"class_list":["post-79565","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hadoop","tag-apache-flume-tutorial","tag-features-of-apache-flume","tag-flume-advantages-and-limitations","tag-flume-applications","tag-flume-architecture","tag-flume-tutorial"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache Flume Tutorial for Beginners - TechVidvan<\/title>\n<meta name=\"description\" content=\"Learn what is Flume, Why Apache Flume, Features of Flume, Flume Architecture, Flume Data Flow, Advantages &amp; Disadvantages of Flume, Applications of Flume\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Flume Tutorial for Beginners - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"Learn what is Flume, Why Apache Flume, Features of Flume, Flume Architecture, Flume Data Flow, Advantages &amp; Disadvantages of Flume, Applications of Flume\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2020-08-20T08:57:42+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2020\/08\/Apache-Flume-Tutorial-TV.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Flume Tutorial for Beginners - TechVidvan","description":"Learn what is Flume, Why Apache Flume, Features of Flume, Flume Architecture, Flume Data Flow, Advantages & Disadvantages of Flume, Applications of Flume","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/","og_locale":"en_US","og_type":"article","og_title":"Apache Flume Tutorial for Beginners - TechVidvan","og_description":"Learn what is Flume, Why Apache Flume, Features of Flume, Flume Architecture, Flume Data Flow, Advantages & Disadvantages of Flume, Applications of Flume","og_url":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2020-08-20T08:57:42+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2020\/08\/Apache-Flume-Tutorial-TV.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"Apache Flume Tutorial for Beginners","datePublished":"2020-08-20T08:57:42+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/"},"wordCount":1144,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2020\/08\/Apache-Flume-Tutorial-TV.jpg","keywords":["Apache Flume Tutorial","Features of Apache Flume","Flume advantages and limitations","Flume applications","Flume Architecture","flume tutorial"],"articleSection":["Hadoop Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/","url":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/","name":"Apache Flume Tutorial for Beginners - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2020\/08\/Apache-Flume-Tutorial-TV.jpg","datePublished":"2020-08-20T08:57:42+00:00","description":"Learn what is Flume, Why Apache Flume, Features of Flume, Flume Architecture, Flume Data Flow, Advantages & Disadvantages of Flume, Applications of Flume","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2020\/08\/Apache-Flume-Tutorial-TV.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2020\/08\/Apache-Flume-Tutorial-TV.jpg","width":1200,"height":628,"caption":"Apache Flume Tutorial"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/apache-flume-tutorial\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"Apache Flume Tutorial for Beginners"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/79565","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=79565"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/79565\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/79583"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=79565"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=79565"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=79565"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}