Apache Software Foundation: all about ASF and its Big Data projects

The Apache Software Foundation oversees the development of Big Data software such as Hadoop, Hive or Cassandra. Find out everything you need to know about it.

The history of the Apache Foundation begins in 1999, in Forest Hill, Maryland, USA. It was on this date and in this place that the foundation was founded.

Apache Software Foundation: what is it?

The Apache Software Foundation (ASF) is a non-profit organization that oversees software development. Its name is a tribute “to the various native American nations collectively referred to as the Apache, well known for their strategic military skills and endurance.

To become a member, it is necessary to contribute to the foundation’s Open Source projects. Individual candidates must be nominated and approved by a majority of members. Corporations, however, are not permitted to become members.

What are the different Apache projects?

apache projects

In total, the Apache Foundation oversees more than 100 top-level projects . Many of them are sub-projects. Here are some of his most renowned top-level projects.

The Apache HTTP Server is probably the foundation’s best-known project. It is a free open-source web server that runs on most operating systems such as Microsoft Windows, macOS, UNIX and Netware.

According to a survey conducted by Netcraft, 60% of all websites on the Internet use Apache Server. If derivatives are included, this percentage rises to 62%. Thus, Apache is used more than all other web servers combined.

Another example is the open-source Apache Cassandra distributed database system. Initially created by Facebook, this distributed database is designed to store and manage large amounts of data distributed across different servers. It consists of the most used NoSQL database for the storage and processing of large volumes of data.

Similarly, Apache has taken under its wing the CloudStack project. It is a Cloud management platform for delivering Infrastructure as a Service (IaaS) in Cloud Computing environments.

Apache also oversees the development of Hadoop, Big Data’s main platform. It is a free Java-based programming frameworkA new version of the software, capable of supporting the processing of large datasets in a distributed environment.

Another well-known project in the field of Big Data is none other than that Apache Hive. It is an open-source Data Warehouse system for querying and analyzing large datasets stored in Hadoop files.

Apache Spark is a fast engine for large-scale data processing. It offers high-level APIs in Java, Scala and Python. There is a large set of libraries for stream processing, Machine Learning or graphics analysis.

Let’s also mention Apache Kafka: an open-source software platform for stream processing, initially developed by Linkedin and given to Apache. It is written in Scala and Java, and aims to provide a unified low-latency platform to support real-time data flows.

In total, Apache is developing more than 30 open-source projects related to Big Data or more generally to data. Other Big Data projects include Airvata, Apex, Beam, BookKeeper, CouchDB, Crunch, DataFu, Drill, Falcon, Flume, Helix, Ignite, Knox, MetaModel, Phoenix, Storm, Tajo and Zeppelin.

Be the first to comment

Leave a Reply

Your email address will not be published.