Data has always been one of the biggest problems in general computing. Especially so in recent years with the explosion of data, and the challenge of deducing useful information out of billions and billions of bytes of data. I remember a time not long ago when handling (not even processing) gigabytes of data was a big deal. Today, it is not unheard of to process 9 TB of data per day, and in near real-time.
Enough nostalgia. Lets talk about today. Today, I attended an event at Bangalore Alpha Lab. Two topics about data handling, processing and the world today. First topic – “MySQL cluster setup” was more towards implementation for high availability and high performing database systems, delivered by Subramanyam Kasibhat. The second topic was about the new BI landscape, big data and cloud, delivered by Sunil Sabat. At roughly an hour each, the sessions covered the basics of their topics, even with some examples and hands-on in a very interesting and informative way. This was combined with insightful discussion about various aspects of the concepts, and technologies used.
The first topic covered was MySQL replication setup to effectively emulate a cluster-type setup. The topic revolved around the addition of GTID’s (global transaction identifiers) to MySQL 5.6. The demo was on Amazon Web Services Cloud, but not detailed enough due to lack of time. The presentation and slides were detailed enough to get across the concepts and understand various topologies and a simple replication implementation example, with failover switch. The AWS instances and keys were offered to anyone willing to take a closer look at the setup. Subramanyam kindly offered to keep the instances running for couple of days and also offered to share AMI’s to anyone interested.
The second topic of the day was Big Data, specifically, how the aspects of new BI landscape require processing and implementation concepts of data processing frameworks such as the Hadoop stack. With examples and numbers, Sunil covered the current BI state and its aspects and what the future requires of Business Intelligence. He explained the concepts of storage, processing (with MapReduce, in this case) and presentation (with various visualization tools). He also mentioned various Hadoop providers catering to different requirements.
Discussions ensued throughout the presentations whenever a debatable topic came up. Most interesting, at least to me, was the discussion on what qualifies as ‘big data’ and synchronous and asynchronous nature of database replication. I wish I could write more about these, but this is new territory to me, and I don’t have much to say or verify. That should change soon. Anyway, on to the photos.