Big Data

Data is at the core of modernisation efforts going on today. Data is growing at a rapid pace and modern technologies are evolving to address this growth. Follow along to read more on Cloud Native Big Data developments.

Aug

Apache Druid On Kubernetes - Part 1

Introduction Let me start off with a quick personal introduction and then move to the topic at hand. I have

Aug 22, 2022

3 min read

Aug

Introduction to The World of Data - (OLTP, OLAP, Data Warehouses, Data Lakes and more)

[This article was originally published here] In this article, I hope to paint a picture of the modern data world

Aug 2, 2022

10 min read

Aug

Hands-On Introduction to Apache Iceberg - Data Lakehouse Engineering

[This Article was Originally published here] As a Developer Advocate for Dremio I spend a lot of time doing research

Aug 2, 2022

4 min read

Jul

Streaming data on object storage: Thoughts

Object stores are the gold standard for cloud native data persistence. So, it is natural to want to store streaming

Jul 16, 2022

3 min read

Feb

SQL Query on MinIO

Full fledged analytical applications, AI, ML workloads, dashboards - need a high performance query engine, that understands standard SQL parlance.

Feb 26, 2022

3 min read

Jan

SQL Query on Parquet Files with DataFusion

Rust big data ecosystem is all set for bigtime - with Arrow and surrounding ecosystem (DataFusion, Ballista) leading the pack.

Jan 16, 2022

3 min read

Dec

Big Data ecosystem turning to Rust: an overview

Java is synonymous with last generation of Big Data tools and technologies. But a lot has changed since 2000s. Latest

Dec 28, 2021

3 min read

Dec

The Curious Case of Small Files

Background Most of the files, by the virtue of their average size and usage patterns are clearly cut out for

Dec 1, 2020

4 min read

Sep

Streaming Data Tools & Techniques

Introduction Streaming data is exactly what it sounds like, a continuously flowing stream of data generated by one or multiple

Sep 1, 2020

6 min read

Aug

Deploy Spark on Kubernetes

Introduction Yarn has been the default orchestration platform for tools from Hadoop ecosystem. This has started changing in recent times.

Aug 1, 2020

6 min read