Apache Falcon-Data Governance for Hadoop

Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie. InMobi is one of ...

How-to Implement Role-based Security in Impala using Apache Sentry

Apache Sentry (incubating) is the Apache Hadoop ecosystem tool for role-based access control (RBAC). In this how-to, I will demonstrate how to implement Sentry for RBAC in Impala. I feel this introduction is best motivated by a use case. Data warehouse ...

Hadoop and NoSQL Now Data Warehouse-Worthy-Gartner

Not long ago, the rules for what constituted a data warehouse were fairly well defined. The schema was fixed, you could say, and was based primarily on relational database technology designed to process structured data. My, how times have changed. ...

Avoiding Split Brainedness in HA Hadoop Clusters

The US Patent Office recently granted Zettaset a patent for the underlying technology in its Hadoop high availability that prevents a "split-brain" situation where multiple master nodes think they're in control of the Hadoop cluster. It's a feather in the ...

Big Workflow-The Future of Big Data Computing

How can organizations embrace — instead of brace for — the rapidly intensifying collision of public and private clouds, HPC environments and Big Data? The current go-to solution for many organizations is to run these technology assets in siloed, specialized ...
1 22 23 24 25 26 28