- 9 Sections
- 56 Lessons
- 90 Days
Expand all sectionsCollapse all sections
- Course Resources & Tools1
- Module 1: Introduction to Linux for Data Professionals11
- 2.1Lecture 1: Why Linux Dominates Data Infrastructure?
- 2.2Lecture 2: Filesystem, Permissions, and Processes Overview
- 2.3Lecture 3: Understanding systemctl Commands for Hadoop and Spark
- 2.4Lecture 4: Navigating Large Datasets in the CLI
- 2.5Lecture 5: Working with CSV, JSON, and Log Files
- 2.6Video: Introduction to Linux for Data Professionals
- 2.7Generate Your 1 GB Practice Dataset
- 2.8Hands-On activity: Analyze a 2 GB CSV File Using Linux Command-Line Tools3 Days
- 2.9Assignment: Shell ETL – Filter & Aggregate Sales Data (No Python/R)3 Days
- 2.10Linux Command Reference for Data Ops
- 2.11Quiz: Introduction to Linux20 Questions
- Module 2: Environment Setup & Cluster Configuration11
- 3.1Lecture 1: Installing Linux on VM, WSL, or Cloud (Ubuntu Server 22.04)
- 3.2Lecture 2: User Management, SSH Key Setup, and Inter-Node Communication
- 3.3Bonus: Difference Between useradd and adduser in Linux
- 3.4Lecture 3: Basics of Networking, /etc/hosts, and Passwordless SSH
- 3.5Lecture 4: Introduction to systemd Services and Daemons for Distributed Components
- 3.6Video: Environment Setup & Cluster Configuration
- 3.73-Node Cluster Configuration Guide
- 3.8Sample /etc/hosts File
- 3.9Hands-On Activity: Create a 3-Node Linux Cluster
- 3.10Assignment: Configure Passwordless SSH and Verify Node Connectivity3 Days
- 3.11Quiz: Environment Setup & Cluster Configuration20 Questions
- Module 3: Hadoop & HDFS on Linux10
- 4.1Lecture 1: Understanding Hadoop Architecture
- 4.2Lecture 2: Installing Java and Hadoop on Linux
- 4.3Lecture 3: Starting and Testing the HDFS Cluster
- 4.4Lecture 4: Running Your First MapReduce Job on Linux
- 4.5Video: Hadoop & HDFS on Linux
- 4.6Hadoop Deployment Automation Resources
- 4.7Hands-on Activity: Deploy a Single-Node to 3-Node Hadoop Cluster
- 4.8Hands-on Activity: Upload and Process a CSV in HDFS
- 4.9Assignment: Bash Automation — Deploy Hadoop on Multiple Nodes3 Days
- 4.10Quiz: Hadoop and HDFS on Linux20 Questions
- Module 4: Spark on Linux10
- 5.1Lecture 1: Apache Spark Overview — Executors, Drivers, and YARN
- 5.2Lecture 2: Installing and Running Spark Standalone on Linux
- 5.3Lecture 3: Submitting Jobs via CLI and Python Scripts
- 5.4Lecture 4: Integrating Spark with HDFS
- 5.5Video: Spark on Linux
- 5.6Spark Setup and ETL Practice
- 5.7Hands-on Activity: Set Up Apache Spark Standalone and Verify via Web UI
- 5.8Hands-on Activity: Run a PySpark Job to Read and Write Data in HDFS
- 5.9Assignment: Build & Submit a Spark Job via Shell Script Automation3 Days
- 5.10Quiz: Spark on Linux20 Questions
- Module 5: Linux for ETL and Automation8
- 6.1Lecture 1: ETL Overview Using Linux Tools
- 6.2Lecture 2: Building Pipelines with Shell Scripts
- 6.3Lecture 3: Integrating Linux Scripts with Airflow or Luigi
- 6.4Lecture 4: Using curl, wget, and jq for API and JSON Data Ingestion
- 6.5Video: Linux for ETL and Automation
- 6.6Hands-on Activity: Automate a Daily Data Fetch & Transform with Bash
- 6.7Assignment: Build a fully automated ETL shell script pulling CSV data from API and processing it into local HDFS3 Days
- 6.8Quiz: Linux for ETL and Automation20 Questions
- Module 6: Performance Tuning for Data Workloads9
- 7.1Lecture 1: CPU, Memory, and I/O Profiling Tools
- 7.2Lecture 2: Linux Kernel Parameters and Tuning for Hadoop/Spark
- 7.3Lecture 3: Filesystems & Storage Fundamentals for Big-Data Workloads (Disk I/O Optimization)
- 7.4Lecture 4: Using cgroups and ulimit for Resource Control
- 7.5Video: Performance Tuning for Data Workloads
- 7.6Hands-On Activity: Measure Spark Job Performance Before and After Memory Tuning
- 7.7Resources for Performance Tuning and Monitoring
- 7.8Assignment: Tune and Document 3 Kernel Parameters for Better Data Performance3 Days
- 7.9Quiz: Performance Tuning for Data Workloads20 Questions
- Module 7: Security and Access Management7
- 8.1User and Group Management for Data Clusters
- 8.2File Permissions and ACLs in HDFS and Linux
- 8.3Using Sudoers and Restricting Access for Jobs
- 8.4Key Management, SSH Hardening, and Basic Firewalls
- 8.5Hands-On Activity: Simulating Hadoop Security — User ACLs & SSH Key Rotation (No Real Cluster Required)
- 8.6Troubleshooting: Fixing Common ACL & SSH Key Rotation Issues
- 8.7Quiz: Security And Access Management20 Questions
- Module 8: Capstone Project — Data Pipeline on Linux4
Lecture 1: Why Linux Dominates Data Infrastructure?
Next
