Google Data Enginner
Overview
This course consists of three modules:-
- Participants will be able to design, build, operationalize, secure & monitor data processing systems with an emphasis on security & compliance; scalability & efficiency; reliability & fidelity
- Participants will be able to leverage, deploy and continuously train the pre-existing ML models
- Participants will be able to ensure solution quality, design the data processing systems
4 Days
Pre-Requisites
Participants must have an understanding of how data works and what it can deliver for the organization
Course Outline
- Theory, Practice, and Tests
- Lab: Setting Up A GCP Account
- Lab: Using the Cloud Shell
- About this section
- Compute Options
- Google Compute Engine (GCE)
- Practical’s:
- Lab: Creating a VM Instance
- Editing a VM Instance
- Creating a Virtual machine Instance Using the Command Line
- Creating and Attaching a Persistent Disk
- More GKE
- Creating Kubernetes Cluster
- Deploying a WordPress Container
- App Engine
- Contrasting App Engine, Compute Engine & also Container Engine
- Lab: Deploy and Run an Engine App
- Storage Options
- Quick Take
- Cloud Storage
- Lab: Working with Cloud Storage Buckets
- Bucket and Object Permissions
- Life cycle Management on Buckets
- Fix for AccessDeniedException: 403 Insufficient Permission
- Lab: Running A Program on a Virtual Machine Instance and Storing Results on Cloud Storage
- Live Migration
- Machine Types and Billing
- Sustained Use and Committed Use Discounts
- Rightsizing Recommendations
- RAM Disk
- Images
- Startup Scripts and Baked Images
- VPCs And Subnets
- Global VPCs, Regional Subnets
- IP Addresses
- Lab: Working with Static IP Addresses
- Routes
- Firewall Rules
- Practical’s:
- Lab: Working with Firewalls
- Working with Auto Mode and Custom Mode Networks
- Bastion Host
- Lab: Working with Cloud VPN
- Cloud Router
- Lab: participants will learn Using Cloud Routers for Dynamic Routing
- Dedicated Interconnect Direct & Carrier Peering
- Shared-VPCs
- Lab: Shared VPCs
- VPC: Network Peering
- Lab: VPC Peering
- Cloud-DNS & Legacy Networks
- Networking
- Managed and Unmanaged Instance Groups
- Types of Load Balancing
- Overview of HTTP(S) Load Balancing
- Forwarding Rules, Url Maps & Target Proxy
- Preview
- Backend Service & Backends
- Load Distribution & Firewall Rules
- Lab: HTTP(S)-Load Balancing
- Content-Based Load Balancing
- SSL Proxy & TCP Proxy Load Balancing
- Lab: SSL Proxy Load Balancing
- Network Load Balancing
- Internal Load Balancing
- Autoscalers
- Lab: Autoscaling with Managed Instance Groups
Ops & Security
- Stack Driver
- Stack Driver Logging
- Stack driver Resource Monitoring
- Stack driver Error Reporting & Debugging
- Lab: Using-Deployment-Manager
- Deployment Manager & Stackdriver
- Cloud IAM: User accounts, Service accounts and API Credentials
- Cloud IAM: Roles, Identity Aware Proxy, Best Practices
- Lab: Cloud-IAM
- Operations and Security
- Practicals Migrating Data Using the Transfer Service gcloud init
- Lab: Cloud Storage Versioning, Directory Sync
- Cloud SQL
- Lab: Creating A Cloud SQL Instance
- Running Commands on Cloud SQL Instance
- Bulk Loading of the Data into Cloud SQL Tables
- More Cloud Spanner
- Lab: Working with Cloud Spanner
- BigTable Intro
- Columnar Store
- Denormalised
- Column Families
- BigTable Performance
- Getting the HBase Prompt
- Lab: BigTable demo
- Datastore
- Lab: Datastore demo
- Data Flow Intro
- Apache-Beam
- Lab: Running A Python Data Flow Program
- Running A Java Data Flow Program
- Implementing Word Count in Dataflow Java
- Executing the Word Count Dataflow
- Executing MapReduce in Data-flow In Python
- Executing MapReduce in Data-flow In Java
- Data Proc
- Lab: Creating & Managing A Dataproc Cluster
- Creating A Firewall Rule to Access Dataproc
- Running A PySpark Job on Dataproc
- Running PySpark REPL Shell & Pig Scripts on Dataproc
- Submitting A Spark Jar to Dataproc
- Working with the Dataproc Using GCloud CLI
- Pub Sub
- Work with Pubsub on the Command Line
- Work with PubSub by Using the Web Console
- Setting Up a Pubsub Publisher by using the Python Library
- Setting Up a Pubsub Subscriber by Using the Python Library
- Publishing Streaming Data into Pubsub
- Reading the Streaming Data from PubSub and Writing it to BigQuery
- Executing A Pipeline to Read Streaming Data & also Write to BigQuery
- Pubsub Source BigQuery Sink
- Data Lab
- Creating and Working on Datalab Instance
- Importing and Exporting Data Using Datalab
- Using the Charting API In Datalab
- Taxicab Prediction by Setting up the dataset
- Taxicab Prediction by Training and Running the model
- The Vision is to Translate, NLP and Speech API
- The Vision API for Label and also the Landmark Detection
- Introducing the Hadoop Ecosystem
- Hadoop
- HDFS
- MapReduce
- Yarn
- Hive
- Hive vs. RDBMS
- HQL vs. SQL
- OLAP in Hive
- Windowing Hive
- Pig
- Spark
- Streams Intro
- Microbatches
- Window Types
- Hadoop Ecosystem
- Introduction
- Theory, Practice, and Tests
- Lab: Setting Up A GCP Account
- Lab: Using the Cloud Shell
