Google Data Enginner

Overview

This course consists of three modules:

- Participants will be able to design, build, operationalize, secure & monitor data processing systems with an emphasis on security & compliance; scalability & efficiency; reliability & fidelity
- Participants will be able to leverage, deploy and continuously train the pre-existing ML models
- Participants will be able to ensure solution quality, design the data processing systems

Duration
4 Days

Pre-Requisites
Participants must have an understanding of how data works and what it can deliver for the organization

Course Outline

Introduction

Theory, Practice, and Tests
Lab: Setting Up A GCP Account
Lab: Using the Cloud Shell

Compute

- About this section
- Compute Options
- Google Compute Engine (GCE)
- Practical’s:
- Lab: Creating a VM Instance
- Editing a VM Instance
- Creating a Virtual machine Instance Using the Command Line
- Creating and Attaching a Persistent Disk

Google Container Engine – Kubernetes (GKE)

More GKE

Creating Kubernetes Cluster
Deploying a WordPress Container

App Engine

Contrasting App Engine, Compute Engine & also Container Engine

Lab: Deploy and Run an Engine App

Storage

Storage Options
Quick Take
Cloud Storage
- Lab: Working with Cloud Storage Buckets
- Bucket and Object Permissions
- Life cycle Management on Buckets
Fix for AccessDeniedException: 403 Insufficient Permission
- Lab: Running A Program on a Virtual Machine Instance and Storing Results on Cloud Storage

Virtual Machines and Images

Live Migration
Machine Types and Billing
Sustained Use and Committed Use Discounts
Rightsizing Recommendations
RAM Disk
Images
Startup Scripts and Baked Images

VPCs and Interconnecting Networks

VPCs And Subnets
Global VPCs, Regional Subnets
IP Addresses
Lab: Working with Static IP Addresses
Routes
Firewall Rules
Practical’s:
- Lab: Working with Firewalls
- Working with Auto Mode and Custom Mode Networks
- Bastion Host

Cloud VPN

Lab: Working with Cloud VPN
Cloud Router
Lab: participants will learn Using Cloud Routers for Dynamic Routing
Dedicated Interconnect Direct & Carrier Peering
Shared-VPCs
Lab: Shared VPCs
VPC: Network Peering
Lab: VPC Peering
Cloud-DNS & Legacy Networks
Networking

Managed Instance Groups and Load Balancing

Managed and Unmanaged Instance Groups
Types of Load Balancing
Overview of HTTP(S) Load Balancing
Forwarding Rules, Url Maps & Target Proxy
Preview
Backend Service & Backends
Load Distribution & Firewall Rules
- Lab: HTTP(S)-Load Balancing
- Content-Based Load Balancing
SSL Proxy & TCP Proxy Load Balancing
- Lab: SSL Proxy Load Balancing
Network Load Balancing
Internal Load Balancing
Autoscalers
- Lab: Autoscaling with Managed Instance Groups

Ops & Security

Stack Driver
Stack Driver Logging
- Stack driver Resource Monitoring
- Stack driver Error Reporting & Debugging

Cloud-Deployment-Manager

Lab: Using-Deployment-Manager
Deployment Manager & Stackdriver

Cloud: Endpoints

Cloud IAM: User accounts, Service accounts and API Credentials
Cloud IAM: Roles, Identity Aware Proxy, Best Practices
- Lab: Cloud-IAM

Data Protection

Operations and Security

Transfer Service

Practicals Migrating Data Using the Transfer Service gcloud init
Lab: Cloud Storage Versioning, Directory Sync

Cloud SQL, Cloud Spanner, OLTP, RDBMS

Cloud SQL
- Lab: Creating A Cloud SQL Instance
- Running Commands on Cloud SQL Instance
- Bulk Loading of the Data into Cloud SQL Tables

Cloud Spanner

More Cloud Spanner
Lab: Working with Cloud Spanner

BigTable ~ HBase = Columnar Store

BigTable Intro
Columnar Store
Denormalised
Column Families
BigTable Performance
Getting the HBase Prompt
Lab: BigTable demo

Datastore ~ Document Database

Datastore
Lab: Datastore demo

Dataflow: Apache Beam

Data Flow Intro
Apache-Beam
- Lab: Running A Python Data Flow Program
- Running A Java Data Flow Program
- Implementing Word Count in Dataflow Java
- Executing the Word Count Dataflow
- Executing MapReduce in Data-flow In Python
- Executing MapReduce in Data-flow In Java

Dataproc: Manage Hadoop

Data Proc
- Lab: Creating & Managing A Dataproc Cluster
- Creating A Firewall Rule to Access Dataproc
- Running A PySpark Job on Dataproc
- Running PySpark REPL Shell & Pig Scripts on Dataproc
- Submitting A Spark Jar to Dataproc
- Working with the Dataproc Using GCloud CLI

Pub/Sub for Streaming

Pub Sub
- Work with Pubsub on the Command Line
- Work with PubSub by Using the Web Console
- Setting Up a Pubsub Publisher by using the Python Library
- Setting Up a Pubsub Subscriber by Using the Python Library
- Publishing Streaming Data into Pubsub
- Reading the Streaming Data from PubSub and Writing it to BigQuery
- Executing A Pipeline to Read Streaming Data & also Write to BigQuery
- Pubsub Source BigQuery Sink

Datalab ~ Jupyter

Data Lab
- Creating and Working on Datalab Instance
- Importing and Exporting Data Using Datalab
- Using the Charting API In Datalab

Trained ML APIs Vision, Translate, NLP and Speech

Taxicab Prediction by Setting up the dataset
Taxicab Prediction by Training and Running the model
The Vision is to Translate, NLP and Speech API
The Vision API for Label and also the Landmark Detection

Additional

Introducing the Hadoop Ecosystem
Hadoop
HDFS
MapReduce
Yarn
Hive
Hive vs. RDBMS
HQL vs. SQL
OLAP in Hive
Windowing Hive
Pig
Spark
Streams Intro
Microbatches
Window Types
Hadoop Ecosystem
Introduction
Theory, Practice, and Tests
Lab: Setting Up A GCP Account
Lab: Using the Cloud Shell

Google Data Enginner

Stay updated about us!