Become a certified Google Cloud Professional Data Engineer in 3 months.

Note*: Since I have personally passed this exam in July 2020, this article will remain relevant for the near future.

My journey with Google BigQuery begun in 2015. Back then it was a separate entity from GCP. Several of our GA360 clients were curious about Google’s new platforms. A handful of them desired to consolidate their web analytics (GA360) data with their CRM data or their App analytics data. 

With the arrival of 2018, I had been exposed to most if not all the data engineering products in the GCP environment. I have mentioned some of the products that I was able to work on below, 

  • App Engine (not required for Data Engineering)
  • Cloud Storage
  • Pub/Sub
  • Dataflow
  • BigQuery
  • Dataprep
  • Dataproc
  • Datalab
  • Others

How I benefited from this certification and how you could too.

Primarily, it created a lot of opportunities for the advancement of my career. Additionally, I work on GMP, GCP, GTM, Google Optimize, Data Studio, etc. regularly, and having acquired my certification in all the mentioned products except for GCP I decided to get myself certified for it. 

In 2018, when I had initially prepared for the Google Data Engineering Certification anxiety crept in when Hadoop, Spark, or any other Big Data Jargon was mentioned due to which I had given up several times. 

2020 was different, with a rush of adrenaline I had decided to book the date for the certifications exam even though I was unsure of my preparation. I had hoped that it would encourage me to power through the learning material. I ended up taking the online proctored based exam, a word of caution for those who would like to do the same, make sure you are in a quiet distraction-free environment. The best part about this mock exam is that the format is the same as the actual exam. 

The various reasons can be to catch up with the growing Cloud/Data or you may already have the skills to use Google Cloud want to demonstrate this to a future employer or client.

So, the reasons to pursue this exam is two-folded either you are trying to lock-in a few months to learn about GCP or you are already a veteran and you’d like for a badge to show that. This certificate demonstrates your proficiency to design and build data processing systems with an emphasis on security and compliance, scalability and efficiency, reliability and fidelity, and flexibility, and portability. A data engineer should be able to leverage, deploy & continuously train pre-existing machine learning models. 

You can work on the GCP without certification as well, however, this certification validates your GCP skills.

What topics you need to cover?

A Professional Data Engineer enables data-driven decision making by collecting, transforming, and publishing data. A data engineer should be able to design, build, operationalize, secure, and monitor data processing systems with a particular emphasis on security and compliance; scalability and efficiency; reliability and fidelity; and flexibility and portability. A data engineer should also be able to leverage, deploy, and continuously train pre-existing machine learning models.

1. Designing data processing systems

  • Mapping storage systems to business requirements
  • Data modeling
  • Tradeoffs involving latency, throughput, transactions
  • Distributed systems
  • Schema design
  • Data publishing and visualization (e.g., BigQuery)
  • Batch and streaming data (e.g., Cloud Dataflow, Cloud Dataproc, Apache Beam, Apache Spark and Hadoop ecosystem, Cloud Pub/Sub, Apache Kafka)
  • Online (interactive) vs. batch predictions
  • Job automation and orchestration (e.g., Cloud Composer)
  • Choice of infrastructure
  • System availability and fault tolerance
  • Use of distributed systems
  • Capacity planning
  • Hybrid cloud and edge computing
  • Architecture options (e.g., message brokers, message queues, middleware, service-oriented architecture, serverless functions)
  • At least once, in-order, and exactly once, etc., event processing
  • Awareness of current state and how to migrate a design to a future state
  • Migrating from on-premises to cloud (Data Transfer Service, Transfer Appliance, Cloud Networking)
  • Validating a migration

2. Building and operationalizing data processing systems

  • Effective use of managed services (Cloud Bigtable, Cloud Spanner, Cloud SQL, BigQuery, Cloud Storage, Cloud Datastore, Cloud Memorystore)
  • Storage costs and performance
  • Lifecycle management of data
  • Data cleansing
  • Batch and streaming
  • Transformation
  • Data acquisition and import
  • Integrating with new data sources
  • Provisioning resources
  • Monitoring pipelines
  • Adjusting pipelines
  • Testing and quality control

3. Operationalizing machine learning models

  • ML APIs (e.g., Vision API, Speech API)
  • Customizing ML APIs (e.g., AutoML Vision, Auto ML text)
  • Conversational experiences (e.g., Dialogflow)
  • Ingesting appropriate data
  • Retraining of machine learning models (Cloud Machine Learning Engine, BigQuery ML, Kubeflow, Spark ML)
  • Continuous evaluation
  • Distributed vs. single machine
  • Use of edge compute
  • Hardware accelerators (e.g., GPU, TPU)
  • Machine learning terminology (e.g., features, labels, models, regression, classification, recommendation, supervised and unsupervised learning, evaluation metrics)
  • Impact of dependencies of machine learning models
  • Common sources of error (e.g., assumptions about data)

4. Ensuring solution quality

  • Identity and access management (e.g., Cloud IAM)
  • Data security (encryption, key management)
  • Ensuring privacy (e.g., Data Loss Prevention API)
  • Legal compliance (e.g., Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), FedRAMP, General Data Protection Regulation (GDPR))
  • Building and running test suites
  • Pipeline monitoring (e.g., Stackdriver)
  • Assessing, troubleshooting, and improving data representations and data processing infrastructure
  • Resizing and autoscaling resources
  • Performing data preparation and quality control (e.g., Cloud Dataprep)
  • Verification and monitoring
  • Planning, executing, and stress testing data recovery (fault tolerance, rerunning failed jobs, performing retrospective re-analysis)
  • Choosing between ACID, idempotent, eventually consistent requirements
  • Mapping to current and future business requirements
  • Designing for data and application portability (e.g., multi-cloud, data residency requirements)
  • Data staging, cataloging, and discovery

What is the cost, validity, and pre-requisite to giving this exam?

The Google Cloud Data Engineering certification cost is $200, validity is 2 years.

Prerequisites: None
Recommended experience: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.

Which course helped me to clear this exam?

You have reached the custard of the cake! 

I would recommend that you take these courses in the below order. 

  1. Google Cloud Professional Data Engineer Course [2019 Update]
    1. Website: Udemy
    2. Level: Beginner
    3. Time: 4.5 hours
    4. Usefulness: 60/100
    5. Cost: $11
  2. Data Engineering, Big Data, and Machine Learning on GCP Specialization
    1. Website: Coursera
    2. Level: Intermediate
    3. Time: 1-2 months
    4. Usefulness: 80/100
    5. Cost: $49 USD per month (after 7-day free trial)
    6. Note: Try to give the practice test for 4-5 times until you don’t score 90%+
  3. Google Cloud Certified Professional Data Engineer
    1. Website: Linux Academy
    2. Level: Difficult
    3. Time: 20 hours
    4. Usefulness: 100/100
    5. Cost: $49 USD per month (after 7-day free trial)
    6. Note: Try to give the practice test for 4-5 times until you don’t score 90%+
  4. Preparing for the Google Cloud Professional Data Engineer Exam
    1. Website: Coursera
    2. Level: Advanced Level
    3. Time: Approx. 7 hours
    4. Usefulness: 50/100
    5. Cost: $49 USD per month (after 7-day free trial)
    6. Note: Try to give the practice test for 4-5 times until you don’t score 90%+
  5. Awesome GCP
    1. Website: Youtube
    2. Level: Advanced Level
    3. Time: Approx. 12 hours
    4. Usefulness: 95/100
    5. Cost: Free
    6. Note: Try to understand the concept while listening to each answer.
  6. Data-Engineering-on-GCP-Cheatsheet
    1. Note: Its a summary document to understand each GCP data engineering product.
  7. Practice questions:
    1. Note: You can try to solve as much as possible questions from here.

Bonus Tips:

  1. Try to understand the difference between each GCP product (Dataflow, Dataproc, Datastore, Bigtable, Big Query, Pub/Sub, and how they can be utilized in each situation. 
  2. The Case Studies are not included, so do not spend too much time on them. 
  3. Understand the basics of SQL, since questions regarding it could be asked in the Big Query Section. 
  4. The pattern of the practice exams by Linux Academy, Data Engineering readiness test, and Coursera are quite similar to the exam, however, it won’t be the same.
  5. The core of this exam focuses on your product application skills. You should be able to know when, why, and how to use each product. You will be tested on when and why. 
  6. Whenever you read Hadoop, Spark, Hive, and Pig in the questions then relate it to Dataproc. 
  7. Whenever you see Apache Beam or Pipeline in the question then relate it to Dataflow. 
  8. Whenever you read that you require globally available storage and should support ACID operation relate it to Cloud Spanner. 
  9. Whenever you see Cassandra in the question then relate it to Big Table.
  10. Try to understand permission and roles in each product.
  11. Selecting Storage options:
GCP_Storage_Options

When I finally hit that ‘Submit’ button after an arduous journey, I was still unsure If I had managed to make the mark. If you feel the same way, relax! 

Do not stop practicing.  

If you’re giving Online Proctored exams after completing the exam, you’ll know whether you passed or failed along with the below message. Google will take 10-15 days to email your certificate along with Google Cloud certification merchandise code.

Good luck for your certification.

You may also like...

Leave a Reply