IES Management College And Research Centre

Image from Google Jackets

Google bigquery the definitive guide: data warehousing, analytics, and machine learning at scale

By: Publication details: Mumbai O'reilly/Shroff Publishers &Distributors Pvt Ltd 2019Description: xvii, 475 PaperISBN:
  • 978-93-5213-921-7
Subject(s): DDC classification:
  • 006.3/Lak/Tig
Contents:
Table of Contents Preface Who Is This Book For? Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments 1. What Is Google BigQuery? Data Processing Architectures Relational Database Management System MapReduce Framework BigQuery: A Serverless, Distributed SQL Engine Working with BigQuery Deriving Insights Across Datasets ETL, EL, and ELT Powerful Analytics Simplicity of Management How BigQuery Came About What Makes BigQuery Possible? Separation of Compute and Storage Storage and Networking Infrastructure Managed Storage Integration with Google Cloud Platform Security and Compliance Summary 2. Query Essentials Simple Queries Retrieving Rows by Using SELECT Aliasing Column Names with AS Filtering with WHERE SELECT *, EXCEPT, REPLACE Subqueries with WITH Sorting with ORDER BY Aggregates Computing Aggregates by Using GROUP BY Counting Records by Using COUNT Filtering Grouped Items by Using HAVING Finding Unique Values by Using DISTINCT A Brief Primer on Arrays and Structs Creating Arrays by Using ARRAY_AGG Array of STRUCT TUPLE Working with Arrays UNNEST an Array Joining Tables The JOIN Explained INNER JOIN CROSS JOIN OUTER JOIN Saving and Sharing Query History and Caching Saved Queries Views Versus Shared Queries Summary 3. Data Types, Functions, and Operators Numeric Types and Functions Mathematical Functions Standard-Compliant Floating-Point Division SAFE Functions Comparisons Precise Decimal Calculations with NUMERIC Working with BOOL Logical Operations Conditional Expressions Cleaner NULL-Handling with COALESCE Casting and Coercion Using COUNTIF to Avoid Casting Booleans String Functions Internationalization Printing and Parsing String Manipulation Functions Transformation Functions Regular Expressions Summary of String Functions Working with TIMESTAMP Parsing and Formatting Timestamps Extracting Calendar Parts Arithmetic with Timestamps Date, Time, and DateTime Working with GIS Functions Summary 4. Loading Data into BigQuery The Basics Loading from a Local Source Specifying a Schema Copying into a New Table Data Management (DDL and DML) Loading Data Efficiently Federated Queries and External Data Sources How to Use Federated Queries When to Use Federated Queries and External Data Sources Interactive Exploration and Querying of Data in Google Sheets SQL Queries on Data in Cloud Bigtable Transfers and Exports Data Transfer Service Exporting Stackdriver Logs Using Cloud Dataflow to Read/Write from BigQuery Moving On-Premises Data Data Migration Methods Summary 5. Developing with BigQuery Developing Programmatically Accessing BigQuery via the REST API Google Cloud Client Library Accessing BigQuery from Data Science Tools Notebooks on Google Cloud Platform Working with BigQuery, pandas, and Jupyter Working with BigQuery from R Cloud Dataflow JDBC/ODBC drivers Incorporating BigQuery Data into Google Slides (in G Suite) Bash Scripting with BigQuery Creating Datasets and Tables Executing Queries BigQuery Objects Summary 6. Architecture of BigQuery High-Level Architecture Life of a Query Request BigQuery Upgrades Query Engine (Dremel) Dremel Architecture Query Execution Storage Storage Data Metadata Summary 7. Optimizing Performance and Cost Principles of Performance Key Drivers of Performance Controlling Cost Measuring and Troubleshooting Measuring Query Speed Using REST API Measuring Query Speed Using BigQuery Workload Tester Troubleshooting Workloads Using Stackdriver Reading Query Plan Information Increasing Query Speed Minimizing I/O Caching the Results of Previous Queries Performing Efficient Joins Avoiding Overwhelming a Worker Using Approximate Aggregation Functions Optimizing How Data Is Stored and Accessed Minimizing Network Overhead Choosing an Efficient Storage Format Partitioning Tables to Reduce Scan Size Clustering Tables Based on High-Cardinality Keys Time-Insensitive Use Cases Batch Queries File Loads Summary Checklist 8. Advanced Queries Reusable Queries Parameterized Queries SQL User-Defined Functions Reusing Parts of Queries Advanced SQL Working with Arrays Window Functions Table Metadata Data Definition Language and Data Manipulation Language Beyond SQL JavaScript UDFs Scripting Advanced Functions BigQuery Geographic Information Systems Useful Statistical Functions Hash Algorithms Summary 9. Machine Learning in BigQuery What Is Machine Learning? Formulating a Machine Learning Problem Types of Machine Learning Problems Building a Regression Model Choose the Label Exploring the Dataset to Find Features Creating a Training Dataset Training and Evaluating the Model Predicting with the Model Examining Model Weights More-Complex Regression Models Building a Classification Model Training Evaluation Prediction Choosing the Threshold Customizing BigQuery ML Controlling Data Split Balancing Classes Regularization k-Means Clustering What’s Being Clustered? Clustering Bicycle Stations Carrying Out Clustering Understanding the Clusters Data-Driven Decisions Recommender Systems The MovieLens Dataset Matrix Factorization Making Recommendations Incorporating User and Movie Information Custom Machine Learning Models on GCP Hyperparameter Tuning AutoML Support for TensorFlow Summary 10. Administering and Securing BigQuery Infrastructure Security Identity and Access Management Identity Role Resource Administering BigQuery Job Management Authorizing Users Restoring Deleted Records and Tables Continuous Integration/Continuous Deployment Cost/Billing Exports Dashboards, Monitoring, and Audit Logging Availability, Disaster Recovery, and Encryption Zones, Regions, and Multiregions BigQuery and Failure Handling Durability, Backups, and Disaster Recovery Privacy and Encryption Regulatory Compliance Data Locality Restricting Access to Subsets of Data Removing All Transactions Related to a Single Individual Data Loss Prevention CMEK Data Exfiltration Protection Summary Index
Summary: Book Description Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you’ll examine how to analyze data at scale to derive insights from large datasets efficiently. Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the BigQuery team, provide best practices for modern data warehousing within an autoscaled, serverless public cloud. Whether you want to explore parts of BigQuery you’re not familiar with or prefer to focus on specific tasks, this reference is indispensable.
List(s) this item appears in: Book Alert-November-2019
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Current library Call number Status Date due Barcode Item holds
Book Book Main Library 006.3/Lak/Tig/36989 (Browse shelf(Opens below)) Available 11136989
Total holds: 0

Table of Contents
Preface
Who Is This Book For?
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
1. What Is Google BigQuery?
Data Processing Architectures
Relational Database Management System
MapReduce Framework
BigQuery: A Serverless, Distributed SQL Engine
Working with BigQuery
Deriving Insights Across Datasets
ETL, EL, and ELT
Powerful Analytics
Simplicity of Management
How BigQuery Came About
What Makes BigQuery Possible?
Separation of Compute and Storage
Storage and Networking Infrastructure
Managed Storage
Integration with Google Cloud Platform
Security and Compliance
Summary
2. Query Essentials
Simple Queries
Retrieving Rows by Using SELECT
Aliasing Column Names with AS
Filtering with WHERE
SELECT *, EXCEPT, REPLACE
Subqueries with WITH
Sorting with ORDER BY
Aggregates
Computing Aggregates by Using GROUP BY
Counting Records by Using COUNT
Filtering Grouped Items by Using HAVING
Finding Unique Values by Using DISTINCT
A Brief Primer on Arrays and Structs
Creating Arrays by Using ARRAY_AGG
Array of STRUCT
TUPLE
Working with Arrays
UNNEST an Array
Joining Tables
The JOIN Explained
INNER JOIN
CROSS JOIN
OUTER JOIN
Saving and Sharing
Query History and Caching
Saved Queries
Views Versus Shared Queries
Summary
3. Data Types, Functions, and Operators
Numeric Types and Functions
Mathematical Functions
Standard-Compliant Floating-Point Division
SAFE Functions
Comparisons
Precise Decimal Calculations with NUMERIC
Working with BOOL
Logical Operations
Conditional Expressions
Cleaner NULL-Handling with COALESCE
Casting and Coercion
Using COUNTIF to Avoid Casting Booleans
String Functions
Internationalization
Printing and Parsing
String Manipulation Functions
Transformation Functions
Regular Expressions
Summary of String Functions
Working with TIMESTAMP
Parsing and Formatting Timestamps
Extracting Calendar Parts
Arithmetic with Timestamps
Date, Time, and DateTime
Working with GIS Functions
Summary
4. Loading Data into BigQuery
The Basics
Loading from a Local Source
Specifying a Schema
Copying into a New Table
Data Management (DDL and DML)
Loading Data Efficiently
Federated Queries and External Data Sources
How to Use Federated Queries
When to Use Federated Queries and External Data Sources
Interactive Exploration and Querying of Data in Google Sheets
SQL Queries on Data in Cloud Bigtable
Transfers and Exports
Data Transfer Service
Exporting Stackdriver Logs
Using Cloud Dataflow to Read/Write from BigQuery
Moving On-Premises Data
Data Migration Methods
Summary
5. Developing with BigQuery
Developing Programmatically
Accessing BigQuery via the REST API
Google Cloud Client Library
Accessing BigQuery from Data Science Tools
Notebooks on Google Cloud Platform
Working with BigQuery, pandas, and Jupyter
Working with BigQuery from R
Cloud Dataflow
JDBC/ODBC drivers
Incorporating BigQuery Data into Google Slides (in G Suite)
Bash Scripting with BigQuery
Creating Datasets and Tables
Executing Queries
BigQuery Objects
Summary
6. Architecture of BigQuery
High-Level Architecture
Life of a Query Request
BigQuery Upgrades
Query Engine (Dremel)
Dremel Architecture
Query Execution
Storage
Storage Data
Metadata
Summary
7. Optimizing Performance and Cost
Principles of Performance
Key Drivers of Performance
Controlling Cost
Measuring and Troubleshooting
Measuring Query Speed Using REST API
Measuring Query Speed Using BigQuery Workload Tester
Troubleshooting Workloads Using Stackdriver
Reading Query Plan Information
Increasing Query Speed
Minimizing I/O
Caching the Results of Previous Queries
Performing Efficient Joins
Avoiding Overwhelming a Worker
Using Approximate Aggregation Functions
Optimizing How Data Is Stored and Accessed
Minimizing Network Overhead
Choosing an Efficient Storage Format
Partitioning Tables to Reduce Scan Size
Clustering Tables Based on High-Cardinality Keys
Time-Insensitive Use Cases
Batch Queries
File Loads
Summary
Checklist
8. Advanced Queries
Reusable Queries
Parameterized Queries
SQL User-Defined Functions
Reusing Parts of Queries
Advanced SQL
Working with Arrays
Window Functions
Table Metadata
Data Definition Language and Data Manipulation Language
Beyond SQL
JavaScript UDFs
Scripting
Advanced Functions
BigQuery Geographic Information Systems
Useful Statistical Functions
Hash Algorithms
Summary
9. Machine Learning in BigQuery
What Is Machine Learning?
Formulating a Machine Learning Problem
Types of Machine Learning Problems
Building a Regression Model
Choose the Label
Exploring the Dataset to Find Features
Creating a Training Dataset
Training and Evaluating the Model
Predicting with the Model
Examining Model Weights
More-Complex Regression Models
Building a Classification Model
Training
Evaluation
Prediction
Choosing the Threshold
Customizing BigQuery ML
Controlling Data Split
Balancing Classes
Regularization
k-Means Clustering
What’s Being Clustered?
Clustering Bicycle Stations
Carrying Out Clustering
Understanding the Clusters
Data-Driven Decisions
Recommender Systems
The MovieLens Dataset
Matrix Factorization
Making Recommendations
Incorporating User and Movie Information
Custom Machine Learning Models on GCP
Hyperparameter Tuning
AutoML
Support for TensorFlow
Summary
10. Administering and Securing BigQuery
Infrastructure Security
Identity and Access Management
Identity
Role
Resource
Administering BigQuery
Job Management
Authorizing Users
Restoring Deleted Records and Tables
Continuous Integration/Continuous Deployment
Cost/Billing Exports
Dashboards, Monitoring, and Audit Logging
Availability, Disaster Recovery, and Encryption
Zones, Regions, and Multiregions
BigQuery and Failure Handling
Durability, Backups, and Disaster Recovery
Privacy and Encryption
Regulatory Compliance
Data Locality
Restricting Access to Subsets of Data
Removing All Transactions Related to a Single Individual
Data Loss Prevention
CMEK
Data Exfiltration Protection
Summary
Index

Book Description
Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you’ll examine how to analyze data at scale to derive insights from large datasets efficiently.

Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the BigQuery team, provide best practices for modern data warehousing within an autoscaled, serverless public cloud. Whether you want to explore parts of BigQuery you’re not familiar with or prefer to focus on specific tasks, this reference is indispensable.

There are no comments on this title.

to post a comment.

Circulation Timings: Monday to Saturday: 8:30 AM to 9:30 PM | Sundays/Bank Holiday during Examination Period: 10:00 AM to 6:00 PM