使用 TFX 设置机器学习管道的最佳做法是什么?

What are the best practices for setting up machine learning pipelines with TFX?

除了 TFX Guide 涵盖的内容外, 设置机器学习管道的其他最佳做法是什么?

原文 ACM KDD '17 TFX paper 介绍了 TFX 的功能以及它们如何支持在生产中大规模部署 ML。值得一读。 如需更近期的报道, Building Machine Learning Pipelines 作者:Hannes Hapke 和 Catherine Nelson,ISBN:9781492053194,由 O'Reilly Media, Inc. 于 2020 年 7 月出版,很好地涵盖了最佳实践。

这是 table 的内容(由 O'Reilly Media 提供)。

Foreword
Preface
What Are Machine Learning Pipelines?
Who Is This Book For?
Why TensorFlow and TensorFlow Extended?
Overview of the Chapters
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
1. Introduction
Why Machine Learning Pipelines?
When to Think About Machine Learning Pipelines
Overview of the Steps in a Machine Learning Pipeline
Data Ingestion and Data Versioning
Data Validation
Data Preprocessing
Model Training and Tuning
Model Analysis
Model Versioning
Model Deployment
Feedback Loops
Data Privacy
Pipeline Orchestration
Why Pipeline Orchestration?
Directed Acyclic Graphs
Our Example Project
Project Structure
Our Machine Learning Model
Goal of the Example Project
Summary
2. Introduction to TensorFlow Extended
What Is TFX?
Installing TFX
Overview of TFX Components
What Is ML Metadata?
Interactive Pipelines
Alternatives to TFX
Introduction to Apache Beam
Setup
Basic Data Pipeline
Executing Your Basic Pipeline
Summary
3. Data Ingestion
Concepts for Data Ingestion
Ingesting Local Data Files
Ingesting Remote Data Files
Ingesting Data Directly from Databases
Data Preparation
Splitting Datasets
Spanning Datasets
Versioning Datasets
Ingestion Strategies
Structured Data
Text Data for Natural Language Problems
Image Data for Computer Vision Problems
Summary
4. Data Validation
Why Data Validation?
TFDV
Installation
Generating Statistics from Your Data
Generating Schema from Your Data
Recognizing Problems in Your Data
Comparing Datasets
Updating the Schema
Data Skew and Drift
Biased Datasets
Slicing Data in TFDV
Processing Large Datasets with GCP
Integrating TFDV into Your Machine Learning Pipeline
Summary
5. Data Preprocessing
Why Data Preprocessing?
Preprocessing the Data in the Context of the Entire Dataset
Scaling the Preprocessing Steps
Avoiding a Training-Serving Skew
Deploying Preprocessing Steps and the ML Model as One Artifact
Checking Your Preprocessing Results in Your Pipeline
Data Preprocessing with TFT
Installation
Preprocessing Strategies
Best Practices
TFT Functions
Standalone Execution of TFT
Integrate TFT into Your Machine Learning Pipeline
Summary
6. Model Training
Defining the Model for Our Example Project
The TFX Trainer Component
run_fn() Function
Running the Trainer Component
Other Trainer Component Considerations
Using TensorBoard in an Interactive Pipeline
Distribution Strategies
Model Tuning
Strategies for Hyperparameter Tuning
Hyperparameter Tuning in TFX Pipelines
Summary
7. Model Analysis and Validation
How to Analyze Your Model
Classification Metrics
Regression Metrics
TensorFlow Model Analysis
Analyzing a Single Model in TFMA
Analyzing Multiple Models in TFMA
Model Analysis for Fairness
Slicing Model Predictions in TFMA
Checking Decision Thresholds with Fairness Indicators
Going Deeper with the What-If Tool
Model Explainability
Generating Explanations with the WIT
Other Explainability Techniques
Analysis and Validation in TFX
ResolverNode
Evaluator Component
Validation in the Evaluator Component
TFX Pusher Component
Summary
8. Model Deployment with TensorFlow Serving
A Simple Model Server
The Downside of Model Deployments with Python-Based APIs
Lack of Code Separation
Lack of Model Version Control
Inefficient Model Inference
TensorFlow Serving
TensorFlow Architecture Overview
Exporting Models for TensorFlow Serving
Model Signatures
Inspecting Exported Models
Setting Up TensorFlow Serving
Docker Installation
Native Ubuntu Installation
Building TensorFlow Serving from Source
Configuring a TensorFlow Server
REST Versus gRPC
Making Predictions from the Model Server
Getting Model Predictions via REST
Using TensorFlow Serving via gRPC
Model A/B Testing with TensorFlow Serving
Requesting Model Metadata from the Model Server
REST Requests for Model Metadata
gRPC Requests for Model Metadata
Batching Inference Requests
Configuring Batch Predictions
Other TensorFlow Serving Optimizations
TensorFlow Serving Alternatives
BentoML
Seldon
GraphPipe
Simple TensorFlow Serving
MLflow
Ray Serve
Deploying with Cloud Providers
Use Cases
Example Deployment with GCP
Model Deployment with TFX Pipelines
Summary
9. Advanced Model Deployments with TensorFlow Serving
Decoupling Deployment Cycles
Workflow Overview
Optimization of Remote Model Loading
Model Optimizations for Deployments
Quantization
Pruning
Distillation
Using TensorRT with TensorFlow Serving
TFLite
Steps to Optimize Your Model with TFLite
Serving TFLite Models with TensorFlow Serving
Monitoring Your TensorFlow Serving Instances
Prometheus Setup
TensorFlow Serving Configuration
Simple Scaling with TensorFlow Serving and Kubernetes
Summary
10. Advanced TensorFlow Extended
Advanced Pipeline Concepts
Training Multiple Models Simultaneously
Exporting TFLite Models
Warm Starting Model Training
Human in the Loop
Slack Component Setup
How to Use the Slack Component
Custom TFX Components
Use Cases of Custom Components
Writing a Custom Component from Scratch
Reusing Existing Components
Summary
11. Pipelines Part 1: Apache Beam and Apache Airflow
Which Orchestration Tool to Choose?
Apache Beam
Apache Airflow
Kubeflow Pipelines
Kubeflow Pipelines on AI Platform
Converting Your Interactive TFX Pipeline to a Production Pipeline
Simple Interactive Pipeline Conversion for Beam and Airflow
Introduction to Apache Beam
Orchestrating TFX Pipelines with Apache Beam
Introduction to Apache Airflow
Installation and Initial Setup
Basic Airflow Example
Orchestrating TFX Pipelines with Apache Airflow
Pipeline Setup
Pipeline Execution
Summary
12. Pipelines Part 2: Kubeflow Pipelines
Introduction to Kubeflow Pipelines
Installation and Initial Setup
Accessing Your Kubeflow Pipelines Installation
Orchestrating TFX Pipelines with Kubeflow Pipelines
Pipeline Setup
Executing the Pipeline
Useful Features of Kubeflow Pipelines
Pipelines Based on Google Cloud AI Platform
Pipeline Setup
TFX Pipeline Setup
Pipeline Execution
Summary
13. Feedback Loops
Explicit and Implicit Feedback
The Data Flywheel
Feedback Loops in the Real World
Design Patterns for Collecting Feedback
Users Take Some Action as a Result of the Prediction
Users Rate the Quality of the Prediction
Users Correct the Prediction
Crowdsourcing the Annotations
Expert Annotations
Producing Feedback Automatically
How to Track Feedback Loops
Tracking Explicit Feedback
Tracking Implicit Feedback
Summary
14. Data Privacy for Machine Learning
Data Privacy Issues
Why Do We Care About Data Privacy?
The Simplest Way to Increase Privacy
What Data Needs to Be Kept Private?
Differential Privacy
Local and Global Differential Privacy
Epsilon, Delta, and the Privacy Budget
Differential Privacy for Machine Learning
Introduction to TensorFlow Privacy
Training with a Differentially Private Optimizer
Calculating Epsilon
Federated Learning
Federated Learning in TensorFlow
Encrypted Machine Learning
Encrypted Model Training
Converting a Trained Model to Serve Encrypted Predictions
Other Methods for Data Privacy
Summary
15. The Future of Pipelines and Next Steps
Model Experiment Tracking
Thoughts on Model Release Management
Future Pipeline Capabilities
TFX with Other Machine Learning Frameworks
Testing Machine Learning Models
CI/CD Systems for Machine Learning
Machine Learning Engineering Community
Summary
A. Introduction to Infrastructure for Machine Learning
What Is a Container?
Introduction to Docker
Introduction to Docker Images
Building Your First Docker Image
Diving into the Docker CLI
Introduction to Kubernetes
Some Kubernetes Definitions
Getting Started with Minikube and kubectl
Interacting with the Kubernetes CLI
Defining a Kubernetes Resource
Deploying Applications to Kubernetes
B. Setting Up a Kubernetes Cluster on Google Cloud
Before You Get Started
Kubernetes on Google Cloud
Selecting a Google Cloud Project
Setting Up Your Google Cloud Project
Creating a Kubernetes Cluster
Accessing Your Kubernetes Cluster with kubectl
Using Your Kubernetes Cluster with kubectl
Persistent Volume Setups for Kubeflow Pipelines
C. Tips for Operating Kubeflow Pipelines
Custom TFX Images
Exchange Data Through Persistent Volumes
TFX Command-Line Interface
TFX and Its Dependencies
TFX Templates
Publishing Your Pipeline with TFX CLI

有很多,喜欢真的是个人选择。我发现这些书非常好,还有一些在线内容。

  1. Hands-On 使用 Scikit-Learn、Keras 和 TensorFlow 进行机器学习:构建智能系统的概念、工具和技术
  2. Tensorflow 专家教程
  3. 深度学习 Python
  4. 用于深度学习的 TensorFlow

在线阅读:

  1. http://web.stanford.edu/class/cs20si/syllabus.html
  2. http://download.tensorflow.org/paper/whitepaper2015.pdf