数据科学
2014年南京东南大学出版社出版的图书
《数据科学》是2014年南京东南大学出版社出版的图书,作者是舒特 (Rachel Schutt) 、奥尼尔 (Cathy 0'Neil)。
内容简介
  本书脱胎于哥伦比亚大学“数据科学导论”课程的教学讲义,它界定了数据科学的研究范畴,是一本注重人文精神,多角度、全方位、深入介绍数据科学的实用指南,堪称大数据时代的实战宝典。本书旨在让读者能够举一反三地解决重要问题,内容包括:数据科学及工作流程、统计模型与机器学习算法、信息提取与统计变量创建、数据可视化与社交网络、预测模型与因果分析、数据预处理与工程方法。另外,本书还将带领读者展望数据科学未来的发展。
图书目录
Preface
1.Introduction: What Is Data Science?
Big Data and Data Science Hype
Getting Past the Hype
Why Now?
Datafication
The Current Landscape (with a Little History)
Data Science lobs
A Data Science Profile
Thought Experiment: Meta—Definition
OK, So What Is a Data Scientist, Really?
In Academia
In Industry
2.Statistical Inference, Exploratory Data Analysis, and the Data Science
Process
Statistic.al Thinking in the Age of Big Data
Statistical Inference
Populations and Samples
Populations and Samples of Big Data
Big Data Can Mean Big Assumptions
Modeling
Exploratory Data Analysis
Philosophy of Exploratory Data Analysis
Exercise: EDA
The Data Science Process
A Data Scientist's Role in This Process
Thought Experiment: How Would You Simulate Chaos?
Case Study: RealDirect
How Does RealDirect Make Money?
Exercise: RealDirect Data Strategy
3.Algorithms
Machine Learning Algorithms
Three Basic Algorithms
Linear Regression
k—Nearest Neighbors (k—NN)
k—means
Exercise: Basic Machine Learning Algorithms
Solutions
Summing It All Up
Thought Experiment: Automated Statistician
4.Spare Filters, Naive Bayes, and Wrangling
Thought Experiment: Learning by Example
Why Won't Linear Regression Work for Filtering Spare?
How About k—nearest Neighbors?
Naive Bayes
Bayes Law
A Spare Filter for Individual Words
A Spam Filter That Combines Words: Naive Bayes
Fancy It Up: Laplace Smoothing
Comparing Naive Bayes to k—NN
Sample Code in bash
Scraping the Web: APIs and Other Tools
Jake's Exercise: Naive Bayes for Article Classification
Sample R Code for Dealing with the NYT API
5.Logistic Regression
Thought Experiments
Classifiers
Runtime
You
Interpretability
Scalability
M6D Logistic Regression Case Study
Chck Models
The Underlying Math
Estimating α and β
Newton's Method
Stochastic Gradient Descent
Implementation
Evaluation
Media 6 Degrees Exercise
Sample R Code
6.1ime Stamps and Financial Modeling
Kyle Teague and GetGlue
Timestamps
Exploratory Data Analysis (EDA)
Metrics and New Variables or Features
What's Nextl
Cathy O'Neil
Thought Experiment
Financial Modeling
In—Sample, Out—of—Sample, and Causality
Preparing Financial Data
Log Returns
Example: The S&P Index
Working out a Volatility Measurement
Exponential Downweighting
The Financial Modeling Feedback Loop
Why Regression?
Adding Priors
A Baby Model
Exercise: GetGlue and Timestamped Event Data
Exercise: Financial Data
7.Extracting Meaning from Data
William Cukierski
Background: Data Science Competitions
Background: Crowdsourcing
The Kaggle Model
A Single Contestant
Their Customers
Thought Experiment: What Are the Ethicallmplications of a Robo—Grader?
Feature Selection
Example: User Retention
Filters
Wrappers
Embedded Methods: Decision Trees
Entropy
The Decision Tree Algorithm
Handling Continuous Variables in Decision Trees
Random Forests
User Retention: Interpretability Versus Predictive Power
David Huffaker: Google's Hybrid Approach to Social Research
Moving from Descriptive to Predictive
Social at Google
Privacy
Thought Experiment: What Is the Best Way to Decrease Concern and Increase Understanding and Control?
8.Recommendation Engines:Building a User—Facing Data Product at Scale
A Real—World Recommendation Engine
Nearest Neighbor Algorithm Review
Some Problems with Nearest Neighbors
Beyond Nearest Neighbor: Machine Learning Classification
The Dimensionality Problem
Singular Value Decomposition (SVD)
Important Properties of SVD
Principal Component Analysis (PCA)
Alternating Least Squares
Fix V and Update U
Last Thoughts on These Algorithms
Thought Experiment: Filter Bubbles
Exercise: Build Your Own Recommendation System
Sample Code in Python
9.Data Visualization and Fraud Detection
Data Visualhation History
Gabriel Tarde
Mark's Thought Experiment
What Is Data Science, Redux?
Processing
Franco Moretti
A Sample of Data Visualization Projects
Mark's Data Visualization Projects
New York Times Lobby: Moveable Type
Project Cascade: Lives on a Screen
Cronkite Plaza
eBay Transactions and Books
Public Theater Shakespeare Machine
Goals of These Exhibits
Data Science and Risk
About Square
The Risk Challenge
The Trouble with Performance Estimation
Model Building Tips
Data Visualization at Square
Ian's Thought Experiment
Data Visualization for the Rest ofUs
Data Visualization Exercise
……
10.Social Networks and Data Journalism
11.Causality
12.Epidemiology
13.Lessons Learned from Data Competitions:Data Leakage and Model Evaluation
14.Data Engineering:MapReduce,Pregel,and Hadoop
15.The Students Speak
16.Next—Generation Data Scientists,Hubris,and Ethics
Index
作者简介
作者:(美国)舒特(Rachel Schutt) (美国)奥尼尔(Cathy O'Neil)
舒特(Rachel Schutt),新闻集团数据科学高级副总裁,是哥伦比亚大学的统计学兼职教授,也是数据科学和工程学院教育委员会的创始会员。
奥尼尔(Cathy O'Neil),Johnson研究实验室的高级数据科学家,具有哈佛大学的数学博士学位,是麻省理工学院数学系的博士后,曾经是巴纳德学院的教授。
参考资料
数据科学.当当网.
数据科学.当当网.
最新修订时间:2024-02-23 09:59
目录
概述
内容简介
图书目录
参考资料