数据科学_2014年南京东南大学出版社出版的图书

数据科学

2014年南京东南大学出版社出版的图书

《数据科学》是2014年南京东南大学出版社出版的图书，作者是舒特 (Rachel Schutt) 、奥尼尔 (Cathy 0'Neil)。

内容简介

　　本书脱胎于哥伦比亚大学“数据科学导论”课程的教学讲义，它界定了数据科学的研究范畴，是一本注重人文精神，多角度、全方位、深入介绍数据科学的实用指南，堪称大数据时代的实战宝典。本书旨在让读者能够举一反三地解决重要问题，内容包括：数据科学及工作流程、统计模型与机器学习算法、信息提取与统计变量创建、数据可视化与社交网络、预测模型与因果分析、数据预处理与工程方法。另外，本书还将带领读者展望数据科学未来的发展。

图书目录

Preface

1.Introduction： What Is Data Science？

Big Data and Data Science Hype

Getting Past the Hype

Why Now？

Datafication

The Current Landscape （with a Little History）

Data Science lobs

A Data Science Profile

Thought Experiment： Meta—Definition

OK， So What Is a Data Scientist， Really？

In Academia

In Industry

2.Statistical Inference， Exploratory Data Analysis， and the Data Science

Process

Statistic.al Thinking in the Age of Big Data

Statistical Inference

Populations and Samples

Populations and Samples of Big Data

Big Data Can Mean Big Assumptions

Modeling

Exploratory Data Analysis

Philosophy of Exploratory Data Analysis

Exercise： EDA

The Data Science Process

A Data Scientist's Role in This Process

Thought Experiment： How Would You Simulate Chaos？

Case Study： RealDirect

How Does RealDirect Make Money？

Exercise： RealDirect Data Strategy

3.Algorithms

Machine Learning Algorithms

Three Basic Algorithms

Linear Regression

k—Nearest Neighbors （k—NN）

k—means

Exercise： Basic Machine Learning Algorithms

Solutions

Summing It All Up

Thought Experiment： Automated Statistician

4.Spare Filters， Naive Bayes， and Wrangling

Thought Experiment： Learning by Example

Why Won't Linear Regression Work for Filtering Spare？

How About k—nearest Neighbors？

Naive Bayes

Bayes Law

A Spare Filter for Individual Words

A Spam Filter That Combines Words： Naive Bayes

Fancy It Up： Laplace Smoothing

Comparing Naive Bayes to k—NN

Sample Code in bash

Scraping the Web： APIs and Other Tools

Jake's Exercise： Naive Bayes for Article Classification

Sample R Code for Dealing with the NYT API

5.Logistic Regression

Thought Experiments

Classifiers

Runtime

You

Interpretability

Scalability

M6D Logistic Regression Case Study

Chck Models

The Underlying Math

Estimating α and β

Newton's Method

Stochastic Gradient Descent

Implementation

Evaluation

Media 6 Degrees Exercise

Sample R Code

6.1ime Stamps and Financial Modeling

Kyle Teague and GetGlue

Timestamps

Exploratory Data Analysis （EDA）

Metrics and New Variables or Features

What's Nextl

Cathy O'Neil

Thought Experiment

Financial Modeling

In—Sample， Out—of—Sample， and Causality

Preparing Financial Data

Log Returns

Example： The S&P Index

Working out a Volatility Measurement

Exponential Downweighting

The Financial Modeling Feedback Loop

Why Regression？

Adding Priors

A Baby Model

Exercise： GetGlue and Timestamped Event Data

Exercise： Financial Data

7.Extracting Meaning from Data

William Cukierski

Background： Data Science Competitions

Background： Crowdsourcing

The Kaggle Model

A Single Contestant

Their Customers

Thought Experiment： What Are the Ethicallmplications of a Robo—Grader？

Feature Selection

Example： User Retention

Filters

Wrappers

Embedded Methods： Decision Trees

Entropy

The Decision Tree Algorithm

Handling Continuous Variables in Decision Trees

Random Forests

User Retention： Interpretability Versus Predictive Power

David Huffaker： Google's Hybrid Approach to Social Research

Moving from Descriptive to Predictive

Social at Google

Privacy

Thought Experiment： What Is the Best Way to Decrease Concern and Increase Understanding and Control？

8.Recommendation Engines：Building a User—Facing Data Product at Scale

A Real—World Recommendation Engine

Nearest Neighbor Algorithm Review

Some Problems with Nearest Neighbors

Beyond Nearest Neighbor： Machine Learning Classification

The Dimensionality Problem

Singular Value Decomposition （SVD）

Important Properties of SVD

Principal Component Analysis （PCA）

Alternating Least Squares

Fix V and Update U

Last Thoughts on These Algorithms

Thought Experiment： Filter Bubbles

Exercise： Build Your Own Recommendation System

Sample Code in Python

9.Data Visualization and Fraud Detection

Data Visualhation History

Gabriel Tarde

Mark's Thought Experiment

What Is Data Science， Redux？

Processing

Franco Moretti

A Sample of Data Visualization Projects

Mark's Data Visualization Projects

New York Times Lobby： Moveable Type

Project Cascade： Lives on a Screen

Cronkite Plaza

eBay Transactions and Books

Public Theater Shakespeare Machine

Goals of These Exhibits

Data Science and Risk

About Square

The Risk Challenge

The Trouble with Performance Estimation

Model Building Tips

Data Visualization at Square

Ian's Thought Experiment

Data Visualization for the Rest ofUs

Data Visualization Exercise

……

10.Social Networks and Data Journalism

11.Causality

12.Epidemiology

13.Lessons Learned from Data Competitions：Data Leakage and Model Evaluation

14.Data Engineering：MapReduce，Pregel，and Hadoop

15.The Students Speak

16.Next—Generation Data Scientists，Hubris，and Ethics

Index

作者简介

作者：（美国）舒特（Rachel Schutt）（美国）奥尼尔（Cathy O'Neil）

舒特（Rachel Schutt），新闻集团数据科学高级副总裁，是哥伦比亚大学的统计学兼职教授，也是数据科学和工程学院教育委员会的创始会员。

奥尼尔（Cathy O'Neil），Johnson研究实验室的高级数据科学家，具有哈佛大学的数学博士学位，是麻省理工学院数学系的博士后，曾经是巴纳德学院的教授。

参考资料

数据科学.当当网.

最新修订时间：2024-02-23 09:59

条目作者

小编

资深百科编辑

概述

内容简介

图书目录

参考资料