reading

Dozens of times per day, we all interact with intelligent machines that are constantly learning from the wealth of data now available to them. These machines – from smartphones and talking robots to self-driving cars – are remaking the world of the twenty-first century in the same way that the Industrial Revolution remade the world of the nineteenth.

AIQ is based on a simple premise: if you want to understand the modern world and where it’s heading, you have to know a little bit of the mathematical language spoken by intelligent machines. AIQ will teach you that language, but in an unconventional way: through stories rather than equations.

You will meet a fascinating cast of historical characters – from Isaac Newton to Florence Nightingale – who have a lot to teach you about data, probability and better thinking. Along the way you will see how these same ideas are playing out in the modern age of big data and intelligent machines. And rather than ushering in the dystopian future so familiar from science fiction, you will see how these technologies can help us overcome some of our built-in cognitive weaknesses, and give us all the chance to lead happier, healthier and more fulfilled lives.

INDEX

The page references in this index correspond to the printed edition from which this ebook was created. To find a specific word or phrase from the index, please use the search feature of your ebook reader.

Abd al-Rahman al-Sufi, 49, 58

Aiken, Howard, 117

Alexa (Amazon digital assistant), 5, 8, 11, 15, 115, 120, 140

Alfa Romeo 6C, 5–6

algorithms, 24

in AI, 3–4

anomaly detection and, 164, 166, 168, 170, 172, 177

assumptions and bias in, 209–212, 226–228, 230, 233–236

chains of, 3–4

data accumulation and, 6

deep-learning, 2, 44, 73–74, 209

health care and, 196–197, 199–202, 204, 207, 210–212, 230

journalism and, 111

natural language processing and, 111, 129, 133–133, 137, 139, 142–143

pattern recognition and, 54, 74 personalization and, 13–15, 24, 34, 36–37, 39

politics and, 37

recidivism-prediction algorithms, 234–235

suggestions versus search, 15

Alibaba, 1, 69, 178

Alipay, 171

Amazon, 1, 15

Alexa, 5, 8, 11, 15, 115, 120, 140

data storage, 6

delivery speed, 69

Echo, 110, 178

market dominance, 9

recommender system, 18

Web Services, 6

American Association for the Advancement of Science, 21

American Statistical Association, 21

Journal of the American Statistical Association, 23, 242n6

Andromeda. See Great Andromeda Nebula

anomaly detection, 145–175

averaging, 155, 163

bias (type of anomaly), 157, 158

coin clipping and, 150–153, 155, 159–161, 169

Formula 1 racing and, 163–164, 172–173, 175, 178, 190, 201

fraud and, 169–171

importance of variability, 148–149

law enforcement and, 167–169

Moneyball, 171–175

NBA and, 173–175

overdispersion (type of anomaly), 157, 159

Patriots coin toss record and, 145–148, 154, 156, 166

simulated coin toss record, 147

smart cities and, 148–149, 164–169

square-root rule (de Moivre’s equation) and, 156–159, 162–163, 165, 168

Trial of the Pyx (Royal Mint fraud protection), 149, 153–163, 175

Apple

data storage, 6

iPhone, 117–118, 142

market dominance, 9

pattern-recognition system, 53

Aristophanes: The Frogs, 152

artificial intelligence (AI)

algorithms and, 3–4

anxieties regarding, 7–10

assumptions and, 209–228

bias in, bias out, 212, 233–237

contraception and, 61–62

criminal justice system and, 234–236

democratization of, 2, 6, 8, 228

diffusion and dissemination of, 2–3, 8

enabling technological trends, 5–6, 63

image classification, 5, 45, 46, 73

image recognition, 11, 61, 66, 200

model rust, 212, 228–233

models versus reality, 215–217

Moravec paradox, 81

policy, 8

rage to conclude bias, 212, 213–217, 225–226, 228

robot cars, 5–8, 77–79, 80–82, 91–97, 100, 106, 197, 200

salaries, 1

SLAM problem (simultaneous localization and mapping), 80–81, 91–92, 95

speech recognition, 5, 114–115, 124–125, 127, 130–131, 140

talent and workforce, 1

twenty questions game and, 133–140

See also anomaly detection; Bayes’s rule;

health care and medicine; natural

language processing (NLP); pattern

recognition; personalization;

prediction rules

assumptions, 209–228

astronomy, 48–53, 58–61, 68–76

Alpha Centauri, 52

Bayes’s rule and, 106

Great Andromeda Nebula, 49–53, 50, 58

Leavitt’s original equation, 60

Leavitt’s prediction rule data, 56

measuring stars, 52–53

Milky Way, 48–53, 57–58, 75, 242n6

nebulae, 49–53, 58

oscillation of a pulsating star, 55

parallax, 52–57

pulsating stars, 54–59, 63, 71–72, 75–76

statistics and, 189

Athey, Alex, 168

automation, 9, 70. See also robotics

autonomous cars. See robot cars

availability heuristic, 7

Baidu, 1, 178

base-rate neglect, 100

Bayes’s rule

coin flips and, 103–104

discovery of, 91

as an equation, 106–107

investing and, 100–105

mammograms and, 96–100, 98–99, 107, 194

medical diagnostics and, 96–100

robot cars and, 81–82, 91–96, 94, 100, 106

search for Air France Flight 447 and, 106

search for USS Scorpion and, 82–90

usefulness of, 81–82, 95–105

Bayesian search, 84–90, 101–102, 106

essential steps of, 84–86

posterior probabilities, 86, 91, 94, 97, 99–100, 104–105, 107

prior beliefs and revised beliefs, 85

prior beliefs and search for USS Scorpion, 89

prior probabilities, 84–86, 88–89, 91, 94–95, 97, 99–100, 105, 107

Bel Geddes, Norman, 117

Belichick, Bill, 145

Bell Labs, 79

BellKor’s Pragmatic Chaos, 14, 31–32

Berglund Scherwitzl, Elina, 61–62

Bernoulli, Johann, 162

big data. See data science; data sets

birth control. See contraception and birth control

bisphosphonates, 210–212

Black Lives Matter, 37

Borges, Jorge Luis: “The Library of Babel,” 131

Bornn, Luke, 173–174

brachistochrone curve, 162

Brooklyn Nets, 173–175

Buffett, Warren, 101–102

cancer, 2, 15

bisphosphonates and, 210–212

breast cancer, 96–100, 107, 194

colorectal cancer, 40, 210

esophageal cancer, 209–211

lymphoma, 40

medical imaging, 199

skin cancer, 200–202

surgery, 198

targeted therapy, 39–41

car accidents, 222–224

Cardwell, Chris, 210–211

Carnegie Mellon University, 124–125, 167

cars. See car accidents; Formula 1 racing; robot cars

CDC. See Centers for Disease Control and Prevention (CDC)

Centers for Disease Control and Prevention (CDC), 229–230

chatbots, 111, 115, 118, 141

China

chatbots, 111

robotic automation, 9, 79–80

tech companies, 1, 171

toilet paper theft (Temple of Heaven Park), 43–44, 49

Churchill, Winston, 20

Cinematch (Netflix recommender system), 13

civil rights activists and organizations, 37, 75

Clinton, Bill, 67

Clinton, Hillary, 37, 110, 233

cloud computing, 63, 172, 202, 207

as AI enabler, 6

coin clipping, 150–153, 155, 159–161, 169

coin toss

Bayes’s rule and, 103–104

New England Patriots and, 145–148, 154, 156, 166

Cold War, 5, 83

Columbia University: Statistical Research Group (SRG), 20–24

computers

BINAC, 120

compilers, 113, 121–123, 245n3

interpreters and compilers, 245n3

speed of, 5–6

subroutines, 121–122

UNIVAC, 5, 119, 122, 143

See also Hopper, Grace

conditional probability, 16–17, 24–30, 34–35, 80, 90–91

asymmetry of, 27, 104

health care and, 39–41

personalization and, 16–17, 41

weather and, 16

See also Bayes’s rule

contraception and birth control

assumptions and, 218–226

history of, 218

Natural Cycles (phone app), 61–62

rhythm method, 62

Cook, E. T., 186

Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), 234–236

Craven, John, 82, 84, 86–90

credit cards

digital assistants and, 110

fraud, 4, 148, 164, 170

Crimean War, 179–189

criminal justice system, 234–236

cucumbers, 2–4, 45, 48–49, 72, 130, 178

data, missing, 25–29, 31

data accumulation, pace of, 6

data mining, 63, 172

data science, 1, 8, 10–11

anomaly detection and, 155–158, 163, 168–172

assumptions and, 215–217, 227

democracy and, 38

feature engineering, 70

health care and, 40, 185–190, 196, 203–207, 229–230, 232

imputation, 27

institutional commitment and, 179

legacy of Florence Nightingale, 179, 185–189

lurking variable, 222–223

pattern recognition and, 45

personalization and, 15–16

prediction rules and, 62, 65, 69–70

user-based collaborative filtering, 33

data sets, 28

anomalies in, 164–165, 169–170, 172

assumptions and, 209, 211, 225, 227–228

bias in, bias out, 233–237

ImageNet Visual Recognition Challenge, 72

massive, 6, 16, 34, 39, 63, 66–70, 133, 142, 164–165, 169–170, 172, 200, 204

pattern recognition and, 45, 47–48, 55–56, 59–61, 63, 66–68

privacy, 206–207

sharing, 204–206

databases

compilers and, 199, 122

health care, 40, 178, 191, 195, 200, 204–205, 210–211, 227

natural language processing, 61, 63, 72, 74, 131, 133

Netflix, 29–31

smart cities, 168

de Moivre’s equation (square-root rule), 156–159, 162–163, 165, 168

decision-making, 238

anomaly detection and, 163

human, 9, 37, 236–237

voting, 37

deep learning, 2, 44, 130, 171, 200, 209

corn yields and, 73

electricity demands and, 73

gender portrayals in film and, 73–74

honeybees and, 73

prediction rules and, 70–74

privacy and, 74–75

Descartes Labs, 73

Dickens, Charles

Christmas Carol, A, 133–134, 136 Martin Chuzzlewit, 185, 190

digital assistants

Alexa (Amazon), 1, 5, 8, 11, 15, 110, 115, 120, 140, 178

algorithms and, 3–4

Google Home, 3, 5, 110

medicine and, 202

speech recognition and, 5

DiMaggio, Joe, 212–217, 220, 226

Dole, Bob, 67

Duke University, 193, 196–197

early-warning systems, 197–198, 203

Earth, 51–53, 55, 58, 79

Echo, Amazon, 110, 178

e-commerce, 1, 69, 178

Eggo, Rosalind, 231–233

Einstein, Albert, 20, 75, 156

energy industry, 2

Facebook, 237

advertisers, 36–37

anomaly detection, 164

“data for gossip” bargain, 36

data sets, 204

data storage, 6

image classification and recognition, 1, 5, 45, 61

market dominance, 9

pattern-recognition system, 53, 61

personalization, 15, 39, 41

presidential election of 2016 and, 36–38

targeted marketing, 36

Facebook Messenger, 111

fake news, 7, 142, 226

financial industry, 2

Bayes’s rule and investing, 100–105

gambling strategy, 100–102

indexing strategy, 100–101, 105

Fitbit, 178, 202

Ford, Henry, 70

Formula 1 racing, 163–164, 172–173, 175, 178, 190, 201

Fowler, Samuel Lemuel, 116

Friedman, Milton, 21

Friends (television series), 111–112, 137, 142

Gawande, Atul: The Checklist Manifesto, 194

Geena Davis Institute on Gender in Media, 73

gender

bias in films, 73–74

stereotypes, 75–76

word vectors and, 139–140

Google

anomaly detection, 164

data sets, 204

data storage, 6

image classification, 73

Inception (neural-network model), 66–69, 72

market dominance, 9

pattern-recognition system, 53

personalization, 15

search engine, 15, 131

self-driving car, 5, 78, 177

speech recognition, 114–115

TensorFlow, 2

word2vec model, 133, 137, 140, 142

Google 411, 131

Google DeepMind, 201

Google Doodle, 144

Google Flu Trends, 228–233

Google Home, 3, 5, 110

Google Ngram Viewer, 129

Google Translate, 61, 110, 114–115, 132, 142–143

Google Voice, 140

Gould, Stephen Jay, 213

GPS technology, 16, 92, 168, 206

Great Andromeda Nebula, 49–53, 50, 58. See also astronomy

Great Recoinage (1696), 150, 160–162, 246n16

Green, Jane, 210–211

Greenblatt, Joel, 101

Gresham’s law, 152

Guest, William “Bull Dog,” 86–87

Hall, John, 186

Harvard Computers (math team), 54

Harvard Mark I, 116–119

HBO, 15, 35

health care and medicine

AI and, 189–208

contraception failure rates, 219, 225

Crohnology, 39

data-science legacy of Florence Nightingale, 185–189

data sharing, 204–206

future trends, 203–208

General Practice Research Database (U.K.), 211

genomic profiling, 40–41

glomerular filtration rate (GFR), 191–197, 249n39

Google searches for “how long does flu last,” 230

health care policy, 33

hospitals, 177, 179, 181–184, 186, 188–198, 202–208

ibrutinib (cancer drug), 40–41

incentives, 203–204

influenza, 228–233, 240

innovation and, 177–208

kidney disease, 190–197, 203–206, 249n39

kidney function over time, 192

mammogram screening, 96–100, 107, 194

medical imaging, 199–201

Nightingale’s coxcomb diagram (1858), 187

patient-centered social networks, 39

PatientsLikeMe, 39

privacy and security, 206–207

remote medicine, 202–203

smart medical devices, 198–199

targeted therapy, 39–40

Tiatros, 39

See also cancer; Nightingale, Florence

Heller, Katherine, 193–195, 197, 204, 207

Herbert, Sidney, 183

Herd, Andy, 111

Higgs boson, 61

Hofstadter, Douglas, 120

Hopper, Grace, 115–124, 143–144

“Amazing Grace,” 115

compiler breakthrough, 118–123

death of, 144

early years and education, 115–116

final years, 143–144

Hopper, Grace (cont’d)

FLOW-MATIC (data-processing compiler), 123

Google Doodle in honor of, 144

Harvard Mark I and, 116–119

top-down, rules-based approach of, 124–125, 131

UNIVAC and, 119, 122, 143

World War II and, 116–118

Hotelling, Harold, 20

House of Cards (Netflix series), 14, 40

Howard, Dwight, 174

Hubble, Edwin, 58–59, 75–76

Hubble Space Telescope, 48, 58–59

Hunter College, 21

IBM

early speech-recognition, 124

Harvard Mark I, 116–119

Watson supercomputer, 109–110, 115, 202

ImageNet Visual Recognition Challenge, 62

image classification, 5, 45, 46, 73

image recognition, 11, 200

prediction rules and, 61, 66

Imperial College London, 198

imputation, 27

influenza, 228–233, 240

Influenza-like Illness Surveillance Network (ILINet), 229–230

Infor, 173, 175

Instagram, 16

data storage, 6

hashtags, 68

psychology and, 227

Insulet, 199

investing. See financial industry

James, LeBron, 21, 174

Japan, 2, 130, 132, 178

Jefferson, Thomas, 38

Johnson, Lyndon, 83

Johnson, Mark, 73

Kaiserswerth (German charity hospital), 181, 186

Kant, Immanuel, 50

kidney disease, 190–197, 203–206, 249n39

Koike, Makoto, 2, 45, 178

Kolmogorov–Smirnov statistics, 164

Kubrick, Stanley, 109

Lagerman, Björn, 73

Laplace, Pierre-Simon, 91, 163

law enforcement, 2

Lazer, David, 231

Leavitt, Henrietta, 5, 49, 53–61, 63, 68, 71–72, 75–76, 130

death of, 75

early years and education, 54

gender stereotypes and, 75–76

original equation of, 60

prediction rule data of, 56

prediction rule discovered by, 55–61

Legendre, Adrien-Marie, 60–61, 65, 68, 243n9

Leonard, Kawhi, 174

Lewis, Michael, 171

LIDAR (light detection and ranging sensor), 92–93, 95

image of a highway, 93

See also robot cars

Lin, Jeremy, 175

London School of Hygiene & Tropical Medicine, 231

Lynch, Peter, 101

Macaulay, Charles, 153

machine learning, 193, 206, 234. See also artificial intelligence (AI); Bayes’s rule; neural networks

Netflix and, 13–15

mammograms, 96–100, 107, 194

Manhattan Project, 118

market dominance, 9–10

mathematics, 10–11

computer science and, 113–123

conditional probability, 16–17, 24–30, 34–35, 40–41, 80

“math skill,” 33

Newton’s worst mathematical mistake, 149–153, 159–163

Nightingale, Florence, and, 180, 186, 193

pattern recognition and, 47, 49–50, 53, 55, 56, 57, 60, 65–68

principle of least squares, 60, 65–66

square-root rule (de Moivre’s equation), 156–159, 162–163, 165, 168

suggestion engines and, 15–31

twenty questions game and, 133–139

word vectors and, 139–141

See also Bayes’s rule; neural networks; prediction rules

maximum heart rate, 47–48, 53, 63–65

equations for, 64

Mayor’s Office of Data Analytics (MODA), 164–166

medicine. See health care and medicine

Medtronic, 199

Menger, Karl, 18–19

Microsoft, 9

Microsoft Azure, 6

modeling

assumptions and, 215–217, 224–225

deep-learning models, 73

imputation and, 27

Inception, 66–69, 72

latent feature, 32–34, 39–40

massive models, 63–69

missing data and, 25–29, 31

model rust, 212, 228–233

natural language processing and, 129–133, 136, 142–143

prediction rules as, 47–48

reality versus, 215–217

rules-based (top-down) models, 112, 114, 124, 126, 129

training the model, 47–48, 70, 72, 129

Moneyball, 171–175

Moore’s law, 5–6

Moravec paradox, 81

Morgenstern, Oskar, 18, 20

Musk, Elon, 7

natural language processing (NLP), 109–142

ambiguity and, 127–128

bottom-up approach, 114, 129

chatbots, 111, 115, 118, 141

digital assistants, 1, 3–5, 8, 11, 15, 110, 115, 120, 140, 178, 202

future trends, 142–143

Google Translate, 61, 110, 114–115, 132, 142–143

growth of statistical NLP, 128–133

knowing how versus knowing that, 128–129, 132

natural language revolution, 112, 124–128

“New Deal” for human-machine linguistic interaction, 114

prediction rules and, 114, 129–131

programing language revolution, 112–123

robustness and, 126–127

rule bloat and, 125–126

speech recognition, 5, 114–115, 124–125, 127, 130–131, 140

top-down approach, 112, 114, 124, 126, 129

word co-location statistics, 137

word vectors, 114–115, 133–137, 139–141

naturally occurring radioactive materials (NORM), 167

Netflix

Crown, The (series), 34–35

data scientists, 15

history of, 13–14

House of Cards (series), 14, 40

Netflix Prize for recommender system, 13–14, 31–32

personalization, 13–15, 18, 22, 25, 34–36, 40–41, 177

recommender systems, 13–14, 29–33

neural networks, 111–112, 114, 137, 141–142, 200–202, 233

deep learning and, 2, 44, 70–74, 130, 171, 200, 209

Friends new episodes and, 111–112, 137, 142

Inception model, 66–69, 72

prediction rules and, 63, 66, 70–73

New England Patriots, 145–148, 154, 156, 166

Newton, Isaac, 149–153, 159–163, 175

Nightingale, Florence, 179–190, 193–194, 197, 207–208

coxcomb diagram (1858), 187

Crimean War and, 182–184

early years and training, 180–182

evidence-based medicine legacy of, 188–189

“lady with the lamp,” 179

medical statistics legacy of, 186–188

nursing reform legacy of, 185–186

Nvidia, 66, 201

Obama, Barack, 110

Office of Scientific Research and Development, 21

parallax, 52–53

pattern recognition, 43–76

cucumber sorting, 45

input and output, 45–48, 53, 57, 59, 61–63, 70–72

learning a pattern, 59, 67–80

maximum heart rate and, 47–48, 53, 63–65, 64

prediction rules and, 48, 53, 57–72, 75

toilet paper theft and, 43–44, 49

See also prediction rules

PayPal, 170–171, 178

personalization, 13–42

conditional probability and, 16–17, 24–30, 34–35, 39–41

latent feature models and, 32–34, 39–40

Netflix and, 13–15, 18, 22, 25, 34–36, 40–41, 177

Wald’s survivability recommendations for aircraft and, 22–29

See also recommender systems; suggestion engines

philosophy, 50, 128–129, 132

Pickering, Edward C., 54

politics, 36–39, 67, 74, 110, 142, 233

prediction rules

contraception and, 61–62

deep learning and, 70–74

evaluation of, 59–61

Google Translate and, 61

Great Andromeda Nebula and, 49–53, 58

image recognition and, 61

massive data and, 66–68

massive models and, 63–66

as models, 47–48

natural language processing and, 114, 129–131

neural networks and, 63, 66, 70–73

overfitting problem, 67–68, 130

training the model, 47–48, 70, 72, 129

trial and error strategy, 68–69

Price, Richard, 91

principle of least squares, 60, 65–66

privacy, 7, 10, 44, 74–75, 178, 204–208

ProPublica, 234

Quetelet, Adolphe, 189

rage to conclude bias, 212, 213–217, 225–226, 228

ransomware, 207

Reagan, Ronald, 143

recommender systems, 13–18

health care and, 40

large-scale, 40

legacy of, 35–41

Netflix, 13–14, 29–33

See also suggestion engines

Rees, Mina, 21

Reinhart, Alex, 167–168

robot cars, 5–8, 77–79, 97, 197, 200

Bayes’s rule and, 81–82, 91–96, 94, 100, 106

introspection and extrapolation (dead reckoning), 94–95

LIDA image of a highway, 93

LIDAR (light detection and ranging sensor), 92–93, 95

SLAM problem (simultaneous localization and mapping) and, 80–81, 91–92, 95

Waymo, 78, 80

robotics

Bayes’s rule and, 81–105

in China, 9

revolution of, 79–82

SLAM problem (simultaneous localization and mapping), 80–81, 91–92, 95

search for USS Scorpion and, 82–90, 106

Stanford Cart, 79

Theseus (life-size autonomous mouse), 79

See also robot cars

Rose, Pete, 214–215

Royal Mint, 149–163

coin clipping, 150–153, 155, 159–161, 169

Great Recoinage (1696), 150, 160–162, 246n16

Newton, Isaac and, 149–153, 159–163, 175

Trial of the Pyx, 149, 153–163, 175

Russell, Alexander Wilson, 115–116

S&P 500, 101

Salesforce, 111

Sapir, Edward, 125

Sarandos, Ted, 15

SAT (standardized test), 33, 139–140

Scherwitzl, Raoul, 62

Schlesinger, Karl, 18–19

Schuschnigg, Kurt, 19

Schweinfurt-Regensburg mission (World War II), 26–27

sci-fi

AI anxiety and, 7, 238

robots, 126

self-driving cars. See robot cars

Sendak, Mark, 196, 203–205, 207

Shannon, Claude, 79

Shapley, Harlow, 57, 242n6

Sieveking, Edward Henry, 185

Silicon Valley, 9, 17, 40, 149

Skype, 111

SLAM problem (simultaneous localization and mapping), 80–81, 91–92, 95

Slattery, Francis A., 82. See also USS Scorpion

Slipher, Vesto, 51

smart cities, 148–149, 164–169

smart medical devices, 198–199

smart watch, 178

smartphones, 111, 178, 200

apps, 61–62, 197, 204

GPS technology, 92, 168

social media, 37, 48, 219, 227. See also Facebook

SpaceX, 7

speech recognition, 5, 114–115, 124–125, 127, 130–131, 140

Harpy (early program), 124–125

See also natural language process (NLP)

Spotify, 18, 41, 42, 237

conditional probability, 16

personalization, 15

square-root rule (de Moivre’s equation), 156–159, 162–163, 165, 168

Standard and Poor’s, 101

Stanford University, 79, 200–202

Statistical Research Group (SRG), 20–24

statistics, 19–20, 22, 24, 106, 155–158, 179, 186–190

Stigler, George, 21

Stripe (payment system), 171

submarines. See USS Scorpion

suggestion engines, 13

bright side of, 39–41

dark side of, 35–39

as “doppelgänger software,” 15

targeted marketing and, 35–36

See also recommender systems

super-utilizer, 192

survivorship bias, 23–24

2001: A Space Odyssey (film), 109

Takats, Zoltan, 198, 207

Tandem, 199

Teller, Edward, 20

Tencent, 1

Tesla, 7,

Thrun, Sebastian, 200, 207

Tiatros (PTSD-centered social network), 39

toilet paper theft, 43–44, 49

Trial of the Pyx, 149, 153–163, 175

Trump, Donald, 233

Tufte, Edward, 225

Uber, 80, 206,

Ulam, Stanislaw, 20

UNIVAC, 5, 119, 122, 143

USS Scorpion, 82–90, 106

bow section, 90

prior beliefs and search for USS Scorpion, 89

Varroa mites, 73

Vassar College, 21, 116, 118

von Neumann, John, 20, 67–68, 118

Wald, Abraham, 17–29, 31, 104, 242n6

early years and education, 18–19

member of Statistical Research Group (Columbia), 20–24

sequential sampling, 22

survivability recommendations for aircraft, 22–29

in United States, 19–21

Wallis, W. Allen, 20–21

WannaCry (ransomware attack), 207

waterfall diagram, 97–100, 98, 106–107

Watson (IBM supercomputer), 109–110, 115, 202

Waymo (autonomous-car company), 78, 80

WeChat, 111, 143

word vectors, 114–115, 133–137, 139–141

word2vec model (Google), 133, 137, 140, 142

World War I, 41–42, 229

World War II, 17–18

Battle of the Bulge, 21

Bayesian search and, 84

Hopper, Grace, and, 116–117

Schweinfurt-Regensburg mission (World War II), 26–27

Statistical Research Group (Columbia) and, 20–24

Wald’s survivability recommendations for aircraft, 22–29

Yormark, Brett, 173

YouTube, 6, 15

Zillow, 65–66, 227

ACKNOWLEDGMENTS

Together we want to thank the two people most responsible for nurturing this book from its earliest stages: Lisa Gallagher of DeFiore & Company and Tim Bartlett of St. Martin’s Press. This is the first thing either of us have written that wasn’t for an academic audience, and we started with almost no sense of what writing and publishing a “real” book would actually entail. We are so grateful to Lisa for seeing and cultivating the potential in those first scribbled drafts, which now look so painfully clunky. We are equally grateful to Tim, both for taking a gamble on two data scientists foolhardy enough to try their hands at writing, and for giving us such unerring advice along the way. We are also indebted to Doug Young of Transworld for his valuable editorial feedback.

We also thank the many other people at DeFiore, St. Martin’s Press, and Macmillan who have been so helpful, including Robert Allen, Alan Bradshaw, Jeff Capshew, Laura Clark, Jennifer Enderlin, Tracey Guest, Leah Johanson, Linda Kaplan, Alice Pfeifer, Gabrielle Piraino, Jason Prince, Sally Richardson, Brisa Robinson, Mary Beth Roche, Robert Van Kolken, Laura Wilson, and George Witte. We give special thanks to India Cooper, whose magnificent editing efforts have put into stark relief the difference between a professional writer and two amateurs like us. Thanks also to Larry Finlay, Bill Scott-Kerr, and the rest of the Transworld team for their support.

We thank Ellen Zippi for her invaluable help in researching this book. We are also grateful to many of our colleagues for sharing stories and expertise, most especially Steven Levitt for introducing us to Lisa Gallagher, and David Madigan for drawing our attention to the two studies on bisphosphonate usage described in chapter 7. We thank Rosalind Eggo, Katherine Heller, and Mark Sendak for their time and trouble in agreeing to be interviewed. Thanks also to those family members who tirelessly read early drafts and gave their feedback: Catherine Aiken, Patricia and Josh Lowry, Anne and George Scott, and Brian Woods.

PERSONAL ACKNOWLEDGMENTS

I am grateful to my co-author, James Scott. Above all, I thank my family for their love and support: my wife, Anne Gron, and our children, Emma, Michael, and Sarah.

Thank you to Nick Polson. I owe Nick for so much in my career that I cannot possibly list it all here; this book is but the latest in a long string of projects and ideas that he has so generously shared with me. I expect that over the decades to come, I will look back and see Nick as the single most important influence on my professional life, and the best friend I ever had in this field. I also want to thank the three most important teachers I ever had: Bill Jeffreys, Jim Berger, and John Trimble. Without Bill and Jim, I would never have become a statistician. Without John’s kindness and generosity, I would never have known how to “tighten/sharpen/brighten” my way to better prose. I also thank my parents, who gave me so much—not least of which was their example. Finally, I am grateful to my wife, Abigail Aiken, for just about everything. I love you, and I could not have helped write this book without your support.

ABOUT THE AUTHORS

Nick Polson is Professor of Econometrics and Statistics at the Chicago Booth School of Business. He is a Bayesian statistician involved in research in machine intelligence, deep learning and computational methods. He regularly speaks to large audiences in the US, UK and the rest of Europe.

James Scott is Associate Professor of Statistics at the University of Texas. James is a statistician and data scientist who studies Bayesian inference and computational methods for big data. James lives in Austin, Texas with his wife, Abigail.

Transworld is part of the Penguin Random House group of companies whose addresses can be found at global.penguinrandomhouse.com

Nicholas Polson and James Scott has asserted their right under the Copyright, Designs and Patents Act 1988 to be identified as the authors of this work.

Every effort has been made to obtain the necessary permissions with reference to copyright material, both illustrative and quoted. We apologize for any omissions in this respect and will be pleased to make the appropriate acknowledgements in any future edition.

This ebook is copyright material and must not be copied, reproduced, transferred, distributed, leased, licensed or publicly performed or used in any way except as specifically permitted in writing by the publishers, as allowed under the terms and conditions under which it was purchased or as strictly permitted by applicable copyright law. Any unauthorized distribution or use of this text may be a direct infringement of the author’s and publisher’s rights and those responsible may be liable in law accordingly.

INTRODUCTION

WE TEACH DATA science to hundreds of students per year, and they’re all fascinated by artificial intelligence. And they ask great questions. How does a car learn to drive itself? How does Alexa understand what I’m saying? How does Spotify pick such good playlists for me? How does Facebook recognize my friends in the photos I upload? These students realize that AI isn’t some sci-fi droid from the future; it’s right here, right now, and it’s changing the world one smartphone at a time. They all want to understand it, and they all want to be a part of it.

And our students aren’t the only ones enthusiastic about AI. They’re joined in their exaltation by the world’s largest companies—from Amazon, Facebook, and Google in America to Baidu, Tencent, and Alibaba in China. As you may have heard, these big tech firms are waging an expensive global arms race for AI talent, which they judge to be essential to their future. For years we’ve seen them court freshly minted PhDs with offers of $300,000+ salaries and much better coffee than we have in academia. Now we’re seeing many more companies jump into the AI recruiting fray—firms sitting on piles of data in, say, insurance or the oil business, who are coming along with whopping salary offers and fancy espresso machines of their own.

Yet while this arms race is real, we think there’s a much more powerful trend at work in AI today—a trend of diffusion and dissemination, rather than concentration. Yes, every big tech company is trying to hoard math and coding talent. But at the same time, the underlying technologies and ideas behind AI are spreading with extraordinary speed: to smaller companies, to other parts of the economy, to hobbyists and coders and scientists and researchers everywhere in the world. That democratizing trend, more than anything else, is what has our students today so excited, as they contemplate a vast range of problems practically begging for good AI solutions.

Who would have thought, for example, that a bunch of undergraduates would get so excited about the mathematics of cucumbers? Well, they did when they heard about Makoto Koike, a car engineer from Japan whose parents own a cucumber farm. Cucumbers in Japan come in a dizzying variety of sizes, shapes, colors, and degrees of prickliness—and based on these visual features, they must be separated into nine different classes that command different market prices. Koike’s mother used to spend eight hours per day sorting cucumbers by hand. But then Koike realized that he could use a piece of open-source AI software from Google, called TensorFlow, to accomplish the same task, by coding up a “deep-learning” algorithm that could classify a cucumber based on a photograph. Koike had never used AI or TensorFlow before, but with all the free resources out there, he didn’t find it hard to teach himself how. When a video of his AI-powered sorting machine hit YouTube, Koike became an international deep-learning/cucumber celebrity. It wasn’t merely that he had given people a feel-good story, saving his mother from hours of drudgery. He’d also sent an inspiring message to students and coders across the world: that if AI can solve problems in cucumber farming, it can solve problems just about anywhere.

That message is now spreading quickly. Doctors are now using AI to diagnose and treat cancer. Electrical companies use AI to improve power-generating efficiency. Investors use it to manage financial risk. Oil companies use it to improve safety on deep-sea rigs. Law enforcement agencies use it to hunt terrorists. Scientists use it to make new discoveries in astronomy and physics and neuroscience. Companies, researchers, and hobbyists everywhere are using AI in thousands of different ways, whether to sniff for gas leaks, mine iron, predict disease outbreaks, save honeybees from extinction, or quantify gender bias in Hollywood films. And this is just the beginning.

We see the real story of AI as the story of this diffusion: from a handful of core math concepts stretching back decades, or even centuries, to the supercomputers and talking/thinking/cucumber-sorting machines of today, to the new and ubiquitous digital wonders of tomorrow. Our goal in this book is to tell you that story. It is partly a story of technology, but it is mainly a story of ideas, and of the people behind those ideas—people from a much earlier age, people who were just keeping their heads down and solving their own problems involving math and data, and who had no clue about the role their solutions would come to play in inventing the modern world. By the end of that story, you’ll understand what AI is, where it came from, how it works, and why it matters in your life.

What Does “AI” Really Mean?

An algorithm is a set of step-by-step instructions so explicit that even something as literal-minded as a computer can follow them. (You may have heard the joke about the robot who got stuck in the shower forever because of the algorithm on the shampoo bottle: “Lather. Rinse. Repeat.”) On its own, an algorithm is no smarter than a power drill; it just does one thing very well, like sorting a list of numbers or searching the web for pictures of cute animals. But if you chain lots of algorithms together in a clever way, you can produce AI: a domain-specific illusion of intelligent behavior. For example, take a digital assistant like Google Home, to which you might pose a question like “Where can I find the best breakfast tacos in Austin?” This query sets off a chain reaction of algorithms:

And that’s AI. Pretty much every AI system—whether it’s a self-driving car, an automatic cucumber sorter, or a piece of software that monitors your credit card account for fraud—follows this same “pipeline-of-algorithms” template. The pipeline takes in data from some specific domain, performs a chain of calculations, and outputs a prediction or a decision.

There are two distinguishing features of the algorithms used in AI. First, these algorithms typically deal with probabilities rather than certainties. An algorithm in AI, for example, won’t say outright that some credit card transaction is fraudulent. Instead, it will say that the probability of fraud is 92%—or whatever it thinks, given the data. Second, there’s the question of how these algorithms “know” what instructions to follow. In traditional algorithms, like the kind that run websites or word processors, those instructions are fixed ahead of time by a programmer. In AI, however, those instructions are learned by the algorithm itself, directly from “training data.” Nobody tells an AI algorithm how to classify credit card transactions as fraudulent or not. Instead, the algorithm sees lots of examples from each category (fraudulent, not fraudulent), and it finds the patterns that distinguish one from the other. In AI, the role of the programmer isn’t to tell the algorithm what to do. It’s to tell the algorithm how to train itself what to do, using data and the rules of probability.

How Did We Get Here?

Modern AI systems, like a self-driving car or a home digital assistant, are pretty new on the scene. But you might be surprised to learn that most of the big ideas in AI are actually old—in many cases, centuries old—and that our ancestors have been using them to solve problems for generations. For example, take self-driving cars. Google debuted its first such car in 2009. But you’ll learn in chapter 3 that one of the main ideas behind how these cars work was discovered by a Presbyterian minister in the 1750s—and that this idea was used by a team of mathematicians over 50 years ago to solve one of the Cold War’s biggest blockbuster mysteries.

Or take image classification, like the software that automatically tags your friends in Facebook photos. Algorithms for image processing have gotten radically better over the last five years. But in chapter 2, you’ll learn that the key ideas here date to 1805—and that these ideas were used a century ago, by a little-known astronomer named Henrietta Leavitt, to help answer one of the deepest scientific questions that humans have ever posed: How big is the universe?

Or even take speech recognition, one of the great AI triumphs of recent years. Digital assistants like Alexa and Google Home are remarkably fluent with language, and they’ll only get better. But the first person to get a computer to understand English was a rear admiral in the U.S. Navy, and she did so almost 70 years ago. (See chapter 4.)

Those are just three illustrations of a striking fact: no matter where you look in AI, you’ll find an idea that people have been kicking around for a long time. So in many ways, the big historical puzzle isn’t why AI is happening now, but why it didn’t happen long ago. To explain this puzzle, we must look to three enabling technological forces that have brought these venerable ideas into a new age.

The first AI enabler is the decades-long exponential growth in the speed of computers, usually known as Moore’s law. It’s hard to convey intuitively just how fast computers have gotten. The cliché used to be that the Apollo astronauts landed on the moon with less computing power than a pocket calculator. But this no longer resonates, because … what’s a pocket calculator? So we’ll try a car analogy instead. In 1951, one of the fastest computers was the UNIVAC, which performed 2,000 calculations per second, while one of the fastest cars was the Alfa Romeo 6C, which traveled 110 miles per hour. Both cars and computers have improved since 1951—but if cars had improved at the same rate as computers, then a modern Alfa Romeo would travel at 8 million times the speed of light.

The second AI enabler is the new Moore’s law: the explosive growth in the amount of data available, as all of humanity’s information has become digitized. The Library of Congress consumes 10 terabytes of storage, but in 2013 alone, the big four tech firms—Google, Apple, Facebook, and Amazon—collected about 120,000 times as much data as this. And that’s a lifetime ago in internet years. The pace of data accumulation is accelerating faster than an Apollo rocket; in 2017, more than 300 hours of video were uploaded to YouTube every minute, and more than 100 million images were posted to Instagram every day. More data means smarter algorithms.

The third AI enabler is cloud computing. This trend is nearly invisible to consumers, but it’s had an enormous democratizing effect on AI. To illustrate this, we’ll draw an analogy here between data and oil. Imagine if all companies of the early twentieth century had owned some oil, but they had to build the infrastructure to extract, transport, and refine that oil on their own. Any company with a new idea for making good use of its oil would have faced enormous fixed costs just to get started; as a result, most of the oil would have sat in the ground. Well, the same logic holds for data, the oil of the twenty-first century. Most hobbyists or small companies would face prohibitive costs if they had to buy all the gear and expertise needed to build an AI system from their data. But the cloud-computing resources provided by outfits like Microsoft Azure, IBM, and Amazon Web Services have turned that fixed cost into a variable cost, radically changing the economic calculus for large-scale data storage and analysis. Today, anyone who wants to make use of their “oil” can now do so cheaply, by renting someone else’s infrastructure.

When you put those four trends together—faster chips, massive data sets, cloud computing, and above all good ideas—you get a supernova-like explosion in both the demand and capacity for using AI to solve real problems.

AI Anxieties

We’ve told you how excited our students are about AI, and how the world’s largest firms are rushing to embrace it. But we’d be lying if we said that everyone was so bullish about these new technologies. In fact, many people are anxious, whether about jobs, data privacy, wealth concentration, or Russians with fake-news Twitter-bots. Some people—most famously Elon Musk, the tech entrepreneur behind Tesla and SpaceX—paint an even scarier picture: one where robots become self-aware, decide they don’t like being ruled by people, and start ruling us with a silicon fist.

Let’s talk about Musk’s worry first; his views have gotten a lot of attention, presumably because people take notice when a member of the billionaire disrupter class talks about artificial intelligence. Musk has claimed that in developing AI technology, humanity is “summoning a demon,” and that smart machines are “our biggest existential threat” as a species.

After you’ve read our book, you’ll be able to decide for yourself whether you think these worries are credible. We want to warn you up front, however, that it’s very easy to fall into a trap that cognitive scientists call the “availability heuristic”: the mental shortcut in which people evaluate the plausibility of a claim by relying on whatever immediate examples happen to pop into their minds. In the case of AI, those examples are mostly from science fiction, and they’re mostly evil—from the Terminator to the Borg to HAL 9000. We think that these sci-fi examples have a strong anchoring effect that makes many people view the “evil AI” narrative less skeptically than they should. After all, just because we can dream it and make a film about it doesn’t mean we can build it. Nobody today has any idea how to create a robot with general intelligence, in the manner of a human or a Terminator. Maybe your remote descendants will figure it out; maybe they’ll even program their creation to terrorize the remote descendants of Elon Musk. But that will be their choice and their problem, because no option on the table today even remotely foreordains such a possibility. Now, and for the foreseeable future, “smart” machines are smart only in their specific domains:

Moreover, consider the opportunity cost of worrying that we’ll soon be conquered by self-aware robots. To focus on this possibility now is analogous to the de Havilland Aircraft Company, having flown the first commercial jetliner in 1952, worrying about the implications of warp-speed travel to distant galaxies. Maybe one day, but right now there are far more important things to worry about—like, to press the jetliner analogy a little further, setting smart policy for all those planes in the air today.

This issue of policy brings us to a whole other set of anxieties about AI, much more plausible and immediate. Will AI create a jobless world? Will machines make important decisions about your life, with zero accountability? Will the people who own the smartest robots end up owning the future?

These questions are deeply important, and we hear them discussed all the time—at tech conferences, in the pages of the world’s major newspapers, and over lunch among our colleagues. We should let you know up front that you won’t find the answers to these questions in our book, because we don’t know them. Like our students, we are ultimately optimistic about the future of AI, and we hope that by the end of the book, you will share that optimism. But we’re not labor economists, policy experts, or soothsayers. We’re data scientists—and we’re also academics, meaning that our instinct is to stay firmly in our lane, where we’re confident of our expertise. We can teach you about AI, but we can’t tell you for sure what the future will bring.

We can tell you, however, that we’ve encountered a common set of narratives that people use to frame this subject, and we find them all incomplete. These narratives emphasize the wealth and power of the big tech firms, but they overlook the incredible democratization and diffusion of AI that’s already happening. They highlight the dangers of machines making important decisions using biased data, but they fail to acknowledge the biases or outright malice in human decision-making that we’ve been living with forever. Above all, they focus intensely on what machines may take away, but they lose sight of what we’ll get in return: different and better jobs, new conveniences, freedom from drudgery, safer workplaces, better health care, fewer language barriers, new tools for learning and decision-making that will help us all be smarter, better people.

Take the issue of jobs. In America, jobless claims kept hitting new lows from 2010 through 2017, even as AI and automation gained steam as economic forces. The pace of robotic automation has been even more relentless in China, yet wages there have been soaring for years. That doesn’t mean AI hasn’t threatened individual people’s jobs. It has, and it will continue to do so, just as the power loom threatened the jobs of weavers, and just as the car threatened the jobs of buggy whips. New technologies always change the mix of labor needed in the economy, putting downward pressure on wages in some areas and upward pressure in others. AI will be no different, and we strongly support job-training and social-welfare programs to provide meaningful help for those displaced by technology. A universal basic income might even be the answer here, as many Silicon Valley bosses seem to think; we don’t claim to know. But arguments that AI will create a jobless future are, so far, completely unsupported by actual evidence.

Then there’s the issue of market dominance. Amazon, Google, Facebook, and Apple are enormous companies with tremendous power. It is critical

ABOUT THE BOOK

INDEX

ACKNOWLEDGMENTS

PERSONAL ACKNOWLEDGMENTS

ABOUT THE AUTHORS

CONTENTS

INTRODUCTION

What Does “AI” Really Mean?

How Did We Get Here?

AI Anxieties