Dozens of times per day, we all interact with intelligent machines that are constantly learning from the wealth of data now available to them. These machines – from smartphones and talking robots to self-driving cars – are remaking the world of the twenty-first century in the same way that the Industrial Revolution remade the world of the nineteenth.
AIQ is based on a simple premise: if you want to understand the modern world and where it’s heading, you have to know a little bit of the mathematical language spoken by intelligent machines. AIQ will teach you that language, but in an unconventional way: through stories rather than equations.
You will meet a fascinating cast of historical characters – from Isaac Newton to Florence Nightingale – who have a lot to teach you about data, probability and better thinking. Along the way you will see how these same ideas are playing out in the modern age of big data and intelligent machines. And rather than ushering in the dystopian future so familiar from science fiction, you will see how these technologies can help us overcome some of our built-in cognitive weaknesses, and give us all the chance to lead happier, healthier and more fulfilled lives.
The page references in this index correspond to the printed edition from which this ebook was created. To find a specific word or phrase from the index, please use the search feature of your ebook reader.
Page numbers in italics refer to figures.
Abd al-Rahman al-Sufi, 49, 58
Aiken, Howard, 117
Alexa (Amazon digital assistant), 5, 8, 11, 15, 115, 120, 140
Alfa Romeo 6C, 5–6
algorithms, 24
in AI, 3–4
anomaly detection and, 164, 166, 168, 170, 172, 177
assumptions and bias in, 209–212, 226–228, 230, 233–236
chains of, 3–4
data accumulation and, 6
deep-learning, 2, 44, 73–74, 209
health care and, 196–197, 199–202, 204, 207, 210–212, 230
journalism and, 111
natural language processing and, 111, 129, 133–133, 137, 139, 142–143
pattern recognition and, 54, 74 personalization and, 13–15, 24, 34, 36–37, 39
politics and, 37
recidivism-prediction algorithms, 234–235
suggestions versus search, 15
Alibaba, 1, 69, 178
Alipay, 171
Amazon, 1, 15
Alexa, 5, 8, 11, 15, 115, 120, 140
data storage, 6
delivery speed, 69
Echo, 110, 178
market dominance, 9
recommender system, 18
Web Services, 6
American Association for the Advancement of Science, 21
American Statistical Association, 21
Journal of the American Statistical Association, 23, 242n6
Andromeda. See Great Andromeda Nebula
anomaly detection, 145–175
averaging, 155, 163
bias (type of anomaly), 157, 158
coin clipping and, 150–153, 155, 159–161, 169
Formula 1 racing and, 163–164, 172–173, 175, 178, 190, 201
fraud and, 169–171
importance of variability, 148–149
law enforcement and, 167–169
Moneyball, 171–175
NBA and, 173–175
overdispersion (type of anomaly), 157, 159
Patriots coin toss record and, 145–148, 154, 156, 166
simulated coin toss record, 147
smart cities and, 148–149, 164–169
square-root rule (de Moivre’s equation) and, 156–159, 162–163, 165, 168
Trial of the Pyx (Royal Mint fraud protection), 149, 153–163, 175
Apple
data storage, 6
iPhone, 117–118, 142
market dominance, 9
pattern-recognition system, 53
Aristophanes: The Frogs, 152
artificial intelligence (AI)
algorithms and, 3–4
anxieties regarding, 7–10
assumptions and, 209–228
bias in, bias out, 212, 233–237
contraception and, 61–62
criminal justice system and, 234–236
democratization of, 2, 6, 8, 228
diffusion and dissemination of, 2–3, 8
enabling technological trends, 5–6, 63
image classification, 5, 45, 46, 73
image recognition, 11, 61, 66, 200
model rust, 212, 228–233
models versus reality, 215–217
Moravec paradox, 81
policy, 8
rage to conclude bias, 212, 213–217, 225–226, 228
robot cars, 5–8, 77–79, 80–82, 91–97, 100, 106, 197, 200
salaries, 1
SLAM problem (simultaneous localization and mapping), 80–81, 91–92, 95
speech recognition, 5, 114–115, 124–125, 127, 130–131, 140
talent and workforce, 1
twenty questions game and, 133–140
See also anomaly detection; Bayes’s rule;
health care and medicine; natural
language processing (NLP); pattern
recognition; personalization;
prediction rules
assumptions, 209–228
astronomy, 48–53, 58–61, 68–76
Alpha Centauri, 52
Bayes’s rule and, 106
Great Andromeda Nebula, 49–53, 50, 58
Leavitt’s original equation, 60
Leavitt’s prediction rule data, 56
measuring stars, 52–53
Milky Way, 48–53, 57–58, 75, 242n6
nebulae, 49–53, 58
oscillation of a pulsating star, 55
parallax, 52–57
pulsating stars, 54–59, 63, 71–72, 75–76
statistics and, 189
Athey, Alex, 168
automation, 9, 70. See also robotics
autonomous cars. See robot cars
availability heuristic, 7
Baidu, 1, 178
base-rate neglect, 100
Bayes’s rule
coin flips and, 103–104
discovery of, 91
as an equation, 106–107
investing and, 100–105
mammograms and, 96–100, 98–99, 107, 194
medical diagnostics and, 96–100
robot cars and, 81–82, 91–96, 94, 100, 106
search for Air France Flight 447 and, 106
search for USS Scorpion and, 82–90
usefulness of, 81–82, 95–105
Bayesian search, 84–90, 101–102, 106
essential steps of, 84–86
posterior probabilities, 86, 91, 94, 97, 99–100, 104–105, 107
prior beliefs and revised beliefs, 85
prior beliefs and search for USS Scorpion, 89
prior probabilities, 84–86, 88–89, 91, 94–95, 97, 99–100, 105, 107
Bel Geddes, Norman, 117
Belichick, Bill, 145
Bell Labs, 79
BellKor’s Pragmatic Chaos, 14, 31–32
Berglund Scherwitzl, Elina, 61–62
Bernoulli, Johann, 162
big data. See data science; data sets
birth control. See contraception and birth control
bisphosphonates, 210–212
Black Lives Matter, 37
Borges, Jorge Luis: “The Library of Babel,” 131
Bornn, Luke, 173–174
brachistochrone curve, 162
Brooklyn Nets, 173–175
Buffett, Warren, 101–102
cancer, 2, 15
bisphosphonates and, 210–212
breast cancer, 96–100, 107, 194
colorectal cancer, 40, 210
esophageal cancer, 209–211
lymphoma, 40
medical imaging, 199
skin cancer, 200–202
surgery, 198
targeted therapy, 39–41
car accidents, 222–224
Cardwell, Chris, 210–211
Carnegie Mellon University, 124–125, 167
cars. See car accidents; Formula 1 racing; robot cars
CDC. See Centers for Disease Control and Prevention (CDC)
Centers for Disease Control and Prevention (CDC), 229–230
chatbots, 111, 115, 118, 141
China
chatbots, 111
robotic automation, 9, 79–80
tech companies, 1, 171
toilet paper theft (Temple of Heaven Park), 43–44, 49
Churchill, Winston, 20
Cinematch (Netflix recommender system), 13
civil rights activists and organizations, 37, 75
Clinton, Bill, 67
Clinton, Hillary, 37, 110, 233
cloud computing, 63, 172, 202, 207
as AI enabler, 6
coin clipping, 150–153, 155, 159–161, 169
coin toss
Bayes’s rule and, 103–104
New England Patriots and, 145–148, 154, 156, 166
Cold War, 5, 83
Columbia University: Statistical Research Group (SRG), 20–24
computers
BINAC, 120
compilers, 113, 121–123, 245n3
interpreters and compilers, 245n3
speed of, 5–6
subroutines, 121–122
UNIVAC, 5, 119, 122, 143
See also Hopper, Grace
conditional probability, 16–17, 24–30, 34–35, 80, 90–91
asymmetry of, 27, 104
health care and, 39–41
personalization and, 16–17, 41
weather and, 16
See also Bayes’s rule
contraception and birth control
assumptions and, 218–226
history of, 218
Natural Cycles (phone app), 61–62
rhythm method, 62
Cook, E. T., 186
Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), 234–236
Craven, John, 82, 84, 86–90
credit cards
digital assistants and, 110
fraud, 4, 148, 164, 170
Crimean War, 179–189
criminal justice system, 234–236
cucumbers, 2–4, 45, 48–49, 72, 130, 178
data, missing, 25–29, 31
data accumulation, pace of, 6
data mining, 63, 172
data science, 1, 8, 10–11
anomaly detection and, 155–158, 163, 168–172
assumptions and, 215–217, 227
democracy and, 38
feature engineering, 70
health care and, 40, 185–190, 196, 203–207, 229–230, 232
imputation, 27
institutional commitment and, 179
legacy of Florence Nightingale, 179, 185–189
lurking variable, 222–223
pattern recognition and, 45
personalization and, 15–16
prediction rules and, 62, 65, 69–70
user-based collaborative filtering, 33
data sets, 28
anomalies in, 164–165, 169–170, 172
assumptions and, 209, 211, 225, 227–228
bias in, bias out, 233–237
ImageNet Visual Recognition Challenge, 72
massive, 6, 16, 34, 39, 63, 66–70, 133, 142, 164–165, 169–170, 172, 200, 204
pattern recognition and, 45, 47–48, 55–56, 59–61, 63, 66–68
privacy, 206–207
sharing, 204–206
databases
compilers and, 199, 122
health care, 40, 178, 191, 195, 200, 204–205, 210–211, 227
natural language processing, 61, 63, 72, 74, 131, 133
Netflix, 29–31
smart cities, 168
de Moivre’s equation (square-root rule), 156–159, 162–163, 165, 168
decision-making, 238
anomaly detection and, 163
human, 9, 37, 236–237
voting, 37
deep learning, 2, 44, 130, 171, 200, 209
corn yields and, 73
electricity demands and, 73
gender portrayals in film and, 73–74
honeybees and, 73
prediction rules and, 70–74
privacy and, 74–75
Descartes Labs, 73
Dickens, Charles
Christmas Carol, A, 133–134, 136 Martin Chuzzlewit, 185, 190
digital assistants
Alexa (Amazon), 1, 5, 8, 11, 15, 110, 115, 120, 140, 178
algorithms and, 3–4
Google Home, 3, 5, 110
medicine and, 202
speech recognition and, 5
DiMaggio, Joe, 212–217, 220, 226
Dole, Bob, 67
Duke University, 193, 196–197
early-warning systems, 197–198, 203
Earth, 51–53, 55, 58, 79
Echo, Amazon, 110, 178
e-commerce, 1, 69, 178
Eggo, Rosalind, 231–233
Einstein, Albert, 20, 75, 156
energy industry, 2
Facebook, 237
advertisers, 36–37
anomaly detection, 164
“data for gossip” bargain, 36
data sets, 204
data storage, 6
image classification and recognition, 1, 5, 45, 61
market dominance, 9
pattern-recognition system, 53, 61
personalization, 15, 39, 41
presidential election of 2016 and, 36–38
targeted marketing, 36
Facebook Messenger, 111
fake news, 7, 142, 226
financial industry, 2
Bayes’s rule and investing, 100–105
gambling strategy, 100–102
indexing strategy, 100–101, 105
Fitbit, 178, 202
Ford, Henry, 70
Formula 1 racing, 163–164, 172–173, 175, 178, 190, 201
Fowler, Samuel Lemuel, 116
Friedman, Milton, 21
Friends (television series), 111–112, 137, 142
Gawande, Atul: The Checklist Manifesto, 194
Geena Davis Institute on Gender in Media, 73
gender
bias in films, 73–74
stereotypes, 75–76
word vectors and, 139–140
anomaly detection, 164
data sets, 204
data storage, 6
image classification, 73
Inception (neural-network model), 66–69, 72
market dominance, 9
pattern-recognition system, 53
personalization, 15
search engine, 15, 131
self-driving car, 5, 78, 177
speech recognition, 114–115
TensorFlow, 2
word2vec model, 133, 137, 140, 142
Google 411, 131
Google DeepMind, 201
Google Doodle, 144
Google Flu Trends, 228–233
Google Home, 3, 5, 110
Google Ngram Viewer, 129
Google Translate, 61, 110, 114–115, 132, 142–143
Google Voice, 140
Gould, Stephen Jay, 213
GPS technology, 16, 92, 168, 206
Great Andromeda Nebula, 49–53, 50, 58. See also astronomy
Great Recoinage (1696), 150, 160–162, 246n16
Green, Jane, 210–211
Greenblatt, Joel, 101
Gresham’s law, 152
Guest, William “Bull Dog,” 86–87
Hall, John, 186
Harvard Computers (math team), 54
Harvard Mark I, 116–119
HBO, 15, 35
health care and medicine
AI and, 189–208
contraception failure rates, 219, 225
Crohnology, 39
data-science legacy of Florence Nightingale, 185–189
data sharing, 204–206
future trends, 203–208
General Practice Research Database (U.K.), 211
genomic profiling, 40–41
glomerular filtration rate (GFR), 191–197, 249n39
Google searches for “how long does flu last,” 230
health care policy, 33
hospitals, 177, 179, 181–184, 186, 188–198, 202–208
ibrutinib (cancer drug), 40–41
incentives, 203–204
influenza, 228–233, 240
innovation and, 177–208
kidney disease, 190–197, 203–206, 249n39
kidney function over time, 192
mammogram screening, 96–100, 107, 194
medical imaging, 199–201
Nightingale’s coxcomb diagram (1858), 187
patient-centered social networks, 39
PatientsLikeMe, 39
privacy and security, 206–207
remote medicine, 202–203
smart medical devices, 198–199
targeted therapy, 39–40
Tiatros, 39
See also cancer; Nightingale, Florence
Heller, Katherine, 193–195, 197, 204, 207
Herbert, Sidney, 183
Herd, Andy, 111
Higgs boson, 61
Hofstadter, Douglas, 120
Hopper, Grace, 115–124, 143–144
“Amazing Grace,” 115
compiler breakthrough, 118–123
death of, 144
early years and education, 115–116
final years, 143–144
Hopper, Grace (cont’d)
FLOW-MATIC (data-processing compiler), 123
Google Doodle in honor of, 144
Harvard Mark I and, 116–119
top-down, rules-based approach of, 124–125, 131
UNIVAC and, 119, 122, 143
World War II and, 116–118
Hotelling, Harold, 20
House of Cards (Netflix series), 14, 40
Howard, Dwight, 174
Hubble, Edwin, 58–59, 75–76
Hubble Space Telescope, 48, 58–59
Hunter College, 21
IBM
early speech-recognition, 124
Harvard Mark I, 116–119
Watson supercomputer, 109–110, 115, 202
ImageNet Visual Recognition Challenge, 62
image classification, 5, 45, 46, 73
image recognition, 11, 200
prediction rules and, 61, 66
Imperial College London, 198
imputation, 27
influenza, 228–233, 240
Influenza-like Illness Surveillance Network (ILINet), 229–230
Infor, 173, 175
Instagram, 16
data storage, 6
hashtags, 68
psychology and, 227
Insulet, 199
investing. See financial industry
James, LeBron, 21, 174
Japan, 2, 130, 132, 178
Jefferson, Thomas, 38
Johnson, Lyndon, 83
Johnson, Mark, 73
Kaiserswerth (German charity hospital), 181, 186
Kant, Immanuel, 50
kidney disease, 190–197, 203–206, 249n39
Koike, Makoto, 2, 45, 178
Kolmogorov–Smirnov statistics, 164
Kubrick, Stanley, 109
Lagerman, Björn, 73
Laplace, Pierre-Simon, 91, 163
law enforcement, 2
Lazer, David, 231
Leavitt, Henrietta, 5, 49, 53–61, 63, 68, 71–72, 75–76, 130
death of, 75
early years and education, 54
gender stereotypes and, 75–76
original equation of, 60
prediction rule data of, 56
prediction rule discovered by, 55–61
Legendre, Adrien-Marie, 60–61, 65, 68, 243n9
Leonard, Kawhi, 174
Lewis, Michael, 171
LIDAR (light detection and ranging sensor), 92–93, 95
image of a highway, 93
See also robot cars
Lin, Jeremy, 175
London School of Hygiene & Tropical Medicine, 231
Lynch, Peter, 101
Macaulay, Charles, 153
machine learning, 193, 206, 234. See also artificial intelligence (AI); Bayes’s rule; neural networks
Netflix and, 13–15
mammograms, 96–100, 107, 194
Manhattan Project, 118
market dominance, 9–10
mathematics, 10–11
computer science and, 113–123
conditional probability, 16–17, 24–30, 34–35, 40–41, 80
“math skill,” 33
Newton’s worst mathematical mistake, 149–153, 159–163
Nightingale, Florence, and, 180, 186, 193
pattern recognition and, 47, 49–50, 53, 55, 56, 57, 60, 65–68
principle of least squares, 60, 65–66
square-root rule (de Moivre’s equation), 156–159, 162–163, 165, 168
suggestion engines and, 15–31
twenty questions game and, 133–139
word vectors and, 139–141
See also Bayes’s rule; neural networks; prediction rules
maximum heart rate, 47–48, 53, 63–65
equations for, 64
Mayor’s Office of Data Analytics (MODA), 164–166
medicine. See health care and medicine
Medtronic, 199
Menger, Karl, 18–19
Microsoft, 9
Microsoft Azure, 6
modeling
assumptions and, 215–217, 224–225
deep-learning models, 73
imputation and, 27
Inception, 66–69, 72
latent feature, 32–34, 39–40
massive models, 63–69
missing data and, 25–29, 31
model rust, 212, 228–233
natural language processing and, 129–133, 136, 142–143
prediction rules as, 47–48
reality versus, 215–217
rules-based (top-down) models, 112, 114, 124, 126, 129
training the model, 47–48, 70, 72, 129
Moneyball, 171–175
Moore’s law, 5–6
Moravec paradox, 81
Morgenstern, Oskar, 18, 20
Musk, Elon, 7
natural language processing (NLP), 109–142
ambiguity and, 127–128
bottom-up approach, 114, 129
chatbots, 111, 115, 118, 141
digital assistants, 1, 3–5, 8, 11, 15, 110, 115, 120, 140, 178, 202
future trends, 142–143
Google Translate, 61, 110, 114–115, 132, 142–143
growth of statistical NLP, 128–133
knowing how versus knowing that, 128–129, 132
natural language revolution, 112, 124–128
“New Deal” for human-machine linguistic interaction, 114
prediction rules and, 114, 129–131
programing language revolution, 112–123
robustness and, 126–127
rule bloat and, 125–126
speech recognition, 5, 114–115, 124–125, 127, 130–131, 140
top-down approach, 112, 114, 124, 126, 129
word co-location statistics, 137
word vectors, 114–115, 133–137, 139–141
naturally occurring radioactive materials (NORM), 167
Netflix
Crown, The (series), 34–35
data scientists, 15
history of, 13–14
House of Cards (series), 14, 40
Netflix Prize for recommender system, 13–14, 31–32
personalization, 13–15, 18, 22, 25, 34–36, 40–41, 177
recommender systems, 13–14, 29–33
neural networks, 111–112, 114, 137, 141–142, 200–202, 233
deep learning and, 2, 44, 70–74, 130, 171, 200, 209
Friends new episodes and, 111–112, 137, 142
Inception model, 66–69, 72
prediction rules and, 63, 66, 70–73
New England Patriots, 145–148, 154, 156, 166
Newton, Isaac, 149–153, 159–163, 175
Nightingale, Florence, 179–190, 193–194, 197, 207–208
coxcomb diagram (1858), 187
Crimean War and, 182–184
early years and training, 180–182
evidence-based medicine legacy of, 188–189
“lady with the lamp,” 179
medical statistics legacy of, 186–188
nursing reform legacy of, 185–186
Nvidia, 66, 201
Obama, Barack, 110
Office of Scientific Research and Development, 21
parallax, 52–53
pattern recognition, 43–76
cucumber sorting, 45
input and output, 45–48, 53, 57, 59, 61–63, 70–72
learning a pattern, 59, 67–80
maximum heart rate and, 47–48, 53, 63–65, 64
prediction rules and, 48, 53, 57–72, 75
toilet paper theft and, 43–44, 49
See also prediction rules
PayPal, 170–171, 178
personalization, 13–42
conditional probability and, 16–17, 24–30, 34–35, 39–41
latent feature models and, 32–34, 39–40
Netflix and, 13–15, 18, 22, 25, 34–36, 40–41, 177
Wald’s survivability recommendations for aircraft and, 22–29
See also recommender systems; suggestion engines
philosophy, 50, 128–129, 132
Pickering, Edward C., 54
politics, 36–39, 67, 74, 110, 142, 233
prediction rules
contraception and, 61–62
deep learning and, 70–74
evaluation of, 59–61
Google Translate and, 61
Great Andromeda Nebula and, 49–53, 58
image recognition and, 61
massive data and, 66–68
massive models and, 63–66
as models, 47–48
natural language processing and, 114, 129–131
neural networks and, 63, 66, 70–73
overfitting problem, 67–68, 130
training the model, 47–48, 70, 72, 129
trial and error strategy, 68–69
Price, Richard, 91
principle of least squares, 60, 65–66
privacy, 7, 10, 44, 74–75, 178, 204–208
ProPublica, 234
Quetelet, Adolphe, 189
rage to conclude bias, 212, 213–217, 225–226, 228
ransomware, 207
Reagan, Ronald, 143
recommender systems, 13–18
health care and, 40
large-scale, 40
legacy of, 35–41
Netflix, 13–14, 29–33
See also suggestion engines
Rees, Mina, 21
Reinhart, Alex, 167–168
robot cars, 5–8, 77–79, 97, 197, 200
Bayes’s rule and, 81–82, 91–96, 94, 100, 106
introspection and extrapolation (dead reckoning), 94–95
LIDA image of a highway, 93
LIDAR (light detection and ranging sensor), 92–93, 95
SLAM problem (simultaneous localization and mapping) and, 80–81, 91–92, 95
Waymo, 78, 80
robotics
Bayes’s rule and, 81–105
in China, 9
revolution of, 79–82
SLAM problem (simultaneous localization and mapping), 80–81, 91–92, 95
search for USS Scorpion and, 82–90, 106
Stanford Cart, 79
Theseus (life-size autonomous mouse), 79
See also robot cars
Rose, Pete, 214–215
Royal Mint, 149–163
coin clipping, 150–153, 155, 159–161, 169
Great Recoinage (1696), 150, 160–162, 246n16
Newton, Isaac and, 149–153, 159–163, 175
Trial of the Pyx, 149, 153–163, 175
Russell, Alexander Wilson, 115–116
S&P 500, 101
Salesforce, 111
Sapir, Edward, 125
Sarandos, Ted, 15
SAT (standardized test), 33, 139–140
Scherwitzl, Raoul, 62
Schlesinger, Karl, 18–19
Schuschnigg, Kurt, 19
Schweinfurt-Regensburg mission (World War II), 26–27
sci-fi
AI anxiety and, 7, 238
robots, 126
self-driving cars. See robot cars
Sendak, Mark, 196, 203–205, 207
Shannon, Claude, 79
Shapley, Harlow, 57, 242n6
Sieveking, Edward Henry, 185
Silicon Valley, 9, 17, 40, 149
Skype, 111
SLAM problem (simultaneous localization and mapping), 80–81, 91–92, 95
Slattery, Francis A., 82. See also USS Scorpion
Slipher, Vesto, 51
smart cities, 148–149, 164–169
smart medical devices, 198–199
smart watch, 178
smartphones, 111, 178, 200
apps, 61–62, 197, 204
GPS technology, 92, 168
social media, 37, 48, 219, 227. See also Facebook
SpaceX, 7
speech recognition, 5, 114–115, 124–125, 127, 130–131, 140
Harpy (early program), 124–125
See also natural language process (NLP)
Spotify, 18, 41, 42, 237
conditional probability, 16
personalization, 15
square-root rule (de Moivre’s equation), 156–159, 162–163, 165, 168
Standard and Poor’s, 101
Stanford University, 79, 200–202
Statistical Research Group (SRG), 20–24
statistics, 19–20, 22, 24, 106, 155–158, 179, 186–190
Stigler, George, 21
Stripe (payment system), 171
submarines. See USS Scorpion
suggestion engines, 13
bright side of, 39–41
dark side of, 35–39
as “doppelgänger software,” 15
targeted marketing and, 35–36
See also recommender systems
super-utilizer, 192
survivorship bias, 23–24
2001: A Space Odyssey (film), 109
Takats, Zoltan, 198, 207
Tandem, 199
Teller, Edward, 20
Tencent, 1
Tesla, 7,
Thrun, Sebastian, 200, 207
Tiatros (PTSD-centered social network), 39
toilet paper theft, 43–44, 49
Trial of the Pyx, 149, 153–163, 175
Trump, Donald, 233
Tufte, Edward, 225
Uber, 80, 206,
Ulam, Stanislaw, 20
UNIVAC, 5, 119, 122, 143
USS Scorpion, 82–90, 106
bow section, 90
prior beliefs and search for USS Scorpion, 89
Varroa mites, 73
Vassar College, 21, 116, 118
von Neumann, John, 20, 67–68, 118
Wald, Abraham, 17–29, 31, 104, 242n6
early years and education, 18–19
member of Statistical Research Group (Columbia), 20–24
sequential sampling, 22
survivability recommendations for aircraft, 22–29
in United States, 19–21
Wallis, W. Allen, 20–21
WannaCry (ransomware attack), 207
waterfall diagram, 97–100, 98, 106–107
Watson (IBM supercomputer), 109–110, 115, 202
Waymo (autonomous-car company), 78, 80
WeChat, 111, 143
word vectors, 114–115, 133–137, 139–141
word2vec model (Google), 133, 137, 140, 142
World War I, 41–42, 229
World War II, 17–18
Battle of the Bulge, 21
Bayesian search and, 84
Hopper, Grace, and, 116–117
Schweinfurt-Regensburg mission (World War II), 26–27
Statistical Research Group (Columbia) and, 20–24
Wald’s survivability recommendations for aircraft, 22–29
Yormark, Brett, 173
YouTube, 6, 15
Zillow, 65–66, 227
Together we want to thank the two people most responsible for nurturing this book from its earliest stages: Lisa Gallagher of DeFiore & Company and Tim Bartlett of St. Martin’s Press. This is the first thing either of us have written that wasn’t for an academic audience, and we started with almost no sense of what writing and publishing a “real” book would actually entail. We are so grateful to Lisa for seeing and cultivating the potential in those first scribbled drafts, which now look so painfully clunky. We are equally grateful to Tim, both for taking a gamble on two data scientists foolhardy enough to try their hands at writing, and for giving us such unerring advice along the way. We are also indebted to Doug Young of Transworld for his valuable editorial feedback.
We also thank the many other people at DeFiore, St. Martin’s Press, and Macmillan who have been so helpful, including Robert Allen, Alan Bradshaw, Jeff Capshew, Laura Clark, Jennifer Enderlin, Tracey Guest, Leah Johanson, Linda Kaplan, Alice Pfeifer, Gabrielle Piraino, Jason Prince, Sally Richardson, Brisa Robinson, Mary Beth Roche, Robert Van Kolken, Laura Wilson, and George Witte. We give special thanks to India Cooper, whose magnificent editing efforts have put into stark relief the difference between a professional writer and two amateurs like us. Thanks also to Larry Finlay, Bill Scott-Kerr, and the rest of the Transworld team for their support.
We thank Ellen Zippi for her invaluable help in researching this book. We are also grateful to many of our colleagues for sharing stories and expertise, most especially Steven Levitt for introducing us to Lisa Gallagher, and David Madigan for drawing our attention to the two studies on bisphosphonate usage described in chapter 7. We thank Rosalind Eggo, Katherine Heller, and Mark Sendak for their time and trouble in agreeing to be interviewed. Thanks also to those family members who tirelessly read early drafts and gave their feedback: Catherine Aiken, Patricia and Josh Lowry, Anne and George Scott, and Brian Woods.
I am grateful to my co-author, James Scott. Above all, I thank my family for their love and support: my wife, Anne Gron, and our children, Emma, Michael, and Sarah.
—Nick Polson
Thank you to Nick Polson. I owe Nick for so much in my career that I cannot possibly list it all here; this book is but the latest in a long string of projects and ideas that he has so generously shared with me. I expect that over the decades to come, I will look back and see Nick as the single most important influence on my professional life, and the best friend I ever had in this field. I also want to thank the three most important teachers I ever had: Bill Jeffreys, Jim Berger, and John Trimble. Without Bill and Jim, I would never have become a statistician. Without John’s kindness and generosity, I would never have known how to “tighten/sharpen/brighten” my way to better prose. I also thank my parents, who gave me so much—not least of which was their example. Finally, I am grateful to my wife, Abigail Aiken, for just about everything. I love you, and I could not have helped write this book without your support.
—James Scott
Nick Polson is Professor of Econometrics and Statistics at the Chicago Booth School of Business. He is a Bayesian statistician involved in research in machine intelligence, deep learning and computational methods. He regularly speaks to large audiences in the US, UK and the rest of Europe.
James Scott is Associate Professor of Statistics at the University of Texas. James is a statistician and data scientist who studies Bayesian inference and computational methods for big data. James lives in Austin, Texas with his wife, Abigail.
TRANSWORLD PUBLISHERS
61–63 Uxbridge Road, London W5 5SA
www.penguin.co.uk
Transworld is part of the Penguin Random House group of companies whose addresses can be found at global.penguinrandomhouse.com
First published in Great Britain in 2018 by Bantam Press
an imprint of Transworld Publishers
Copyright © Nicholas Polson and James Scott 2018
Cover design by David Baldeosingh Rotstein
Nicholas Polson and James Scott has asserted their right under the Copyright, Designs and Patents Act 1988 to be identified as the authors of this work.
Every effort has been made to obtain the necessary permissions with reference to copyright material, both illustrative and quoted. We apologize for any omissions in this respect and will be pleased to make the appropriate acknowledgements in any future edition.
A CIP catalogue record for this book is available from the British Library
Version 1.0 Epub ISBN 9781473554368
ISBN 9780593079775
This ebook is copyright material and must not be copied, reproduced, transferred, distributed, leased, licensed or publicly performed or used in any way except as specifically permitted in writing by the publishers, as allowed under the terms and conditions under which it was purchased or as strictly permitted by applicable copyright law. Any unauthorized distribution or use of this text may be a direct infringement of the author’s and publisher’s rights and those responsible may be liable in law accordingly.
1 3 5 7 9 10 8 6 4 2
To Diana and to Anne.
—NP
To my grandparents, to Margaret Aiken, and to the golden guinea.
—JS
WE TEACH DATA science to hundreds of students per year, and they’re all fascinated by artificial intelligence. And they ask great questions. How does a car learn to drive itself? How does Alexa understand what I’m saying? How does Spotify pick such good playlists for me? How does Facebook recognize my friends in the photos I upload? These students realize that AI isn’t some sci-fi droid from the future; it’s right here, right now, and it’s changing the world one smartphone at a time. They all want to understand it, and they all want to be a part of it.
And our students aren’t the only ones enthusiastic about AI. They’re joined in their exaltation by the world’s largest companies—from Amazon, Facebook, and Google in America to Baidu, Tencent, and Alibaba in China. As you may have heard, these big tech firms are waging an expensive global arms race for AI talent, which they judge to be essential to their future. For years we’ve seen them court freshly minted PhDs with offers of $300,000+ salaries and much better coffee than we have in academia. Now we’re seeing many more companies jump into the AI recruiting fray—firms sitting on piles of data in, say, insurance or the oil business, who are coming along with whopping salary offers and fancy espresso machines of their own.
Yet while this arms race is real, we think there’s a much more powerful trend at work in AI today—a trend of diffusion and dissemination, rather than concentration. Yes, every big tech company is trying to hoard math and coding talent. But at the same time, the underlying technologies and ideas behind AI are spreading with extraordinary speed: to smaller companies, to other parts of the economy, to hobbyists and coders and scientists and researchers everywhere in the world. That democratizing trend, more than anything else, is what has our students today so excited, as they contemplate a vast range of problems practically begging for good AI solutions.
Who would have thought, for example, that a bunch of undergraduates would get so excited about the mathematics of cucumbers? Well, they did when they heard about Makoto Koike, a car engineer from Japan whose parents own a cucumber farm. Cucumbers in Japan come in a dizzying variety of sizes, shapes, colors, and degrees of prickliness—and based on these visual features, they must be separated into nine different classes that command different market prices. Koike’s mother used to spend eight hours per day sorting cucumbers by hand. But then Koike realized that he could use a piece of open-source AI software from Google, called TensorFlow, to accomplish the same task, by coding up a “deep-learning” algorithm that could classify a cucumber based on a photograph. Koike had never used AI or TensorFlow before, but with all the free resources out there, he didn’t find it hard to teach himself how. When a video of his AI-powered sorting machine hit YouTube, Koike became an international deep-learning/cucumber celebrity. It wasn’t merely that he had given people a feel-good story, saving his mother from hours of drudgery. He’d also sent an inspiring message to students and coders across the world: that if AI can solve problems in cucumber farming, it can solve problems just about anywhere.
That message is now spreading quickly. Doctors are now using AI to diagnose and treat cancer. Electrical companies use AI to improve power-generating efficiency. Investors use it to manage financial risk. Oil companies use it to improve safety on deep-sea rigs. Law enforcement agencies use it to hunt terrorists. Scientists use it to make new discoveries in astronomy and physics and neuroscience. Companies, researchers, and hobbyists everywhere are using AI in thousands of different ways, whether to sniff for gas leaks, mine iron, predict disease outbreaks, save honeybees from extinction, or quantify gender bias in Hollywood films. And this is just the beginning.
We see the real story of AI as the story of this diffusion: from a handful of core math concepts stretching back decades, or even centuries, to the supercomputers and talking/thinking/cucumber-sorting machines of today, to the new and ubiquitous digital wonders of tomorrow. Our goal in this book is to tell you that story. It is partly a story of technology, but it is mainly a story of ideas, and of the people behind those ideas—people from a much earlier age, people who were just keeping their heads down and solving their own problems involving math and data, and who had no clue about the role their solutions would come to play in inventing the modern world. By the end of that story, you’ll understand what AI is, where it came from, how it works, and why it matters in your life.
When you hear “AI,” don’t think of a droid. Think of an algorithm.
An algorithm is a set of step-by-step instructions so explicit that even something as literal-minded as a computer can follow them. (You may have heard the joke about the robot who got stuck in the shower forever because of the algorithm on the shampoo bottle: “Lather. Rinse. Repeat.”) On its own, an algorithm is no smarter than a power drill; it just does one thing very well, like sorting a list of numbers or searching the web for pictures of cute animals. But if you chain lots of algorithms together in a clever way, you can produce AI: a domain-specific illusion of intelligent behavior. For example, take a digital assistant like Google Home, to which you might pose a question like “Where can I find the best breakfast tacos in Austin?” This query sets off a chain reaction of algorithms:
One algorithm converts the raw sound wave into a digital signal.
Another algorithm translates that signal into a string of English phonemes, or perceptually distinct sounds: “brek-fust-tah-koze.”
The next algorithm segments those phonemes into words: “breakfast tacos.”
Those words are sent to a search engine—itself a huge pipeline of algorithms that processes the query and sends back an answer.
Another algorithm formats the response into a coherent English sentence.
A final algorithm verbalizes that sentence in a non-robotic-sounding way: “The best breakfast tacos in Austin are at Julio’s on Duval Street. Would you like directions?”
And that’s AI. Pretty much every AI system—whether it’s a self-driving car, an automatic cucumber sorter, or a piece of software that monitors your credit card account for fraud—follows this same “pipeline-of-algorithms” template. The pipeline takes in data from some specific domain, performs a chain of calculations, and outputs a prediction or a decision.
There are two distinguishing features of the algorithms used in AI. First, these algorithms typically deal with probabilities rather than certainties. An algorithm in AI, for example, won’t say outright that some credit card transaction is fraudulent. Instead, it will say that the probability of fraud is 92%—or whatever it thinks, given the data. Second, there’s the question of how these algorithms “know” what instructions to follow. In traditional algorithms, like the kind that run websites or word processors, those instructions are fixed ahead of time by a programmer. In AI, however, those instructions are learned by the algorithm itself, directly from “training data.” Nobody tells an AI algorithm how to classify credit card transactions as fraudulent or not. Instead, the algorithm sees lots of examples from each category (fraudulent, not fraudulent), and it finds the patterns that distinguish one from the other. In AI, the role of the programmer isn’t to tell the algorithm what to do. It’s to tell the algorithm how to train itself what to do, using data and the rules of probability.
Modern AI systems, like a self-driving car or a home digital assistant, are pretty new on the scene. But you might be surprised to learn that most of the big ideas in AI are actually old—in many cases, centuries old—and that our ancestors have been using them to solve problems for generations. For example, take self-driving cars. Google debuted its first such car in 2009. But you’ll learn in chapter 3 that one of the main ideas behind how these cars work was discovered by a Presbyterian minister in the 1750s—and that this idea was used by a team of mathematicians over 50 years ago to solve one of the Cold War’s biggest blockbuster mysteries.
Or take image classification, like the software that automatically tags your friends in Facebook photos. Algorithms for image processing have gotten radically better over the last five years. But in chapter 2, you’ll learn that the key ideas here date to 1805—and that these ideas were used a century ago, by a little-known astronomer named Henrietta Leavitt, to help answer one of the deepest scientific questions that humans have ever posed: How big is the universe?
Or even take speech recognition, one of the great AI triumphs of recent years. Digital assistants like Alexa and Google Home are remarkably fluent with language, and they’ll only get better. But the first person to get a computer to understand English was a rear admiral in the U.S. Navy, and she did so almost 70 years ago. (See chapter 4.)
Those are just three illustrations of a striking fact: no matter where you look in AI, you’ll find an idea that people have been kicking around for a long time. So in many ways, the big historical puzzle isn’t why AI is happening now, but why it didn’t happen long ago. To explain this puzzle, we must look to three enabling technological forces that have brought these venerable ideas into a new age.
The first AI enabler is the decades-long exponential growth in the speed of computers, usually known as Moore’s law. It’s hard to convey intuitively just how fast computers have gotten. The cliché used to be that the Apollo astronauts landed on the moon with less computing power than a pocket calculator. But this no longer resonates, because … what’s a pocket calculator? So we’ll try a car analogy instead. In 1951, one of the fastest computers was the UNIVAC, which performed 2,000 calculations per second, while one of the fastest cars was the Alfa Romeo 6C, which traveled 110 miles per hour. Both cars and computers have improved since 1951—but if cars had improved at the same rate as computers, then a modern Alfa Romeo would travel at 8 million times the speed of light.
The second AI enabler is the new Moore’s law: the explosive growth in the amount of data available, as all of humanity’s information has become digitized. The Library of Congress consumes 10 terabytes of storage, but in 2013 alone, the big four tech firms—Google, Apple, Facebook, and Amazon—collected about 120,000 times as much data as this. And that’s a lifetime ago in internet years. The pace of data accumulation is accelerating faster than an Apollo rocket; in 2017, more than 300 hours of video were uploaded to YouTube every minute, and more than 100 million images were posted to Instagram every day. More data means smarter algorithms.
The third AI enabler is cloud computing. This trend is nearly invisible to consumers, but it’s had an enormous democratizing effect on AI. To illustrate this, we’ll draw an analogy here between data and oil. Imagine if all companies of the early twentieth century had owned some oil, but they had to build the infrastructure to extract, transport, and refine that oil on their own. Any company with a new idea for making good use of its oil would have faced enormous fixed costs just to get started; as a result, most of the oil would have sat in the ground. Well, the same logic holds for data, the oil of the twenty-first century. Most hobbyists or small companies would face prohibitive costs if they had to buy all the gear and expertise needed to build an AI system from their data. But the cloud-computing resources provided by outfits like Microsoft Azure, IBM, and Amazon Web Services have turned that fixed cost into a variable cost, radically changing the economic calculus for large-scale data storage and analysis. Today, anyone who wants to make use of their “oil” can now do so cheaply, by renting someone else’s infrastructure.
When you put those four trends together—faster chips, massive data sets, cloud computing, and above all good ideas—you get a supernova-like explosion in both the demand and capacity for using AI to solve real problems.
We’ve told you how excited our students are about AI, and how the world’s largest firms are rushing to embrace it. But we’d be lying if we said that everyone was so bullish about these new technologies. In fact, many people are anxious, whether about jobs, data privacy, wealth concentration, or Russians with fake-news Twitter-bots. Some people—most famously Elon Musk, the tech entrepreneur behind Tesla and SpaceX—paint an even scarier picture: one where robots become self-aware, decide they don’t like being ruled by people, and start ruling us with a silicon fist.
Let’s talk about Musk’s worry first; his views have gotten a lot of attention, presumably because people take notice when a member of the billionaire disrupter class talks about artificial intelligence. Musk has claimed that in developing AI technology, humanity is “summoning a demon,” and that smart machines are “our biggest existential threat” as a species.
After you’ve read our book, you’ll be able to decide for yourself whether you think these worries are credible. We want to warn you up front, however, that it’s very easy to fall into a trap that cognitive scientists call the “availability heuristic”: the mental shortcut in which people evaluate the plausibility of a claim by relying on whatever immediate examples happen to pop into their minds. In the case of AI, those examples are mostly from science fiction, and they’re mostly evil—from the Terminator to the Borg to HAL 9000. We think that these sci-fi examples have a strong anchoring effect that makes many people view the “evil AI” narrative less skeptically than they should. After all, just because we can dream it and make a film about it doesn’t mean we can build it. Nobody today has any idea how to create a robot with general intelligence, in the manner of a human or a Terminator. Maybe your remote descendants will figure it out; maybe they’ll even program their creation to terrorize the remote descendants of Elon Musk. But that will be their choice and their problem, because no option on the table today even remotely foreordains such a possibility. Now, and for the foreseeable future, “smart” machines are smart only in their specific domains:
Moreover, consider the opportunity cost of worrying that we’ll soon be conquered by self-aware robots. To focus on this possibility now is analogous to the de Havilland Aircraft Company, having flown the first commercial jetliner in 1952, worrying about the implications of warp-speed travel to distant galaxies. Maybe one day, but right now there are far more important things to worry about—like, to press the jetliner analogy a little further, setting smart policy for all those planes in the air today.
This issue of policy brings us to a whole other set of anxieties about AI, much more plausible and immediate. Will AI create a jobless world? Will machines make important decisions about your life, with zero accountability? Will the people who own the smartest robots end up owning the future?
These questions are deeply important, and we hear them discussed all the time—at tech conferences, in the pages of the world’s major newspapers, and over lunch among our colleagues. We should let you know up front that you won’t find the answers to these questions in our book, because we don’t know them. Like our students, we are ultimately optimistic about the future of AI, and we hope that by the end of the book, you will share that optimism. But we’re not labor economists, policy experts, or soothsayers. We’re data scientists—and we’re also academics, meaning that our instinct is to stay firmly in our lane, where we’re confident of our expertise. We can teach you about AI, but we can’t tell you for sure what the future will bring.
We can tell you, however, that we’ve encountered a common set of narratives that people use to frame this subject, and we find them all incomplete. These narratives emphasize the wealth and power of the big tech firms, but they overlook the incredible democratization and diffusion of AI that’s already happening. They highlight the dangers of machines making important decisions using biased data, but they fail to acknowledge the biases or outright malice in human decision-making that we’ve been living with forever. Above all, they focus intensely on what machines may take away, but they lose sight of what we’ll get in return: different and better jobs, new conveniences, freedom from drudgery, safer workplaces, better health care, fewer language barriers, new tools for learning and decision-making that will help us all be smarter, better people.
Take the issue of jobs. In America, jobless claims kept hitting new lows from 2010 through 2017, even as AI and automation gained steam as economic forces. The pace of robotic automation has been even more relentless in China, yet wages there have been soaring for years. That doesn’t mean AI hasn’t threatened individual people’s jobs. It has, and it will continue to do so, just as the power loom threatened the jobs of weavers, and just as the car threatened the jobs of buggy whips. New technologies always change the mix of labor needed in the economy, putting downward pressure on wages in some areas and upward pressure in others. AI will be no different, and we strongly support job-training and social-welfare programs to provide meaningful help for those displaced by technology. A universal basic income might even be the answer here, as many Silicon Valley bosses seem to think; we don’t claim to know. But arguments that AI will create a jobless future are, so far, completely unsupported by actual evidence.
Then there’s the issue of market dominance. Amazon, Google, Facebook, and Apple are enormous companies with tremendous power. It is critical