Warum ist Machine Learning so wichtig? Hier ist alles was ihr wissen müsst.
Not just big data, but wide data
The enormous scale of data available to firms can pose several challenges. Of course, big data may require advanced software and hardware to handle and store it. But machine learning is about how the analysis of the data also has to adapt to the size of the dataset. This is because big data is not just long, but wide as well. For example, consider an online retailer’s database of customers in a spreadsheet. Each customer gets a row, and if there are lots of customers then the dataset will be long. However, every variable in the data gets its own column, too, and we can now collect so much data on every customer – purchase history, browser history, mouseclicks, text from reviews – that the data are usually wide as well, to the point where there are even more columns than rows. Most of the tools in machine learning are designed to make better use of wide data.
Predictions, not causality
The most common application of machine learning tools is to make predictions. Here are a few examples of prediction problems in a business:
Making personalized recommendations for customers
Forecasting long-term customer loyalty
Anticipating the future performance of employees
Rating the credit risk of loan applicants
These settings share some common features. For one, they are all complex environments, where the right decision might depend on a lot of variables (which means they require “wide” data). They also have some outcome to validate the results of a prediction – like whether someone clicks on a recommended item, or whether a customer buys again. Finally, there is an important business decision to be made that requires an accurate prediction.
One important difference from traditional statistics is that you’re not focused on causality in machine learning. That is, you might not need to know what happens when you change the environment. Instead you are focusing on prediction, which means you might only need a model of the environment to make the right decision. This is just like deciding whether to leave the house with an umbrella: we have to predict the weather before we decide whether to bring one. The weather forecast is very helpful but it is limited; the forecast might not tell you how clouds work, or how the umbrella works, and it won’t tell you how to change the weather. The same goes for machine learning: personalized recommendations are forecasts of people’s preferences, and they are helpful, even if they won’t tell you why people like the things they do, or how to change what they like. If you keep these limitations in mind, the value of machine learning will be a lot more obvious.
Separating the signal from the noise
So far we’ve talked about when machine learning can be useful. But how is it used, in practice? It would be impossible to cover it all in one article, but roughly speaking there are three broad concepts that capture most of what goes on under the hood of a machine learning algorithm: feature extraction, which determines what data to use in the model; regularization, which determines how the data are weighted within the model; and cross-validation, which tests the accuracy of the model. Each of these factors helps us identify and separate “signal” (valuable, consistent relationships that we want to learn) from “noise” (random correlations that won’t occur again in the future, that we want to avoid). Every dataset has a mix of signal and noise, and these concepts will help you sort through that mix to make better predictions.
Think of “feature extraction” as the process of figuring out what variables the model will use. Sometimes this can simply mean dumping all the raw data straight in, but many machine learning techniques can build new variables — called “features” — which can aggregate important signals that are spread out over many variables in the raw data. In this case the signal would be too diluted to have an effect without feature extraction. One example of feature extraction is in face recognition, where the “features” are actual facial features — nose length, eye color, skin tone, etc. — that are calculated with information from many different pixels in an image. In a music store, you could have features for different genres. For instance, you could combine all the rock sales into a single feature, all the classical sales into another feature, and so on.
There are many different ways to extract features, and the most useful ones are often automated. That means that rather than hand-picking the genre for each album, you can find “clusters” of albums that tend to be bought by all the same people, and learn the “genres” from the data (and you might even discover new genres you didn’t know existed). This is also very common with text data, where you can extract underlying “topics” of discussion based on which words and phrases tend to appear together in the same documents. However, domain experts can still be helpful in suggesting features, and in making sense of the clusters that the machine finds.
(Clustering is a complex problem, and sometimes these tools are used just to organize data, rather than make a prediction. This type of machine learning is called “unsupervised learning”, because there is no measured outcome that is being used as a target for prediction.)