Learning Problem
#statisticsBy collecting input data $X$ and output data $Y$ for a problem:
$$ \begin{align*} X &= [x_{1}, x_{2}... x_{N}] \\ Y &= [y_{1}, y_{2}... y_{N}] \\ \end{align*} $$We want to find out the target ideal function $f$ where $Y=f(X)$, but most of time we can only find a "good enough" solution $g \approx f$ since real-world problem has no analytic solution.
In data learning, we will have a learning algorithm $A$ and a hypothesis set $H$ which contains possible formulas where
$$ \begin{align*} g &= A(X, Y, H) \\ Y &= g(X) \\ \end{align*} $$Note that $X$, $Y$ and $f$ are decided by the problem. What we choose are $H$ and $A$ which informally referred as the learning model.
Example: The Perceptron Model¶
Consider a credit card application problem, we have historical application data includes $d$-dimensional customer metadata (salary, debt, job and year of residence, etc.) and the final decision (approve or deny)
Data¶
$$ \begin{align*} \vec{x} &= [x_{1}, x_{2},... , x_{d}] \\ y &= \{1, -1\} \\ \vec{X} &= [\vec{x}_{1}, \vec{x}_{2},... , \vec{x}_{N}] \\ \vec{Y} &= [y_{1}, y_{2},... , y_{N}] \\ \end{align*} $$Model¶
Here we choose a linear weights hypothesis represents different importances of parameters.
$$ H(\vec{x}) = sign(b + \sum_{i=1}^d w_ix_i) $$where $\sum_{i=1}^d w_ix_i$ is the credit score and $b$ is some threshhold. The application will be approve if score $>$ threshold otherwise deny.
To simplify, let $w_0 = b$ and $x_0 = 1$, thus we have
$$ \begin{align*} H(\vec{x}) &= sign(\vec{w}^T\vec{x}) \\ \vec{w} &= [b, w_1, w_2, ..., w_d] \\ \vec{x} &= [1, x_1, x_2, ..., x_d] \\ \end{align*} $$