机器学习常见算法推导过程
机器学习算法推导
机器学习
线性回归
预测值与误差: y(i)=θTx(i)+ε(i)(1)y^{(i)}=\theta ^{T} x^{(i)}+\varepsilon ^{(i)} \quad (1)y(i)=θTx(i)+ε(i)(1)
由于误差服从高斯分布:ρ(ε(i))=12πσexp(−(ε(i))22σ2)(2)\rho (\varepsilon ^{(i)})=\frac{1}{\sqrt{2\pi} \sigma } exp(-\frac{(\varepsilon ^{(i)})^2}{2\sigma ^2} )\quad(2)ρ(ε(i))=2πσ1exp(−2σ2(ε(i))2)(2)
将(1)带入(2),可得:
ρ(ε(i))=12πσexp(−(y(i)−θTx(i))22σ2)\rho (\varepsilon ^{(i)})=\frac{1}{\sqrt{2\pi} \sigma } exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma ^2} )ρ(ε(i))=2πσ1exp(−2σ2(y(i)−θTx(i))2)
似然函数: L(θ)=∏i=1mp(y(i)∣x(i);θ)=∏i=1m12πσexp(−(y(i)−θTx(i))22σ2)L(\theta) = \prod_{i=1}^{m} p(y^{(i)}|x^{(i)};\theta ) = \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } ) L(θ)=i=1∏mp(y(i)∣x(i);θ)=i=1∏m2πσ1exp(−2σ2(y(i)−θTx(i))2)
解释:什么样的参数和数据组合恰好就是真实值 以此来解释似然函数
对数似然:将两边Log一下,转化为加法:
logL(θ)=∏i=1mp(y(i)∣x(i);θ)=log∏i=1m12πσexp(−(y(i)−θTx(i))22σ2)\log L(\theta) = \prod_{i=1}^{m} p(y^{(i)}|x^{(i)};\theta ) = \log \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } )logL(θ)=i=1∏mp(y(i)∣x(i);θ)=logi=1∏m2πσ1exp(−2σ2(y(i)−θTx(i))2)
展开式:log∏i=1m12πσexp(−(y(i)−θTx(i))22σ2)=mlog12πσ−1σ2.12∑i=1m(y(i)−θTx(i))2\log \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } ) = m\log \frac{1}{\sqrt[]{2\pi }\sigma } -\frac{1}{\sigma ^2 } .\frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2logi=1∏m2πσ1exp(−2σ2(y(i)−θTx(i))2)=mlog2πσ1−σ21.21i=1∑m(y(i)−θTx(i))2
故:极大似然:保证似然函数越大越好,在上述推导中,除去常数项方程为:
−12∑i=1m(y(i)−θTx(i))2-\frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2−21i=1∑m(y(i)−θTx(i))2
所以保证:J(θ)=12∑i=1m(y(i)−θTx(i))2J(\theta ) = \frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2J(θ)=21i=1∑m(y(i)−θTx(i))2最小即可
J(θ)=12∑i=1m(y(i)−θTx(i))2=12(Xθ−y)T(Xθ−y)J(\theta ) = \frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2 = \frac{1}{2}(X\theta -y)^{T}(X\theta -y)J(θ)=21i=1∑m(y(i)−θTx(i))2=21(Xθ−y)T(Xθ−y)
∇θJ(θ)=∇θ(12(Xθ−y)T(Xθ−y))=∇θ(12(θTXT−yT)(Xθ−y))\nabla _ \theta J(\theta ) = \nabla _\theta (\frac{1}{2}(X\theta -y)^{T}(X\theta -y) )=\nabla _\theta (\frac{1}{2}(\theta ^{T}X^{T}-y^{T})(X\theta -y) )∇θJ(θ)=∇θ(21(Xθ−y)T(Xθ−y))=∇θ(21(θTXT−yT)(Xθ−y))
=∇θ(12(θTXTXθ−θTXTy−yTXθ+yTy)) = \nabla _\theta (\frac{1}{2}(\theta^{T}X^{T}X\theta -\theta ^{T}X^{T}y-y^{T}X\theta +y^{T}y ) ) =∇θ(21(θTXTXθ−θTXTy−yTXθ+yTy))
=12(2XTXθ−XTy−(yTX)T)=XTXθ−XTy=\frac{1}{2} (2X^{T}X\theta -X^{T}y-(y^{T}X)^{T})=X^{T}X\theta -X^{T}y=21(2XTXθ−XTy−(yTX)T)=XTXθ−XTy
当偏导等于0:θ=(XTX)−1XTy\theta = (X^{T}X)^{-1}X^{T}yθ=(XTX)−1XTy
更多推荐

所有评论(0)