机器学习

线性回归

预测值与误差: y(i)=θTx(i)+ε(i)(1)y^{(i)}=\theta ^{T} x^{(i)}+\varepsilon ^{(i)} \quad (1)y(i)=θTx(i)+ε(i)1
由于误差服从高斯分布:ρ(ε(i))=12πσexp(−(ε(i))22σ2)(2)\rho (\varepsilon ^{(i)})=\frac{1}{\sqrt{2\pi} \sigma } exp(-\frac{(\varepsilon ^{(i)})^2}{2\sigma ^2} )\quad(2)ρ(ε(i))=2π σ1exp(2σ2(ε(i))2)2

将(1)带入(2),可得:

ρ(ε(i))=12πσexp(−(y(i)−θTx(i))22σ2)\rho (\varepsilon ^{(i)})=\frac{1}{\sqrt{2\pi} \sigma } exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma ^2} )ρ(ε(i))=2π σ1exp(2σ2(y(i)θTx(i))2)

似然函数: L(θ)=∏i=1mp(y(i)∣x(i);θ)=∏i=1m12πσexp(−(y(i)−θTx(i))22σ2)L(\theta) = \prod_{i=1}^{m} p(y^{(i)}|x^{(i)};\theta ) = \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } ) L(θ)=i=1mp(y(i)x(i);θ)=i=1m2π σ1exp(2σ2(y(i)θTx(i))2)

解释:什么样的参数和数据组合恰好就是真实值 以此来解释似然函数

对数似然:将两边Log一下,转化为加法:
log⁡L(θ)=∏i=1mp(y(i)∣x(i);θ)=log⁡∏i=1m12πσexp(−(y(i)−θTx(i))22σ2)\log L(\theta) = \prod_{i=1}^{m} p(y^{(i)}|x^{(i)};\theta ) = \log \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } )logL(θ)=i=1mp(y(i)x(i);θ)=logi=1m2π σ1exp(2σ2(y(i)θTx(i))2)

展开式:log⁡∏i=1m12πσexp(−(y(i)−θTx(i))22σ2)=mlog⁡12πσ−1σ2.12∑i=1m(y(i)−θTx(i))2\log \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } ) = m\log \frac{1}{\sqrt[]{2\pi }\sigma } -\frac{1}{\sigma ^2 } .\frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2logi=1m2π σ1exp(2σ2(y(i)θTx(i))2)=mlog2π σ1σ21.21i=1m(y(i)θTx(i))2

故:极大似然:保证似然函数越大越好,在上述推导中,除去常数项方程为:
−12∑i=1m(y(i)−θTx(i))2-\frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^221i=1m(y(i)θTx(i))2

所以保证:J(θ)=12∑i=1m(y(i)−θTx(i))2J(\theta ) = \frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2J(θ)=21i=1m(y(i)θTx(i))2最小即可

J(θ)=12∑i=1m(y(i)−θTx(i))2=12(Xθ−y)T(Xθ−y)J(\theta ) = \frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2 = \frac{1}{2}(X\theta -y)^{T}(X\theta -y)J(θ)=21i=1m(y(i)θTx(i))2=21(Xθy)T(Xθy)

∇θJ(θ)=∇θ(12(Xθ−y)T(Xθ−y))=∇θ(12(θTXT−yT)(Xθ−y))\nabla _ \theta J(\theta ) = \nabla _\theta (\frac{1}{2}(X\theta -y)^{T}(X\theta -y) )=\nabla _\theta (\frac{1}{2}(\theta ^{T}X^{T}-y^{T})(X\theta -y) )θJ(θ)=θ(21(Xθy)T(Xθy))=θ(21(θTXTyT)(Xθy))

=∇θ(12(θTXTXθ−θTXTy−yTXθ+yTy)) = \nabla _\theta (\frac{1}{2}(\theta^{T}X^{T}X\theta -\theta ^{T}X^{T}y-y^{T}X\theta +y^{T}y ) ) =θ(21(θTXTXθθTXTyyTXθ+yTy))

=12(2XTXθ−XTy−(yTX)T)=XTXθ−XTy=\frac{1}{2} (2X^{T}X\theta -X^{T}y-(y^{T}X)^{T})=X^{T}X\theta -X^{T}y=21(2XTXθXTy(yTX)T)=XTXθXTy

当偏导等于0:θ=(XTX)−1XTy\theta = (X^{T}X)^{-1}X^{T}yθ=(XTX)1XTy


Logo

汇聚全球AI编程工具,助力开发者即刻编程。

更多推荐