你好，非常感谢你写的alpha FM，速度很快，很好用。可是，在我这边数据上跑的效果却远差于LR模型（使用alpha lr对比）。请问你那边测试alpha FM的

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

alphaFM 效果如何？ about alphafm HOT 19 OPEN

castellanzhang commented on August 20, 2024

alphaFM 效果如何？

from alphafm.

Comments (19)

CastellanZhang commented on August 20, 2024 1

@420742882 ，还有样本的输入顺序，最好是按照样本时间戳的顺序，不能是类似于按用户聚合或正样本都排在一起这种。FTRL因为是online learning的框架，对样本序比较敏感，没有时间戳的话最好shuffle一下。

from alphafm.

CasyWang commented on August 20, 2024

Any conclusion here?

from alphafm.

CastellanZhang commented on August 20, 2024

@420742882 ，机器学习依赖的环节很多，数据选取、特征构建、模型选择、参数调优、评价指标等等。笼统的讨论两个算法孰优孰劣是没有意义的，况且还有“没有免费午餐定理”。否则大家在所有问题上都用“最优”算法岂不一劳永逸了？就我的经验，在我们的大部分广告ctr数据上同样的高维特征做输入，调好参数后，FM比LR的AUC能有千分位上的提高。特征工程没做好或参数没调好，DNN输给LR都很正常，何况FM。

from alphafm.

CasyWang commented on August 20, 2024

是跟没有加入组合特征的LR比较吗？如果不是，如何公平的比较呢？因为LR的好坏，决定于人工组合特征。Thanks for your kindly reply.发自我的iPhone------------------ 原始邮件 ------------------发件人: BruceZhao <[email protected]>发送时间: 2017年8月22日 12:37收件人: CastellanZhang/alphaFM <[email protected]>抄送: oliverwang <[email protected]>, Comment <[email protected]>主题: 回复：[CastellanZhang/alphaFM] alphaFM 效果如何？ (#3)@420742882 ，机器学习依赖的环节很多，数据选取、特征构建、模型选择、参数调优、评价指标等等。笼统的讨论两个算法孰优孰劣是没有意义的，况且还有“没有免费午餐定理”。否则大家在所有问题上都用“最优”算法岂不一劳永逸了？就我的经验，在我们的大部分广告ctr数据上同样的高维特征做输入，调好参数后，FM比LR的AUC能有千分位上的提高。特征工程没做好或参数没调好，DNN输给LR都很正常，何况FM。 —You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/CastellanZhang/alphaFM","title":"CastellanZhang/alphaFM","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/CastellanZhang/alphaFM"}},"updates":{"snippets":[{"icon":"PERSON","message":"@CastellanZhang in #3: @420742882 ，机器学习依赖的环节很多，数据选取、特征构建、模型选择、参数调优、评价指标等等。笼统的讨论两个算法孰优孰劣是没有意义的，况且还有“没有免费午餐定理”。否则大家在所有问题上都用“最优”算法岂不一劳永逸了？就我的经验，在我们的大部分广告ctr数据上同样的高维特征做输入，调好参数后，FM比LR的AUC能有千分位上的提高。特征工程没做好或参数没调好，DNN输给LR都很正常，何况FM。"}],"action":{"name":"View Issue","url":"#3 (comment)"}}}

from alphafm.

CastellanZhang commented on August 20, 2024

@CasyWang ，是完全一样的高维特征（包含组合特征和非组合特征），即完全一样的样本做输入，只是模型不同。

from alphafm.

420742882 commented on August 20, 2024

@CastellanZhang ，不好意思，我没讲清楚。我的意思是用alpha lr和我们的lr去做对比，完全一样的样本、特征。alpha lr比我们的lr效果要差很多。至于说，fm比lr在千分位上有提升，我是认可的。

from alphafm.

CastellanZhang commented on August 20, 2024

@420742882 ，我不知道你们自己的LR具体是怎么实现的，如果同样是online learning FTRL的话，应该是完全一致的。如果是传统的类似于OWL-QN之类的全局优化算法，那么数据量是个问题。我们广告业务碰到的都是大规模数据，即样本相对于特征维度足够多，我做过严格对比，样本只需要过一般即能达到全局LR算法的效果，甚至更好。如果数据量不多的话可能需要迭代多次才行。有人问过我类似的问题，见：CastellanZhang/alphaFM_softmax#2
还是那句话，具体问题具体分析。

from alphafm.

420742882 commented on August 20, 2024

实验用的数据量（亿级样本）和特征纬度是足够多的。我再研究一下，多谢~

from alphafm.

420742882 commented on August 20, 2024

@CastellanZhang ，多谢！重新shuflle了一下样本，auc跟baseline lr已经很接近，之前差10个百分点，现在差一个百分点。按样本时间戳顺序排序，效果是不是应该会更好？

from alphafm.

CastellanZhang commented on August 20, 2024

@420742882 ,不客气。按时间戳排序会不会更好，我没法下结论，只能你们继续实验了，只能说这么做肯定不会很差。此外，好好调调参数，或者再加大样本量，应该可以和lr持平的。

from alphafm.

420742882 commented on August 20, 2024

@CastellanZhang 调了w_alpha参数，auc和logloss可以反超baseline lr。但是线上AB Test时，效果下降很厉害。可能是什么原因呢？ctr预估的不准么？我们的样本做了采样，会把ctr预估的偏高~ 对比了ftrl和lr的预估结果，ftrl会把ctr预估的更高。

from alphafm.

CastellanZhang commented on August 20, 2024

@420742882 ，auc和logloss都比baseline好，说明训练工具本身没有问题。至于线上效果，我根本不知道你们具体业务，无法评价，而且线上情况涉及因素众多，很大可能不是训练工具的原因，需要你们全面debug了。

from alphafm.

420742882 commented on August 20, 2024

@CastellanZhang 你好。我们的业务是ctr预估。线下测试auc有千分之二的提升，logloss也是好于基线，但是线上效果就是差于基线。可以给一些排查思路么？新模型和base模型都是用同一套线上代码，有bug的概率比较小。

from alphafm.

shenleiz commented on August 20, 2024

博主好，我在运用alphaFM的过程中发现，上线效果，刚开始还可以，但一段时间之后，效果就下降了，没有最初上线那边好，不清楚是FTRL的问题吗，还是其他原因，您有遇到类似情况吗？

from alphafm.

CastellanZhang commented on August 20, 2024

@shenleiz ，我们的好多FTRL模型都是跑了两年多了都没有问题。据我所知FTRL已然是很成熟的算法，从2013年各家公司都开始广泛使用，所以不用担心算法本身是否正确，好好想想是哪个环节出了问题。尤其是之前都成功了最近不行了，那么二者对比是哪些因素发生了变化？如果一切重新开始是否还能像当初一样生效？需要自己好好实验分析。

from alphafm.

CasyWang commented on August 20, 2024

模型运行一段时间后，维度不断增高，出现一定程度的过拟合，就会导致效果下跌。这时候，需要retraining.发自我的iPhone------------------ 原始邮件 ------------------发件人: BruceZhao <[email protected]>发送时间: 2018年1月4日 17:59收件人: CastellanZhang/alphaFM <[email protected]>抄送: oliverwang <[email protected]>, Mention <[email protected]>主题: 回复：[CastellanZhang/alphaFM] alphaFM 效果如何？ (#3)博主好，我在运用alphaFM的过程中发现，上线效果，刚开始还可以，但一段时间之后，效果就下降了，没有最初上线那边好，不清楚是FTRL的问题吗，还是其他原因，您有遇到类似情况吗？ —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/CastellanZhang/alphaFM","title":"CastellanZhang/alphaFM","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/CastellanZhang/alphaFM"}},"updates":{"snippets":[{"icon":"PERSON","message":"@shenleiz in #3: 博主好，我在运用alphaFM的过程中发现，上线效果，刚开始还可以，但一段时间之后，效果就下降了，没有最初上线那边好，不清楚是FTRL的问题吗，还是其他原因，您有遇到类似情况吗？"}],"action":{"name":"View Issue","url":"#3 (comment)"}}}

from alphafm.

lcshr123 commented on August 20, 2024

模型运行一段时间后，维度不断增高，出现一定程度的过拟合，就会导致效果下跌。这时候，需要retraining.发自我的iPhone------------------ 原始邮件 ------------------发件人: BruceZhao [email protected]发送时间: 2018年1月4日 17:59收件人: CastellanZhang/alphaFM [email protected]抄送: oliverwang [email protected], Mention [email protected]主题: 回复：[CastellanZhang/alphaFM] alphaFM 效果如何？ (#3)博主好，我在运用alphaFM的过程中发现，上线效果，刚开始还可以，但一段时间之后，效果就下降了，没有最初上线那边好，不清楚是FTRL的问题吗，还是其他原因，您有遇到类似情况吗？ —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/CastellanZhang/alphaFM","title":"CastellanZhang/alphaFM","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"[https://github.com/CastellanZhang/alphaFM"}},"updates":{"snippets":{"icon":"PERSON","message":"@shenleiz in #3: 博主好，我在运用alphaFM的过程中发现，上线效果，刚开始还可以，但一段时间之后，效果就下降了，没有最初上线那边好，不清楚是FTRL的问题吗，还是其他原因，您有遇到类似情况吗？"}],"action":{"name":"View Issue","url":"#3 (comment)"}}}

请问这个是 FTRL 的通病吗？我也遇到了这种情况。我感觉在线学习应该可以一直持续训练才对

from alphafm.

aromazyl commented on August 20, 2024

FTRL本身就是给凸问题做优化的，FM非凸。

from alphafm.

dotsonliu commented on August 20, 2024

请问xlearn 和 alphafm哪个效果好？离线auc xlearn高了3个点

from alphafm.

alphaFM 效果如何？ about alphafm HOT 19 OPEN

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs