huangjing

很多人认为explore机制在冷启动中会非常有效，但是工业界很少看到explore用到广告系统的介绍和传闻。当然也许是因为场景限制，即一个广告系统只有一次冷启动，所以没有太大的工业应用价值。但是，我最近参与的广告系统正好会遇到非常多的冷启动问题，即我们的广告系统会不断的有新的广告类型和形式出现，因此有很多冷启动场景，按照道理explore在初期会有比较好的效果，但是在我们最近对广告系统的实验中，发现想象很美好，现实很骨感。同时，我们也很惊讶的发现，LR比你想象的更强大，更稳健，果然是simple but not simpler。

根据我们的实验结果，总结了explore机制的几个鸡肋原因：

通常情况下，我们认为ee有两个地方可能会出效果

1.冷启动：系统刚开始启动的时候，LR数据不够。

在我们的广告系统中，这个假设并不对。

（1）广告库的主体变化不大（与新闻推荐，物料每天都变化相比），探测空间没那么大

（2）即使初期数据不够，LR有一定的overfitting也没关系，因为在广告系统中，总展现一个广告没问题，并且是好的（持续带来好的pay），其实在LR中加入自解释特征其实给这些好的广告是有策略倾斜的，在广告系统，这个策略倾斜是好，对于新闻推荐是不好的，因为没有人愿意看老新闻，但是好的老新闻在LR系统中会给比较好的点击率，此时新的新闻出现的机会会少（尽管比历史上证明差的新闻出现概率高，但仍然不够）

（3）实际数据也标明，即使一天的数据，也能训练一个auc还可以的LR，这时的LR具有这样的性质：展现充分的广告预估ctr基本为后验平均ctr（通过自解释特征），展现不充分的广告通过泛化的特征有一个天然的explore（即肯定比线上已知表现差的广告展现概率高，比线上已知表现好的广告展现概率低），和ee相比，它只是不具备展现次数越少，被explore出来的概率越大的这种性质罢了。

其次，ee算法存在本身的局限性：

不管e&e用的什么算法，核心都是有个均值，然后有个上下届估计（或者分布估计），E&E认为这个均值是确定的，不会随着其他环境变化而变化。而对于广告系统，即使我们选取query-广告这么细的粒度，都会发现不是定值，还会受到地域，位次等众多因素的影响，如果我们把这些环境都考虑进去，会使得单个item的粒度太小，数据积累慢，收敛慢，而如果我们不考虑这些因素，item的粒度太大，受环境因素影响，p不再是定值，后面的算法也就不会有好的效果。如果考虑LINUCB对环境的因素做线性建模，但是这样和LR就没太大差异了。

2.线上已经有LR了，但是LR可能会因为bias，有些新广告永远出不来。

（1）这种钻空子的行为在线上比较难收到预期的效果，从原理上说，除非有大量好的广告在泛化特征上打分很低，导致不能展现出来。如果不做任何分析，直接上explore，有点撞大运的感觉了（很难保证能explore出好的）。但是如果仔细分析，并发现这些case，似乎从中找到规律加入合适的泛化特征效率更高。

（2）LR对样本的学习效率远高于E&E，因为E&E是对单样本做的学习，样本与样本之间没有关联，而LR是对样本的特征做的学习，一个样本会同时带来多个特征的更新

在我们实际的实验中，2已经被充分证明，撞大运很难，而1从前期分析来看，希望不是很大，我们似乎只能寄希望E&E+LR在探测样本上会被LR效率更高，从而收敛比单纯LR快了。实验马上会进行，后续会再补充实验的效果。

PPT(请翻墙！！)

R代码

library(ggplot2)
#######################################
# multi-armed bandit problem
# try number
#######################################


N = 10000
# ad info
ctr   <- c(0.3, 0.1, 0.2, 0.5)

#########################
# experiment for e-greedy
#########################

result1=eps_greedy(N, ctr, 0.01)
result2=eps_greedy(N, ctr, 0.1)
result3=eps_greedy(N, ctr, 0.2)

best_index<-data.frame("algo1"=result1$best_ratio, 
                       "algo2"=result2$best_ratio, 
                       "algo3"=result3$best_ratio)

lost_com<-data.frame("algo1"=result1$regret, 
                     "algo2"=result2$regret, 
                     "algo3"=result3$regret)

ggplot(data=lost_com, aes(x=c(1:N))) + 
  geom_point(aes(y=algo1), colour="red") +
  geom_point(aes(y=algo2), colour="blue") + 
  geom_point(aes(y=algo3), colour="black")

ggplot(data=best_index, aes(x=c(1:N))) + 
  geom_line(aes(y=algo1), colour="red") +
  geom_line(aes(y=algo2), colour="blue") + 
  geom_line(aes(y=algo3), colour="black")


#########################
# experiment for UCB
#########################
result=UCB(N, ctr)
best_index<-data.frame("algo"=result$best_ratio)
ggplot(data=best_index, aes(x=c(1:N))) + 
  geom_point(aes(y=algo))


lost_com<-data.frame("algo"=result1$regret) 
lost_com$com <- log(c(1:N))
ggplot(data=lost_com, aes(x=c(1:N))) + 
  geom_line(aes(y=algo)) +geom_line(aes(y=38*com))

clk_shw = data.frame(result$clk_shw)
ggplot(data=clk_shw, aes(x=c(1:N))) + 
  geom_line(aes(y=X5), colour="red") +
  geom_line(aes(y=X6), colour="blue") + 
  geom_line(aes(y=X7), colour="black") + 
  geom_line(aes(y=X8), colour="green")  

interval_plot(as.vector(clk_shw[100,], mode = "numeric"))
interval_plot(as.vector(clk_shw[5000,], mode = "numeric"))
interval_plot(as.vector(clk_shw[7500,], mode = "numeric"))
interval_plot(as.vector(clk_shw[10000,], mode = "numeric"))


interval_plot<-function(s)
{
  cnt = length(s)/2
  data = data.frame("x"=rep(0, cnt*3), "y"=rep(0, cnt*3))
  for (i in 1:cnt)
  {
    clk = s[i]
    shw = s[i+cnt]
    data[3*i-2,1] = i
    data[3*i-1,1] = i
    data[3*i,1] = i
    data[3*i-2,2] = clk/shw
    data[3*i-1,2] = clk/shw + sqrt(2 * log(sum(shw))/shw)
    data[3*i,2] = clk/shw - sqrt(2 * log(sum(shw))/shw)
  }
  ggplot(data = data, aes(x=x, y=y)) + geom_point() + ylim(-0.5,1)
}


#########################
# experiment for thompson sampling
#########################

result1=thompson(N, ctr, alpha=1, beta=10)
result2=thompson(N, ctr, alpha=10, beta=100)
result3=thompson(N, ctr, alpha=100, beta=10000)

result1=thompson(N, ctr, alpha=1, beta=10)
result2=thompson(N, ctr, alpha=1, beta=10)
result3=thompson(N, ctr, alpha=1, beta=10)


best_index<-data.frame("algo1"=result1$best_ratio, 
                       "algo2"=result2$best_ratio, 
                       "algo3"=result3$best_ratio)

lost_com<-data.frame("algo1"=result1$regret, 
                     "algo2"=result2$regret, 
                     "algo3"=result3$regret)

ggplot(data=lost_com, aes(x=c(1:N))) + 
  geom_point(aes(y=algo1), colour="red") +
  geom_point(aes(y=algo2), colour="blue") + 
  geom_point(aes(y=algo3), colour="black")

ggplot(data=best_index, aes(x=c(1:N))) + 
  geom_line(aes(y=algo1), colour="red") +
  geom_line(aes(y=algo2), colour="blue") + 
  geom_line(aes(y=algo3), colour="black")

#########################
# experiment for the compare of 3 methods
#########################
result1=eps_greedy(N, ctr, 0.1)
result2=UCB(N, ctr)
result3=thompson(N, ctr)

best_index<-data.frame("algo1"=result1$best_ratio, 
                       "algo2"=result2$best_ratio, 
                       "algo3"=result3$best_ratio)

lost_com<-data.frame("algo1"=result1$regret, 
                     "algo2"=result2$regret, 
                     "algo3"=result3$regret)

ggplot(data=lost_com, aes(x=c(1:N))) + 
  geom_point(aes(y=algo1), colour="red") +
  geom_point(aes(y=algo2), colour="blue") + 
  geom_point(aes(y=algo3), colour="black")

ggplot(data=best_index, aes(x=c(1:N))) + 
  geom_line(aes(y=algo1), colour="red") +
  geom_line(aes(y=algo2), colour="blue") + 
  geom_line(aes(y=algo3), colour="black")

###############################################
# get best_chose_ratio according to best_index
###############################################
get_best_ratio <- function(ctr, best_index)
{
  best = which(ctr == max(ctr))[1]
  N = length(best_index)
  best_index_ratio = rep(0, N)
  s=0
  for (i in 1:N)
  {
    if (best_index[i] == best)
    {
      s = s + 1
    }
    best_index_ratio[i] = s/i
  }
  best_index_ratio
}


########################################################
# Algo1:
# e-greedy
########################################################
eps_greedy<-function(N, ctr, e=0.1){
  index <- c(1:length(ctr))
  best = 1
  lost = rep(0, N)
  clk <- rep(0, length(ctr))
  shw <- rep(1, length(ctr))
  best_index <- rep(0, N)
  for(i in 1:N){
    prob = runif(1)
    # explore
    if (prob < e)
    {
      try_index = sample(index[-best],1)
    }
    else
    {
      try_index = best 
    }
    best_index[i] = try_index
    shw[try_index] = shw[try_index] + 1
    x = runif(1)

    if (x <= ctr[try_index])
    {
      clk[try_index] = clk[try_index] + 1
    } 
    lost[i] = max(ctr) * i - sum(clk) 
    # renew best index   
    best = which(clk/shw == max(clk/shw))[1]
  }
  best_ratio = get_best_ratio(ctr, best_index)
  list("best_ratio"=best_ratio, "regret"=lost)
}

########################################################
# Algo2:
# ucb
########################################################
UCB<-function(N, ctr){
  index <- c(1:length(ctr))
  best = 1
  lost = rep(0, N)
  clk <- rep(0, length(ctr))
  shw <- rep(1, length(ctr))
  best_index <- rep(0, N)
  clk_shw <-matrix(data=NA, nrow = N, ncol = length(ctr) * 2)
  for(i in 1:N)
  {
    prior = clk/shw + sqrt(2 * log(sum(shw))/shw)
    clk_shw[i, 1:4] = clk
    clk_shw[i, 5:8] = shw
    best = which(prior == max(prior))[1]
    best_index[i] = best
    shw[best] = shw[best]+1
    x = runif(1)
    if (x <= ctr[best])
    {
      clk[best] = clk[best] + 1
    }
    lost[i] = max(ctr) * i - sum(clk) 
  }
  best_ratio = get_best_ratio(ctr, best_index)
  list("best_ratio"=best_ratio, "regret"=lost, "clk_shw"=clk_shw)
}

########################################################
# Algo3:
# Thompson Sampling
########################################################

thompson<-function(N, ctr, alpha = 1, beta = 1){
  index <- c(1:length(ctr))
  best = 1
  lost = rep(0, N)
  clk <- rep(0, length(ctr))
  shw <- rep(1, length(ctr))
  best_index <- rep(0, N)
  for(i in 1:N)
  {
    theta = rep(0, length(index))
    for(j in index)
    {
      theta[j] = rbeta(1, clk[j] + alpha, shw[j] - clk[j] + beta)
    }
    best = which(theta == max(theta))[1]
    best_index[i] = best
    shw[best] = shw[best]+1
    x = runif(1)
    if (x <= ctr[best])
    {
      clk[best] = clk[best] + 1
    }
    lost[i] = max(ctr) * i - sum(clk) 
  }
  best_ratio = get_best_ratio(ctr, best_index)
  list("best_ratio"=best_ratio, "regret"=lost)
}

EE-想说爱你不容易

Categories

Archives