GithubHelp home page GithubHelp logo

关于评测测试集 about uniem HOT 6 CLOSED

wangyuxinwhy avatar wangyuxinwhy commented on September 3, 2024
关于评测测试集

from uniem.

Comments (6)

hjq133 avatar hjq133 commented on September 3, 2024

我看M3E把BQ、LCQMC、PAWSX的测试集也加进去训练了。比较那几个不一定公平。

from uniem.

Nipi64310 avatar Nipi64310 commented on September 3, 2024

我看M3E把BQ、LCQMC、PAWSX的测试集也加进去训练了。比较那几个不一定公平。

Hello @hjq133 ,
但是实际测试,m3e-base在这几个测试集上效果都比较差。。
测试脚本如下,修改于 https://github.com/bojone/BERT-whitening/tree/main/chn

from datasets import load_dataset

dataset = load_dataset("shibing624/nli_zh", "BQ") # ATEC or BQ or LCQMC or PAWSX or STS-B
print(dataset)
print(dataset['test'][:2])
import numpy as np
import scipy.stats
from sentence_transformers import SentenceTransformer
model_path = 'moka-ai/m3e-base'
model = SentenceTransformer(model_path,device='cuda')
def convert_to_vecs(data):
    """转换文本数据为向量形式
    """
    a_vecs = model.encode(data['sentence1'],batch_size=32)
    b_vecs = model.encode(data['sentence2'],batch_size=32)
    return a_vecs, b_vecs, np.array(data['label'])
a_vecs, b_vecs,labels = convert_to_vecs(dataset['test'])


def transform_and_normalize(vecs, kernel=None, bias=None):
    """应用变换,然后标准化
    """
    if not (kernel is None or bias is None):
        vecs = (vecs + bias).dot(kernel)
    norms = (vecs**2).sum(axis=1, keepdims=True)**0.5
    return vecs / np.clip(norms, 1e-8, np.inf)

def compute_corrcoef(x, y):
    """Spearman相关系数
    """
    return scipy.stats.spearmanr(x, y).correlation

# 变换,标准化,相似度,相关系数
all_corrcoefs = []
a_vecs = transform_and_normalize(a_vecs)
b_vecs = transform_and_normalize(b_vecs)
sims = (a_vecs * b_vecs).sum(axis=1)
corrcoef = compute_corrcoef(labels, sims)
all_corrcoefs.append(corrcoef)
print(all_corrcoefs)

# [0.6381030399066687]

image

from uniem.

wangyuxinwhy avatar wangyuxinwhy commented on September 3, 2024

这个是我看之前 text2vec 的评测结果

至于为什么没有使用 text2vec 的评测集,主要有两个考虑

  1. 对比不公平,text2vec 使用了数据集中的正例和负例,M3E 只使用了这些数据集中的正例,没有使用这些数据集中的负例,但可能因为划分方式的问题,导致标签泄漏。
  2. 用户实际使用中肯定都是域外的数据集,从公平和实用的角度,我选择了 text2vec 和 m3e 都没有见过的数据集,并尽量保证这些数据集的领域和任务都不同。

比如,上面图片中的 SOHU 数据集,text2vec 是在这个数据集上训练的,而 M3E 是从来没有见过这个数据集的,因此这样的比较肯定是有问题的,所以 text2vec 的作者在新的 README 中删掉了这一行。

from uniem.

hjq133 avatar hjq133 commented on September 3, 2024

我看M3E把BQ、LCQMC、PAWSX的测试集也加进去训练了。比较那几个不一定公平。

Hello @hjq133 , 但是实际测试,m3e-base在这几个测试集上效果都比较差。。 测试脚本如下,修改于 https://github.com/bojone/BERT-whitening/tree/main/chn

from datasets import load_dataset

dataset = load_dataset("shibing624/nli_zh", "BQ") # ATEC or BQ or LCQMC or PAWSX or STS-B
print(dataset)
print(dataset['test'][:2])
import numpy as np
import scipy.stats
from sentence_transformers import SentenceTransformer
model_path = 'moka-ai/m3e-base'
model = SentenceTransformer(model_path,device='cuda')
def convert_to_vecs(data):
    """转换文本数据为向量形式
    """
    a_vecs = model.encode(data['sentence1'],batch_size=32)
    b_vecs = model.encode(data['sentence2'],batch_size=32)
    return a_vecs, b_vecs, np.array(data['label'])
a_vecs, b_vecs,labels = convert_to_vecs(dataset['test'])


def transform_and_normalize(vecs, kernel=None, bias=None):
    """应用变换,然后标准化
    """
    if not (kernel is None or bias is None):
        vecs = (vecs + bias).dot(kernel)
    norms = (vecs**2).sum(axis=1, keepdims=True)**0.5
    return vecs / np.clip(norms, 1e-8, np.inf)

def compute_corrcoef(x, y):
    """Spearman相关系数
    """
    return scipy.stats.spearmanr(x, y).correlation

# 变换,标准化,相似度,相关系数
all_corrcoefs = []
a_vecs = transform_and_normalize(a_vecs)
b_vecs = transform_and_normalize(b_vecs)
sims = (a_vecs * b_vecs).sum(axis=1)
corrcoef = compute_corrcoef(labels, sims)
all_corrcoefs.append(corrcoef)
print(all_corrcoefs)

# [0.6381030399066687]

image

你这比较不太对。他那个是拿pretrain单独在每个集子上单独finetune的结果、
所以你得拿M3E单独每个集子train后再去测试。
shibing624/text2vec#51

from uniem.

wangyuxinwhy avatar wangyuxinwhy commented on September 3, 2024

我看M3E把BQ、LCQMC、PAWSX的测试集也加进去训练了。比较那几个不一定公平。

Hello @hjq133 , 但是实际测试,m3e-base在这几个测试集上效果都比较差。。 测试脚本如下,修改于 https://github.com/bojone/BERT-whitening/tree/main/chn

from datasets import load_dataset

dataset = load_dataset("shibing624/nli_zh", "BQ") # ATEC or BQ or LCQMC or PAWSX or STS-B
print(dataset)
print(dataset['test'][:2])
import numpy as np
import scipy.stats
from sentence_transformers import SentenceTransformer
model_path = 'moka-ai/m3e-base'
model = SentenceTransformer(model_path,device='cuda')
def convert_to_vecs(data):
    """转换文本数据为向量形式
    """
    a_vecs = model.encode(data['sentence1'],batch_size=32)
    b_vecs = model.encode(data['sentence2'],batch_size=32)
    return a_vecs, b_vecs, np.array(data['label'])
a_vecs, b_vecs,labels = convert_to_vecs(dataset['test'])


def transform_and_normalize(vecs, kernel=None, bias=None):
    """应用变换,然后标准化
    """
    if not (kernel is None or bias is None):
        vecs = (vecs + bias).dot(kernel)
    norms = (vecs**2).sum(axis=1, keepdims=True)**0.5
    return vecs / np.clip(norms, 1e-8, np.inf)

def compute_corrcoef(x, y):
    """Spearman相关系数
    """
    return scipy.stats.spearmanr(x, y).correlation

# 变换,标准化,相似度,相关系数
all_corrcoefs = []
a_vecs = transform_and_normalize(a_vecs)
b_vecs = transform_and_normalize(b_vecs)
sims = (a_vecs * b_vecs).sum(axis=1)
corrcoef = compute_corrcoef(labels, sims)
all_corrcoefs.append(corrcoef)
print(all_corrcoefs)

# [0.6381030399066687]

image

你这比较不太对。他那个是拿pretrain单独在每个集子上单独finetune的结果、 所以你得拿M3E单独每个集子train后再去测试。 shibing624/text2vec#51

对的,应该看我上面回复的那张图。另外,如果是想要看那个模型更适合,直接在自己的场景上面测试是最实在的~

from uniem.

Nipi64310 avatar Nipi64310 commented on September 3, 2024

感谢回复

from uniem.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.