VSCode
远程连接服务器
-
先检查本地(~/.ssh/id_rsa_pub)是否有ssh key,如果没有,则创建一个:
ssh-keygen -t rsa -b 4096
-
添加自己的本地公钥到远程主机:
ssh-copy-id 用户名@服务器ip
-
打开remote的配置文件,添加以下内容
Host 起个名字 User 用户名 HostName 服务器ip
服务器上网
export http_proxy=10.61.3.141:8888
export https_proxy=10.61.3.141:8888
-
模型训练
python -m colbert.train –mask-punctuation
输入:–triples ‘./data_marco/triples.train.small.tsv’
输出:模型的checkpoint
-
rerank topK
python -m colbert.test –checkpoint /data/users/caiyinqiong/colbert/ColBERT_2/ColBERT/experiments/dirty/train.py/2020-12-02_20.02.43/checkpoints/colbert.dnn
输入:模型的checkpoint
load_qrels (注意:queries和collection不设值)
topk
输出:ranking.tsv、ranking.metrics
-
编码collection,建立index
python -m colbert.index –checkpoint /data/users/caiyinqiong/colbert/ColBERT_2/ColBERT/experiments/dirty/train.py/2020-12-02_20.02.43/checkpoints/colbert.dnn
输入:模型的checkpoint
collection ‘./data_marco/collection.tsv’
index_root
index_name
输出:index
-
由index得到faiss_index
python -m colbert.index_faiss
输入:index_root
index_name
输出:faiss建立的index
-
第一阶段检索
python -m colbert.retrieve –checkpoint /data/users/caiyinqiong/colbert/ColBERT_2/ColBERT/experiments/dirty/train.py/2020-12-02_20.02.43/checkpoints/colbert.dnn
输入:模型的checkpoint
queries:’./data_marco/queries.dev.small.tsv’
collection:’./data_marco/collection.small.tsv’ (没用)
qrels:’./data_marco/qrels.dev.small.tsv’ (没用)
index_root
index_name
输出:ranking.tsv
-
重排序topK
python -m colbert.rerank –checkpoint /data/users/caiyinqiong/colbert/ColBERT_2/ColBERT/experiments/dirty/train.py/2020-12-02_20.02.43/checkpoints/colbert.dnn
输入:模型的checkpoint
queries:’./data_marco/queries.dev.small.tsv’
collection:’./data_marco/collection.small.tsv’ (没用)
qrels:’./data_marco/qrels.dev.small.tsv’ (没用)
topk
index_root (没用)
index_name (没用)
输出:ranking.tsv
triples.train.small.tsv:quert \t pos_passage \t neg_passage
collection.tsv:id \t passage
queries.dev.small.tsv:id \t query