VSCode
远程连接服务器
- 
    
先检查本地(~/.ssh/id_rsa_pub)是否有ssh key,如果没有,则创建一个:
ssh-keygen -t rsa -b 4096 - 
    
添加自己的本地公钥到远程主机:
ssh-copy-id 用户名@服务器ip - 
    
打开remote的配置文件,添加以下内容
Host 起个名字 User 用户名 HostName 服务器ip 
服务器上网
export http_proxy=10.61.3.141:8888
export https_proxy=10.61.3.141:8888
- 
    
模型训练
python -m colbert.train –mask-punctuation
输入:–triples ‘./data_marco/triples.train.small.tsv’
输出:模型的checkpoint
 - 
    
rerank topK
python -m colbert.test –checkpoint /data/users/caiyinqiong/colbert/ColBERT_2/ColBERT/experiments/dirty/train.py/2020-12-02_20.02.43/checkpoints/colbert.dnn
输入:模型的checkpoint
 load_qrels (注意:queries和collection不设值)
 topk
输出:ranking.tsv、ranking.metrics
 - 
    
编码collection,建立index
python -m colbert.index –checkpoint /data/users/caiyinqiong/colbert/ColBERT_2/ColBERT/experiments/dirty/train.py/2020-12-02_20.02.43/checkpoints/colbert.dnn
输入:模型的checkpoint
 collection ‘./data_marco/collection.tsv’
 index_root
 index_name
输出:index
 - 
    
由index得到faiss_index
python -m colbert.index_faiss
输入:index_root
 index_name
输出:faiss建立的index
 - 
    
第一阶段检索
python -m colbert.retrieve –checkpoint /data/users/caiyinqiong/colbert/ColBERT_2/ColBERT/experiments/dirty/train.py/2020-12-02_20.02.43/checkpoints/colbert.dnn
输入:模型的checkpoint
 queries:’./data_marco/queries.dev.small.tsv’
 collection:’./data_marco/collection.small.tsv’ (没用)
 qrels:’./data_marco/qrels.dev.small.tsv’ (没用)
 index_root
 index_name
输出:ranking.tsv
 - 
    
重排序topK
python -m colbert.rerank –checkpoint /data/users/caiyinqiong/colbert/ColBERT_2/ColBERT/experiments/dirty/train.py/2020-12-02_20.02.43/checkpoints/colbert.dnn
输入:模型的checkpoint
 queries:’./data_marco/queries.dev.small.tsv’
 collection:’./data_marco/collection.small.tsv’ (没用)
 qrels:’./data_marco/qrels.dev.small.tsv’ (没用)
 topk
 index_root (没用)
 index_name (没用)
输出:ranking.tsv
 
triples.train.small.tsv:quert \t pos_passage \t neg_passage
collection.tsv:id \t passage
queries.dev.small.tsv:id \t query