whenever

  • Home

  • Tags21

  • Categories6

  • Archives122

  • About

C++读取ivecs格式数据

Posted on 2019-08-12 In 近似最近邻搜索

引言

近似最近邻搜索中通常会涉及到fvecs和ivecs格式的数据,其中,原始数据一般为fvecs格式的数据,查询结果一般为ivecs格式的。ivecs内部存储的主要是数据的id,数据类型为unsigned类型。就其内部数据结构而言,行数为查询点的个数,列数为对每个查询点查询返回个数再加1,因为每行的第一个位置存储的是对每个查询点查询返回个数。

可以通过程序来读取ivecs格式数据的内容,下面是用c++程序读取ivecs格式数据内容并输出其查询数据个数和对每个查询点查询返回个数。

C++读取ivecs格式数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <iostream>
#include <fstream>
#include <vector>

void load_ivecs_data(const char* filename,
std::vector<std::vector<unsigned> >& results, unsigned &num, unsigned &dim) {
std::ifstream in(filename, std::ios::binary);
if (!in.is_open()) {
std::cout << "open file error" << std::endl;
exit(-1);
}
in.read((char*)&dim, 4);
//std::cout<<"data dimension: "<<dim<<std::endl;
in.seekg(0, std::ios::end);
std::ios::pos_type ss = in.tellg();
size_t fsize = (size_t)ss;
num = (unsigned)(fsize / (dim + 1) / 4);
results.resize(num);
for (unsigned i = 0; i < num; i++) results[i].resize(dim);

in.seekg(0, std::ios::beg);
for (size_t i = 0; i < num; i++) {
in.seekg(4, std::ios::cur);
in.read((char*)results[i].data(), dim * 4);
}
in.close();
}

int main(int argc, char** argv) {
std::vector<std::vector<unsigned> > true_load;
unsigned dim, num;
load_ivecs_data(argv[1], true_load, num, dim);
for(size_t i = 0; i < num; i++) {
for(size_t j = 0; j < dim; j++) {
std::cout << true_load[i][j] << " ";
}
std::cout << std::endl;
}
std::cout << "result_num:"<< num << std::endl << "result dimension:" << dim << std::endl;
return 0;
}

参考文献

[1]付聪, NSG : Navigating Spread-out Graph For Approximate Nearest Neighbor Search, https://github.com/ZJULearning/nsg, 2019.8.12.

稀罕作者
Mengzhao Wang WeChat Pay

WeChat Pay

Mengzhao Wang Alipay

Alipay

# ANNS # C/C++ # 源码阅读
量化编码的分层可通航小世界图(HNSW)笔记
C++读取fvecs格式数据(SIFT1M数据集的结构)
  • Table of Contents
  • Overview
Mengzhao Wang

Mengzhao Wang

Try? All the way !
122 posts
6 categories
21 tags
  1. 1. 引言
  2. 2. C++读取ivecs格式数据
  3. 3. 参考文献
© 2021 Mengzhao Wang