Linux下ls命令排版

2018-01-24 programming

想用C语言模拟一下ls命令的效果，想了几种默认情况下的排版以及实现，感觉都不太满意，于是就好奇地去看了一下ls的源码。

Google 得到 ls 一类的基础命令属于 coreutils 的一部分，然后在http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ls.c找到了ls的源代码。
接着找到了其中的calculate_columns函数，具体的实现代码从第5068行开始，看起来不是太多，就尝试着阅读了一下。

/* Calculate the number of columns needed to represent the current set
   of files in the current display width.  */

static size_t
calculate_columns (bool by_columns)
{
  size_t filesno;		/* Index into cwd_file.  */
  size_t cols;			/* Number of files across.  */

  /* Normally the maximum number of columns is determined by the
     screen width.  But if few files are available this might limit it
     as well.  */
  size_t max_cols = MIN (max_idx, cwd_n_used);

  init_column_info ();

  /* Compute the maximum number of possible columns.  */
  for (filesno = 0; filesno < cwd_n_used; ++filesno)
    {
      struct fileinfo const *f = sorted_file[filesno];
      size_t name_length = length_of_file_name_and_frills (f);

      for (size_t i = 0; i < max_cols; ++i)
        {
          if (column_info[i].valid_len)
            {
              size_t idx = (by_columns
                            ? filesno / ((cwd_n_used + i) / (i + 1))
                            : filesno % (i + 1));
              size_t real_length = name_length + (idx == i ? 0 : 2);

              if (column_info[i].col_arr[idx] < real_length)
                {
                  column_info[i].line_len += (real_length
                                              - column_info[i].col_arr[idx]);
                  column_info[i].col_arr[idx] = real_length;
                  column_info[i].valid_len = (column_info[i].line_len
                                              < line_length);
                }
            }
        }
    }

  /* Find maximum allowed columns.  */
  for (cols = max_cols; 1 < cols; --cols)
    {
      if (column_info[cols - 1].valid_len)
        break;
    }

  return cols;
}

其中涉及到了几个前面的变量，比较重要的有以下几个：

cwd_n_used
```
/* Index of first unused slot in 'cwd_file'.  */
static size_t cwd_n_used;
```
cwd 应该是指当前工作目录（current working directory），这里cwd_n_used是cwd_file中已使用的数目，也就是指要打印的目录下一共有多少文件和文件夹。

max_idx

/* Maximum number of columns ever possible for this display.  */
static size_t max_idx;

当前显示区域能够放下的理论最大列数，可以找到它的初始值设置：

/* The minimum width of a column is 3: 1 character for the name and 2
   for the separating white space.  */
#define MIN_COLUMN_WIDTH	3

/* Determine the max possible number of display columns.  */
max_idx = line_length / MIN_COLUMN_WIDTH;
/* Account for first display column not having a separator,
   or line_lengths shorter than MIN_COLUMN_WIDTH.  */
max_idx += line_length % MIN_COLUMN_WIDTH != 0;

这里是假设了所有文件的文件名只有一个字符，然后两列之间隔两个空格，依此计算显示区域可以放下的理论最大列数。

column_info

/* Array with information about column filledness.  */
static struct column_info *column_info;

指向列信息数组的结构体指针，column_info[i]中存储当总列数为i时的列相关信息。结构体包含：

/* Information about filling a column.  */
struct column_info
{
  bool valid_len; //所有列的总宽度是否合法
  size_t line_len; //所有列的总宽度
  size_t *col_arr; //保存各列宽度的数组
};

从函数整体流程来看，先设置最大列数（max_cols）为理论最大列数（max_idx）和总文件数（cwd_n_used）中的较小值，当文件名可以在一行内全部显示时可以减少计算量。
然后遍历所有文件，对于每个文件（filesno），从总列数为0遍历到总列数为最大列数，当总列数为 i 时，计算该文件所在列（idx），并更新列数信息（column_info）中总列数为 i 时的情况（column_info[i]）。
最后在列数信息（column_info）中从大到小遍历，找到列总宽度合法（valid_len）的列。当列数最大时，行数自然就最小了。

emmmm，感觉有点暴力，每次都要尝试计算所有的可能排版方法。我觉得可以直接把总列数从大到小的循环放在外层，然后循环所有文件看列的总宽度是否合法，合法时直接跳出循环，感觉可能会好一点。

然后自己又遇到了汉字显示的问题：原本是利用strlen函数得到一列中文件名最长的那个的长度，加上两个空格作为这一列的宽度，然后像printf("%*s", max_width, filename);这样进行格式化输出。结果一个utf-8的汉字是三个字节，然后在虚拟终端上等宽字体占的宽度为2，此时同样是%5d，包含汉字的文件名显示的宽度就会比纯英文的更短。。。
例如：

int main(void)
{
    printf("%-5s%s\n", "念", "345");
    printf("%-5s%s\n", "12", "345");
    return 0;
}

结果为：
ls_bad_lenth 同样是%-5s，“念”占三个字节，因此后面补了两个空格；“12”占了两个字节，后面就补了三个空格。
因此需要对utf-8字符进行单独处理：

int getlen_str(char *str)
{
    /* 计算UTF-8文字在终端等宽字体下显示宽度
     *
     * UTF-8编码规则:
     * 1）对于单字节的符号，字节的第一位设为0，后面7位为这个符号的
     *    Unicode码。因此对于英语字母，UTF-8编码和ASCII码是相同的。
     * 2）对于n字节的符号（n > 1），第一个字节的前 n 位都设为1，第
     *    n+1位设为0，后面字节的前两位一律设为10。剩下的没有提及的
     *    二进制位，全部为这个符号的 Unicode 码。
     *
     * 参考http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html
     */
    int res = 0, i;
    while(*str != '\0')
    {
        if(*str > 0)
        {
            res++;
            str++;
        }
        else
        {
            for(i = 7;i >= 0;i--)
                if(!((*str>>i)&1)) break;
            res += 2;
            str += 7-i;
        }
    }
    return res;
}

假设计算得到的该列显示宽度为5，那么就可以用5-getlen_str(str)个空格来占位：

int main(void)
{
    int width = 5;
    char str1[] = "念", str2[] = "12";
    printf("%s%-*s%s\n", str1, width - getlen_str(str1), "", "345");
    printf("%s%-*s%s\n", str2, width - getlen_str(str2), "", "345");
    return 0;
}

结果为：
ls_good_lenth 看起来整齐多了。

念

Linux下ls命令排版

Related Posts

云服务器系统更新后出现 I/O error2025-12-23

使用libvirt和cloud image快速创建linux测试环境2025-04-13

回首（2025.2）2025-02-05