python文件遍历与匹配过滤

目录

路径/home/ghost/workspace/Other/结构如下

├── git  
├── input  
│   ├── csv  
│   │   ├── test_file_1.csv  
│   │   └── test_file_2.csv  
│   ├── test.csv  
│   ├── test_file_1.txt  
│   └── test_file_2.txt  
├── input-archive  
└── temp  

现用python进行文件遍历与过滤

显示当前层级文件夹

import os
work_folder = '/home/ghost/workspace/Other'
os.listdir(work_folder)

结果如下:

['input', 'temp', 'input-archive', 'git']  

遍历所有文件和文件夹(含子文件夹)

如下代码中root是基准文件夹,dirs是基准文件夹下的文件夹,files为基准文件夹下的文件

exclude  = ['git','temp']    # 遍历时希望排除的文件夹
for root, dirs, files in os.walk(work_folder):
    for ex in exclude:
        if ex in dirs:
            dirs.remove(ex)  # 移除 dirs 中不想继续遍历的文件夹
    print(root,dirs,files)

结果如下:

/home/ghost/workspace/Other ['input', 'input-archive', 'git'] []  
/home/ghost/workspace/Other/input ['csv'] ['test_file_1.txt', 'test_file_2.txt', 'test.csv']  
/home/ghost/workspace/Other/input/csv [] ['test_file_2.csv', 'test_file_1.csv']  
/home/ghost/workspace/Other/input-archive [] []  

模式匹配过滤文件

找出目录下所有csv文件(含子目录),这里用到glob模块,recursive为True配合 ** 符号代表递归向下搜索。

import glob
pat = '/home/ghost/workspace/Other/input/**/*.csv'
for csv in glob.glob(pat,recursive = True):
    print(csv)

结果如下:

/home/ghost/workspace/Other/input/test.csv  
/home/ghost/workspace/Other/input/csv/test_file_2.csv  
/home/ghost/workspace/Other/input/csv/test_file_1.csv