python文件遍历与匹配过滤
目录
路径/home/ghost/workspace/Other/结构如下
├── git
├── input
│ ├── csv
│ │ ├── test_file_1.csv
│ │ └── test_file_2.csv
│ ├── test.csv
│ ├── test_file_1.txt
│ └── test_file_2.txt
├── input-archive
└── temp
现用python进行文件遍历与过滤
显示当前层级文件夹
import os
work_folder = '/home/ghost/workspace/Other'
os.listdir(work_folder)
结果如下:
['input', 'temp', 'input-archive', 'git']
遍历所有文件和文件夹(含子文件夹)
如下代码中root是基准文件夹,dirs是基准文件夹下的文件夹,files为基准文件夹下的文件
exclude = ['git','temp'] # 遍历时希望排除的文件夹
for root, dirs, files in os.walk(work_folder):
for ex in exclude:
if ex in dirs:
dirs.remove(ex) # 移除 dirs 中不想继续遍历的文件夹
print(root,dirs,files)
结果如下:
/home/ghost/workspace/Other ['input', 'input-archive', 'git'] []
/home/ghost/workspace/Other/input ['csv'] ['test_file_1.txt', 'test_file_2.txt', 'test.csv']
/home/ghost/workspace/Other/input/csv [] ['test_file_2.csv', 'test_file_1.csv']
/home/ghost/workspace/Other/input-archive [] []
模式匹配过滤文件
找出目录下所有csv文件(含子目录),这里用到glob模块,recursive为True配合 ** 符号代表递归向下搜索。
import glob
pat = '/home/ghost/workspace/Other/input/**/*.csv'
for csv in glob.glob(pat,recursive = True):
print(csv)
结果如下:
/home/ghost/workspace/Other/input/test.csv
/home/ghost/workspace/Other/input/csv/test_file_2.csv
/home/ghost/workspace/Other/input/csv/test_file_1.csv