Search for text in files of same name in multiple folders of various depths

Joe Published at Dev

Joe

Situation: In Linux, I have a parent folder with almost 100 folders of various names. Each folder has a file ResourceParent.xml and hundreds of of different version numbers each of which has its own ResourceVer.xml file. I am interested in both ResourceParent.xml in the 1st level folder and the ResourceVer.xml in the LATEST version folder (highest number) e.g. ver548.

I need to search inside each file for 3 tags .txt|.csv|.xls and return the information inside these tags into a report.txt file. The tags usually on the same line so I think Grep is ok.

What I've tried:

grep -nr -E ".txt|.csv|.xls" . > /dir/to/the/ReportFile.txt

This takes way too long as it searches in every one of the thousands of directories and produces a lot of unnecessary duplicated data.

Also, I've tried to go into each folder depending on what I'm looking for and running this script, which is a bit better re reduced duplicates and more relevant data, but it is still too cumbersome.

Question: How do I run a linux script to search for tags in a file structure that looks like this: Tags of interest inside .xml files:

".txt|.csv|.xls"

current location:

/dir

File of interest 1:

/dir/par/ResourceParent.xml

File of interest 2:

(need the latest ver number)

/dir/par/ver###/ResourceVer.xml

Needed output file:

ResourceReport.txt

Update

I found ls | tail -1 selects the folder with the greatest ver number. So I think the answer involves this..

Zanna

Perhaps with two commands...

grep --include="ResourceParent.xml" -r -E '.txt|.csv|.xls' > file
for d in par*; a=("$d"/*); b=($(sort -V <<<"${a[*]}")); grep -HE '.txt|.csv|.xls' "${b[@]: -1}"/*; done >> file

The second one puts the contents of each directory at the par level into an array sorted by version number so that you can search just the last item in the array. This seems to work (I am getting the last version number) and only takes a couple of seconds on my test directory structure (the first command takes about twice as long).

If your version numbers are padded so that they sort naturally, for the second command you would be able to use simply:

for d in par*; a=("$d"/*); grep -HE '.txt|.csv|.xls' "${a[@]: -1}"/*; done >> file

I mean if your numbers are ver1 ver2 ... ver100, you will need to sort the array, but if they are ver001, ver002 ... ver100, you will not need to sort the array because it will be in the right order anyway.

You may need to replace "${b[@]: -1}"/* with "${b[@]: -1}"/ResourceVer.xml. I did not create other files. You will presumably also need to replace par* with something (I think you said you have about 100 directories at this level).

But maybe you wanted the data sorted by the directories at the level of par so that you get

data from par1/ResourceParent.xml
data from par1/ver{latest}/ResourceVer.xml
data from par2/ResourceParent/xml
data from par2/ver{latest}/ResourceVer.xml

You could perform some text processing on the output file, but it depends how your par directories are named. Since I named them par1 par2 ... par200

sort -V file >> betterfile

will to do that job, assuming filenames had no newlines.

You could also trim off the filenames by using grep -h (instead of -H) in the original commands (though that would mean that you could not sort the data afterwards by the above method), or by text processing at the end, for example, if your filenames have no colons or newlines this would be quite reliable:

sed 's/^[^:]*://' file

You can write to the file instead of stdout by adding the -i flag to sed after testing.

Thanks to John1024 whose answer on U&L provides a great way to get the last filename that doesn't rely on parsing the output of ls or find or gratuitously loop over the structure to count iterations.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-04-19

Comments

0 comments

TOP Ranking

Article

Search for text in files of same name in multiple folders of various depths

Search for text in files of same name in multiple folders of various depths

pump.io port in URL

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

How to import an asset in swift using Bundle.main.path() in a react-native native module

How to use HttpClient with ANY ssl cert, no matter how "bad" it is

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

Modbus Python Schneider PM5300

What is the exact difference between “ use_all_dns_ips” and "resolve_canonical_bootstrap_servers_only” in client.dns.lookup options?

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

BigQuery - concatenate ignoring NULL

Is there an option for a Simulink Scope to display the layout in single column?

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

How to define a specific version of macOS in C

MERGE with DELETE on target with partial match on source?

Apache rewrite or susbstitute rule for bugzilla HTTP 301 redirect

Soundcloud API Authentication | NodeWebkit, redirect uri and local file system

express js can't redirect user

UWP access denied

How to Set Particular Area/Region Selected MapView

split column by delimiter and deleting expanded column

Center buttons and brand in Bootstrap

How to design a xml file to display in more screen