c# - Returning Dictionary from Linq Query -
c# - Returning Dictionary<FileHash, string[]> from Linq Query -
thanks in advance assistance. i'm not sure if possible, i'm trying list of duplicate files using hashes identify list of files associated hashes.
i have below:
dictionary<filehash, string[]> findduplicatefiles(string searchfolder) { directory.getfiles(searchfolder, "*.*") .select( f => new { filename = f, filehash = encoding.utf8.getstring(new sha1managed() .computehash(new filestream(f, filemode. openorcreate, fileaccess.read))) }) .groupby(f => f.filehash) .select(g => new { filehash = g.key, files = g.select(z => z.filename).tolist() }) .groupby(f => f.filehash) .select(g => new {filehash = g.key, files = g.select(z => z.files).toarray()});
it compiles fine, i'm curious whether there's way manipulate results homecoming dictionary.
any suggestions, alternatives, critiques appreciated.
there's extension method this. stick @ end of existing query:
.todictionary(x => x.filehash, x => x.files);
however: using encoding.utf8.getstring
convert arbitrary binary data string bad idea. utilize convert.tobase64string
instead. hash not utf-8 encoded string, don't treat one.
you're grouping hash twice, suspect isn't want do.
alternatively, remove previous groupby
calls , utilize lookup
instead:
var query = directory.getfiles(searchfolder, "*.*") .select(f => new { filename = f, filehash = convert.tobase64string( new sha1managed().computehash(...)) }) .tolookup(x => x.filehash, x => x.filename);
that give lookup<string, string>
, files grouped hash.
one farther thing note: suspect you'll leaving file streams open method. suggest write little separate method compute hash of file based on name, making sure close stream (with using
statement in normal way). end making query simpler - along lines of:
var query = directory.getfiles(searchfolder) .tolookup(x => computehash(x));
it's hard simplify much farther :)
c# linq file duplicates
Comments
Post a Comment