add a function to filter letters combinations
#1
Hello, i would like to add a filter that check wether a letter can be combined or not with another.
the goal would be to avoid creation of impossible words like "ghjkkt", since people tend to use passwords made or inspired by real words. 
So my idea is to insert a control function that take the current letter of the combination, and the candidate next letter, check if the combo would be validor not.
if not, return false, and check the next letter candidate . if true, hashcat can go on with the creation of the word.
it can be done by giving an array that , for each alphabet letter, list those letters that cannot be put after it.
As an example, this ruleset is made for the italian language, so after letter "a" can go anything, after letter "b" cannot be put  QWTPSDFGHJKZXCV, etc..



Code:
#include <stdbool.h> // For the bool data type
#include <string.h>  // For the strstr() function
#include <ctype.h>  // For the toupper() function
// Definition of the rules for forbidden letters.
const char* rules[26] = {
    "",                        // A
    "QWTPSDFGHJKZXCV",          // B
    "QWTPDFGJKZXBNM",          // C
    "QWTPSFHGKZXCVBNM",        // D
    "",                        // E
    "QWTDGHJKZXCVB",            // F
    "QTPJKZXCBM",              // G
    "QWRTUPSDGFHJKLZXCVBNM",    // H
    "",                        // I
    "QWRTUPSDGFHJKLZXCVBNM",    // J, example value, following the pattern of other definitions
    "QWRTUPSDGFHJKLZXCVBNM",    // K, example value
    "WHJX",                    // L
    "QWDFGHJKZXN",              // M
    "WHKXM",                    // N
    "",                        // O
    "QWTDGHJKZXVB",            // P
    "QRTPSDFGHJKLZXCVBNM",      // Q
    "QWJKX",                    // R
    "JX",                      // S
    "QPDGJKZXCB",              // T
    "",                        // U
    "QWTPSDFGHJKZXCBM",        // V
    "QWRTUPSDGFHJKLZXCVBNM",    // W
    "QWRTUPSDGFHJKLZXCVBNM",    // X
    "QWRTUPSDGFHJKLZXCVBNM",    // Y
    "QWRTPDGFJKXCVB"            // Z
};
bool isValidCombination(char currentLetter, char nextLetter) {
    // Convert letters to uppercase for simplicity of comparison
    currentLetter = toupper(currentLetter);
    nextLetter = toupper(nextLetter);

    // Get the corresponding index to the current letter in the rules array
    int index = currentLetter - 'A';
   
    // Check if the next letter is present in the string of forbidden letters
    if (strchr(rules[index], nextLetter) != NULL) {
        // The next letter is forbidden after the current letter
        return false;
    }

    // The combination is valid
    return true;
}

what do you think, it can be useful? the gola is to form only realistic words, and avoid all the nonsense like "hgghmhx"
If this can work, someone can give some hint on where to check on the source code to insert this? thank you
Reply
#2
as far as i know these would reduce the overall performance for fast hashes dramatically, because the logic of droppping candidates this way is way to slow compared to generating and hashing

on the other hand there is already an optimizer called markov chain for this, which is implemented
Reply
#3
(03-28-2024, 02:35 PM)Snoopy Wrote: as far as i know these would reduce the overall performance for fast hashes dramatically, because the logic of droppping candidates this way is way to slow compared to generating and hashing

on the other hand there is already an optimizer called markov chain for this, which is implemented

thanks for the reply.
Actually I dont see how will be slower overall, considering the number of cominations that would be saved in this way.
I mean, maybe will be slower for <10 char words, but for >10 there is no way this method would slower the normal algo...
Reply
#4
Unfortunately, Snoopy is likely correct about the performance impacts. A significant portion of the speed of hashcat in mask/bruteforce mode comes from being able to generate and distribute work extremely quickly across the computing device(s). Introducing a filtering step like this has been tried a few times and the performance penalty is usually pretty significant.

If your intent is just to reduce the amount of "non-human" looking candidates, then the markov chains and markov cutoff that are already implemented may already do what you are looking for. We order keyspace using the markov chains specifically to test "nonsense" candidates later or, with the cutoff, not at all.
Reply
#5
(03-28-2024, 08:06 PM)Chick3nman Wrote: Unfortunately, Snoopy is likely correct about the performance impacts. A significant portion of the speed of hashcat in mask/bruteforce mode comes from being able to generate and distribute work extremely quickly across the computing device(s). Introducing a filtering step like this has been tried a few times and the performance penalty is usually pretty significant.

If your intent is just to reduce the amount of "non-human" looking candidates, then the markov chains and markov cutoff that are already implemented may already do what you are looking for. We order keyspace using the markov chains specifically to test "nonsense" candidates later or, with the cutoff, not at all.

thank you for the clarification. I said that because everytime I look at the combinations being tested , I always see nonsense like "hjdnudsn", with 3 consonants one after the other, or 2 consonants that in no way would be part of an actual word.
Reply