Suppose that you are given a string. Write a function to find the first nonrepeated character in that string. Here’s an example: suppose you are given the string “interview”. The first nonrepeated character in that string is ‘n’, because ‘i’ appears twice in the string. And the first nonrepeated character for “racecar” is ‘e’ – which is the first and only nonrepeated character. Explain the efficiency of your algorithm.




Finding a solution for this problem is quite easy – but the hard part is finding an efficient solution. The very first solution that you may come up with is this: starting from the beginning of the string, for each character in the string, compare it to all of the other characters in the string. If you find another character that matches the current character then you know that you have found a repeat of that character. As soon as you find a character that has no matches, then you have found your first nonrepeated character. And that solution works just fine. But, the question is: how efficient is this solution – in other words, what is the time order of this solution? Remember that time order means the same thing as Big O.

The time order of our first solution

Well, in the worst case we would have to go through the entire string to find that the very last character is the first non repeated character (like in the string “xxyyz”) or that there is no non-repeated character (like in the string “toot”). This means that for a string that has n characters, we would have to do n comparisons for each character, giving us n*n which is n2. In terms of Big O Notation, this means that the algorithm will be O(n2). For very large strings, it’s very likely that this algorithm will do close to n2 comparisons if not n2. This is because there’s a higher probability of there being more repeated characters in larger strings when compared to smaller strings.

How can we improve our first solution?




But, there must be an algorithm that has a better time order than O(n2), because this solution is just too simple – especially if asked in an interview setting. So, rather than asking ourselves how we can make the algorithm more efficient, let’s instead ask what specifically we can improve in the algorithm we currently have. Well, what were we doing in this algorithm, and why was it O(n2) ? One factor of the “n2” (or basically half of the time efficiency of the algorithm – which would be “n” in this case) is going through each and every character in the string to see if it is nonrepeated. Because that nonrepeated character can even be at the very end of the string, improving this part of the algorithm to improve efficiency does not seem possible. But, the other half of the algorithm is searching the entire string in order to find matches for each character. Now, that sounds like it can be made more efficient. By improving the search of this algorithm, we can improve the overall efficiency as well.

Data structures allow for more efficient searching

Any time you want to improve the search efficiency for a set of data, it’s always a good idea to put that data inside a data structure – because a data structure is meant for more efficient searching and organization of data. The search portion in our current algorithm accounts for O(n) time. How can we improve that time so that it’s less than O(n)? Well, what data structures are less than O(n)? There’s binary trees, which can be searched in O(log(n)). And there’s also arrays and hash tables – which both have constant time element lookup, meaning that the time to lookup something does not depend on the size of the input. Since that sounds promising, let’s try to use either an array or a hash table since they are good data structures to help with faster searches. This will hopefully reduce our time order complexity.

Using a hash table or an array to find the first nonrepeated character in a string

If we use an array or a hashtable, the question is what will we store in the index for the array, or the key for the hashtable? Well, because we need to be able to search the data structure by character, the most obvious choice for the key or index is the character. Because array indices have to be numeric, we could just convert a given character to an integer and then use it as an index. This means for ASCII strings we would have 128 possible values, and for Unicode strings we would have 65,536 possible values. But, what do we actually store inside the data structures? Because the problem asks us to find the first nonrepeated character, then why don’t we just store the number of times each character appears in the string. So, all we have to do is scan the string once and we can get a count for each of the characters that would tell us how many times each one appears in the string.

And once we have a count for all of the characters, then all we have to do is scan the data structure for the very first “1” that we encounter in the data structure. This would give us the first non-repeated character!

What’s the Big-O Notation of the new solution to finding the first nonrepeated character?

But, is the new algorithm actually an improvement over what we previously had? Well, let’s break it down: we scan the entire string once to build the data structure. And then, once we have built the data structure, we scan it to find the ‘1’ – which tells us that this is the first nonrepeated character. This means that in the worst case, there will be 2 ‘operations’ on each character in a given string, which would be 2n for a string of length n. In Big-O Notation, this is O(n), which is a lot more efficient than our previous algorithm which was O(n2).

Hash tables versus arrays




Now the question is whether we should use an array or a hashtable as our data structure? Let’s examine the differences between the two. Hashtables do have a higher lookup overhead when compared to arrays, so that’s one advantage of arrays. But, the biggest difference is in the memory that each data structure would require. A hash table would only need to store the characters that actually exist in the input string – so if a string contains the characters “abcdef”, then a hashtable would only need to store the characters in the string “abcdef”. An array, on the other hand, would need an element for every single possible value of a character. This is because with an array you can not skip indices – so we can not have an array with just 2 elements, where one index is 10 and one index is 99. That’s impossible – you would have to have elements at index 0 and 1 for a 2 element array. And, remember that we have to store the numeric values of the characters in the index position for this problem. This means that if we have a Unicode string, we would need to have 65,536 elements (assuming a 16 bit Unicode encoding) in our array to account for every possible character that could be in the string – whether or not it is actually in the string. This is because we simply do not know what is going to be in the string that’s being passed in – but with an array we have to be ready to accept all possible values. But for an ASCII string, an array would really only need 128 elements, because there are only 128 possible ASCII values.

Because of the memory requirements, arrays would be better when there are not many possible character values (like in an ASCII string) and when the strings are long (because hash tables have a higher lookup overhead than arrays). But, hashtables are much more efficient when strings are smaller in size or when there are lot of possible character values.

Let’s use a hashtable, just because most people are very familiar with arrays, but not so much with hashtables. Before we write the actual code, let’s write out the pseudocode. Here’s what it would look like:

Create a hash table data structure
that will store the count of each character

   Then, for each character in the string:
   If there is no value for that character in the
   hashtable then insert a '1'

   Otherwise, if there is a value for that character
   then increment the value by 1

After the first part is done, for each character in the
input string, look it up in the hashtable, if there is a '1',
then return that character because it will be the first
nonrepeating character in the string.
If none of the characters in the string have a count of 1,
then return null, meaning there are no nonrepeated
characters in the input string.

Example of using the Hashtable in Java:

If we implement the pseudocode above in Java this is what it would look like:


// returns the first nonrepeated Character in a string
public static Character findFirstNonRepeated( String input) {
// create a new hashtable:
Hashtable hashChar = new Hashtable( );

int j, strLength;
Character chr;
Integer intgr;

strLength = input.length( );

for (j =0; j < strLength; j++)
{
  chr = new Character(input.charAt( j ) );
  intgr = (Integer) hashChar.get(chr);
  if (intgr == null) {
     hashChar.put(chr, new Integer(1));
  }
  else
  {
    hashChar.put(chr, new Integer(intgr.intValue( ) + 1) );
  }
}

/* go through hashtable and search for the first 
    nonrepeated char:
*/
  
for(j = 0; j < length; j++)
{
  chr = new Character(input.charAt(j));
  if (((Integer) hashChar.get(chr)).intValue( ) == 1)
    return chr;
}
/* this only returns if the loop above doesn't find
   a nonrepeated character.
*/
return null;

}

}

We can actually improve the code above even more. Note that every time we put something into the hashtable, we are creating an object of type Integer. Why not just create 2 Object values which are also flags that represent 'one time', and 'more than one time' that will tell us how many times a given character appears in the string? This way, we don't have to create a new object every time we insert into the hashtable, which consumes a lot of memory unnecessarily. Here is what the code would look like after changing it to use flags instead of creating new objects:


// returns the first nonrepeated Character in a string
public static Character findFirstNonRepeated( String input) {
// create a new hashtable:
Hashtable hashChar = new Hashtable( );

int j, strLength;
Character chr;
Object oneTime = new Object( );
Object twoTimes = new Object( );

strLength = input.length( );

for (j =0; j < strLength; j++)
{
  chr = new Character(input.charAt( j ) );
  Object o = hashChar.get(chr);

  /*if there is no entry for that particular 
     character, then insert the oneTime flag:
  */
  if (o == null) {
     hashChar.put(chr, oneTime );
  }
  /*
   
  */
  else if (o == oneTime)
  {
    hashChar.put(chr, twoTimes);
  }
}

/* go through hashtable and search for the first 
    nonrepeated char:
*/
  
for(j = 0; j < length; j++)
{
  chr = new Character(input.charAt(j));
  if ( hashChar.get(c) == oneTime)
    return chr;
}
/* this only returns null if the loop above doesn't find
   a nonrepeated character in the hashtable
*/
return null;

}

}

And finally, we have our final solution. The lesson here is that using data structures can really help in making a solution to a programming problem more efficient - and is something you should always keep in mind.

Hiring? Job Hunting? Post a JOB or your RESUME on our JOB BOARD >>

Subscribe to our newsletter for more free interview questions.

  • Domenick D’Onofrio

    public static Character firstNonRepeat(String str) {
    char[] alphas = new char[26];
    boolean[] checked = new boolean[26];
    for (int i = 0; i < str.length(); ++i) {
    alphas[str.charAt(i) – 'a']++;
    }
    for (int i = 0; i easy to adapt this to ascii or extended ascii, but as the post describes using this implementation with unicode chars would be inefficient in terms of space.)

  • Keith Nicholas

    The first solution is not O(n^2 ), It may seem like that when you consider small strings. but actually, it is dependent on the number of different characters. When we are talking about ascii characters, then the first solution is going to be fine. Even in unicode, someone is going to have to go out of their way to construct a worse case string,

  • dotan

    “then all we have to do is scan the data structure for the very first “1” that we encounter in the data structure. This would give us the first non-repeated character!”

    in the array solution (bucket) this one is incorrect
    we will have to go over the string again as i see it
    first ‘1’ doesnt necessarily mean it is the first nonrepeated in the string
    we lose the input order when using a bucket

    Do I missing anything?

  • István Böhm

    Check the code. We read the string again, and check the value of the current character in the hashtable.

  • Jitendra Chahal

    public class FirstNonRepeatedStringChar {
    public static void main(String[] args) {
    String str = “aabbccdDefghijklmnKj”;
    for(int i=0; i<str.length(); i++)
    {
    boolean flag = true;
    char ch = str.charAt(i);
    for(int j =0; j<str.length(); j++)
    {
    if(i==j)
    continue;
    if(str.charAt(j)==ch)
    {
    flag = false;
    break;
    }
    }
    if(flag)
    {
    System.out.println(ch);
    break;
    }
    }
    }
    }

  • To The Point

    In the second for loop it is strLength not length, Thanks for the tutorial.

  • Holden

    Hash table do not preserve insertion order. how we can be sure that this character id the first non repeated one?

  • evvo

    A LinkedHashMap is more expensive to create and search than a simple array of integers (indexed by the char).

  • Victor

    this is O(n^2)…

  • Victor

    HashMap

    for some reason it didnt come out right in comment when there are no spaces around the “”

    Also, java.util.Hashtable is obsolete as of JDK 1.2.
    java.util.HashMap should be used instead.

  • Victor

    more concise method would be to do this:
    public static void main(String[] args)
    {
    Character c = getFirstDuplicate(“asfretegbdfsgdrgtdrgrad”);
    System.out.println((c == null ? “None” : c));
    }

    public static Character getFirstDuplicate(String str)
    {
    HashMap table = new HashMap();
    for (char c : str.toCharArray())
    {
    table.put(c, table.get(c) == null);
    }

    for (Character ch : str.toCharArray())
    {
    if (table.get(ch))
    return ch;
    }
    return null;
    }

  • Nishant

    public Character getFirstNonRepeatedChar(String input){
    for (int i=0;i<input.length();i++) {
    Character cc = input.charAt(i);
    if (input.lastIndexOf(cc) == input.indexOf(cc)){
    return cc;
    }
    }
    return null;
    }

  • Nishant

    public Character getFirstNonRepeatedChar(String input){
    for (int i=0;i<input.length();i++){
    Character cc = input.charAt(i);
    if (input.lastIndexOf(cc) == input.indexOf(cc)){
    return cc;
    }
    }
    return null;
    }

  • Take an empty integer array of 256 elements to hold the frequency of character of the string.

    Iterate through the string and construct the frequency array. Increment the current value of the frequency array by 1 at the location of the corresponding ASCII value of the character.

    Iterate through the string and check the frequency array. Return the first element whose frequency is 1.
    For explanation and code http://www.algoqueue.com/algoqueue/default/view/6881280/first-non-repeating-character-in-a-string

  • You can use a LinkedHashMap and loop through the entries in insertion order.

  • This is pretty bad…

  • adarsh

    Really impressive posts.Concepts are explained nicely

  • Muthu

    public class FindFirstNonRepeatedCharacter {

    public static void main(String[] args) {

    char name[] = "abbacstsrtrqmq".toCharArray();

    for (int i = 0; i < name.length; i++) {

    boolean uniqueChar = true;

    for (int j = 0; j < name.length; j++) {

    if (i != j && name[i] == name[j]) {

    uniqueChar = false;

    break;

    }

    }

    if (uniqueChar) {

    System.out.println(name[i]);

    break;

    }

    }

    }

    }

  • You're missing the fact that it isn't the keys of the Hashtable that it is iterating through, but just the input string a second time…the Hashtable doesn't need to maintain order, for the input string IS the order.

  • ThomasDalla

    That's why this is "2n" ("in the worst case, there will be 2 ‘operations’ on each character in a given string, which would be 2n for a string of length n"), you need to go through the list again to find out which one is the first one.
    Using a LinkedHashMap would allow to just retrieve the first unique character, if any.

  • mv

    HashTable will not maintain insertion order and hence the first '1' that you encounter in the HashTable might not be the first non-repeated character. Am I missing something?