Longest Substring Without Repeating Characters - Leetcode #3 Short & Simple Solution
Updated: Jun 3, 2022
Problem Statement
In our previous article we solved the two sum problem. This is another article in the series leetcode problem solutions and this article is a solution to leetcode 3 problem.
For a given input string s, return the length of the longest substring in s without repeating characters. s consists of English letters and digits.
Example
Example 1:
Input: s = "abcabcbb"
Output: 3
Example 2:
Input: s = "bbbbb"
Output: 1
Example 3:
Input: s = "pwwkew"
Output: 3
Solution
Let us try to understand the problem statement first. This is a pretty straightforward problem if you know what a substring is for a given string.
A substring a continuous sequence of characters in a given string. For example, the substrings of string "abcd" are: "a", "ab", "abc", "abcd", "b", "bc", "bcd", "c", "cd" and "d". If the characters are not continuous it is not considered a substring. "bd", "ad", "acd" etc. are not substrings because the characters are not continuous as compared to the original string s.
Note: Substring is different from a subsequence which may not be consecutive in nature.
Now that we know what a substring is, we need to write an algorithm that returns the length of the longest substring in string s. But wait, remember we cannot just return any substring that is longest as clearly mentioned in the problem statement, we are only allowed to consider substrings without repeating characters. "Substring without repeating characters" is nothing but a substring containing all unique characters, there should not be any duplicate characters/letters in the substring.
Now that the problem statement is clear, lets see what approach we can use to solve it:
Solution 1: Naive Approach
The most simple and straightforward solution to this problem is to check every possible substring for the given string s and return the length of the longest substring without repeating characters.
This algorithm involves the following steps:
For the given string s, generate all possible substrings. We can generate all possible substrings by using two loops. Fix the outer loop and move the inner loop (second loop) index to generate all possible substrings (refer code for more details).
As and when you obtain the substrings using the loops (described in step 1), you can check if the current substring contains only distinct characters or has duplicates. We can do this by going trough all the characters of the obtained substring using another loop and adding it to a hashmap.
During the iteration of the third loop, if you encounter any character that is already in the hashmap, the obtained substring contains duplicate characters. Otherwise all the characters in the substring are unique.
If the substring contains duplicate characters, skip the current substring, do not consider it for calculating the result.
If the substring contains all unique characters, check if the length of the current substring is greater than the length of all the unique character substrings obtained so far. If yes, update the result, else move on to the next substring and repeat steps 1-5.
The running time complexity of this algorithm O(n^3).
Code
Language: Go
Complexity Analysis
Time Complexity: O(n^3)
Time complexity is O(n^2) for finding the substrings using first and second loops and O(n) to find out if it is contains non repeating characters. So the total time complexity is O(n) * O(n^2) = O(n^3).
Space Complexity: O(1)
You might ask, we are making use of a hashmap to store characters of string, so how is the space complexity O(1). If you look at the problem statement carefully, it clearly states the given input string can contain only English letters and digits. Therefore the space needed to store these characters in a hashmap is constant i.e. 26+26 for upper and lower case English letters and 10 for digits. So in the worst case the space needed to store this in a hashmap is O(26+26+10) = O(62), which in general can be represented as O(1).
This approach can be optimized to improve the time complexity to O(n^2) which is better than our earlier time complexity of O(n^3), but can we do better?
Want to master coding? Looking to learn new skills and crack interviews? We recommend you to explore these tailor made courses:
Solution 2: Sliding Window Technique
This problem can be efficiently solved by using the two pointer sliding window technique. We slide the window forward each time we find a duplicate character. In this approach we need to process each character in the given string only once to find out the result.
For a given input string s this algorithm does the following steps:
We take two variables start and end to indicate the start and end of the window. The string between the start and end is our current substring. Initially both start and end point to the 0th index (1st character) of the string s, i.e. start = 0 and end = 0. Also create a result variable to store the required result.
Next we create a hashmap which helps us to efficiently check if the current substring contains any duplicate characters. Lets name it charIndexMap. Key to this hashmap is the current character and value of this hashmap is the position of the character represented by the hashmap key in the given string s.
In each iteration we check if the current character at end index is present in the charIndexMap or not.
If the character is not present, add it to the hashmap along with the index.
If the character is present in the map already, it means that the current character is a duplicate and we should not be considering the substring with this character while calculating the result. So, take the length of the substring from the start of the window until the previous character. If the current length is greater than the result, update the result. Also since we have found a duplicate character, we need to update the start index. Another important step here is to remove all characters from the previous window from our hashmap (since we have moved our window by updating the start variable). Finally add the current character to our hashmap along with its index.
How do we update the start variable once a duplicate character is found?
Remember along with storing the characters in hashmap, we also store the index of the character in the string as hashmap value. When we get a duplicate character during our iteration what does this value in hashmap mean? It simply means that the current duplicate character was earlier found at this index represented by the hashmap value (first occurrence). Using this info we discard all the characters before the first occurrence of the duplicate including the first occurrence (discard all characters from previous window). So our new start value will be start = map[current character] + 1.
Note: Character literals are represented by uint8 or rune datatype in go.
Simulation
Lets see the working of the above algorithm with an example:
Consider s = "abcabcbb" is the given string. Initially start = 0 and end = 0 and our hashmap is empty as shown in the figure below. Also we initialize result = 0.
Now we check if s[end] = 'a' is present in our hashmap. Obviously 'a' is not present in hashmap, so we add 'a' as hashmap key and 0 as hashmap value indicating 'a' is found at 0th position in string (index based position). Same is shown in figure below:
Next we increment end pointer.
Again check if s[end] = 'b' is present in map. As 'b' is not present we add it to hashmap as shown in diagram below.
Increment the end pointer, s[end] = 'c', and since it is not in map we add it to our hashmap.
Increment end again. Now s[end] = 'a'.
Now s[end]='a' is present in map, which means it is a duplicate character. Since we are looking only for substrings without repeating characters, we cannot consider the current substring "abca" for our result, so our valid substring is only till the previous character, i.e "abc".
Since we have found a duplicate the next step is to update result. result = max(result , length of substring from index 0 to 2) = max(result , length of substring "abc") = max (3,3) = 3.
Also since the current character is a duplicate, we need to update out start pointer. We update start as start = first occurrence of duplicate + 1. We can get the first occurrence of duplicate from our hashmap. So start = map['a'] + 1 = 0+1 = 1.
Next, remove all the characters before our new start index and previous start index from hashmap (in this case our previous start index was 0 and new start index which we calculated a just before this step is 1) since they are no longer in our new window, so we remove 'a' from hashmap.
Add the current character 'a' to map along with its index to hashmap. Now our diagram looks as shown below:
Increment end pointer, s[end] = 'b'.
Again 'b' is present in map. Update result = max(result , length of substring from index 1 to 3) = max(result , length of substring "bca") = max(3,3) = 3.
Delete previous window characters, i.e. delete 'b' from hashmap.
Update start = map['b']+1 = 1 + 1 = 2 and update hashmap, map['b'] = 4 for current character.
Again we increment end pointer, end = 5 which is character 'c' which is a duplicate.
We repeat the steps of calculate result, removing previous window characters from map, updating the start pointer and adding the current character to map.
result = max(result , length of substring from index 2 to 4) = max(result , length of substring "cab") = max(3,3) = 3.
start = map['c'] + 1 = 2+1 = 3.
Remove character 'c':2 from previous window.
Add the current character 'c': 5 to our hashmap. Now our diagram looks as below:
Increment end pointer again, now end = 6, which is character 'b'. 'b' is also present in our map(we found it earlier at 4 index in our string s).
result = max(result , length of substring from index 3 to 5) = max(result , length of substring "abc") = max(3,3) = 3.
start = map['b'] + 1 = 4+1 = 5 and remove all characters before our new start ('c':5, 'a':3 and 'b':4) from our map. Also we update the current character 'b':6 into map.
Finally we increment end again, our new end = 7. Again 'b' is present in hashmap.
result = max(result , length of substring from index 5 to 6) = max(result , length of substring "cb") = max(3,2) = 3 and start = map['b'] + 1 = 6+1 = 7. Add 'b':7 to map.
The end has now reached the end of the string s, so we terminate the loop and return the result.
Note: The order in which elements are stored in the map does not matter.
Code
Language: Go
Complexity Analysis
Time Complexity: O(n)
We check each character in the string only once. Yes we would also have to delete the elements from the hashmap once a duplicate character is found, but this is a constant time operation because at max the map can store only 62 elements in our case before a duplicate is found.
Space Complexity: O(1)
In the worst case we would have to store all n characters of the string s in hashmap, but this is a constant value which at max can go up to O(62) as explained in solution 1.
That is all for this article, thank you for taking your time to read this. If you have any questions or doubts, please let us know in the comments section below, we will be happy to answer you.
If you found this article useful, do not forget to subscribe to our website, your support motivates us to bring out more such articles in future (scroll down to the bottom of the page to find the subscription form).
You can explore more such amazing articles from code recipe in our blogs section.
Code Recipe Limited Time Offer: Get 100% discount on Code Recipe Membership Plan. Join now and get exclusive access to premium content for free. Hurry! Offer only available for a limited time. Join now.
Hello Everyone,
Code Recipe is now on YouTube! For videos on latest topic visit our YouTube channel: Code Recipe
https://www.youtube.com/channel/UC9qXo8tTfbXLVQFbc93fiBg
Do not forget to subscribe to our channel if you find the videos useful. Your support means a lot to us!
Happy Learning. Ba bye! 😊
Hey This part here: "Another important step here is to remove all characters from the previous window from our hashmap.." is misleading I think, since (as you explain later in the article) we only have to remove the characters from the window that are leading up to 'start' (including start), and not all characters from the window. Just like you said: "Using this info we discard all the characters before the first occurrence of the duplicate including the first occurrence". Cheers, Rayaqin
Hi Code Recipe, so I'm curious how this will work in a situation like this: "wapwkew" the longest substring is "apwke", and from the explanation, I don't get how this will get "apwke". Kindly explain. Thanks
implementation with c++
class Solution {
public:
int ct[266];
bool isok(){
for(int i=0;i<266;i++){
if(ct[i]>1){
return false;
}
}
return true;
}
int lengthOfLongestSubstring(string s) {
for(int i=0;i<266;i++){
ct[i]=0;
}
int ans=0,lt=0,rt=0,n=s.size();
while(lt<=rt&&rt<n){
if(isok()){
ct[s[rt]]++;
ans=max(ans,rt-lt);
rt++;
}else{
ct[s[lt++]]--;
}
}
if(isok())
ans=max(ans,rt-lt);
return ans;
}
};
Rust implementation (for someone care =)) ) use std::cmp::max;
use std::collections::HashMap;
fn longest_substring_without_repeat_character(str: String) -> i32 {
let mut seen = HashMap::new();
let mut longest = 0_i32;
let mut left = 0_usize;
for (idx, c) in str.char_indices() {
match seen.insert(c, idx) {
Some(found_index) => {
if found_index >= left {
left = found_index + 1;
} else {
longest = max(longest, idx as i32 - left as i32 + 1);
}
}
None => longest = max(longest, idx as i32 - left as i32 + 1),
}
}
longest
}