Extract Substring from a String in R using substring()
In R, you can extract a substring from a string using the base R substring() function. This function is useful when you know the starting and ending character positions of the text you want to extract.
R string positions are 1-based. This means the first character is at position 1, the second character is at position 2, and so on. The ending position is included in the extracted result.
Syntax of substring() in R
The syntax of R substring function is
substring(c, first, last)
where :
- c is the string
- first is the starting position of substring (in the main string) to be extracted
- last is the ending position of substring (in the main string) to be extracted. last is optional, in which case last is taken as whole length of the string.
The value returned by substring() is a character vector. If you pass a single string, the result is usually one extracted string. If you pass multiple strings, R applies the extraction to each element.
Example 1 – Get Substring from a String in R
In this example, we will take a string, and find its substring given first and last positions.
r_strings_substring.R
# Example R program to find substring
str = 'Hello World! Welcome to learning R programming language'
#substring(c, first, last)
subStr = substring(str, 13, 23)
print (subStr)
Output
$ Rscript r_strings_substring.R
[1] " Welcome to"
Here, the substring starts from character position 13 and ends at character position 23. Because R includes both positions, the extracted text contains all characters from position 13 through position 23.
Example 2 – Get Substring from a String in R – Without “last”
In this example, we will take a string, and find its substring given only first position. If last position is given, end of the string is considered as last.
r_strings_substring.R
# Example R program to find substring
str = 'Hello World! Welcome to learning R programming language'
#substring(c, first, last=nchar(c))
subStr = substring(str, 13)
print (subStr)
Output
$ Rscript r_strings_substring.R
[1] " Welcome to learning R programming language"
Since the last argument is not supplied, R extracts characters from position 13 to the end of the string.
Extract First N Characters from a String in R
To extract the first few characters of a string, set first to 1 and set last to the number of characters required.
Example
str <- "TutorialKart"
result <- substring(str, 1, 8)
print(result)
Output
[1] "Tutorial"
Extract Last N Characters from a String in R
To extract characters from the end of a string, use nchar() to find the string length. Then calculate the starting position from that length.
Example
str <- "TutorialKart"
last_three <- substring(str, nchar(str) - 2, nchar(str))
print(last_three)
Output
[1] "art"
In this example, nchar(str) returns the length of the string. The expression nchar(str) - 2 gives the starting position for the last three characters.
Extract Substrings from Multiple Strings in R
The substring() function can also work with a character vector. In the following example, the same character positions are extracted from each string in the vector.
names <- c("RLanguage", "PythonCode", "JavaScript")
result <- substring(names, 1, 4)
print(result)
Output
[1] "RLan" "Pyth" "Java"
This is useful when the same substring position must be extracted from many values, such as fixed-width codes, prefixes, or labels.
Extract Different Character Ranges from the Same String
You may also pass vectors to the first and last arguments. This lets you extract more than one character range from the same string.
str <- "ABCDEFGHIJ"
result <- substring(str, first = c(1, 4, 7), last = c(3, 6, 10))
print(result)
Output
[1] "ABC" "DEF" "GHIJ"
The first substring is taken from positions 1 to 3, the second from positions 4 to 6, and the third from positions 7 to 10.
substring() vs substr() in R
Base R also provides substr(). Both substring() and substr() can extract text using character positions. The commonly used syntax for substr() is:
substr(x, start, stop)
For simple extraction with known start and end positions, the two functions often look similar in use.
str <- "R Programming"
print(substring(str, 3, 13))
print(substr(str, 3, 13))
Output
[1] "Programming"
[1] "Programming"
Use substring() when you prefer its argument names first and last. Use substr() when you prefer the shorter base R function name with start and stop.
Extract Substring by Pattern in R
The substring() function is position-based. If you need to extract text based on a pattern, use regular expression functions such as regexpr() and regmatches(), or use a string package function designed for pattern matching.
Example: extract the number from a string
str <- "Order ID: 4589"
match_position <- regexpr("[0-9]+", str)
result <- regmatches(str, match_position)
print(result)
Output
[1] "4589"
Choose substring() when character positions are known. Choose regular expressions when the required text changes position but follows a recognizable pattern.
Common Mistakes while Extracting Substrings in R
- Starting from zero: R uses position
1for the first character, not position0. - Forgetting that last is included:
substring(str, 1, 4)extracts four characters, not three. - Using substring() for unknown positions: If the substring must be found by a pattern, use regular expressions instead.
- Ignoring spaces: Spaces count as characters, so they affect the first and last positions.
- Not checking string length: Use
nchar()when extracting text from the end of a string.
Frequently Asked Questions on R substring()
How do I extract a substring from a string in R?
Use substring(string, first, last). The first argument gives the starting character position, and last gives the ending character position.
Does R substring() start counting from 0 or 1?
R starts counting string positions from 1. The first character of a string is at position 1.
How do I extract text from a position to the end of a string in R?
Omit the last argument. For example, substring(str, 5) extracts text from position 5 to the end of str.
How do I extract the last characters of a string in R?
Use nchar() to calculate the string length. For example, substring(str, nchar(str) - 2, nchar(str)) extracts the last three characters.
Should I use substring() or regular expressions in R?
Use substring() when you know the exact character positions. Use regular expressions when the text must be extracted based on a pattern.
QA Checklist for R Substring Examples
- Confirm that every
substring()example uses 1-based character positions. - Check that the expected output includes the ending position supplied in
last. - Verify that spaces inside strings are counted as characters.
- Use
nchar()when explaining examples that extract characters from the end. - Use regex examples only for pattern-based extraction, not for simple fixed-position substrings.
Conclusion: Extract Substring from a String in R
In this R Tutorial, we have learnt to find Substring from a String in R programming language using substring() function.
Use substring() for fixed character positions, nchar() when working from the end of a string, and regular expressions when the required substring must be found by a pattern.
TutorialKart.com