Strings

Matlab | R

Replace substring

replace all characters "a" with "x"

"a1 a2 a3 a4".replace("a","x")

x1 x2 x3 x4

replace file ending ".txt" with ".csv"

filename = "mydata.txt"

newFilename = filename.replace(".txt",".csv")

mydata.csv

remove file ending ".txt"

filename = "mydata.txt"

rawname = filename.replace(".txt","")

mydata

Split and merge text strings

split

>>> 'a1,a2,a3'.split(',')

['a1', 'a2', 'a3']

get filename (get first list element)

>>> 'filename.txt.bz2'.split('.')[0]

'filename'

split() by default used all whitespace characters (space, tab \t, new line \n, \r, ...)


join - to concatenate a list to a string

>>> ','.join(['a1', 'a2', 'a3']) # comma separated

'a1,a2,a3'

>>> ' '.join(['a1', 'a2', 'a3']) # space separated

'a1 a2 a3'

Or, simply using + for concatenating 2 or 3 words . For many words, better use join()

>>> 'a1' + ' ' + 'a2' + ' ' + 'a3'

'a1 a2 a3'


# add prefix, only if not already present

s = 'hello world'

prefix = 'hello'

if not s.startswith(prefix):

s = prefix + ' ' + s

'hello world'


Sort and convert text strings

length of string

>>> len('hello')

5

sorted list

>>> sorted(['C', 'b', 'd','A'], key=str.lower)

['A', 'b', 'C', 'd']

sort list in descending order

>>> sorted([4, 1, 3, 2], reverse=True)

[4, 3, 2, 1]


convert strings to numbers (and back as string)

>>> int('23')

23

>>> float('23.1')

23.1

>>> str(23)

'23'

see also: format()

format('hello','>20')

' hello'

convert all strings in a list into int numbers

s = ['4', '1', '3', '2']

n = [int(x) for x in s]

[4, 1, 3, 2]

see also: List Comprehension

Search for substring

Check presence of substring

'day' in 'Friday'

True

get index location of substring

'Friday'.index('day')

3

find all locations of a substring (e.g., all positions of letter 'l' in word 'hello' as 0-based-indices )

s='hello'

[idx for idx, letter in enumerate(s) if letter == 'l']

[2, 3]

check for any substring present in a string (e.g., in filename)

filename = 'sample.fastq.gz'

any(seqType in filename for seqType in ['.fasta','.fa','.fastq','.fq'])

True


Lists

find a substring in list items

dates = ['May 2015','January 2012','Dezember 2015','June 2014']

dates2015= [s for s in dates if '2015' in s]

['May 2015', 'Dezember 2015']

check if a substring is present any list item

dates= ['May 2015','January 2012','Dezember 2015','June 2014']

if any('2015' in s for s in dates):

print('yes, 2015 is included')

read more:

→ Regular expression operations

→ Sorting (wiki.python.org)