Matlab | R

Replace substring

replace all characters "a" with "x"

"a1 a2 a3 a4".replace("a","x")

   x1 x2 x3 x4

replace file ending ".txt" with ".csv"

filename    = "mydata.txt"

newFilename = filename.replace(".txt",".csv")

  mydata.csv

remove file ending ".txt"

filename = "mydata.txt"

rawname  = filename.replace(".txt","")

  mydata

Split and merge text strings 

split

>>>   'a1,a2,a3'.split(',') 

['a1', 'a2', 'a3']

get filename (get first list element)

>>>  'filename.txt.bz2'.split('.')[0] 

'filename'

split() by default used all whitespace characters (space, tab \t, new line \n, \r, ...)


join  -  to concatenate a list to a string

>>> ','.join(['a1', 'a2', 'a3'])  #  comma separated

'a1,a2,a3'

>>> ' '.join(['a1', 'a2', 'a3'])space separated

'a1 a2 a3'

Or, simply using  +  for concatenating 2 or 3 words . For many words, better use  join() 

>>>  'a1' + ' ' + 'a2' + ' ' + 'a3' 

'a1 a2 a3'


# add prefix, only if not already present

s = 'hello world'

prefix = 'hello'

if not s.startswith(prefix):

    s = prefix + ' ' + s

'hello world'


Sort and convert text strings 

length of string

>>>  len('hello') 

5

sorted list

>>>  sorted(['C', 'b', 'd','A'], key=str.lower) 

['A', 'b', 'C', 'd']

sort list in descending order

>>>  sorted([4, 1, 3, 2], reverse=True) 

[4, 3, 2, 1]


convert strings to numbers (and back as string)

>>>   int('23') 

23

>>>  float('23.1') 

23.1

>>>  str(23) 

'23'

see also: format()

format('hello','>20')

'               hello'

convert all strings in a list into int numbers

s = ['4', '1', '3', '2']

n = [int(x) for x in s]

[4, 1, 3, 2]

see also: List Comprehension

Search for substring

Check presence of substring

'day' in 'Friday' 

True

get index location of substring

'Friday'.index('day')

3

find all locations of a substring (e.g., all positions of letter 'l' in word 'hello' as 0-based-indices )

s='hello'

[idx for idx, letter in enumerate(s) if letter == 'l']

[2, 3]

check for any substring present in a string (e.g., in filename) 

filename = 'sample.fastq.gz'

any(seqType in filename for seqType in ['.fasta','.fa','.fastq','.fq'])

True


Lists

find a substring in list items

dates = ['May 2015','January 2012','Dezember 2015','June 2014']

dates2015= [s for s in dates if '2015' in s]

['May 2015', 'Dezember 2015']

check if a substring is present any list item

dates= ['May 2015','January 2012','Dezember 2015','June 2014']

if any('2015' in s for s in dates):

    print('yes, 2015 is included')

read more:

→ Regular expression operations

→ Sorting (wiki.python.org)