Finding character frequency using Python – Dictionary

Similar to word frequency there is sometimes a need to find the character frequency in a text file containing corpora. So using below script we can just do that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#open a file and find its character frequency
with open('raw_corpus.txt') as fp:
    lines = fp.read().split("n")   #here lines contains entire file contents

#incremental variable
i=1;

#dictionary to save characters as keys and values as fruquency
charcount={}


#to access file contents line by line
for line in lines:

    #convert to lowercase
    lower_line = line.lower()

    chars = lower_line

    #for loop to access current line characters 
    for char in chars:
        if char not in charcount:
            charcount[char] = 1
        else:
            charcount[char] += 1
    
    #print (i,"t",lower_line)
    
    i = i + 1                 #increment i 


#print the dictionary with sorted keys(tokens) and values
for k in sorted(charcount):
    print (k, charcount[k])

This script will open the file ‘raw_corpus.txt’ read its contents line by line, then find each character frequency and store in dictionary.

Dictionary in Python is similar to hashes in Perl. It stores a values for each corresponding key, duplicate keys are overridden when a same key is encountered while storing.

Leave a Reply

Your email address will not be published. Required fields are marked *