This graph illustrates the frequency distribution of 2-grams (bigrams) in Turkish. The graph is composed of the top 500 2-grams. I have included the data for the top 100 2-grams below. Email me for a complete list.
Turkish is an agglutinative language and frequently uses suffixes. I expect the top 2-grams to correspond to the most frequently used suffixes. My expectation is confirmed by the data below. I will continue on this subject in another post.
Notes:
- Frequencies are computed from Hurriyet and Zaman newspapers using columnist articles between 2001 and 2011.
Data
Top 100 2-gram frequencies in Turkish
ar 2.196% ni 0.764% lı 0.564% rd 0.419%
la 2.006% ta 0.764% ha 0.554% ur 0.406%
an 1.933% ek 0.741% na 0.546% ru 0.402%
er 1.888% el 0.737% bu 0.545% iz 0.400%
in 1.734% ay 0.733% mi 0.544% ği 0.386%
le 1.727% et 0.712% at 0.540% ür 0.380%
de 1.539% iy 0.707% ad 0.525% nu 0.380%
en 1.350% ne 0.706% im 0.514% rl 0.375%
ın 1.336% ol 0.701% em 0.505% ey 0.374%
da 1.304% rı 0.686% nl 0.499% lm 0.372%
ya 1.188% nı 0.684% dı 0.494% iş 0.360%
ir 1.179% si 0.680% es 0.480% az 0.359%
ma 1.174% yo 0.677% ge 0.477% ce 0.358%
bi 1.107% ki 0.670% on 0.476% ık 0.350%
il 1.074% te 0.664% aş 0.472% be 0.349%
ka 1.021% am 0.650% ik 0.467% ul 0.338%
ra 0.974% sa 0.640% ıl 0.459% rk 0.330%
ri 0.951% ti 0.639% ed 0.450% ca 0.328%
ak 0.949% ye 0.638% tı 0.445% st 0.321%
nd 0.949% re 0.638% se 0.436% ld 0.319%
al 0.938% as 0.632% ün 0.435% du 0.313%
li 0.899% ba 0.628% is 0.432% lu 0.311%
di 0.860% ve 0.594% ke 0.430% ğı 0.309%
me 0.850% un 0.590% kl 0.428% gi 0.301%
or 0.815% sı 0.579% ır 0.424% mı 0.301%
Not The Little Things: Bigram Frequencies In Turkish >>>>> Download Now
ReplyDelete>>>>> Download Full
Not The Little Things: Bigram Frequencies In Turkish >>>>> Download LINK
>>>>> Download Now
Not The Little Things: Bigram Frequencies In Turkish >>>>> Download Full
>>>>> Download LINK ZY