Uiser:Illandancient/21t Scots Word Frequency

After detecting weaknesses with the SCOTS Scottish Corpus, I have tried to pull together my own corpus of contemporary Scots texts, that is only work published after the year 2000, from various different dialects.

Its not much of a corpus to be honest, only 195,000 or so words in total, but drawn from poetry, newspaper columns, books and Scots websites like Bella Caledonia and the Scots Language Centre. In terms of dialect word counts, there are 105,000 from Doric sources, 72,500 from Lallans, 5,643 from Orkney, 7,174 from Shetland and 4,827 Sourthern.

But I'm adding more texts as I find them

The word frequencies for the top 500 words are as follows:-

Word Total Occurrences Doric Lallans Orkney Shetland Southern
the 13206 7022 5463 376 49 296
an 5801 3348 1939 118 261 135
a 5668 3051 1965 179 221 252
o 4523 2300 2045 63 106 9
tae 3810 2390 1102 163 52 103
in 2887 1548 1095 87 81 76
wis 2347 1697 447 59 103 41
it 2018 1002 905 15 46 50
wi 1769 1016 634 13 96 10
that 1675 746 811 37 15 66
he 1579 872 597 56 53 1
at 1495 573 784 47 64 27
on 1295 730 459 42 47 17
her 1192 928 183 23 55 3
as 1136 633 433 33 30 7
is 1086 400 603 37 36 10
his 1059 804 177 36 40 2
be 1038 546 405 29 36 22
for 943 416 414 37 52 24
oot 881 588 194 37 35 27
i 865 505 198 19 142 1
day 811 696 82 7 16 10
ti 767 767
nae 758 572 153 2 22 9
fur 742 473 211 10 31 17
she 739 535 192 4 1 7
up 724 429 204 10 51 30
temp 718 718
hid 711 606 6 98 1
bit 705 567 36 36 40 26
and 700 125 435 76 12 52
ye 693 262 402 10 19
wes 666 666
ower 621 344 216 9 37 15
or 619 329 261 15 11 3
like 614 435 159 7 6 7
aboot 607 303 242 19 23 20
we 599 195 269 9 82 44
jonsar 588 588
this 574 291 251 22 2 8
frae 562 337 210 2 13
they 540 332 127 46 1 34
da 525 77 2 422 24
twa 493 370 102 9 12
sae 474 263 191 15 5
jist 464 356 100 1 7
him 460 250 158 15 33 4
doon 459 353 44 17 26 19
hae 451 262 166 14 8 1
mair 449 249 177 5 13 5
but 443 44 359 11 14 15
scots 423 67 354 2
fae 421 186 177 25 32 1
their 414 306 94 2 1 11
noo 385 298 55 18 12 2
fit 380 351 21 2 3 3
aye 378 202 143 12 12 9
fowk 376 204 171 1
fin 375 373 2
me 368 136 139 18 68 7
richt 366 246 120
eck 365 365
it's 365 233 99 7 26
aw 363 342 3 18
bi 362 165 189 3 5
aa 353 294 10 17 32
auld 351 210 129 3 3 6
you 347 303 29 3 8 4
them 342 255 63 21 1 2
affa 336 331 4 1
time 331 186 106 16 12 11
back 330 220 81 6 14 9
awa 327 221 91 6 9
weel 320 215 65 15 23 2
there 318 214 77 6 1 20
see 317 215 72 8 15 7
its 312 137 168 2 2 3
wid 311 248 19 24 16 4
fine 307 289 11 1 2 4
said 304 226 61 6 4 7
rainfall 303 303
nicht 299 216 83
oor 292 80 200 3 9
wee 283 214 56 13
win 283 263 18 2
ma 281 65 195 21
lang 280 156 109 5 8 2
hiz 277 277
dis 276 249 1 26
if 276 160 79 10 15 12
no 273 17 194 21 17 24
war 265 232 26 2 5
s 259 8 208 27 16
are 258 138 107 1 6 6
ah 255 3 224 28
been 255 123 103 11 5 13
rain 250 236 9 1 4
some 248 160 52 7 9 20
aff 247 151 70 8 10 8
pit 246 108 128 3 4 3
than 244 112 123 7 2
thocht 240 158 80 2
efter 239 158 62 5 9 5
ken 235 113 116 4 1 1
thai 227 2 225
wad 222 139 57 11 11 4
whit 222 6 176 25 12 3
cam 220 91 109 6 14
afore 213 139 45 10 12 7
ain 212 106 99 3 4
intae 212 139 62 1 4 6
kent 210 100 97 3 9 1
bot 207 207
minnie 207 206 1
scotland 207 33 174
ither 206 78 126 1 1
year 206 124 67 3 6 6
thon 203 99 103 1
mornin 201 197 2 2
were 201 118 61 3 6 13
ane 198 43 154 1
syne 197 96 101
ony 194 102 85 4 2 1
can 190 64 110 4 8 4
far 188 177 3 5 3
hame 187 118 58 6 5
cauld 185 158 22 1 4
sin 185 176 9
telt 185 113 65 4 3
yon 184 156 24 1 3
dry 181 169 2 6 4
again 179 127 36 3 9 4
een 179 145 16 6 12
gaed 178 109 67 2
yer 178 122 49 7
tho 177 90 83 4
heid 176 116 55 1 2 2
get 174 107 45 6 8 8
hed 173 135 10 18 10
hir 171 5 163 3
muckle 171 81 68 11 8 3
us 171 63 89 4 15
first 170 90 72 1 6 1
hoose 170 119 21 9 18 3
whan 170 158 7 5
hale 169 135 33 1
ae 165 131 24 3 7
maist 165 59 98 4 2 2
didna 163 59 100 4
shae 160 159 1
wye 159 144 8 7
will 157 66 82 2 4 3
did 154 93 48 2 11
of 154 86 44 5 14 5
bin 153 148 4 1
set 153 65 65 8 6 9
thair 151 150 1
gaun 147 78 62 7
new 146 73 49 6 5 13
thaim 144 141 3
bein 143 75 61 2 2 3
o' 143 22 2 78 41
gie 142 70 66 1 3 2
man 142 50 63 5 12 12
fair 141 107 25 3 6
got 141 85 37 9 4 6
mak 141 45 82 7 7
fer 140 138 1 1
mither 140 89 44 7
come 139 81 34 5 14 5
mirren 139 139
say 138 79 52 4 2 1
says 138 81 28 3 19 7
anither 137 60 74 1 1 1
maun 137 59 73 5
micht 137 75 60 2
ere's 136 136
three 136 120 14 1 1
roon 134 128 2 4
guid 133 26 84 9 3 11
here 132 70 44 6 7 5
still 132 76 31 4 11 10
sun 132 20 109 3
nor 131 48 79 1 3
gin 130 52 70 8
my 130 117 3 1 8 1
mony 129 53 68 8
made 128 74 47 1 2 4
baith 127 55 59 11 2
en 127 108 17 2
feel 127 96 23 2 4 2
wey 127 10 107 7 3
could 126 62 25 9 18 12
seen 124 78 34 4 1 7
sunday 123 115 2 6
tammas 121 1 120
days 120 68 47 3 1 1
doun 120 3 114 3
warld 119 37 82
when 119 9 90 3 2 15
tak 118 43 61 11 3
leid 117 30 87
faither 115 82 31 2
sna 115 115
by 114 37 72 2 3
fa 114 114
friday 114 110 4
atween 113 54 47 9 1 2
dae 113 34 72 7
gey 113 80 29 1 3
throu 113 2 111
door 111 89 16 4 2
haes 111 103 5 3
till 111 93 8 5 4 1
monday 110 110
bide 109 99 3 3 2 2
gang 109 72 33 4
near 109 95 10 3 1
so 109 76 9 2 17 5
ta 109 2 107
ere 108 108
think 108 64 41 2 1
coorse 106 79 21 6
he'd 106 91 14 1
thursday 106 105 1
wir 106 22 35 13 36
dinna 105 63 42
inno 105 105
tuesday 105 105
ben 104 46 57 1
john 104 12 92
saturday 104 104
whaur 104 1 100 3
wifie 104 102 2
air 103 92 7 2 1 1
bairns 103 63 33 3 4
out 103 7 96
div 102 99 3
wednesday 102 102
hiv 101 74 25 2
life 101 35 49 3 7 7
m 101 2 80 14 5
bonny 100 95 2 3
to 99 40 29 7 20 3
comes 98 76 13 3 5 1
dee 98 79 7 1 11
last 98 56 33 2 5 2
even 97 46 44 1 1 5
then 97 54 31 5 7
thing 97 52 40 2 3
wither 96 96
i'm 95 71 18 3 2 1
better 94 66 19 3 5 1
brian 94 94
gied 93 21 64 2 2 4
mind 93 49 31 2 4 7
nou 93 92 1
referendum 93 93
left 92 57 17 4 10 4
she'd 92 79 13
weet 92 84 1 2 5
face 91 55 19 6 10 1
jock 91 80 6 5
sic 91 31 57 3
niver 90 68 17 2 2 1
went 90 68 12 10
at's 89 87 2
braw 89 69 20
lik 89 2 70 14 3
gweed 88 88
had 88 11 70 4 2 1
tell 88 46 25 2 10 5
that's 88 23 54 5 6
tuik 88 58 30
fir 87 21 30 24 7 5
sair 86 35 48 3
yin 86 4 73 2 7
cood 85 85
haed 85 68 3 14
cud 84 80 4
i'll 84 76 5 3
let 84 52 26 4 2
throo 84 84
wisna 84 44 30 1 9
things 83 42 32 5 3 1
was 83 23 20 2 1 37
place 82 41 35 5 1
same 82 39 35 3 5
hard 81 46 23 4 5 3
bene 80 80
eence 79 78 1
meg 79 76 1 2
road 79 59 10 2 8
wirds 79 18 49 6 6
an' 78 24 50 4
didnae 78 46 27 5
isie 78 78
keep 78 55 13 4 3 3
licht 78 43 30 5
a'a 77 77
north 77 61 13 2 1
thir 77 3 34 40
took 77 36 23 2 11 5
winter 77 74 1 1 1
anely 76 57 19
blaw 76 75 1
turns 76 69 7
yird 76 19 57
aince 75 21 52 2
hit 75 21 6 46 2
bruce 74 71 3
canna 74 41 23 2 8
foo 74 71 3
kirk 74 50 22 2
wark 74 23 34 13 4
eneuch 73 43 30
siller 73 40 29 4
alang 72 28 39 3 2
rise 72 69 3
years 72 36 22 2 5 7
best 71 35 24 2 8 2
came 71 59 4 8
grun 71 64 7
side 71 46 12 4 5 4
government 70 14 56
warm 70 61 2 2 5
wisnae 70 57 9 1 3
folk 69 8 29 7 5 20
heard 69 40 19 2 3 5
frost 68 68
look 68 56 6 2 4
sooth 68 48 12 3 4 1
about 67 3 63 1
bairn 66 52 12 2
gien 66 36 28 2
men 66 30 26 3 7
watter 66 39 21 2 4
bed 65 37 22 5 1
gran 65 64 1
minnie's 65 65
thegither 65 29 36
til 65 1 63 1
we're 65 19 44 1 1
wife 65 53 8 2 2
body 64 39 21 4
pairt 64 20 40 2 2
wrang 64 42 18 4
haun 63 29 32 2
there's 63 20 35 1 7
your 63 52 6 3 2
anent 62 1 61
bricht 62 51 11
cuid 62 62
himsel 62 48 12 1 1
language 62 7 52 1 2
makkin 62 33 26 3
mebbe 62 51 10 1
na 62 54 7 1
need 62 34 23 1 2 2
sat 62 55 5 2
wha 62 6 48 7 1
wull 62 52 8 2
ay 61 53 8
bare 61 57 2 2
big 61 32 13 2 4 10
haud 61 26 33 1 1
onie 61 61
tap 61 45 10 1 3 2
ten 61 47 8 5 1
ah'm 60 41 3 14 2
haein 60 43 14 1 2
lad 60 40 15 3 2
mannie 60 39 16 5
ooto 60 60
past 60 40 14 1 2 3
r 60 2 58
times 60 34 23 1 2
bonnie 59 34 14 7 4
dull 59 58 1
naebody 59 29 26 3 1
sayin 59 27 32
sea 59 32 4 18 5
wesna 59 59
young 59 45 5 3 2 4
doric 58 52 6
fower 58 43 15
inti 58 58
toon 58 46 2 4 6
yet 58 29 21 4 4
ava 57 52 3 1 1
bittie 57 40 15 2
comin 57 30 16 10 1
fermer 57 51 6
help 57 45 4 4 4
neist 57 41 16
nivver 57 35 12 1 9
speired 57 38 17 2
wan 57 12 20 6 16 3
deid 56 30 25 1
dow 56 56
forrit 56 18 38
pairlament 56 56
room 56 48 5 1 2
taen 56 31 19 1 5
whin 56 2 4 22 28
yeir 56 56
car 55 52 3
eftir 55 7 47 1
english 55 6 46 2 1
fand 55 33 21 1
five 55 41 9 1 4
brocht 54 19 34 1
ey 54 54
feet 54 35 11 3 3 2
great 54 47 4 1 2
lay 54 41 11 2
sma 54 25 28 1
verra 54 26 28
while 54 16 28 5 3 2
claes 53 36 12 4 1
d 53 46 1 6
hair 53 40 7 1 5
hauns 53 31 22
het 53 44 9
human 53 20 31 1 1
much 53 28 18 2 5
oh 53 45 5 1 2
steenhillock 53 53
ilka 52 4 48
intil 52 7 45
neil 52 51 1
sicht 52 39 13
ahin 51 42 9
asks 51 51
green 51 40 4 3 4
grey 51 44 6 1
makk 51 50 1
naethin 51 33 16 2
wird 51 24 23 2 2
douglas 50 50
fell 50 30 16 1 2 1
fire 50 39 5 6
hes 50 46 4
leddrach 50 50
nur 50 47 3
rid 50 3 45 2
shoor 50 50
want 50 11 27 5 2 5
awaw 49 49
em 49 49
horse 49 49
ice 49 16 31 2
itsel 49 22 27
juist 49 29 4 5 11
sister 49 23 18 8
unner 49 10 38 1
wae 49 1 32 16
cryed 48 48
has 48 16 32
jenny 48 48
k 48 3 45
roch 48 44 4
saw 48 22 16 7 2 1
t 48 20 21 7
wuid 48 48
amang 47 11 34 2
cut 47 34 7 2 4
end 47 19 23 2 3
fish 47 17 6 11 1 12
hear 47 23 12 5 2 5
kintra 47 17 30
loon 47 46 1
maks 47 22 23 2
matty 47 47
naething 47 35 11 1
och 47 43 4
park 47 40 3 2 2
sky 47 38 4 3 2
turned 47 44 1 2
bad 46 25 18 1 2
black 46 28 11 1 5 1
dinnae 46 8 36 1 1
felt 46 14 27 1 3 1
god 46 25 19 2
held 46 21 22 2 1
ilkie 46 46
job 46 28 11 1 5 1
kind 46 18 26 1 1
quine 46 46
seemed 46 26 13 1 2 4
blue 45 30 10 5
fear 45 34 10 1
ferm 45 39 6
gettin 45 28 11 2 2 2
hersel 45 29 13 2 1
oan 45 7 5 33
puir 45 9 35 1
simmer 45 42 2 1
takk 45 45
try 45 38 4 2 1
afoir 44 44

Sources

eedit

The texts in the corpus have come from a variety of sources and authors, including but not limited to: