<h2 id="heading-what-i-did-today">📌 What I did today</h2>
<h3 id="heading-the-necessity-of-bert-as-a-learning-model">The necessity of BERT as a learning model</h3>
<p>BERT is a learning model that takes an existing model and fine-tunes it to fit our needs. Fine-tuning is a technique where a pretrained model is updated to align with downstream tasks. Models trained with domain-specific data tend to perform better.</p>
<h3 id="heading-selection-of-bert-huggingface-transformers-model">Selection of BERT huggingface transformers model</h3>
<ol>
<li><p>bert-base-multilingual-cased bert-base-multilingual-cased is a model specialized in over 100 languages, but it doesn't seem to have extensive training on the Korean language. The total number of Korean vocabulary it was trained on is 119,546.</p>
<p> <img src="https://velog.velcdn.com/images/c1madang/post/fce363f1-8332-4d29-80f2-ccab0263f65c/image.png" alt /></p>
</li>
</ol>
<ul>
<li><p>When using the raw tokenizer of BERT:</p>
<p>  <img src="https://velog.velcdn.com/images/c1madang/post/2e864395-da5e-48a5-814c-a458c8035e6c/image.png" alt /></p>
</li>
</ul>
<ol>
<li>KoBERT A model that overcomes the limitations of Google's bert-base-multilingual-cased for Korean language performance.</li>
</ol>
<ul>
<li>Reference GitHub repositories: <a target="_blank" href="https://github.com/SKTBrain/KoBERT/tree/master/kobert_hf">https://github.com/SKTBrain/KoBERT/tree/master/kobert_hf</a> <a target="_blank" href="https://github.com/monologg/KoBERT-Transformers">https://github.com/monologg/KoBERT-Transformers</a></li>
</ul>
<p>\=&gt; Selected KoBERT and currently implementing it.</p>
<blockquote>
<p>AI training is really difficult... Feedback received during mentoring: accuracy &gt;= 90!!! At least 90% accuracy is needed for meaningful free-text input.</p>
</blockquote>


## 📌 What I did today

### The necessity of BERT as a learning model

BERT is a learning model that takes an existing model and fine-tunes it to fit our needs. Fine-tuning is a technique where a pretrained model is updated to align with downstream tasks. Models trained with domain-specific data tend to perform better.

### Selection of BERT huggingface transformers model

1. bert-base-multilingual-cased bert-base-multilingual-cased is a model specialized in over 100 languages, but it doesn't seem to have extensive training on the Korean language. The total number of Korean vocabulary it was trained on is 119,546.
    
    ![](https://velog.velcdn.com/images/c1madang/post/fce363f1-8332-4d29-80f2-ccab0263f65c/image.png align="left")
    

* When using the raw tokenizer of BERT:
    
    ![](https://velog.velcdn.com/images/c1madang/post/2e864395-da5e-48a5-814c-a458c8035e6c/image.png align="left")
    

1. KoBERT A model that overcomes the limitations of Google's bert-base-multilingual-cased for Korean language performance.
    

* Reference GitHub repositories: [https://github.com/SKTBrain/KoBERT/tree/master/kobert\_hf](https://github.com/SKTBrain/KoBERT/tree/master/kobert_hf) [https://github.com/monologg/KoBERT-Transformers](https://github.com/monologg/KoBERT-Transformers)
    

\=&gt; Selected KoBERT and currently implementing it.

> AI training is really difficult... Feedback received during mentoring: accuracy &gt;= 90!!! At least 90% accuracy is needed for meaningful free-text input.

06/11/23

[TIL] AI Model - BERT

06/11/23

Table of contents

📌 What I did today

The necessity of BERT as a learning model

Selection of BERT huggingface transformers model