AIうぉ--!(ai-wo-katsuyo-shitai !)

AIを上手く使ってみせたい!!自分なりに。

TF-IDFを動かしてみる。(Windows, Python, 対象文書:英語)

TF-IDFを動かしてみる。(Windows, Python, 対象文書:英語)

わけあって、TF-IDFを動かしてみる。(Windows, Python, 対象文書:英語)

他力

利用させて頂いたコード

https://tech.at-iroha.jp/?p=1576

環境

windows10
python3.7

コード修正ポイント1

元の方、windows以外??
以下のように変更した。

   #mecab = MeCab.Tagger("-O chasen -d /var/lib/mecab/dic/ipadic-utf8/")
    mecab = MeCab.Tagger ("-Ochasen")

コード修正ポイント2

今回、対象文章を英語にしたかったので、Mecabは不要? get_textの処理を、実質、テキストを読むだけにした。

def get_textEN(input_file_name):
    with open(input_file_name, 'r', encoding='utf-8') as f:
        text = f.read()
    
    return text

動かした結果

文章は、deeplで英語にした。

doc_no: 0
        aim: 0.245413
        be: 0.245413
        citizen: 0.245413
        clean: 0.245413
        community: 0.245413
        corporate: 0.245413
        creation: 0.245413
        good: 0.245413
        international: 0.245413
        products: 0.245413
        prosperous: 0.245413
        provision: 0.245413
        safe: 0.245413
        trusted: 0.245413
        by: 0.197998
        contribute: 0.197998
        society: 0.197998
        through: 0.197998
        accessible: 0.000000
        affluent: 0.000000
        around: 0.000000
        at: 0.000000
        based: 0.000000
        business: 0.000000
        connect: 0.000000
        contributing: 0.000000
        customers: 0.000000
        dreams: 0.000000
        environment: 0.000000
        everything: 0.000000
        find: 0.000000
        foundation: 0.000000
        full: 0.000000
        future: 0.000000
        global: 0.000000
        heart: 0.000000
        information: 0.000000
        irreplaceable: 0.000000
        is: 0.000000
        it: 0.000000
        lowest: 0.000000
        make: 0.000000
        materially: 0.000000
        offer: 0.000000
        on: 0.000000
        online: 0.000000
        organize: 0.000000
        our: 0.000000
        people: 0.000000
        possible: 0.000000
        preservation: 0.000000
        price: 0.000000
        realize: 0.000000
        same: 0.000000
        security: 0.000000
        seek: 0.000000
        spiritually: 0.000000
        strive: 0.000000
        that: 0.000000
        time: 0.000000
        trust: 0.000000
        uncover: 0.000000
        usable: 0.000000
        while: 0.000000
        will: 0.000000
        wings: 0.000000
        with: 0.000000
        world: 0.000000
doc_no: 1
        based: 0.264426
        connect: 0.264426
        dreams: 0.264426
        foundation: 0.264426
        full: 0.264426
        future: 0.264426
        heart: 0.264426
        on: 0.264426
        security: 0.264426
        that: 0.264426
        trust: 0.264426
        wings: 0.264426
        with: 0.264426
        contribute: 0.213337
        world: 0.213337
        accessible: 0.000000
        affluent: 0.000000
        aim: 0.000000
        around: 0.000000
        at: 0.000000
        be: 0.000000
        business: 0.000000
        by: 0.000000
        citizen: 0.000000
        clean: 0.000000
        community: 0.000000
        contributing: 0.000000
        corporate: 0.000000
        creation: 0.000000
        customers: 0.000000
        environment: 0.000000
        everything: 0.000000
        find: 0.000000
        global: 0.000000
        good: 0.000000
        information: 0.000000
        international: 0.000000
        irreplaceable: 0.000000
        is: 0.000000
        it: 0.000000
        lowest: 0.000000
        make: 0.000000
        materially: 0.000000
        offer: 0.000000
        online: 0.000000
        organize: 0.000000
        our: 0.000000
        people: 0.000000
        possible: 0.000000
        preservation: 0.000000
        price: 0.000000
        products: 0.000000
        prosperous: 0.000000
        provision: 0.000000
        realize: 0.000000
        safe: 0.000000
        same: 0.000000
        seek: 0.000000
        society: 0.000000
        spiritually: 0.000000
        strive: 0.000000
        through: 0.000000
        time: 0.000000
        trusted: 0.000000
        uncover: 0.000000
        usable: 0.000000
        while: 0.000000
        will: 0.000000
doc_no: 2
        affluent: 0.240740
        business: 0.240740
        contributing: 0.240740
        environment: 0.240740
        global: 0.240740
        irreplaceable: 0.240740
        materially: 0.240740
        preservation: 0.240740
        realize: 0.240740
        same: 0.240740
        spiritually: 0.240740
        time: 0.240740
        while: 0.240740
        will: 0.240740
        at: 0.194227
        our: 0.194227
        society: 0.194227
        strive: 0.194227
        through: 0.194227
        accessible: 0.000000
        aim: 0.000000
        around: 0.000000
        based: 0.000000
        be: 0.000000
        by: 0.000000
        citizen: 0.000000
        clean: 0.000000
        community: 0.000000
        connect: 0.000000
        contribute: 0.000000
        corporate: 0.000000
        creation: 0.000000
        customers: 0.000000
        dreams: 0.000000
        everything: 0.000000
        find: 0.000000
        foundation: 0.000000
        full: 0.000000
        future: 0.000000
        good: 0.000000
        heart: 0.000000
        information: 0.000000
        international: 0.000000
        is: 0.000000
        it: 0.000000
        lowest: 0.000000
        make: 0.000000
        offer: 0.000000
        on: 0.000000
        online: 0.000000
        organize: 0.000000
        people: 0.000000
        possible: 0.000000
        price: 0.000000
        products: 0.000000
        prosperous: 0.000000
        provision: 0.000000
        safe: 0.000000
        security: 0.000000
        seek: 0.000000
        that: 0.000000
        trust: 0.000000
        trusted: 0.000000
        uncover: 0.000000
        usable: 0.000000
        wings: 0.000000
        with: 0.000000
        world: 0.000000
doc_no: 3
        customers: 0.281677
        everything: 0.281677
        find: 0.281677
        lowest: 0.281677
        offer: 0.281677
        online: 0.281677
        possible: 0.281677
        price: 0.281677
        seek: 0.281677
        uncover: 0.281677
        at: 0.227255
        it: 0.227255
        our: 0.227255
        strive: 0.227255
        accessible: 0.000000
        affluent: 0.000000
        aim: 0.000000
        around: 0.000000
        based: 0.000000
        be: 0.000000
        business: 0.000000
        by: 0.000000
        citizen: 0.000000
        clean: 0.000000
        community: 0.000000
        connect: 0.000000
        contribute: 0.000000
        contributing: 0.000000
        corporate: 0.000000
        creation: 0.000000
        dreams: 0.000000
        environment: 0.000000
        foundation: 0.000000
        full: 0.000000
        future: 0.000000
        global: 0.000000
        good: 0.000000
        heart: 0.000000
        information: 0.000000
        international: 0.000000
        irreplaceable: 0.000000
        is: 0.000000
        make: 0.000000
        materially: 0.000000
        on: 0.000000
        organize: 0.000000
        people: 0.000000
        preservation: 0.000000
        products: 0.000000
        prosperous: 0.000000
        provision: 0.000000
        realize: 0.000000
        safe: 0.000000
        same: 0.000000
        security: 0.000000
        society: 0.000000
        spiritually: 0.000000
        that: 0.000000
        through: 0.000000
        time: 0.000000
        trust: 0.000000
        trusted: 0.000000
        usable: 0.000000
        while: 0.000000
        will: 0.000000
        wings: 0.000000
        with: 0.000000
        world: 0.000000
doc_no: 4
        it: 0.433449
        world: 0.433449
        accessible: 0.268625
        around: 0.268625
        information: 0.268625
        is: 0.268625
        make: 0.268625
        organize: 0.268625
        people: 0.268625
        usable: 0.268625
        by: 0.216725
        affluent: 0.000000
        aim: 0.000000
        at: 0.000000
        based: 0.000000
        be: 0.000000
        business: 0.000000
        citizen: 0.000000
        clean: 0.000000
        community: 0.000000
        connect: 0.000000
        contribute: 0.000000
        contributing: 0.000000
        corporate: 0.000000
        creation: 0.000000
        customers: 0.000000
        dreams: 0.000000
        environment: 0.000000
        everything: 0.000000
        find: 0.000000
        foundation: 0.000000
        full: 0.000000
        future: 0.000000
        global: 0.000000
        good: 0.000000
        heart: 0.000000
        international: 0.000000
        irreplaceable: 0.000000
        lowest: 0.000000
        materially: 0.000000
        offer: 0.000000
        on: 0.000000
        online: 0.000000
        our: 0.000000
        possible: 0.000000
        preservation: 0.000000
        price: 0.000000
        products: 0.000000
        prosperous: 0.000000
        provision: 0.000000
        realize: 0.000000
        safe: 0.000000
        same: 0.000000
        security: 0.000000
        seek: 0.000000
        society: 0.000000
        spiritually: 0.000000
        strive: 0.000000
        that: 0.000000
        through: 0.000000
        time: 0.000000
        trust: 0.000000
        trusted: 0.000000
        uncover: 0.000000
        while: 0.000000
        will: 0.000000
        wings: 0.000000
        with: 0.000000

対象文章の例

To strive to find and uncover everything our customers seek online, and to offer it at the lowest possible price.

ここで、本来、「it」とかは、小さい値にしたいはず。。。

コメント

もう少し、多くの文章でやるべきなんでしょう。。。。