Skip to content

Some extractors (get_money, get_durations) are simply not working, while get_dates is? help? #76

Open
@abrljak

Description

@abrljak

Hi,
my app is a simple Rest API endpoint that is attempting to use 3 extractors (money, duration and dates) to locate the information in a supplied text. I pass in the same input text in all 3 extractors:

The amount of 120.000 USD should be paid in 12 equal monthly instalments starting with Jun 16, 2024.

I get the following output:

{
    "dates": [
        "Sun, 16 Jun 2024 00:00:00 GMT"
    ],
    "durations": [],
    "money": []
}

It looks like the get_dates() did it's job perfectly, but the other 2 extractors have not.
I have tried many different examples, tried downloading various tokenizers via nltk hoping that I am missing a dependency or something. I have no idea what might be wrong and I have a feeling I am missing something really simple.

Here is my complete code:

from flask import Flask, request, jsonify

import nltk
import lexnlp.extract.en.money
import lexnlp.extract.en.durations
import lexnlp.extract.en.dates

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('maxent_ne_chunker')
nltk.download('words')

app = Flask(__name__)

@app.route('/extract', methods=['POST'])
def extract_info():
    # Get the text from the request body
    data = request.json
    contract_text = data.get('text', '')

    if not contract_text:
        return jsonify({"error": "No text provided"}), 400

    money = list(lexnlp.extract.en.money.get_money(contract_text))
    durations = list(lexnlp.extract.en.durations.get_durations(contract_text))
    dates = list(lexnlp.extract.en.dates.get_dates(contract_text))

    return jsonify({
        "money": money,
        "durations": durations,
        "dates": dates
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

I am running everything inside docker - so here is also the complete dockerfile

FROM python:3.9-slim
RUN apt-get update && \
    apt-get install -y build-essential git ca-certificates && \
    update-ca-certificates
RUN git --version
RUN pip install spacy numpy dateparser pyahocorasick unidecode quantulum3 regex nltk
RUN python -m spacy download en_core_web_sm
RUN pip install git+https://github.com/LexPredict/lexpredict-lexnlp.git@2.3.0
RUN pip install Flask
WORKDIR /app
COPY . /app
CMD ["python", "app.py"]

Can you please help?

Thank you,
A.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy