Text Prediction Pipeline

Text pipeline is integrated within the UniversalMLObject architecture with small variations.

Pipeline:

File Name	Description
server/dependency.py	Stored inside the Object collection & text_content as optional field
server/db_connection.py	Use object_collection functions
server/routers/prediction.py	Uses object_endpoint to predict, get a result, and store result
server/prediction_worker/utility/main.py	Uses the same predict method as image and other forms of data

Server/prediction.py

Predict Function for text

Summary: This is an endpoint at localhost:5000/model/predict, which receives a list of text files and predict using given models, then store UniversalMLObject in the object_collection.

Params:

objects: A list of txt files in this case, currently has to be a '.txt' file which contains information that needs to be predicted
models: A list of strings which are model names
current_user: the username of the current user
model_type: The type of input files, "text" in this case

Return: Dictionary with key "text", value is a list of hashes for different files, example below:

{
   "text": [
       "bd931a6a2262fbab85c18b5a9bfa5a78",
       "cd931a6a2widjfidf5c18b5a9bfa5a78"
   ]
}

Predict is mostly the same for all types of data. However, when model_type = 'text':

Then text_content is stored in UniversalMLObject as str
Then calls 'utility.main.predict_object' function with params hash_md5 & text_content
Compare to other forms of data, where file_name is passed as param instead of text_content, so prediction worker can access the data in the shared folder by file_name

else:
  text_content = file_obj.read()
  text_content = text_content.decode('UTF-8')
  for model in models:
    Queue(name=model, connection=redis).enqueue(
      'utility.main.predict_object', hash_md5, text_content, job_id=hash_md5+model+str(uuid.uuid4())
    )

server/prediction_worker/utility/main.predict_object

Summary: This function is used by server to call prediction_workers to predict. Params:

Object_hash: md5_hashes for the Object
prediction_obj: Text is text_content ( str of data), other data has a file_path where can be used to find the data

Output:

Prediction result
If the prediction failed, an error message printed out in the console

Example Text Models

Simple_text model is receiving any string and always return Positive: 1 as the result
Sentiment_analysis: Hugging face model, determine a piece of text is positive or negative and gives the score
NER_analysis: Hugging face model, return named entities from the text including person, location and organizations