Skip to content

SushainDevi/TextToVideo-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

README for Text-to-Video Generation Project

Overview

This project implements a text-to-video generation system using deep learning techniques. It creates videos based on textual prompts by generating sequences of images that depict moving shapes, specifically circles, in various directions and transformations. The generated videos can be used for various applications, including animation, educational content, and artistic expression.

Table of Contents

Installation

To set up the project, follow these steps:

  1. Clone the repository:

    git clone <repository-url>
    cd <repository-directory>
  2. Install required packages:

    Use pip to install the necessary libraries:

    pip install numpy opencv-python pillow torch torchvision scipy
  3. Set up the environment:

    Ensure you have a compatible version of Python (3.6 or higher) and PyTorch installed with CUDA support if you intend to run on a GPU.

Usage

To generate videos based on text prompts, follow these steps:

  1. Generate the dataset:

    The dataset consists of videos generated from predefined prompts. Run the dataset generation script:

    python generate_dataset.py

    This will create a directory named training_dataset containing the generated video frames.

  2. Train the model:

    After generating the dataset, train the model using the following command:

    python train_model.py

    This will train the GAN architecture on the generated dataset.

  3. Generate videos from text prompts:

    After training, you can generate videos by running:

    python generate_video.py "circle moving down"

    Replace "circle moving down" with any other prompt from the predefined list.

Dataset Generation

The dataset is generated by creating 10-frame videos of a circle moving in various directions based on text prompts. The dataset generation process includes:

  • Creating a directory for the dataset.
  • Defining the number of videos and frames.
  • Generating images with moving shapes using the create_image_with_moving_shape function.
  • Applying Gaussian splatting to the generated images for a smoother visual effect.

Model Architecture

The project utilizes a Generative Adversarial Network (GAN) architecture comprising:

  • Text Embedding Layer: Converts text prompts into embeddings.
  • Generator: Generates video frames based on random noise and text embeddings.
  • Discriminator: Distinguishes between real and generated frames.

Key Classes

  • TextEmbedding: Embeds text prompts into a numerical format.
  • Generator: Transforms random noise and text embeddings into video frames.
  • Discriminator: Evaluates the authenticity of generated frames.

Training the Model

The model is trained using the following steps:

  1. Load the dataset using a custom TextToVideoDataset class.
  2. Initialize the generator and discriminator networks.
  3. Use binary cross-entropy loss for training.
  4. Optimize the networks using Adam optimizer.
  5. Iterate through the dataset for a specified number of epochs, updating the generator and discriminator alternately.

Generating Videos

To create videos from text prompts, the following functions are implemented:

  • generate_video: Generates a video based on a given text prompt.
  • frames_to_video: Converts generated frames into a video file.
  • save_frames_to_disk: Saves generated frames to a specified directory.

Contributing

Contributions are welcome! To contribute to this project:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them.
  4. Push to your fork and create a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

  • Special thanks to the contributors and the community for their support.
  • This project leverages PyTorch and other open-source libraries for deep learning and image processing.

Feel free to customize this README further based on specific details or additional features of your project.

About

This project implements a text-to-video generation system using deep learning techniques. It creates videos based on textual prompts by generating sequences of images that depict moving shapes, specifically circles, in various directions and transformations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages