0% found this document useful (0 votes)

20 views13 pages

Modern Neural Network Technologies Text-to-Image: Scientific Visualization, 2023, Volume 15, Number 2, Pages 66 - 79

The paper reviews modern text-to-image neural networks, focusing on their capabilities, applications, and the challenges they face in generating images from textual descriptions. It highlights significant models like Midjourney, DALL-E 2, Stable Diffusion, and ruDALL-E, discussing their unique features and potential uses in various fields such as environmental monitoring and medical data analysis. The rapid advancements in this technology raise important considerations for its impact on society and the need for further research into the quality and accuracy of generated images.

Uploaded by

baforemmanuel1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views13 pages

Modern Neural Network Technologies Text-to-Image: Scientific Visualization, 2023, Volume 15, Number 2, Pages 66 - 79

Uploaded by

baforemmanuel1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Scientific Visualization, 2023, volume 15, number 2, pages 66 - 79, DOI: 10.26583/sv.15.2.

Modern Neural Network Technologies Text-to-Image

N.A. Bondareva1

Keldysh Institute of Applied Mathematics RAS

1 ORCID: 0000-0002-7586-903X, nicibond9991@gmail.com

Abstract
This paper discusses state-of-the-art graphical text-to-image neural networks and meth-
ods for text-to-image conversion, analyzing the results achieved and samples created to date
for text-to-image conversion tasks. Ways of applying neural network approaches to text-to-
image transformation for environmental monitoring, infrastructure and medical data analysis
tasks are proposed. In this paper the results of neural network generation and its correlation
with the user input linguistic constructions of text queries are reviewed, and the typical flaws
and artifacts typical of the neural network generated images are identified and classified. The
rapid development of neural network technologies in this field could have a significant impact
on society, the professional market and the media, which makes the task of studying neural
network images and identifying them among other graphic content particularly relevant.
Keywords: Machine Learning, Computer Vision and Pattern Recognition, Neural net-
work, Computer graphics, Text-to-image.

1. Introduction
The field of neural network technology is currently undergoing rapid development, be-
coming more sophisticated every day and gaining more and more skills and capabilities. In
particular, neural networks that can process images in a variety of ways, from animating pho-
tos to automatically creating full-fledged images based on a user's text request, are becoming
particularly popular these days.
The task of such a neural network is to form plausible images for a variety of sentences
that explore the compositional structure of language. Another task becomes the simultaneous
management of multiple objects, their attributes and their spatial relations. In order to cor-
rectly interpret a query sentence, the algorithm must not only correctly compose each object
attribute, but also correctly form associations. For example, to visualize the sentence "hedge-
hog in red hat, yellow gloves, blue shirt and green trousers", the neural network needs to rec-
ognize in the text and generate object images in a given combination of features and object
(hat, red), (gloves, yellow), (shirt, blue) and (trousers, green) without mixing them [1].
It should also be noted that the user of this type of neural network cannot yet predict in
advance the visual result that the neural network will produce for the entered textual query.
The correlation between the original query text and the resulting visual image is a separate
class of problems, which is currently being actively studied and solved by the developers of
the largest text-to-image neural networks, such as Midjourney.

1.1 Text-to-image graphical neural networks

2022 is a pivotal year for many creative professions. In March 2022, the Midjourney im-
age generation neural network opened to the public and quickly gained a large following, not
least due to the fact that it became publicly available before the likes of DALL-E and Stable
Diffusion. With a distinctive and already recognisable style, it's rapidly evolving and improv-
ing, allowing users to recreate more and more complex queries in graphical form.
Also in 2022, USA-based machine learning technology licensing and development com-
pany OpenAI unveiled DALL-E 2, an updated version of the neural network first shown in
January 2021. The new version generates even better and more realistic images from the de-
scription in English.
One of the major events in the world of AI image generators was the public release of Sta-
ble Diffusion neural network, because unlike DALL-E and Midjourney, Stable Diffusion mod-
el source code is open and allows users to conduct their own experiments to improve the neu-
ral network. Stable Diffusion has become the basis for a dozen new projects, the number of
which is now only growing.

1.2 Field of application

Nowadays the main area of application of such neural networks is in the media and enter-
tainment industry, although in potential applications the field is almost limitless: from illus-
trations for presentations to logos, sketches for films and official covers for glossy magazines.
Manufacturers in the fashion world are also turning their attention to the potential of
neural network graphics. The rapid and high-quality development of unique designs (Figure
1) to suit individual user requirements can become closely linked to production and be in de-
mand in society.

Figure 1. A unique design for sneakers generated by Midjourney's neural network using the
query "nike sneakers in khokhloma style" [2].

It can be assumed that the use of neural networks will evolve in the future as a tool for
one of the most sought-after scenarios for businesses - personalizing content to suit individu-
al user needs.
However, the ability of neural networks to quickly and automatically generate an infinite
number of different images from a given textual description opens up opportunities for scien-
tific work as well.
With the ability to train off-the-shelf algorithms on thematically selected material (pre-
pared image database), it is possible to create specialized neural networks adapted to do-
main-specific terms and queries.
For example, the use of text-to-image neural networks is possible in areas such as envi-
ronmental monitoring or biomedical technology.
In order to organize environmental monitoring, it is first necessary to collect data. As the
data from different sources are analyzed in order to monitor the processes taking place in the
environment: images, heterogeneous sensor data, textual data and others, the collected data
is heterogeneous [3]. After analyzing the data and identifying the main components that have
the most impact on the overall situation, it becomes possible to summarize what is happening
in textual form. It is then possible to generate an appropriate textual query and model a visu-
al image from the linguistic data.
A complex analysis allows to get the most effective picture of the processes taking place
and draw adequate conclusions.
The use of neural network graphics for rapid generation of illustrative images of the pro-
cesses under study allows one to get the most complete impression of what is happening.
With the availability of textual eyewitness testimonies, it becomes possible to quickly re-
construct the visual picture of the events and visualize the overall situation for further analy-
sis.
The aggregate of various data can be transformed into a visual form without the need to
use human imagination, but with the possibility of online expert corrections to bring the final
representation to the desired form that most accurately reflects the phenomenon being de-
scribed. Figure 2 shows a visualization of a rather general query (query text: "crimson sky,
high waves, a storm is approaching"). Nevertheless, the image is already highly detailed and
presented in four versions, from which the user can choose the one most suitable for his
needs and make the necessary adjustments until a satisfying result is achieved.

Figure 2. Example of a visual image reconstructed from a textual description.

For more specialized tasks, such as manufacturing or medicine, specially trained neural
networks are needed, capable of understanding certain jargon or science-based linguistic
phrases without allowing ambiguous interpretations.
Given the vast amount of accumulated material, and the existence of specialized archives
for many fields, it may be a matter of time before specifically oriented graphical neural net-
works are developed.
Potentially, their application gives ample opportunities for analysis of various types of da-
ta, combining them and displaying in a clear and understandable way. They can also be used
extensively in teaching and learning tools.
For example, a neural network can represent a typical condition of some organ, tissues at
a certain set of symptoms listed in a query. In case a textual description contains an indica-
tion of some pathology, a visual representation can help to highlight it and make the right de-
cision.
Neural networks are already widely used in different fields of science.
For example, tasks that are performed by an inpainting function (removing objects and
then shading empty areas of an image so that the fact of such shading is unnoticeable), are in
demand in archeology in the case when it is necessary to recreate a building of which only ru-
ins are left. A neural network can generate an image based on data about similar buildings
and styles in architecture.
2. TEXT-TO-IMAGE GRAPHICAL NEURAL NETWORKS
This section presents brief descriptions of the most popular and large commercial text-to-
image neural networks that have become widely known in the last year. These include
Midjourney, which opened in March 2022, an updated version of DALL-E 2, first demon-
strated in January 2021; Stable Diffusion, an open-source neural network that has become
the basis for dozens of new projects; and ruDALL-E, a Russian neural network based on gen-
erative models from SberDevices and Sber AI.

2.1 Midjourney
Midjourney [4] is proprietary software that creates images from text descriptions. The
project was founded in February 2022 by scientist and entrepreneur David Holtz. The
Midjourney team has positioned itself as an independent research laboratory dedicated to ex-
panding humanity's creative abilities.
Midjourney's work is enabled by two relatively recent technological breakthroughs in ar-
tificial intelligence: the ability of neural networks to understand human speech and create
images.
The neural network is trained to match textual descriptions with visual images across
hundreds of millions of examples, using specially compiled collections that contain billions of
images gathered in the network, as well as matched image-text pairs. Such datasets can be
commercial or open source, such as LAION [5], on which the famous Stable Diffusion neural
network was trained. The results of such training allow solving various cross-modal tasks -
generation of pictures based on text descriptions, generation of text descriptions based on
pictures, regeneration or rendering of image parts, etc. This makes it possible to advance in
solving such topical tasks as visualization of incomplete data and their replenishment.
Midjourney, like most neural networks of this type, is well capable of making explicit que-
ries, without getting specific. For example, if you give it a query to generate "red car on the
road", it will generate quite satisfactory options. You can experiment with car colour, size,
background - these are quite general queries.
Problems may arise with more specific queries. For example, a car model may already
cause problems for a neural network. The rarer this model occurs in the network space, the
less chance that a neural network will be able to draw it.
However, graphical neural networks at this point in time are an extremely fast developing
and progressing area of computer graphics, so versions of Midjourney are constantly being
updated and improved. The paper [6] provides a comparative review of Midjourney versions
v3 and v4, looking at the key differences and features of the updated version. In March 2023,
an update to Midjourney version v5 was released and its features are only being explored.

2.2 DALL-E 2
DALL-E 2 [7] is one of the most popular neural network graphics systems, developed by
OpenAI with 12 billion parameters based on GPT-3 (Generative Pre-trained Transformer 3 -
the largest and most advanced language model in the world from OpenAI), trained to gener-
ate images from text descriptions using the text-image pair data set. It is able to generate
original images from textual descriptions and allows users to upload images and edit them,
for example by adding elements. Furthermore DALL-E can not only generate an image from
scratch, it can also regenerate any rectangular area of an existing image,
According to the developers, "DALL-E 2 is an artificial intelligence system that can create
realistic images and drawings from a natural language description".
DALL-E 2 started as a research project and is primarily of interest due to the publications
of the developers, who have done a lot of work in creating algorithms and studying the behav-
ior and capabilities of the developed neural network [1, 8, 9].
A neural network can create images in a wide variety of drawing styles and techniques. It
can be an image that looks like a frame from a cartoon, or the image will look like a real pho-
tograph.
DALL-E 2 was trained on image pairs and their respective captions. According to the de-
velopers, the pairs were taken from a combination of publicly available and licensed sources
[10].
The software is now available to a limited number of people, only by subscription. This is
due to both limited server infrastructure capacity and the developers' desire to control the de-
velopment and self-learning of the neural network through user testing. In particular, due to
concerns about the misuse of the neural network, the developers carefully filter content for its
training and incoming requests for prohibited topics (violence, adult content, etc.).
Among the features provided in the latest updates are such as:
 higher resolution of images
 query processing in more than 107 languages, including Russian
 high request recognition accuracy
 possibility of setting colour filters and image style
 can take an existing image as an input and create a creative variation of it
 possibility to refine the uploaded image.

2.3 Stable Diffusion

On 22 August, Stability AI released its open-source image generation model that could
compete with DALL-E 2 in terms of quality.
Stable Diffusion (SD) stands out from similar neural networks primarily due to its open
source code under the Creative ML OpenRail-M license [11]. This makes it possible to run SD
on your own computer, rather than via the cloud, which is accessed via a website or API.
For decent results, the developers recommend an NVIDIA 3xxx series GPU with at least
6GB of RAM.
Stable Diffusion is a system made up of many components and models that are responsi-
ble for different parts of the system. These include a text understanding component, which
converts textual information into digital form, and an image information space creation com-
ponent, from which the image itself is subsequently drawn using an image decoder. This is
done only once at the end of the process and produces a finished pixel image. Such an algo-
rithm speeds up the process compared to previous diffusion models operating in pixel space
(Figure 3).

Figure 3. The main components of Stable Diffusion.

For more on the work of Stable Diffusion, see [12].

2.4 ruDALL-E
ruDALL-E [13] is a family of generative models from SberDevices and Sber AI. The neural
network was developed and trained by Sber AI researchers with the partner support of scien-
tists from AIRI Institute of Artificial Intelligence on a combined Sber AI and SberDevices da-
taset of 1 billion text-image pairs. Teams from Sber AI, SberDevices, Samara University, AIRI
and SberCloud actively participated in the project.
Specialists created and trained two versions of the model, named after two great Russian
abstractionists, Vasily Kandinsky and Kazimir Malevich:
 ruDALL-E Kandinsky (XXL) with 12 billion parameters;
 ruDALL-E Malevich (XL) with 1.3 billion parameters.
Both models are capable of generating colourful images on a variety of topics from a short
textual description. According to the developers, Kandinsky uses backward diffusion and can
process queries in 101 languages, without any loss in quality or speed. Among those languages
are both common languages such as Russian and English, as well as rarer languages such as
Mongolian. The system will understand even if a query contains words in different languages.
Training the ruDALL-E neural network on a Christofari cluster was the biggest computa-
tional challenge in Russia. It involved 196 NVIDIA A100 cards, each with 80 GB of memory.
The whole training took 14 days or 65,856 GPU-hours. It was first trained for 5 days at
256x256 resolution, then 6 days at 512x512 resolution and 3 days at maximum clean data.
The ruDALL-E Kandinsky 2.0 system is claimed to be the first multilingual diffusion neu-
ral network capable not only of accepting requests in different languages, but also of forming
linguistic-visual shifts in language cultures.
This statement is supported by a number of experiments [14]. In particular, such queries
as "national dish" or "person with higher education" are tested (Figures 4 and 5). For the
Russian-language query, the neural network produces predominantly white males, while for
the same query in French, the results are more varied. For the query in Chinese, the results
have more stylized images, but in most cases they also reflect the national component.

Figure 4. Testing the query "photo of a person with higher education" in

Russian, French and Chinese.

Figure 5. Testing the query "national dish" in Russian, Japanese and Hindi.

The author also conducted an experiment (Figure 6) on the FusionBrain platform [15],
which confirmed the orientation of this neural network to different language environments.
The query "national dish", performed in several languages, produced completely different re-
sults.
Figure 6. Testing the query "national dish" in Russian, Hindi and Italian (rows across).

It is worth noting that queries in different languages make sense to test either on the
above-mentioned platform or by interacting with developers' repositories directly. The
rudalle.ru platform is not adapted to such queries; it is capable of perceiving a foreign lan-
guage, identifying it, translating the query into Russian, and then generating a visual image.
Such experiments open up a separate area for research, as preliminary studies suggest
that neural networks of different language groups will have their own distortions and differ-
ences in the interpretation of the same phenomenon, depending on the mass culture belong-
ing to one or another language group.

3. FEATURES OF GRAPHICAL NEURAL NETWORKS

3.1 Human-Network interaction using natural language
This rapid development of neural network technologies' capabilities in the field of
graphics and photorealistic images brings to the forefront the task of interaction between
humans and neural network technologies using natural language. The linguistic construct
that a human uses to formulate a task often contains much more meaning and historical con-
text than a neural network, which focuses on a specific set of parameters and phrases, can
understand. [16, 17].
For example, since most neural networks can only understand queries in a certain lan-
guage (English being the most common), the linguistic context and subtleties of translation
must be taken into account when dealing with them. This issue also needs research.
So, popular on the web experiments on visualization of well-known proverbs and sayings
in Russian are of dubious effectiveness, because such experiments are most often carried out
in Midjourney neural network, which specializes in queries in English and understands re-
quests in other languages poorly. Accordingly, the cultural layer, on which it relies, refers ra-
ther to the English-language space and reflects its specificity.
Thus, in Figure 7 the user asked the neural network a query in the form of the Russian
proverb "волки ноги кормят".
Figure 7. Neural network's attempt to generate an image using the Russian phrase
"волка ноги кормят".

Unfortunately, the original article [18] does not provide the exact text of the query, but
judging from the result, we can conclude that there was a direct literal translation of "wolf
feet fed", and the neural network reproduced this query quite literally. Meanwhile, this prov-
erb has the full English analogue of the semantic idiom "The dog that trots about finds a
bone" or the translation offered by the online translator DeepL, "the wolf feeds the wolf",
which implies absolutely different visual images, but has the same meaning. Therefore, when
giving a neural network a query, you should take into account the difference between the se-
mantic translation and the direct translation, because the results can be drastically different.
Thus, making the right query becomes, in a sense, a profession. People who have learned to
get the intended and high-quality result are already called "prompt-engineers", and more and
more offers to form a precise query for a neural network are appearing on freelance exchang-
es.

3.2 Typical artefacts

In addition to poor correlation between linguistic query and graphical representation of
the result, when a neural network does not understand the query or recognizes and visualizes
only part of the meaning put into the query text by the user, a number of artifacts typical of
neural networks are revealed.
These artifacts are widespread and typical for neural network generated images. In par-
ticular, their presence can be used to identify an image generated by a neural network.
Conventionally a common set of artifacts can be divided into three main groups:
1. "Chimeras". The case when a neural net cannot correctly reflect the requested object or
mixes given objects with each other, generating surreal and sometimes frightening images.
Such things can also be planned by the user, but in this case the query text itself implies com-
bining incompatible notions.
One of the most famous examples is the human hand, or more precisely the position of
the fingers. The most common artifact in the generation of human images - distorted hands,
where there are either missing or six or more fingers, or they are intertwined and bent at an
anatomically inconceivable angle. There is speculation that the neural network combines
multiple hand arrangements, but does not filter out minor details like extra fingers.
2. Distorted composition. Neural networks in the majority of cases cannot create fully re-
alistic or stylized images with a lot of details. Objects merge with one another, there are un-
der-drawn or mis-matched objects.
This phenomenon differs from "chimeras" in that the overall structure of the generated
picture at first glance is not violated and seems natural, but upon closer examination it ap-
pears that some objects are not completed, located relative to each other with a distorted per-
spective or flowing one into another.
Figure 8 shows one example of this problem. According to the request (man waiting in
line at Mcdonald's in Thailand, detailed facial features, full body, fuji color film, 2005 -v 4)
the picture shows a young man standing in a line at Mcdonald's with the appropriate entou-
rage and appropriate appearance. There is a high degree of photorealism and detail in the im-
age, but a closer look at the background reveals a number of clear signs that the image be-
longs to the neural network. The people blend into each other: for example, one man's face is
sunken into another man's T-shirt, his head has no neck, and another man's arm is dispro-
portionately thin and merges with the edge of his blue sleeve, blurring around the edges and
seeming to shine through.

Figure 8. Image generated using Midjourney v4 [6].

Another artefact that occurs quite often in Midjourney is the bent spoons in the food pic-
tures (Figures 9-10).

Figure 9. Presence of artefact - deformed spoon in the generated image [19].

Figure 10. Presence of artefact - deformed spoon in the generated image [20].

Texture artefacts. In this case, the artifacts do not affect the overall image and occur in
places where the neural network cannot adequately process some highly detailed area or rec-
reate the desired structure. This could be hair, clothing fabric, or skin.
Such artifacts are inherent to neural networks that can reconstruct part of the image, en-
hance the quality, or generate the image from scratch.
More often than not, at the location of the artefact, when zoomed in you can see a visible
difference between the damaged area and the rest of the image. In Figure 11, for example, you
can see an odd ripple in one section of hair, unlike the rest of the hair. A neural network often
produces this pixel grid effect, but in most cases it is only visible when it is greatly exaggerat-
ed.

Figure 11. An example of the presence of artefacts in hair texture [21].

3.3 Developments in graphical networks and emerging issues

3.3.1 Copyright
Neural network technologies are currently in their heyday. From individual artists creat-
ing jewellery to large companies such as Adobe, the fruits of their work are beginning to be
used en masse for their own purposes. Such a leap is generating quite a few social phenome-
na. Some companies are already banning on their site the uploading and selling of illustra-
tions created with AI and tools such as DALL-E, Midjourney and Stable Diffusion. The reason
Getty Images (the US photo agency that owns one of the world's largest image banks) is re-
jecting AI creativity is because of possible copyright issues.
Content creation tools are predominantly trained on images taken from the internet and
protected by copyright. These sources may include personal art blogs, news sites and stock
sites (stock is a photo image on a particular subject which is sold on publicly available mar-
ketplaces and can be used as an illustration or advertisement for photographs). Scraping (ex-
tracting data from web pages) is recognized as legal in the US and falls under the category of
'fair use'. A number of artists whose work has been copied or imitated by neural network im-
age generators have called for this area to be regulated by law, as a neural network is able to
copy a particular style very precisely and reproduce it on its own content (Figure 12).
Figure 12. Original photo by photographer Richard Avedon (left) and a portrait in the same
style generated in the photorealistic model of the Stable Diffusion dreamlike-photoreal-2.0
algorithm (right). [22].

3.3.2 Spreading fake information

Another emerging issue is the possibility for users to train ready-made neural network al-
gorithms on their own data, given the appropriate technical capabilities. While most other
paid neural networks have certain markers set by the developers that limit a number of user
requests, and the system bans the account in case of abuse (such requests include 18+ topics,
shocking content and violence), open-source neural networks like Stable Diffusion allow us-
ers to experiment relatively freely and train their own neural networks, targeting certain are-
as.
This can cause uncontrolled spew of unfiltered graphic content into the online infor-
mation space, including photorealistic images that, if misused, can trigger meaningful social
movements and spread unconfirmed fake information.
In February 2023, a precedent of the use of images generated by graphic neural networks
was first reported in the media [23]. Fraudsters used neural network images to cash in on the
earthquakes in Turkey and Syria by distributing generated disaster images on Twitter along
with addresses to cryptocurrency wallets asking for charitable donations (Figure 13).

Figure 13. Fake photo generated in a neural network.

Thanks to visible artefacts (distortions of the child's face and fingers on the hands), the
fraudulent scheme was quickly uncovered, but this case risks being only the first of many.
Neural networks are rapidly improving and being updated, and neural network images are
becoming increasingly dense in everyday life. It is quite possible that their use will soon be
aimed not only at creative activities, but also at fraudulent and provocative ones. An instan-
taneous mass of fake images of current political events or shocking content, generated at high
speed and in large quantities, can mislead unprepared people and lead to negative social reac-
tions.
Photorealistic neural networks carry the potential risk of discretizing and destroying the
legal value of photographic or video evidence and distorting and falsifying historical sources.
Such perspectives bring to the fore the need to address the task of verification and identi-
fication of generated or processed photorealistic images in order to effectively counteract
their malicious use.
The first measures proposed are to identify and classify the main features of neural net-
work images and typical artefacts (as described in section 3.2).
Currently, in most cases neural network images can be identified by a set of direct and in-
direct features, but for a number of images this becomes a difficult task. For example, these
could be single portraits of people with no hands or complex poses, realistic 'photographs' of
animals, abstract landscapes or paintings. In these cases, the neural network has been trained
on a huge database and produces almost no artefacts.
This in turn raises the challenge of developing algorithms to distinguish the work of the
neural network and identify computer "fakes" among the original photos.

CONCLUSIONS
In this paper, state-of-the-art text-to-image graphical neural networks and methods of
text-to-image transformation have been examined and the results achieved have been ana-
lyzed.
A number of problems generated by these systems were considered. Ways of applying
neural network approaches to text-to-image transformation for environmental monitoring,
infrastructure and medical data analysis tasks were proposed.

References
1. Ramesh A., Pavlov M., Goh G., Gray S., Voss C., Radford A., Chen M, Sutskever I.,
2021. Zero-Shot Text-to-Image Generation, arXiv:2102.12092 [cs.CV],
https://doi.org/10.48550/arXiv.2102.12092
2. Telegram Channel «Neurodesign», 2023a, https://t.me/neurodes/343 (19 March
2023)
3. Yazikov E.G., Talovskaya A.V., Nadeina L.V., 2013. Geoecological environmental moni-
toring: соursebook / Tomsk Polytechnic University
4. Midjourney, https://www.midjourney.com/ (19 April 2023)
5. LAION. Large-scale Artificial Intelligence Open Network. https://laion.ai/ (19 March
2023)
6. Yubin Ma, 10 Incredible Prompt Styles to Try in Midjourney V4.
https://aituts.com/midjourney-v4-prompts-to-try/ (23 January 2023)
7. DALL·E 2, https://openai.com/product/dall-e-2 (19 April 2023)
8. Dhariwal P., Nichol A. 2021, Diffusion Models Beat GANs on Image Synthesis.
arXiv:2105.05233
https://doi.org/10.48550/arXiv.2105.05233
9. Radford A., Jong W.K., Hallacy C., Ramesh A., Goh G., Agarwal S., Sastry G., Askell A.,
Mishkin P., Clark J., Krueger G., Sutskever I. 2021. Learning Transferable Visual Models
From Natural Language Supervision. arXiv preprint arXiv:2103.00020 [cs.CV].
https://doi.org/10.48550/arXiv.2103.00020
10. DALL·E 2 Preview - Risks and Limitation, 2022, https://github.com/openai/dalle-2-
preview/blob/main/system-card.md#model (19 March 2023)
11. Stable Diffusion Online, https://stablediffusionweb.com/ (19 April 2023)
12. Alammar J. 2022, The Illustrated Stable Diffusion.
https://jalammar.github.io/illustrated-stable-diffusion/ (19 March 2023)
13. ruDALL-E, https://rudalle.ru/ (19 April 2023)
14. Shakhmatov A., Razhigayev A., Arkhipkin V., Nikolic A., Pavlov I., Kuznetsov A., Di-
mitrov D., Shavrina T., Markov S. Kandinsky 2.0 - the first multilingual diffusion for text-
based image generation.
https://habr.com/ru/company/sberbank/blog/701162/ (19 March 2023)
15. FusionBrain. https://fusionbrain.ai/diffusion (19 March 2023)
16. Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A., 2017. Image-toimage translation with
conditional adversarial networks. In Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 1125–1134.
17. Koh, J. Y., Baldridge, J., Lee, H., and Yang, Y., 2021. Text-toimage generation ground-
ed by fine-grained user attention. In Proceedings of the IEEE/CVF Winter Conference on Ap-
plications of Computer Vision, pp. 237–246.
18. Midjourney and idioms.
https://pikabu.ru/story/midjourney_i_frazeologizmyi_9768400 (23 January 2023)
19. Telegram Channel «Neurodesign», 2023b, https://t.me/neurodes/619 (19 March
2023)
20. Telegram Channel «Neurodesign», 2023c, https://t.me/neurodes/303 (19 March
2023)
21. Telegram Channel «Neurodesign», 2023d, https://t.me/neurodes/750 (19 March
2023)
22. Makushin A. https://t.me/makushinphoto/541 (23 January 2023)
23. Gelbart H., 2023, Scammers are profiting from the earthquake in Turkey by raising
money, supposedly to help the victims. https://www.bbc.com/russian/news-64640487 (19
March 2023)

Tenofas FLUX Modular Workflow - User Guide - Civitai
100% (1)
Tenofas FLUX Modular Workflow - User Guide - Civitai
15 pages
Anatomy & Physiology For Emergency Care 3rd Edition
No ratings yet
Anatomy & Physiology For Emergency Care 3rd Edition
157 pages
Ai Image Phase 3
No ratings yet
Ai Image Phase 3
5 pages
Lecture Generative AI and Whole Cell Modeling
No ratings yet
Lecture Generative AI and Whole Cell Modeling
50 pages
Textbook of Physiology For Allied Health Sciences and Paramedical Courses First Edition 9348385321 9789348385321 Compress
No ratings yet
Textbook of Physiology For Allied Health Sciences and Paramedical Courses First Edition 9348385321 9789348385321 Compress
258 pages
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
No ratings yet
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
12 pages
Document
No ratings yet
Document
29 pages
Diffusion: by Aryan Jain
100% (1)
Diffusion: by Aryan Jain
55 pages
Keyboard Teaching Guide 4
100% (1)
Keyboard Teaching Guide 4
125 pages
How Generative
No ratings yet
How Generative
18 pages
SKGGen AI
No ratings yet
SKGGen AI
27 pages
GenAI Roadmap
No ratings yet
GenAI Roadmap
8 pages
Jemr 13
No ratings yet
Jemr 13
26 pages
FLUX.1 Prompt Guide
No ratings yet
FLUX.1 Prompt Guide
16 pages
Hard Prompts Made Easy: Gradient-Based Discrete Optimization For Prompt Tuning and Discovery
No ratings yet
Hard Prompts Made Easy: Gradient-Based Discrete Optimization For Prompt Tuning and Discovery
15 pages
Generative Artificial Intelligence - Wikipedia
No ratings yet
Generative Artificial Intelligence - Wikipedia
37 pages
Instagram Masterplan 2024 09 08 07 27 24
No ratings yet
Instagram Masterplan 2024 09 08 07 27 24
107 pages
Where Art Meets Technology Mid Journey's AI Image Generator
No ratings yet
Where Art Meets Technology Mid Journey's AI Image Generator
5 pages
Advancements in Generative AI A Comprehensive Review of GANs GPT Autoencoders Diffusion Model and Transformers
No ratings yet
Advancements in Generative AI A Comprehensive Review of GANs GPT Autoencoders Diffusion Model and Transformers
26 pages
The Six Best PDF Generator APIs - PSPDFKit
No ratings yet
The Six Best PDF Generator APIs - PSPDFKit
22 pages
21 Apps Successfull Fashion Brands Are Using
No ratings yet
21 Apps Successfull Fashion Brands Are Using
11 pages
Stablee
No ratings yet
Stablee
16 pages
Extensions AUTOMATIC1111 - Stable-Diffusion-Webui Wiki GitHub
No ratings yet
Extensions AUTOMATIC1111 - Stable-Diffusion-Webui Wiki GitHub
61 pages
Fin Aaaaaaa Al
No ratings yet
Fin Aaaaaaa Al
138 pages
Add Remove or Change ANYTHING in Midjourney 1706046609
No ratings yet
Add Remove or Change ANYTHING in Midjourney 1706046609
27 pages
GenAI & Midjourney
No ratings yet
GenAI & Midjourney
24 pages
Aishat Ismail Portfolio 2
100% (1)
Aishat Ismail Portfolio 2
12 pages
Chapter 12. Somatoform and Dissociative Disorders
No ratings yet
Chapter 12. Somatoform and Dissociative Disorders
8 pages
Deepseek-Ai - awesome-Deepseek-Integration - Integrate The DeepSeek API Into Popular Softwares
No ratings yet
Deepseek-Ai - awesome-Deepseek-Integration - Integrate The DeepSeek API Into Popular Softwares
10 pages
Dissociative Identity Disorder
100% (1)
Dissociative Identity Disorder
17 pages
Digital Documentation Templates
No ratings yet
Digital Documentation Templates
10 pages
1 Year of Art Prompts
100% (1)
1 Year of Art Prompts
15 pages
Sexuality and Sexual Behaviors
No ratings yet
Sexuality and Sexual Behaviors
3 pages
Introduction To Mage - Space
No ratings yet
Introduction To Mage - Space
15 pages
? What Is A Storyboard
100% (1)
? What Is A Storyboard
3 pages
Ai and Copyright Both Side of Black Box
No ratings yet
Ai and Copyright Both Side of Black Box
28 pages
PS371 Powerpoint C&C CH 6 Trauma, Stress, Dissociation
No ratings yet
PS371 Powerpoint C&C CH 6 Trauma, Stress, Dissociation
13 pages
User's Prompt Journey
No ratings yet
User's Prompt Journey
13 pages
Midjourney Prompts Discord
No ratings yet
Midjourney Prompts Discord
1 page
Cute Illustrated Kawaii Japan Background Slides
No ratings yet
Cute Illustrated Kawaii Japan Background Slides
14 pages
Beginners Guide To Artificial Intelligence
No ratings yet
Beginners Guide To Artificial Intelligence
109 pages
Case Analysis
No ratings yet
Case Analysis
3 pages
Stable Diffusion
No ratings yet
Stable Diffusion
6 pages
Power of Prompts PDF
No ratings yet
Power of Prompts PDF
15 pages
Midjourney Cheat Sheet
No ratings yet
Midjourney Cheat Sheet
4 pages
Jailbreaking For Education Inquiry
No ratings yet
Jailbreaking For Education Inquiry
66 pages
Role of AI in Product Marketing 1747155240
No ratings yet
Role of AI in Product Marketing 1747155240
19 pages
Business Model Canvas Template
No ratings yet
Business Model Canvas Template
1 page
Best Online Guides For Notion
No ratings yet
Best Online Guides For Notion
2 pages
Christmas Photo Collage Slides
No ratings yet
Christmas Photo Collage Slides
16 pages
Stable Diffusion, Midjourney and DALL-E 2
No ratings yet
Stable Diffusion, Midjourney and DALL-E 2
9 pages
LLMs For Me - Introduction LLMs & Generative Text
No ratings yet
LLMs For Me - Introduction LLMs & Generative Text
38 pages
OpenAI Glossary
No ratings yet
OpenAI Glossary
1 page
Image Synthesis From An Ethical Perspective
No ratings yet
Image Synthesis From An Ethical Perspective
11 pages
A Glimpse Into The Future of Cinema - The Ultimate Guide To A.I. Art For Film Animation
No ratings yet
A Glimpse Into The Future of Cinema - The Ultimate Guide To A.I. Art For Film Animation
14 pages
How To Stop Gen AI Threats With Zero Trust
No ratings yet
How To Stop Gen AI Threats With Zero Trust
11 pages
ControlVideo Adding Conditional Control For One Shot Text-to-Video Editing
No ratings yet
ControlVideo Adding Conditional Control For One Shot Text-to-Video Editing
19 pages
Zhang Adding Conditional Control To Text-to-Image Diffusion Models ICCV 2023 Paper
No ratings yet
Zhang Adding Conditional Control To Text-to-Image Diffusion Models ICCV 2023 Paper
12 pages
Search BNB Leaflet - Canva
No ratings yet
Search BNB Leaflet - Canva
1 page
GPT4 Stable Diffusion Prompt Generator
No ratings yet
GPT4 Stable Diffusion Prompt Generator
3 pages
One-Step Image Translation With Text-to-Image Models
No ratings yet
One-Step Image Translation With Text-to-Image Models
29 pages
McCormick How Stable Diffusion Works Dec 2022
No ratings yet
McCormick How Stable Diffusion Works Dec 2022
13 pages
AI Articles
No ratings yet
AI Articles
5 pages
D P M GAN M 2DI: Iffusion Robabilistic Odels Beat ON Edical Mages
No ratings yet
D P M GAN M 2DI: Iffusion Robabilistic Odels Beat ON Edical Mages
13 pages
OpenAI Director Who Helped Oust Altman Now Key Player in Startup
No ratings yet
OpenAI Director Who Helped Oust Altman Now Key Player in Startup
5 pages
A Battle Royal Is Brewing Over Copyright and AI
No ratings yet
A Battle Royal Is Brewing Over Copyright and AI
10 pages
The Art of Effective Prompt Writing
No ratings yet
The Art of Effective Prompt Writing
52 pages
Benchmarking Infographic PDF 1
No ratings yet
Benchmarking Infographic PDF 1
6 pages
AI Art Series - Best Text Prompts To Create Stunning AI Art
No ratings yet
AI Art Series - Best Text Prompts To Create Stunning AI Art
2 pages
SDXL Models Comparison
No ratings yet
SDXL Models Comparison
62 pages
Of EU, US, and Chinese Approaches: Alesia Zhuk
No ratings yet
Of EU, US, and Chinese Approaches: Alesia Zhuk
8 pages
Diffree: Text-Guided Shape Free Object Inpainting With Diffusion Model
No ratings yet
Diffree: Text-Guided Shape Free Object Inpainting With Diffusion Model
17 pages
How To Install and Run Stable Diffusion On Apple Silicon M1:M2 Macs - Stable Diffusion Art
No ratings yet
How To Install and Run Stable Diffusion On Apple Silicon M1:M2 Macs - Stable Diffusion Art
4 pages
2024 TOG Transparent Image Layer Diffusion Using Latent Transparency
No ratings yet
2024 TOG Transparent Image Layer Diffusion Using Latent Transparency
38 pages
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs
No ratings yet
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs
38 pages
Instagen: Enhancing Object Detection by Training On Synthetic Dataset
No ratings yet
Instagen: Enhancing Object Detection by Training On Synthetic Dataset
13 pages
463 Linguistic Profiling of Deepfa
No ratings yet
463 Linguistic Profiling of Deepfa
12 pages
Midjourney Prompts de A A Z Parte 1-72
No ratings yet
Midjourney Prompts de A A Z Parte 1-72
2 pages
L C M: S H - R I F - I: Atent Onsistency Odels Ynthesizing IGH Esolution Mages With EW Step Nference
No ratings yet
L C M: S H - R I F - I: Atent Onsistency Odels Ynthesizing IGH Esolution Mages With EW Step Nference
18 pages
Midjourney Prompts de A A Z Parte 1-85
No ratings yet
Midjourney Prompts de A A Z Parte 1-85
2 pages
Shi - 2024 - Fine Tuning Text-to-Image Generation Models Using Curriculum Learning For Yao Costume Image
No ratings yet
Shi - 2024 - Fine Tuning Text-to-Image Generation Models Using Curriculum Learning For Yao Costume Image
7 pages
AI Art vs. Artists Legal Proposal 2
No ratings yet
AI Art vs. Artists Legal Proposal 2
5 pages
Wa0029.
No ratings yet
Wa0029.
11 pages
On Catastrophic Inheritance of Large Foundation Models
No ratings yet
On Catastrophic Inheritance of Large Foundation Models
33 pages
CLAY: A Controllable Large-Scale Generative Model For Creating High-Quality 3D Assets
No ratings yet
CLAY: A Controllable Large-Scale Generative Model For Creating High-Quality 3D Assets
20 pages
Auto Generation of MultimediaTeaching Materials Based Using Generative AI (2) (1) - 71772377105 KIRUTHIKA A Edited
No ratings yet
Auto Generation of MultimediaTeaching Materials Based Using Generative AI (2) (1) - 71772377105 KIRUTHIKA A Edited
10 pages
House of Leaves - AI Generated Artwork - NightCafe Creator
No ratings yet
House of Leaves - AI Generated Artwork - NightCafe Creator
1 page
Training-Free Multi-Concept Image Generation and Editing With Rectified Flow Transformers
No ratings yet
Training-Free Multi-Concept Image Generation and Editing With Rectified Flow Transformers
26 pages
The Midjourney Expedition: Generate creative images from text prompts and seamlessly integrate them into your workflow
From Everand
The Midjourney Expedition: Generate creative images from text prompts and seamlessly integrate them into your workflow
Margarida Barreto
No ratings yet
Thinking Machines: Exploring AI's Present and Future: Exploring AI's Present and Future
From Everand
Thinking Machines: Exploring AI's Present and Future: Exploring AI's Present and Future
Altaf Siddiqui
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Modern Neural Network Technologies Text-to-Image: Scientific Visualization, 2023, Volume 15, Number 2, Pages 66 - 79

Uploaded by

Modern Neural Network Technologies Text-to-Image: Scientific Visualization, 2023, Volume 15, Number 2, Pages 66 - 79

Uploaded by

Scientific Visualization, 2023, volume 15, number 2, pages 66 - 79, DOI: 10.26583/sv.15.2.

Modern Neural Network Technologies Text-to-Image

Keldysh Institute of Applied Mathematics RAS

1 ORCID: 0000-0002-7586-903X, nicibond9991@gmail.com

1.1 Text-to-image graphical neural networks

1.2 Field of application

Figure 2. Example of a visual image reconstructed from a textual description.

2.3 Stable Diffusion

Figure 3. The main components of Stable Diffusion.

For more on the work of Stable Diffusion, see [12].

Figure 4. Testing the query "photo of a person with higher education" in

3. FEATURES OF GRAPHICAL NEURAL NETWORKS

3.2 Typical artefacts

Figure 8. Image generated using Midjourney v4 [6].

Figure 9. Presence of artefact - deformed spoon in the generated image [19].

Figure 11. An example of the presence of artefacts in hair texture [21].

3.3 Developments in graphical networks and emerging issues

3.3.2 Spreading fake information

Figure 13. Fake photo generated in a neural network.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.