Self-optimal clustering, in comparison with other clustering methods, includes features that, when optimizing in such environments, these characteristics should be considered. A variety of methods for clustering have already been proposed that each of these methods looks at the environment with specific approach and have optimized clustering methods by inspiration of different algorithms. In this research, we have used the Fuzzy Q learning algorithm for the first time. In a Fuzzy Q learning problem, we face with an autonomous agent that interacts with environment through trial and error, and learns to select the optimal action to reach the goal. In the Fuzzy Q learning model, the agent moves into the environment and remembers the related states and rewards. The agent tries to behave in such a way that maximizes the reward function. Since Fuzzy Q learning algorithm uses the combination of reinforcement learning and fuzzy logic, it is an appropriate option for solving this group of problems. We first define the Q learning algorithm in this thesis. And after proposing this algorithm, we will shortly investigate how to improve it by fuzzy logic, which leads to the suggestion of a fuzzy reward function to reduce the complexity of clustering, and express the efficiency of proposed algorithms by standard and appropriate tests. The results of tests indicate that the proposed method has acceptable efficiency.